Manage Data Pipeline with Terraform & Track Data Versions using VDK Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â [Open in app]( or [online]()
[Azure Cognitive Services, Prompt Design, Whisper JAX vs PyTorch & Dataflow ML on Apache Beam]( Manage Data Pipeline with Terraform & Track Data Versions using VDK May 5
Â
[Share](
 👋 Hey, "The quality of your prompts is the biggest lever for achieving good results with GPTs and LLMs" - [Sharif Shameem, Founder of Lexica & Debuild](. If you're looking to get the best results out of LLMs and GPTs, one thing that really helps is prompt engineering. By getting better at this skill, you can improve the quality of your work and even expand the range of tasks your model can handle. And guess what? There's an amazing new tool called the [Automatic Prompt Engineer]( that makes prompt engineering for GPT-3 super easy! With this tool, you'll be able to unlock a whole new world of possibilities. On that note, let’s move on to this week's focus in [DataPro#42]( which is all about exploring ways to enhance AI capabilities in our work to stay ahead in today's fast-paced technological landscape. Our upcoming book, [Practical Guide to Azure Cognitive Services]( will be the centerpiece of this effort. Additionally, we will be delving into [building high-accuracy Generative AI applications]( and featuring a course on [ChatGPT Prompt Engineering by OpenAI and Andrew Ng](. We will also be exploring [AI self-play for algorithm design]( [integrating neural networks]( and much more. I hope these resources can help you take your data and machine learning practice to the next level and achieve great outcomes. Get ready for a productive learning experience! If you’re interested in sharing ideas to foster the growth of the data community, then this survey is for you. Consider sharing your thoughts and get a FREE bestselling Packt book, The Applied Artificial Intelligence Workshop as PDF. Jump on in! [TELL US WHAT YOU THINK]( Cheers,
Merlyn Shelley
Editor in Chief, Packt This Week’s Key Highlights: - [The Art of Prompt Design: Use Clear Syntax]( - [Whisper JAX vs PyTorch: Truth about ASR Performance on GPUs]( - [Running ML models with new Dataflow ML on Apache Beam]( - [Automatically Managing Data Pipeline Infrastructures With Terraform]( - [How to Keep Track of Data Versions Using Versatile Data Kit]( Latest Forks on GitHub - [AudioGPT:]( Understanding and generating speech, music, sound, and talking head. - [LLMsPracticalGuide:]( A curated list of practical guides and resources of LLMs (LLMs Tree, Examples, Papers). - [Track-Anything:]( A flexible and interactive tool for video object tracking and segmentation, based on Segment Anything, XMem, and E2FGVI. - [WizardLM:]( Empowering large pre-trained language models to follow complex instructions. - [tango:]( Codes and model of the paper "Text-to-Audio Generation using Instruction Tuned LLM and Latent Diffusion Model". - [LaMini-LM:]( A diverse herd of distilled models from large-scale instructions. - [datacomp:]( Design dataset for pre-training CLIP models to improve downstream tasks performance with image-text pairs. - [hidet:]( An open-source efficient deep learning framework/compiler, written in Python. - [mPLUG-Owl:]( A new training paradigm with a modularized design for large multi-modal language models. - [nanoGPT:]( The simplest, fastest repository for training/finetuning medium-sized GPTs. [Pledge your support]( Industry Insights AWS ML - [Build high-accuracy Generative AI applications on enterprise data using Amazon Kendra, LangChain, and large language models:]( Generative AI from large language models can provide valuable insights, but enterprise use cases require generating insights from enterprise content. Retrieval Augmented Generation (RAG) approach is used to keep the answers in-domain and avoid hallucinations. Using solutions like Amazon Kendra that offer high-accuracy semantic search results helps improve the quality of insights generated by LLM. Also, it provides data source connectors, supports common file formats, and offers security features, making it ideal for enterprise use cases. Refer to "[Building with Generative AI on AWS]( for more information. - [Hosting ML Models on Amazon SageMaker using Triton: XGBoost, LightGBM, and Treelite Models:]( XGBoost is a popular tree-based model used for classification and regression problems. [Amazon SageMaker]( can serve XGBoost models using [NVIDIA Triton Inference Server]( with real-time inference workloads met by SageMaker real-time endpoints. [Single]( and [multi-model endpoints]( are available to deploy one or multiple machine learning models against a logical endpoint based on specific requirements. Microsoft Research - [AI self-play for algorithm design:]( A self-play pipeline is used to improve a language model's ability to solve puzzles. The LM generates puzzles and attempts to solve them multiple times. The solutions are filtered for correctness, and the LM is further trained on verified correct solutions to synthetic puzzles. The process repeats, leading to significant improvements in the LM's ability to solve held-out test puzzles. The self-play technique can improve AI programming abilities, but high-level abstract reasoning remains challenging for AI systems. Google AI & ML - [Running ML models now easier with new Dataflow ML innovations on Apache Beam:]( Only 20% of companies use their AI models in production, according to [Harvard Business Review](. Google Cloud Dataflow is built on Apache Beam, a unified programming model for developing batch and stream pipelines. Three new machine learning (ML) focused features have been added to Dataflow, integrating with the Apache Beam community to simplify streaming ML models at scale in production. The Dataflow ML features can be used by developers while the RunInference transform is continuously updated to make ML production and development more flexible. Just for Laughs! Why did the LLM refuse to answer the question about whether or not it was conscious?  Because it was in a state of deep learning! Get the Latest Updates from Industry Experts Are you curious to find out how the AI era will impact the tasks we perform at work? Our Packt team worked with the top Microsoft Azure experts to create a condensed guide that delves into the new AI-infused technology offered by the Azure cloud. We're excited to offer you a sneak peek from our upcoming book, "[Practical Guide to Azure Cognitive Services]( which will give you an insight into how this technology can help build your ideas into valuable products that fit the market. Practical Guide to Azure Cognitive Services – By [Chris Seferlis]( [Christopher Nellis]( and [Andy Robert]( Azure Cognitive Services provides a collection of pre-built APIs for AI solutions that can be integrated into existing applications, allowing users to utilize Microsoft's renowned vision, speech, text, and decision-making AI capabilities. With this practical guide, you can deploy AI solutions with ease and receive industry-specific implementation examples to swiftly take them into production. Optimize operations, reduce costs, and deliver state-of-the-art AI solutions by leveraging the power of Azure OpenAI. Exploring native AI enrichments in Azure Cognitive Search  Semantic search The semantic search capability is the newest feature added to Azure Cognitive Search. This feature is a premium capability that can provide more accurate search results by providing an enhanced relevance score and language understanding elements. Semantic search essentially uses the inverted indexing process that happens with a full-text search as described before, and then enhances the search with a few activities given as follows: - Spell correction – The first thing the semantic search does is ensure that search terms are spelt correctly and fixes any spelling errors prior to processing. - Full-text search – Next, the standard search process takes place. - Enhanced scoring – After the standard search is complete and provides the documents as they are scored, they are re-scored with semantic search. This process uses the default similarity [scoring algorithm and calculates]( the score based on the similarity of terms and the results from the index. Then it re-ranks the documents based on the enhanced score provided. - Captions and answers – Finally, the document is analyzed further to extract key sentences and paragraphs that describe the broader context of the document. This can be surfaced for review by a user to gain a quicker understanding of what the full document consists of. This also provides a basis for an answer if a question is asked with the initial search request. The above content is extracted from our upcoming book, [Practical Guide to Azure Cognitive Services]( written By [Chris Seferlis]( [Christopher Nellis]( and [Andy Robert](. To know more about what it has to offer check out the button below. [DISCOVER FRESH CONCEPTS & KEEP READING!]( Featured on Prompt Engineering - [OpenAI and Andrew Ng’s ChatGPT Prompt Engineering Course: Guidelines and Summary:]( Andrew Ng and OpenAI have released a free [ChatGPT prompt engineering course for developers](. Poorly constructed prompts can lead to irrelevant or incorrect outputs, so investing time in crafting efficient prompts is crucial for optimal results. This article summarizes the course's guidelines for crafting effective prompts and provides insights on how to apply them in various applications, such as chatbots, language translation, and content generation. - [In-Context Learning, In Context:]( In-context learning (ICL) is a paradigm where a large language model (LLM) learns to solve a new task at inference time by being fed a prompt with examples of that task. Recent works explore how to [manipulate prompts to help LLMs]( perform certain tasks more easily. ICL has an important relationship with prompting, and recent studies suggest that even small models may benefit from ICL. - [The Art of Prompt Design: Use Clear Syntax:]( The article discusses the importance of clear syntax when building prompts for large language models such as ChatGPT or GPT-4. Using an open-source StableLM model, the article shows how clear syntax can improve the LLM's output, making it easier to parse and ensuring that it matches the intended meaning. Code examples are [provided in a notebook]( for readers to reproduce. Find Out What’s New - [How to Keep Track of Data Versions Using Versatile Data Kit:]( The Versatile Data Kit (VDK) is a tool that helps manage data versioning needs by offering comprehensive solutions for data ingestion and processing from different sources, using SQL or Python. [VDK]( enables users to build data lakes and track changes made to their critical business information efficiently. - [Integrating Neural Net: Deriving the Normal Distribution CDF:]( The article presents a technique to train a neural network to derive integrals of functions, including those that do not have closed-form solutions. The method involves using a custom loss function and automatic differentiation, with an example of training a neural network to integrate the PDF of the normal distribution to produce the cumulative density function (CDF). - [Grad-CAM in Pytorch: Use of Forward and Backward Hooks:]( Here they discuss [implementing the Grad-CAM algorithm in PyTorch]( using forward and backward hooks without changing the model's forward function. It is based on Stepan Ulyanin's approach and aims to contribute to the work. This method can be applied to any convolutional neural network. - [Automatically Managing Data Pipeline Infrastructures With Terraform:]( This article explains how Terraform, a cloud-agnostic Infrastructure as Code (IaC) tool, can be used to develop data projects with cloud tools like S3, Lambda, Glue, and EMR. Terraform uses a standardized language called HCL (HashiCorp Configuration Language) to interact with cloud providers' APIs and allows for infrastructure versioning using plain-text files that can be easily managed with Git. - [Whisper JAX vs PyTorch: Uncovering the Truth about ASR Performance on GPUs:]( The author discusses Whisper JAX, a faster implementation of the Whisper speech recognition model using a different backend framework than OpenAI's PyTorch. The article tests both CPU and GPU implementations and measures accuracy and execution time. Whisper JAX is shown to outperform the PyTorch implementation on CPU platforms, with a speedup factor of approximately two-fold. - [Full Stack 7-Steps MLOps Framework:]( The article explains how to create a batch-serving architecture and ETL pipeline for loading data into the feature store and creating a training dataset version. The article emphasizes the importance of feature stores in ML systems. Here they also suggest using [Hopsworks]( as a feature store because of its serverless nature and essential features, but in big data scenarios, [Spark]( can be used instead. For better understanding, it is recommended to run the code while reading the course by accessing the [GitHub repository]( See you next time! As a GDPR-compliant company, we want you to know why you’re getting this email. The _datapro team, as a part of Packt Publishing, believes that you have a legitimate interest in our newsletter and its products. Our research shows that you opted-in for email communication with Packt Publishing in the past and we think your previous interest warrants our appropriate communication. If you do not feel that you should have received this or are no longer interested in _datapro, you can opt out of our emails by clicking the link below.  [Like](
[Comment](
[Restack](  © 2023 Copyright © 2022 Packt Publishing, All rights reserved.
Our mailing address is:, Packt Publishing
Livery Place, 35 Livery Street, Birmingham, West Midlands B3 2PB
United Kingdom
[Unsubscribe]() [Start writing]()