Newsletter Subject

Google DeepMind AlphaDev, Cox Regressions, PyTorch 2.0 on AWS, PyArrow for pandas and Dask workflows, Machine Lear...

From

substack.com

Email Address

packtdatapro1@substack.com

Sent On

Fri, Jun 9, 2023 02:02 PM

Email Preheader Text

Neuro-Symbolic Programming in Python, Integrate LLM workflows with Knowledge Graph � � � � � �

Desktop View
HTML
Text
Mobile View

Go Premium to Unlock

Subscribe Now

Neuro-Symbolic Programming in Python, Integrate LLM workflows with Knowledge Graph [Open in app]( or [online]() [Google DeepMind AlphaDev, Cox Regressions, PyTorch 2.0 on AWS, PyArrow for pandas and Dask workflows, Machine Learning Orchestration]( Neuro-Symbolic Programming in Python, Integrate LLM workflows with Knowledge Graph Jun 9 [Share]( 👋 Hey, "Machine learning orchestration is the art of harmonizing data, models, and infrastructure into a symphony of intelligent systems." - [Andrew Ng, Founder of DeepLearning.AI, Founder & CEO of Landing AI]( In a world of ever-expanding technological possibilities, machine learning orchestration emerges as the conductor of innovation. Like a master composer, it harmonizes the intricate elements of data, models, and infrastructure, transforming them into a symphony of intelligent systems. Through this art, we unlock the untapped potential of AI, unraveling the mysteries of our complex world and paving the way for groundbreaking discoveries that challenge the limits of human imagination. Let's now dive into the captivating lineup of articles and topics highlighted in this week's [DataPro#47]( edition. The central focus of this edition revolves around [Neuro-Symbolic AI]( skillfully authored by [Alexiei Dingli]( and [David Farrugia](. Additionally, we'll embark on a journey into the world of Amazon SageMaker Canvas, where we can [retrain machine learning models]( and automate batch predictions. For PyTorch enthusiasts, there's a special treat in store. We'll delve into the latest advancements in [PyTorch 2.0 on AWS]( exploring the capabilities of [Gen App Builder's Enterprise Search]( But wait, there's more! We'll also get a glimpse of the [Generative AI support features available on Vertex AI](. We'll investigate how more [versatile AI tools]( can optimize computer systems, [simplify data cleansing]( using large language models, develop an intuitive understanding of [Cox regressions]( leverage [PyArrow]( to enhance pandas and Dask workflows, and discover how [machine learning orchestration]( is revolutionizing MLOps. Get ready for an immersive experience, as we explore profound concepts, cutting-edge research, practical solutions, and an abundance of interesting resources. Your insights matter! Shape Packt’s impactful AI/ML reports by taking our quick survey on preferred topics, formats, and solutions. Join us to create practical reports that address challenges and showcase industry trends! [Take the Survey, Shape our Reports!]( Key Highlights: - [Analyzing Leakage of PII in Language Models]( - [Integrate LLM workflows with Knowledge Graph using Neo4j and APOC]( - [Machine learning-ready datasets from SageMaker offline Feature Store]( - [Graph Neural Network Relational Classifiers]( - [Helping businesses with generative AI]( Would you be interested in connecting with our DataPro Newsletter Editor-in-Chief for a user interview, where you can share your ideas and feedback? We value your input and would love to customize the content modules according to your preferences. To get started, simply fill in your email ID in the survey below, and we'll be in touch with you soon! Be sure to join our feedback program! Complete a call and claim free credit as a reward. On top of that, those who complete the survey below would also receive a FREE Packt ebook, "[The Applied Artificial Intelligence Workshop]( in PDF format. Let's make the DataPro Newsletter even better together! Don't miss out! [Share your Feedback!]( Cheers, [Merlyn Shelley]( Editor-in-Chief, Packt Latest Research on GitHub - [polyaxon]( MLOps Tools For Managing & Orchestrating The Machine Learning LifeCycle. - [AgnostiqHQ]( Pythonic tool for running machine-learning/high performance/quantum-computing workflows in heterogenous environments. - [taskfleet]( Cloud- and container-native task orchestrator optimized for machine learning jobs. - [shubham-goel]( Code repository for the paper: [Humans in 4D: Reconstructing and Tracking Humans with Transformers.]( - [salesforce]( CodeTF: One-stop Transformer Library for State-of-the-art Code LLM. - [Rikorose]( A Low Complexity Speech Enhancement Framework for Full-Band Audio (48kHz) using on Deep Filtering. - [facebookresearch]( Hiera is a hierarchical vision transformer that is fast, powerful, and, above all, simple. - [DAMO-NLP-SG]( Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding. [Pledge your support]( Industry Insights AWS ML How-To Tutorials - [Retrain ML models and automate batch predictions in Amazon SageMaker Canvas using updated datasets:]( Amazon SageMaker [Canvas]( now allows you to retrain machine learning models and automate batch prediction workflows with updated datasets, facilitating continuous learning and improved model performance. This enhances efficiency and decision-making by processing multiple data points simultaneously and analyzing the predictions for insights and informed decisions. - [Build high-performance ML models using PyTorch 2.0 on AWS:]( This post highlights the benefits and process of running large-scale, high-performance distributed machine learning (ML) model training and deployment using PyTorch 2.0 on AWS. It demonstrates fine-tuning a RoBERTa (Robustly Optimized BERT Pretraining Approach) model for sentiment analysis and deploying it on [AWS Graviton]( C7g EC2 instances with improved speed. It provides an example for beginners to get started with PyTorch 2.0 on AWS for ML projects. - [Build machine learning-ready datasets from the Amazon SageMaker offline Feature Store using the Amazon SageMaker Python SDK:]( The Feature Store in SageMaker Python SDK has been [updated]( to simplify dataset creation from the offline store. New methods in the SDK enable creating datasets without SQL queries, supporting operations like time travel, duplicate record filtering, and joining multiple feature groups with accuracy. This post demonstrates using the updated SDK to build ML-ready datasets effortlessly without SQL statements, providing a seamless experience for Python-oriented customers. Google AI & ML - [Helping businesses with generative AI:]( Google has expanded access to generative AI technologies and introduced educational programs, consulting services, industry blueprints, and an extensive partner ecosystem. The aim is to support customers in learning, building, and deploying generative AI solutions, with examples of successful innovation from companies like [Character.ai]( [Deutsche Bank]( [Uber]( and [Wendy's](. [Google Cloud Consulting]( offers learning journeys for different audiences, while new consulting offerings focus on AI-enabled search, information summarization, process automation, and personalized content creation. Additionally, sample reference architectures and workflows are provided to accelerate generative AI projects across various industries. Google remains committed to openness and continually adds new partners to its [AI ecosystem]( - [Improving search experiences with Enterprise Search on Gen App Builder:]( Google's Enterprise Search on Generative AI App Builder (Gen App Builder) enables organizations to quickly create custom chatbots and semantic search applications with minimal coding. It combines internal data with Google's search technologies and generative models, delivering personalized search experiences for enterprise applications and websites. The platform supports multimodal data, provides control over answer summaries, and simplifies app development with out-of-the-box capabilities. - [Generative AI support on Vertex AI is now generally available:]( Google is announcing the general availability of Generative AI support on Vertex AI, providing customers with access to platform capabilities for custom generative AI applications. Developers can utilize text models, embeddings API, and [Generative AI Studio]( for tuning and deploying models. With strong data governance and security, Vertex AI empowers customers to access foundation models, customize them, and build generative AI applications while maintaining control over their data. Google ensures [responsible AI practices]( and maintains user security, data management, and access controls. Expert Insights from Packt Community Neuro-Symbolic Programming in Python - By [Alexiei Dingli]( and [David Farrugia]( Neuro-symbolic artificial intelligence (NSAI) systems are not constrained by standardized principles or confined by special requirements. The only consideration in NSAI systems is the combination of symbolic artificial intelligence (AI) and neural networks (NNs). The first objective is to identify a strategy for designing and implementing an NSAI system. There are various ways to build such a system. Here, we build a system based on the logic tensor network (LTN) architecture. Environment and data setup For this purpose, we will work with the Red and White Wine Dataset ([( – publicly available in Kaggle. This dataset consists of 12 features describing different wine characteristics (such as the density and residual sugar, to name a couple) and a binary label representing whether said wine is a red or white wine. As our development environment, we will use the Google Colaboratory (Colab) platform (. You can find the Jupyter notebook and the dataset in the book's official GitHub repository: [( can download the dataset from Kaggle and upload it to the Colab session. import pandas as pd df = pd.read_csv('/content/wine_dataset.csv') df.drop('quality', axis=1, inplace=True) df['style'] = np.where(df['style'] == 'red', True, False) df = df.sample(frac=1) Logic Tensor Network (LTN) framework For the more interested reader, you can read the full LTN paper at [( Additionally, we will use the excellent Python library LTNtorch (. LTNtorch is an LTN implementation based on the deep learning package PyTorch (. LTNtorch also comprehensively explains the mathematical foundations around the LTN algorithm. Another benefit of LTNs (and subsequently, NSAI) is their high transparency. We have stated multiple times that NSAI systems are explainable by default. The LTN training process involves translating first-order logic to tensor embeddings. We start off with logical relationships between variables and predicates. Then, we use quantifiers over the variables and predicates (returning either True or False) and transform them into multi-dimensional vectors (using tensor operations). This excerpt is taken from the recently published book titled "[Neuro-Symbolic AI]( written by By [Alexiei Dingli]( and [David Farrugia]( published in May 2023. To get a preview of the book's content, be sure to read the [whole chapter available here]( or sign up for a [7-day free trial]( to access the complete Packt digital library. To explore more, click on the button below. [Discover Fresh Concepts, Keep Reading!]( Latest Developments in LLMs & GPTs - [Optimising computer systems with more generalised AI tools:]( AlphaZero and MuZero, AI models based on reinforcement learning, have achieved superhuman performance in games. They are now being applied to optimize data centers and video compression. [AlphaDev]( a specialized version of AlphaZero, has discovered new sorting algorithms that accelerate software, improving efficiency in computing. These advances are shaping the future of computing, benefiting billions of people. AlphaDev's algorithm increases sorting efficiency by 70% for short sequences and 1.7% for larger sequences, saving time and energy when processing search results. - [Analyzing Leakage of Personally Identifiable Information in Language Models:]( This work focuses on the risk of Personally Identifiable Information (PII) leakage by Language Models (LMs). It introduces game-based definitions for PII leakage via black-box extraction, inference, and reconstruction attacks on LM models. Empirical evaluations reveal that differential privacy mitigates PII disclosure but still leaks about 3% of PII sequences. Additionally, novel attacks are introduced, extracting significantly more PII sequences compared to existing attacks, and a connection between record-level membership inference and PII reconstruction is established. - [Planning for Multi-Object Manipulation with Graph Neural Network Relational Classifiers:]( Nvidia proposes a novel graph neural network framework for multi-object manipulation, enabling robots to reason about inter-object relations and predict changes during interactions. Their model operates on partial-view point clouds, allowing dynamic interactions and multi-step planning. The model shows successful transferability from simulation to the real world, enabling robots to rearrange objects of different shapes and sizes using various manipulation skills. Find Out What’s New - [From Chaos to Clarity: Streamlining Data Cleansing Using Large Language Models:]( Advanced AI tools, equipped with natural language processing and pattern recognition, have the potential to revolutionize data cleansing, making it more usable. These tools, such as Large Language Models (LLMs) like OpenAI's GPT 3.5 Turbo, can reshape data and unlock valuable insights. In a specific use case of normalizing survey responses, LLMs were employed to classify the responses, automate the process, and achieve the task at a cost of less than one dollar. This paves the way for improved customer experiences. - [Integrate LLM workflows with Knowledge Graph using Neo4j and APOC:]( Neo4j is working on integrating Large Language Models (LLMs) with Knowledge Graphs using [Awesome Procedures On Cypher (APOC)]( extended procedures. They currently support OpenAI and VertexAI endpoints, with plans to support more. LLMs can generate [Cypher statements]( or enrich text embeddings to retrieve connected information and provide context for queries. The advancements in Neo4j and APOC will lead to improved data handling and processing. You can find the code on [GitHub]( - [Unbox the Cox: Intuitive Guide to Cox Regressions:]( Cox regression models the relationship between predictor variables and the time to an event. A hypothetical dataset with 5 subjects is used to illustrate this. Each subject either experiences an event (event=1) or not (event=0) and has a predictor variable (x). Hazards are probabilities per unit time, while likelihoods relate to event probabilities. Stacked bar charts help analyze maximum likelihood estimation (MLE) by visualizing negative log-likelihoods and their relationship with predictors. - [Utilizing PyArrow to Improve pandas and Dask Workflows:]( This post explores the use of PyArrow to enhance [pandas]( and [Dask]( workflows. PyArrow's general support for dtypes in pandas 2.0 addresses issues with missing values and non-standard dtypes. It significantly reduces memory usage, especially for string columns, and offers performance improvements. The dtype_backend in PyArrow can also improve I/O times. Ongoing developments aim to further enhance PyArrow's support and speed up workflows. - [Machine Learning Orchestration vs MLOps:]( MLOps systems, like other workflow-based systems, rely on an orchestrator to manage and coordinate complex processes. An orchestrator, such as [Apache Airflow]( plays a crucial role in running machine learning pipelines by providing the environment for executing pipeline steps and ensuring proper sequencing. Airflow serves as both an orchestrator and an MLOps tool, enabling efficient deployment and maintenance of machine learning models. See you next time! As a GDPR-compliant company, we want you to know why you’re getting this email. The _datapro team, as a part of Packt Publishing, believes that you have a legitimate interest in our newsletter and its products. Our research shows that you opted-in for email communication with Packt Publishing in the past and we think your previous interest warrants our appropriate communication. If you do not feel that you should have received this or are no longer interested in _datapro, you can opt out of our emails by clicking the link below. [Like]( [Comment]( [Restack]( © 2023 Copyright © 2022 Packt Publishing, All rights reserved. Our mailing address is:, Packt Publishing Livery Place, 35 Livery Street, Birmingham, West Midlands B3 2PB United Kingdom [Unsubscribe]() [Get the app]( writing]()

Edit & Download HTML

Add To Favourites

EDM Keywords (184)

world working workflows work wendy week websites way want wait variables value used use usable upload updated unlock tuning transform touch top tools time think task taking taken system symphony survey sure support strategy store state start speed soon simulation sign share shaping risk reward relationship red received reason read queries python pyarrow purpose providing provides provided products processing process preview preferences predictions predicates potential plans pii paving paves past part pandas orchestrator opted opt openness newsletter new neo4j name mysteries miss manage make maintenance ltns llms link limits less lead know kaggle journey join investigate interested interactions insights input infrastructure implementing illustrate identify ideas harmonizes google glimpse getting get games future find feel feedback false explore explainable excerpt examples example event environment energy employed embark emails email dtypes download dive discover designing deploying density delve default dataset datapro customize couple cost content constrained consideration connection connecting confined conductor computing complete combination code clicking click classify chief chaos challenge capabilities call button build book benefits beginners aws articles art applied app apoc announcing analyzing alphazero allows aim advances advancements additionally achieve accuracy access abundance 70

substack.com

Merlyn from Packt

Follow domain to get weekly email update

Marketing emails from substack.com

Sent On

08/12/2024

Sent On

08/12/2024

Sent On

08/12/2024

Sent On

08/12/2024

Sent On

08/12/2024

Sent On

07/12/2024

Google DeepMind AlphaDev, Cox Regressions, PyTorch 2.0 on AWS, PyArrow for pandas and Dask workflows, Machine Lear...

Email Preheader Text

EDM Keywords (184)

substack.com

Marketing emails from substack.com

Email Content Statistics

Font Used