Packt DataPro#30: Creating Tensors, Open-Source Vizier & Prediction Pipeline using Azure ML [View this email in your browser]( [PacktDataPro Logo]( February 14, 2023 | DataPro #30 👋 Hey {NAME}, Thank you for being a valued member of the Packt Publishing family! We appreciate your support and want to offer you our complimentary data newsletter. As users of our technical books, we think this newsletter will be of interest to you. It will keep you up to date on the latest developments and advancements in the field of data and machine learning. Below is the latest issue. Feel free to contact us if you have any questions. Packt DataPro #30 "AI is not just another technology, it is a foundational technology that will transform how we live and work." - [Andrew Ng, Co-founder of Google Brain and former VP & GM of Baidu](. AI is more than just a tool; it's actually a building block for our future. And with that kind of power, it's super important that we use it in the best way possible. Think of it this way: AI is like a cornerstone for the world we want to build, and by working together, we can make it amazing! This week, the focus is to build tensors, prediction pipelines, and Azure functions to simplify data practices and bridge the gap between advanced technology and practical application. Key Insights: - [Creating Tensors in PyTorch]( - [Tutorial: Creating Microsoft Azure Function Apps]( - [MLOps made simple]( If you find the newsletter useful, share this with your friends! Also, make sure to add our email address to your contact list to avoid missing out on any updates. Jump on in! Cheers,Merlyn Shelley
Associate Editor in Chief, Packt Keep up with cutting-edge Research on GitHub - [microsoft]( DNN inference latency prediction toolkit for accurate modelling and prediction on edge devices. - [microsoft]( Distributed ANN library for large-scale vector search, with high-quality indexing and serving toolkits. - [microsoft]( The implementation of [BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining.]( - [Machine-Learning-Interactive_Tools:]( Interactive Tools for Machine Learning, Deep Learning and Mathematics. - [tpot:]( A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming. - [doccano:]( Open-source annotation tool for machine learning practitioners. - [label-studio:]( Label Studio is a multi-type data labelling and annotation tool with standardized output format. - [mljs]( Machine learning tools in JavaScript. This library is a compilation of the tools developed in the [mljs]( organization. - [polyaxon:]( MLOps Tools For Managing & Orchestrating the Machine Learning LifeCycle. - [healthcareai-py:]( Python tools for healthcare machine learning. The aim of healthcareai is to streamline machine learning in healthcare. - [cleanlab:]( The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels. - [alteryx]( A machine learning tool for automated prediction engineering. - [h2oai]( H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform. - [autogluon:]( AutoGluon automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications. - [robotgo:]( Golang Desktop Automation. Control the mouse, keyboard, read the screen, process, Window Handle, image and bitmap and global event listener. [EMAIL FORWARDED? JOIN DATAPRO HERE!]( Stay informed about Data & ML Industry Insights AWS - [Amazon SageMaker Automatic Model Tuning now supports three new completion criteria for hyperparameter optimization:]( [Amazon SageMaker]( introduces support for [three new completion]( criteria in [automatic model tuning]( including maximum tuning time, improvement monitoring, and convergence detection, offering more control over the stopping criteria when finding the best hyperparameter configuration for a model. - [Create powerful self-service experiences with Amazon Lex on Talkdesk CX Cloud contact center:]( Contact centers use AI and NLP through conversational bots to provide personalized customer experience and efficient self-service support. Amazon Lex API integrates its conversational AI capabilities with Talkdesk's contact center solution, which includes voice biometrics for caller authentication. Amazon Lex provides contextual information to agents for improved caller assistance. Google Cloud - [Application security with Cloud SQL IAM database authentication:]( Hardening complex applications is challenging, especially those with multiple layers of different authentication schemes. Cloud SQL's [IAM Database Authentication]( maps Cloud IAM users or service accounts to database roles, creating logins with matching email addresses. - [Pub/Sub Lite’s Apache Spark Structured Streaming Connector is now Generally Available:]( The Open Source [Pub/Sub Lite Apache Spark connector]( is now compatible with Apache Spark 3.X.X distributions and is officially GA. [Pub/Sub Lite]( is a Google Cloud messaging service for asynchronous message exchange between independent applications. Publish apps send messages to topics while subscribe apps receive messages from subscriptions. The connector is publicly available from the [Maven Central repository](. Just for Laughs! Why did the data scientist cross the road? To get to the other side of the prediction! Understanding Data & ML Core Concepts Creating Tensors in PyTorch - By [Sebastian Raschka]( [Yuxi (Hayden) Liu]( [Vahid Mirjalili]( We can simply create a tensor from a list or a NumPy array using the torch.tensor or the torch.from_numpy function as follows: import torch import numpy as np np.set_printoptions(precision=3) a = [1, 2, 3] b = np.array([4, 5, 6], dtype=np.int32) t_a = torch.tensor(a) t_b = torch.from_numpy(b) print(t_a) print(t_b) tensor([1, 2, 3]) tensor([4, 5, 6], dtype=torch.int32) This resulted in tensors t_a and t_b, with their properties, shape=(3,) and dtype=int32, adopted from their source. Similar to NumPy arrays, we can also see these properties: t_ones = torch.ones(2, 3) t_ones.shape torch.Size([2, 3]) print(t_ones) tensor([[1., 1., 1.], [1., 1., 1.]]) Finally, creating a tensor of random values can be done as follows: rand_tensor = torch.rand(2,3) print(rand_tensor) tensor([[0.1409, 0.2848, 0.8914], [0.9223, 0.2924, 0.7889]]) Manipulating the data type and shape of a tensor Learning ways to manipulate tensors is necessary to make them compatible for input to a model or an operation. In this section, you will learn how to manipulate tensor data types and shapes via several PyTorch functions that cast, reshape, transpose, and squeeze (remove dimensions). The torch.to() function can be used to change the data type of a tensor to a desired type: t_a_new = t_a.to(torch.int64) print(t_a_new.dtype) torch.int64 See [( for all other data types. Certain operations require that the input tensors have a certain number of dimensions (that is, rank) associated with a certain number of elements (shape). Thus, we might need to change the shape of a tensor, add a new dimension, or squeeze an unnecessary dimension. PyTorch provides useful functions (or operations) to achieve this, such as torch.transpose(), torch.reshape(), and torch.squeeze(). Let’s take a look at some examples: Transposing a tensor: t = torch.rand(3, 5) t_tr = torch.transpose(t, 0, 1) print(t.shape, ' --> ', t_tr.shape) torch.Size([3, 5]) --> torch.Size([5, 3]) Reshaping a tensor (for example, from a 1D vector to a 2D array): t = torch.zeros(30) t_reshape = t.reshape(5, 6) print(t_reshape.shape) torch.Size([5, 6]) Removing the unnecessary dimensions (dimensions that have size 1, which are not needed): t = torch.zeros(1, 2, 1, 4, 1) t_sqz = torch.squeeze(t, 2) print(t.shape, ' --> ', t_sqz.shape) torch.Size([1, 2, 1, 4, 1]) --> torch.Size([1, 2, 4, 1]) This curated content was taken from the book [Implementing a Multilayer Artificial Neural Network from Scratch | Machine Learning with PyTorch and Scikit-Learn (packtpub.com)](. To learn more, click on the button below. [SIT BACK, RELAX & START READING!]( Find Out What’s New in Data & ML - [Tutorial: Creating Microsoft Azure Function Apps:]( This tutorial teaches the basics of using Azure Functions for serverless app development, enabling faster and more cost-effective development by eliminating manual scaling and server provisioning. It also covers the step-by-step process of creating a Python Function App in Azure and highlights the benefits of using Azure's serverless technology. By the end of the tutorial, you will have learned how to create, test, and delete a Python Function App in Azure. - [Open Source Vizier: Towards reliable and flexible hyperparameter and blackbox optimization:]( Vizier]( is a [blackbox optimization]( system used across Google for [optimizing]( various products such as Search, Ads, and YouTube. It offers a collection of algorithms and benchmarks for AutoML researchers under common APIs to assess the performance of proposed methods. [TensorFlow Probability]( allows researchers to use the [JAX-based Gaussian Process Bandit algorithm]( which is based on the default algorithm in Google Vizier. - [Code in the Cloud With Anaconda:]( Anaconda has made its cloud notebook, previously only available as part of paid subscription plans, now free for anyone to use. This move aims to empower people with data literacy and make it easier for people to start coding in the world of data science. The cloud notebook allows for cloud-hosted Python computing and is easy to access. To start using it, simply create an account on [Anaconda's cloud platform]( and click "Notebooks" or navigate to "nb.anaconda.cloud". - [Checking survey representativeness by looking at canary variables:]( A canary variable is a diagnostic tool used in statistical modeling. It refers to a variable with a known distribution, which is not adjusted in the model, used to check the validity of the sampling procedure. Comparing the estimated distribution of the canary variable to external knowledge, it provides a way to verify the representativeness of the sample. Although not foolproof, this check is a useful step in ensuring the accuracy of a statistical model. - [MLOps made simple: how to run a batch prediction pipeline using Azure Machine Learning components:]( Azure has introduced components for data scientists to use in their machine learning pipeline. [Components]( are self-contained code pieces that can be used to build workflows that can be easily shared with team members. In this article, a Feature Matching pipeline will be created using components. Azure Machine Learning (AML) provides a UI to manage components, data assets, and compute resources, making it easier for data scientists to adopt MLOps principles in their workflows. - [Easily Validate User-Generated Data Using Pydantic:]( Pydantic is a powerful tool for data validation in Python, particularly for data from Excel. It simplifies the process of checking and filtering user-generated data to ensure it is valid, without having to write complex Pandas functions. Pydantic allows for granular data validation, helps understand the data and can validate most forms of data inputs. The author, a data engineer, found Pydantic to be an effective solution for data validation, saving time and effort compared to previous methods. - [The Subtleties of Converting a Model from TensorFlow to PyTorch:]( In the blog post, the author explains how to migrate a TensorFlow model to PyTorch, highlighting important differences between the two frameworks. The author provides tips for converting models between frameworks and offers end results including MLPerf ResNet-50 and MobileNet-v1 PyTorch checkpoints. These results can be found on the author's [GitHub repository](. See you next time! Would you like to participate in a quick survey? [START SURVEY!]( Clicking unsubscribe will stop all [_datapro]( communication. Make sure you don't make a hasty decision! [NOT FOR YOU? UNSUBSCRIBE HERE]( [Facebook icon] [Instagram icon] [Twitter icon] [Logo] Copyright (C) 2023 Packt Publishing. All rights reserved. Hello, Thank you for being a part of the DataPro weekly newsletter. Team Packt.
As a GDPR-compliant company, we want you to know why you’re getting this email. The _datapro team, as a part of Packt Publishing, believes that you have a legitimate interest in our newsletter and its products. Our research shows that you,{EMAIL}, opted-in for email communication with Packt Publishing in the past and we think your previous interest warrants our appropriate communication. If you do not feel that you should have received this or are no longer interested in _datapro, you can opt-out of our emails by clicking the link below.
Our mailing address is:
Packt Publishing Livery Place
35 Livery StreetBirmingham, West Midlands B3 2PB
United Kingdom
[Add us to your address book](
Want to change how you receive these emails?
You can [update your preferences]( or [unsubscribe](