A Collection of Top Data Engineering News, Articles, Presentations July 2021 [InfoQ]
Data Engineering Round-Up Sponsored by
[CockroachLabs] [Latest Content](#latest-content), [Top Viewed Content](#top-viewed-content), [Top News](#news), [Top Articles](#top-articles), [Top Presentations](#top-presentations-and-interviews) In this special newsletter we bring you up to date on all the new content and news related to Data Engineering on InfoQ. We are also maintaining a portal page for this content on InfoQ at: [(. [] Latest Content on InfoQ [LinkedIn Open Sources Greykite, a Python-based Forecasting Library]( (news, Jun 30, 2021)
[Google Cloud Announces Managed Machine Learning Platform Vertex AI]( (news, Jun 06, 2021)
[Accelerating Deep Learning on the JVM with Apache Spark and NVIDIA GPUs]( (articles, Jun 11, 2021)
[Facebook Open-Sources Expire-Span Method for Scaling Transformer AI]( (news, Jun 15, 2021)
[Google Trains Two Billion Parameter AI Vision Model]( (news, Jun 22, 2021) [Architecting for Scale [Two Free Chapters] - (By O'Reilly)](
Learn techniques for building systems that can handle huge quantities of traffic, data, and demandâwithout affecting the quality your customers expect. Architects, managers, directors in engineering and operations organizations will learn how to build applications at scale that run more smoothly. [Download now](. Sponsored content [Architecting for Scale]( [] Top Viewed Content on InfoQ [Google Open-Sources Trillion-Parameter AI Language Model Switch Transformer]( (news, Feb 16, 2021)
[OpenAI Announces GPT-3 Model for Image Generation]( (news, Feb 02, 2021)
[The Future of Data Engineering]( (articles, Feb 16, 2021)
[AWS Announces Amazon Aurora Supports PostgreSQL 12]( (news, Feb 07, 2021)
[InfoQ eMag: Modern Data Engineering]( (eMags, Feb 22, 2021) [] Top News [NVIDIA Announces AI Training Dataset Generator DatasetGAN](
Researchers at NVIDIA have created DatasetGAN, a system for generating synthetic images with annotations to create datasets for training AI vision models. DatasetGAN can be trained with as few as 16 human-annotated images and performs as well as fully-supervised systems requiring 100x more annotated images. [Microsoft Announces Limited Access to Its Neural Text-to-Speech AI](
Recently, Microsoft announced limited access to its neural text-to-speech AI called Custom Neural Voice. The service allows developers to create custom synthetic voices. [Pyodide Brings Python and Its Scientific Stack to the Browser with WebAssembly](
Mozilla announced that Pyodide, which aims at providing a full Python data science stack running entirely in the browser, has become an independent community-driven project. Pyodide uses the CPython 3.8 interpreter compiled to WebAssembly, and thus allows using Python, NumPy, Pandas, Matplotlib, SciPy, and more in Iodide, an experimental interactive scientific computing environment for the web. [Certified Kubernetes Application Developer (CKAD) Study Guide](
This study guide goes in-depth on the topics you need to pass the CKAD exam from the Cloud Native Computing Foundation. Learn core principles of services and networking, and gain a thorough understanding of state persistence and volumes. Practice with real sample exercises. [Download now](. Sponsored content [Study Guide]( [TensorFlow 2.4 Release Includes CUDA 11 Support and API Updates](
The TensorFlow project announced the release of version 2.4.0 of the deep-learning framework, featuring support for CUDA 11, cuDNN 8, and NVIDIA's Ampere GPU architecture, as well as new strategies and profiling tools for distributed training. Other API updates include mixed-precision in Keras and a NumPy frontend. [AWS Announces a Data Management and Analytics Solution Called Amazon FinSpace](
Recently, AWS announced a data management and analytics solution purpose-built for the Financial Services Industry (FSI) called Amazon FinSpace. The service aims to reduce the time it takes for financial analysts to find and access all types of financial data for analysis. [] Top Articles [How Optimizing MLOps Can Revolutionize Enterprise AI](
In this article, the author discusses data science architecture, containerization, and how new solutions like Feature Store can help with the full lifecycle of machine learning processes.
[article]( [How to Build Interactive Data Visualizations for Python with Bokeh](
In this article, the author shows how to use one of the powerful Python tools Bokeh in creating data visualizations with custom charts.
[article]( [Performance Tuning Techniques of Hive Big Data Table](
In this article, author Sudhish Koloth discusses how to tackle performance problems when using Hive Big Data tables.
[article]( [Why a Serverless Data API Might Be Your Next Database](
In this article, author Pieter Humphrey discussed database as a service (DBaaS) and serverless data API for cloud based data management.
[article]( [Indestructible Storage in the Cloud with Apache Bookkeeper](
At Salesforce, we required a storage system that could work with two kinds of streams. Being the pioneers in cloud computing, we also required our storage system to be cloud-aware.
[article]( [Benchmarking the Cloud: AWS vs. GCP vs. Azure](
The 2021 Cloud Report compares AWS, Azure, and GCP on benchmarks that reflect critical applications and workloads. Read the report to learn which cloud is the most cost efficient, how to evaluate performance tradeoffs, and how to assess the cost/benefit of disks and CPU processors. [Download now](. Sponsored content [Cloud Report]( [] Top Presentations [Building Latency Sensitive User Facing Analytics via Apache Pinot](
Chinmay Soman discusses how LinkedIn, Uber and other companies managed to have low latency for analytical database queries in spite of high throughput.
[Chinmay Soman]( [Using DevEx to Accelerate GraphQL Federation Adoption @Netflix](
Paul Bakker and Kavitha Srinivasan discuss how they made certain Build vs Buy (open source) trade-offs and the socio-technical aspects of working with many teams on a single shared schema.
[Bakker and Srinivasan]( [Evolving Analytics in the Data Platform](
Blanca Garcia-Gil discusses the BBCâs analytics platform architecture, the failure modes they designed for, and the investigation of the new unknowns and how they automated them away.
[Blanca Garcia-Gil]( [Designing IoT Data Pipelines for Deep Observability](
Shrijeet Paliwal discusses how Tesla deals with large data ingestion and processing, the challenges with IoT data collecting and processing, and how to deal with them.
[Shrijeet Paliwal]( [Scaling & Optimizing the Training of Predictive Models](
Nicholas Mitchell presents the core building blocks of an entire toolchain able to deal with challenges of large amounts of data in an industrial scalable system.
[Nicholas Mitchell]( [Connect with InfoQ on Twitter]( [Connect with InfoQ on Facebook]( [Connect with InfoQ on LinkedIn]( [Connect with InfoQ on Youtube]( You have received this message because you are subscribed to the âSpecial Reports Newsletterâ. To stop receiving this email, please click the following link: [Unsubscribe]( C4Media Inc. (InfoQ.com),
2275 Lake Shore Boulevard West,
Suite #325,
Toronto, Ontario, Canada,
M8V 3Y3