A Collection of Top Software Data Engineering News, Articles, Presentations December 2022 [InfoQ]
Data Engineering Round-Up Sponsored by
[ScyllaDB] [Latest Content](#latest-content), [Top Viewed Content](#top-viewed-content), [Top News](#news), [Top Articles](#top-articles), [Top Presentations](#top-presentations-and-interviews) In this special newsletter we bring you up to date on all the new content and news related to Data Engineering on InfoQ. We are also maintaining a portal page for this content on InfoQ at: [(. [] Latest Content on InfoQ [Google Open-Sources Secure ML Operating System KataOS]( (news, Nov 22, 2022)
[Create Your Distributed Database on Kubernetes with Existing Monolithic Databases]( (articles, Nov 16, 2022)
[Uber Freight Near-Real-Time Analytics Architecture]( (news, Nov 08, 2022)
[Meta Announces Next Generation AI Hardware Platform Grand Teton]( (news, Nov 08, 2022)
[Anaconda Publishes 2022 State of Data Science Report]( (news, Nov 01, 2022) [SQL to NoSQL: Architecture Differences and Considerations for Migration](
Read this white paper to learn about the architectural differences Between SQL and NoSQL, the tradeoffs between flexibility, scale and cost, and considerations for successful SQL to NoSQL migrations. [Download now](. Sponsored content [SQL to NoSQL]( [] Top Viewed Content on InfoQ [Stability AI Open-Sources Image Generation Model Stable Diffusion]( (news, Sep 06, 2022)
[Microsoft Releases Stream Analytics No-Code Editor into General Availability]( (news, Oct 21, 2022)
[Next Generation of Data Movement and Processing Platform at Netflix]( (news, Aug 29, 2022)
[The InfoQ eMag: Modern Data Architectures, Pipelines, & Streams]( (minibooks, Oct 04, 2022)
[PyTorch Becomes Linux Foundation Top-Level Project]( (news, Oct 18, 2022) [] Top News [Apache InLong: Integration Framework for Massive Data](
Apache InLong, an integration framework designed for massive data, was originally built at Tencent, where it was used in production for more than eight years, to support massive data reporting services in big data scenarios. The project officially graduated as an Apache top-level project three years after the introduction of the project in the Apache Incubator. [Grab Shared Its Experience in Designing Distributed Data Platform](
GrabApp is an application that customers select and buy their daily needs from merchants. To be scalable and manageable the data platform and ingestion should be designed as a distributed, fault-tolerant. To design this data platform two classes of data stores are considered: OLTP and OLAP. [Netflixâs New Algorithm Offers Optimal Recommendation Lists for Users with Finite Time Budget](
Netflix developed a new machine learning algorithm based on reinforcement learning to create an optimal list of recommendations considering a finite time budget for the user. In a recommendation use case, often the factor of finite time to make a decision is ignored. [7 Essentials When Selecting a NoSQL Database-as-a-Service (DBaaS)](
With the move into the âZettabyte era,â many are looking at database-as-a-service (DBaaS) options. This paper outlines 7 key considerations that help teams realize the many benefits a DBaaS has to offer â without falling into some of the common traps. [Download now](. Sponsored content [DBaaS]( [Amazon's AlexaTM 20B Model Outperforms GPT-3 on NLP Benchmarks](
Researchers at Amazon Alexa AI have announced Alexa Teacher Models (AlexaTM 20B), a 20-billion-parameter sequence-to-sequence (seq2seq) language model that exhibits state-of-the-art performance on 1-shot and few-shot NLP tasks. AlexaTM 20B outperforms GPT-3 on SuperGLUE and SQuADv2 benchmarks while having fewer than 1/8 the number of parameters. [Netflix's Fraud Detection Framework for Streaming Services](
Netflix has developed a fraud and abuse detection framework for streaming services, based on artificial intelligence models and data-driven anomaly detections trained on the behavior of the users. Streaming services have, potentially, a lot of onboarded users on multiple devices. [] Top Articles [Migrating Netflix's Viewing History from Synchronous Request-Response to Async Events](
Sharma Podila shares lessons from migrating to asynchronous processing at scale, requiring attention to managing data loss, a highly available infrastructure, and elasticity to handle bursts.
[article]( [AutoML: the Promise vs. Reality According to Practitioners](
This article aims to discuss the types of AutoML, what AutoML does and does not bring to a project and raises a question of the best path to responsible AI.
[article]( [Streaming-First Infrastructure for Real-Time Machine Learning](
The benefits of streaming-first infrastructure for real-time ML are online prediction for fast responses and continual learning for adapting to change in data distributions in production.
[article]( [Creating a Secure Distributed Database Cluster Leveraging Your Existing Database Management System](
Database Plus, a technology applicable to any database, answers Big Data challenges and eliminates switching costs and vendor lock-in. Here's how to easily create a distributed and encrypted database.
[article]( [How to Migrate an Oracle Database to MySQL Using AWS Database Migration Service](
In this article, author Deepak Vohra discusses the details of migrating an on-prem database to a MySQL database on the cloud, using AWS Database Migration Service.
[article]( [7 Reasons Not to Put an External Cache in Front of Your Database](
Modern applications rely on memory architectures that are both extremely fast and globally distributed. The ideal architecture finds a balance between memory (RAM) and storage (SSD) to deliver high performance along with reliability and consistency. As this whitepaper shows, caching solutions that are not integral to the database can cause substantial issues. [Download now](. Sponsored content [7 Reasons Not to]( [] Top Presentations [GraphQL Caching on the Edge](
Max Stoiber discusses why and how to edge cache production GraphQL APIs at scale.
[Max Stoiber]( [What You Should Know before Deploying ML in Production](
Francesca Lazzeri shares an overview of the most popular MLOps tools and best practices, and presents a set of tips and tricks useful before deploying a solution in production.
[Francesca Lazzeri]( [Modern Data Pipelines in AdTechâLife in the Trenches](
Roksolana Diachuk discusses how to use modern data pipelines for reporting and analytics as well as the case of historical data reprocessing in AdTech.
[Roksolana Diachuk]( [Protecting User Data via Extensions on Metadata Management Tooling](
Alyssa Ransbury overviews the current state of metadata management tooling, and details how Square implemented security on its data.
[Alyssa Ransbury]( [Streaming-First Infrastructure for Real-Time ML](
Chip Huyen discusses the state of continual learning for ML, its motivations, challenges, and possible solutions.
[Chip Huyen]( [Connect with InfoQ on Twitter]( [Connect with InfoQ on Facebook]( [Connect with InfoQ on LinkedIn]( [Connect with InfoQ on Youtube]( You have received this message because you are subscribed to the âSpecial Reports Newsletterâ. To stop receiving this email, please click the following link: [Unsubscribe]( C4Media Inc. (InfoQ.com),
2275 Lake Shore Boulevard West,
Suite #325,
Toronto, Ontario, Canada,
M8V 3Y3