Newsletter Subject

Data Science Insider: December 15th, 2023

From

superdatascience.com

Email Address

support@superdatascience.com

Sent On

Fri, Dec 15, 2023 05:09 PM

Email Preheader Text

In This Week?s SuperDataScience Newsletter: LLMs Break New Ground. How to Build a Million-Paramete

Desktop View
HTML
Text
Mobile View

Go Premium to Unlock

Subscribe Now

In This Week’s SuperDataScience Newsletter: LLMs Break New Ground. How to Build a Million-Parameter LLM. McDonald's Unveils Hexagonal Cloud Architecture for Data Science Clarity. Mastering RAG: Advanced Strategies for Data Scientists. Putin Confronts AI Deep Fake in Public Q&A. Cheers, - The SuperDataScience Team P.S. Have friends and colleagues who could benefit from these weekly updates? Send them to [this link]( to subscribe to the Data Science Insider. --------------------------------------------------------------- [LLMs Break New Ground]( brief: AI researchers at DeepMind have claimed a groundbreaking discovery, asserting that large language models (LLMs) like OpenAI's ChatGPT can produce genuinely new scientific insights. While chatbots have previously repackaged existing information, the researchers developed "FunSearch," a system using LLMs to write computer programs for problem-solving. FunSearch tackled two puzzles, producing new solutions for the cap set problem in mathematics and the bin packing problem (an optimization problem, in which items of different sizes must be packed into a finite number of bins in a way that minimizes the number of bins used.) The breakthrough suggests that LLMs can contribute to algorithmic discovery, transforming how computer science approaches problem-solving, with potential applications for programmers and collaboration between AI and human mathematicians. Why this is important: This discovery underscores the evolving capabilities of LLMs beyond generating text and introduces a novel approach to problem-solving using FunSearch. [Click here to learn more!]( [How to Build a Million-Parameter LLM]( brief: Fareed Khan's comprehensive article on Level Up Coding provides a step-by-step guide to constructing a million-parameter Large Language Model (LLM) from scratch using Python, following the LLaMA architecture. Khan emphasizes practical implementation over theoretical discussions, utilizing a basic dataset and avoiding the need for a high-end GPU. The process involves understanding LLaMA's Transformer architecture, incorporating techniques like RMSNorm for pre-normalization, SwiGLU activation function, and Rotary Embeddings. The article also covers hyperparameter experimentation and saving the LLM. Prerequisites include basic knowledge of object-oriented programming and neural networks, with PyTorch familiarity recommended. Data scientists will benefit from this guide as it navigates beyond theoretical aspects, offering a tangible blueprint for constructing LLMs. Why this is important: Practical insights into implementation nuances, hyperparameter tuning, and model saving enhance data scientists' ability to create efficient language models without reliance on extensive computing resources. The article's emphasis on simplicity and accessibility aligns with the growing interest in democratizing language model development, making it an essential resource for data scientists. [Click here to read on!]( [McDonald's Unveils Hexagonal Cloud Architecture for Data Science Clarity]( In brief: For the second week running we are exploring McDonald’s. Last week we learned how they’re partnering with Google and this week, in their Technical Blog, a McDonald's Solutions Architect, explores the Hexagonal Architecture pattern as a solution for managing complexity in cloud ecosystems. This architectural approach involves separating external systems from the core application using a hexagon shape, with ports, adapters, and a focus on Domain-Driven Design (DDD) principles. The hexagon encapsulates the application domain, preventing business logic from leaking outside. Rice illustrates this with an example of building an application for file uploads, emphasizing the importance of clear, easily understandable design. The article teases the next steps in the series, highlighting how connecting ports and adapters facilitates complexity management. Why this is important: By separating concerns and following DDD principles, data scientists can create robust applications, facilitating easier maintenance, testing, and scalability in complex cloud ecosystems. The architecture's emphasis on clean design and encapsulation aligns with best practices, enhancing collaboration and comprehension among data science teams working on intricate projects. [Click here to discover more!]( [Mastering RAG: Advanced Strategies for Data Scientists]( In brief: In this comprehensive guide by Leonie Monigatti on Towards Data Science, data scientists are provided with advanced strategies to enhance the performance of Retrieval-Augmented Generation (RAG) pipelines. Recognizing data science as an experimental science, the article emphasizes the absence of a universal algorithm, introducing the concept of "hyperparameters" and tuning strategies for RAG applications. The guide delves into the ingestion stage, covering data cleaning, chunking, embedding models, metadata, multi-indexing, and indexing algorithms. Moving to the inferencing stage, it explores query transformations, retrieval parameters, advanced retrieval strategies, re-ranking models, language models (LLMs), and prompt engineering. The article concludes by highlighting the increasing importance of discussing strategies for bringing RAG pipelines to production-ready performances. Why this is important: The intricate details of data cleaning, chunking, and embedding models in the ingestion stage contribute to the quality of information retrieval. In the inferencing stage, considerations like query transformations, advanced retrieval strategies, and LLM fine-tuning directly impact the relevance and accuracy of generated responses. [Click here to see the full picture!]( [Putin Confronts AI Deep Fake in Public Q&A]( In brief: In a recent public Q&A session, Russian President Vladimir Putin encountered an apparent AI-generated "deep fake" version of himself, posing questions about the prevalence of twins and concerns regarding neural networks and AI. The event, part of Putin's annual phone-in with the Russian public, highlights the growing use of AI deep fakes to spread misinformation globally. As nations compete for leadership in AI, Putin, like other world leaders, aims to position Russia as a frontrunner in the technology. This incident underscores the urgent need for data scientists to address the risks associated with deep fakes, emphasizing the importance of developing tools to detect and counteract AI-generated misinformation in political and public discourse. Why this is important: The viral video of this incident may appear comical but it also highlights serious issues about deep fake technology that we should all be aware of. [Click here to see the full picture!]( [Super Data Science podcast]( In this week's [SuperDataScience]( Podcast episode, CEO and Co-Founder of Exazyme, Ingmar Schuster explains how he uses AI to design proteins. He speaks with Jon Krohn about their wider applications in pharmaceuticals and chemistry, how Kernel methods make the design of synthetic biological catalysts more efficient, and when to use shallow machine learning over deep learning. [Click here to find out more!]( --------------------------------------------------------------- What is the Data Science Insider? This email is a briefing of the week's most disruptive, interesting, and useful resources curated by the SuperDataScience team for Data Scientists who want to take their careers to the next level. Want to take your data science skills to the next level? Check out the [SuperDataScience platform]( and sign up for membership today! Know someone who would benefit from getting The Data Science Insider? Send them [this link to sign up.]( # # If you wish to stop receiving our emails or change your subscription options, please [Manage Your Subscription]( SuperDataScience Pty Ltd (ABN 91 617 928 131), 15 Macleay Crescent, Pacific Paradise, QLD 4564, Australia

Edit & Download HTML

Add To Favourites

EDM Keywords (89)

wish week way want twins technology take subscribe step speaks solution simplicity sign see scalability saving relevance read quality putin provided programmers prevalence political pharmaceuticals performance partnering packed number need minimizes mcdonald mathematics llms link level learned learn leadership items introduces important importance hyperparameters highlighting guide google getting frontrunner friends focus find example enhance emphasis emails email efficient discover detect design deepmind contribute constructing concept colleagues collaboration click claimed chemistry cheers chatgpt chatbots change careers building build briefing brief bins benefit aware avoiding article architecture application ai address accuracy absence

superdatascience.com

SuperDataScience Team

Follow domain to get weekly email update

Marketing emails from superdatascience.com

Sent On

23/02/2024

Sent On

16/02/2024

Sent On

09/02/2024

Sent On

02/02/2024

Sent On

19/01/2024

Sent On

15/01/2024

Email Content Statistics

Subscribe Now

Subject Line Length

Data shows that subject lines with 6 to 10 words generated 21 percent higher open rate.

Subscribe Now

Average in this category

Subscribe Now

Number of Words

The more words in the content, the more time the user will need to spend reading. Get straight to the point with catchy short phrases and interesting photos and graphics.

Subscribe Now

Average in this category

Subscribe Now

Number of Images

More images or large images might cause the email to load slower. Aim for a balance of words and images.

Subscribe Now

Average in this category

Subscribe Now

Time to Read

Longer reading time requires more attention and patience from users. Aim for short phrases and catchy keywords.

Subscribe Now

Average in this category

Subscribe Now

Predicted open rate

Subscribe Now

Spam Score

Spam score is determined by a large number of checks performed on the content of the email. For the best delivery results, it is advised to lower your spam score as much as possible.

Subscribe Now

Flesch reading score

Flesch reading score measures how complex a text is. The lower the score, the more difficult the text is to read. The Flesch readability score uses the average length of your sentences (measured by the number of words) and the average number of syllables per word in an equation to calculate the reading ease. Text with a very high Flesch reading ease score (about 100) is straightforward and easy to read, with short sentences and no words of more than two syllables. Usually, a reading ease score of 60-70 is considered acceptable/normal for web copy.

Subscribe Now

Technologies

What powers this email? Every email we receive is parsed to determine the sending ESP and any additional email technologies used.

Subscribe Now

Email Size (not include images)

No.	Font Name
Subscribe Now

Data Science Insider: December 15th, 2023

Email Preheader Text

EDM Keywords (89)

superdatascience.com

Marketing emails from superdatascience.com

Email Content Statistics

Font Used