In This Week’s SuperDataScience Newsletter: Meta AI Predicts Shape of 600 million Proteins. Scientists Struggle to Identify Why AI Works. Google Announces Giant AI Language Model. Backpropagation in Neural Networks. Trends to Watch for 2023. Cheers,
- The SuperDataScience Team P.S. Have friends and colleagues who could benefit from these weekly updates? Send them to [this link]( to subscribe to the Data Science Insider. --------------------------------------------------------------- [Meta AI Predicts Shape of 600 million Proteins]( brief: Regular readers of the SuperDataScience weekly newsletters will be aware of the huge splash that was made earlier this year, when London-based DeepMind unveiled their research which revealed the predicted structures for some 220 million proteins, covering nearly every protein from known organisms in DNA databases. Now, Meta has followed suit and their researchers have this week, announced that have used a model that can predict the 3D structure of proteins based on their amino acid sequences in a first-of-its-kind metagenomic database - the ESM Metagenomic Atlas. This differs from previous work in the sphere, such as DeepMind’s, as Meta’s AI is based on a language learning model whereas DeepMind’s is based on a shape-and-sequence matching algorithm. The new AI is claimed to determine proper protein folds 60 times faster than previous methods, revealing how novel proteins fit together into a functioning molecule. Why this is important: By creating the first large-scale characterization of metagenomics proteins, Meta has opened up the possibility that researchers may soon understand the function of a protein’s active site at the biochemical level - information that could be invaluable for drug development and discovery. [Click here to learn more!]( [Scientists Struggle to Identify Why AI Works]( brief: The growth of deep neural networks (DNN) – an artificial neural network with multiple hidden layers between the input and output layers- has led to an increase in the use of black box models, where systems are viewed only in terms of their inputs and outputs but with no understanding of the process that is happening within. This has resulted in scientists having less clarity about their model’s results. Emily M. Bender, a professor of linguistics at the University of Washington, says: “If you have a dataset consisting of such inputs and outputs, it's always possible to train a black box system that can produce outputs of the right type—but often much, much harder to evaluate whether they are correct. Furthermore, there are lots of cases where it's impossible to make a system where the outputs would be reliably correct because the inputs just don't contain enough information.” Why this is important: Black box systems have proved controversial for producing results that incorporate racial and gender biases. Without understanding why these results are happening, we cannot hope to overcome them. [Click here to read on!]( [Google Announces Giant AI Language Model]( In brief: Google used this week’s AI keynote event in New York City to unveil a slew of new projects, including the development of a universal speech model which is designed to support the world’s top 1000 languages and which it claims is the “largest language model coverage seen in a speech model today.” The 1,000 Languages Initiative won’t be ready for many years but so far, they have developed a Universal Speech Model (USM) that is trained on over 400 languages, making it the most coverage in a speech model to date. Large language models (LLMs) have become a rich area of discovery for the tech giant, who have arguably always placed Language and AI at the heart of their business model. In this article, Google claims that this model will make it easier to bring various AI functionalities to languages that are currently poorly represented. Why this is important: This is the latest in a long line of Google developments where they’ve sought to expand their language capabilities. Most recently they added 24 more languages to its Google Translate platform and enabled voice typing for nine more African languages on Gboard. [Click here to discover more!]( [Backpropagation in Neural Networks]( In brief: The rise of artificial neural networks in the world of ML has led to backpropagation becoming increasingly used. Backpropagation is a short way of saying “backward propagation of errors.” It is the method of fine-tuning the weights of a neural network based on the error rate obtained in the previous epoch (the number of passes of the entire training dataset the ML algorithm has completed) and it is the heart of neural network training. By tuning the weights correctly, you are able to reduce error rates and make the model reliable by increasing its generalization. This article by builtin will teach you: how to set the model components for a backpropagation neural network, how to build a neural network, how forward propagation works, when to use backpropagation in neural networks, how to calculate deltas in backpropagation neural networks and how to update the weights in backpropagation for a neural network. Why this is important: Backpropagation is a standard method of training artificial neural networks and, as such, data scientists should be knowledgeable about what it is and how to apply it. [Click here to see the full picture!]( [Trends to watch for 2023]( In brief: It’s that time again, as we hurtle towards Christmas and predictions for the new year begin to dominate the news. This year, it’s Forbes who’ve kicked off proceedings with their ‘Top 5 Data Science and Analytics Trends In 2023.’ They argue that a move toward data-driven business models will be fundamental in the increasing numbers of companies seeking digital transformation. As such, the key trends are predicted to be: Data Democratization – where a holistic approach to data analytics is undertaken by companies and not left solely to data scientists or engineers, Artificial Intelligence – with a prediction that businesses will use AI to draw insights from data more quickly than ever before, Cloud and Data-as-a-Service – allowing companies to access data sources that have been collected and curated by third parties, Real-Time Data – letting companies know what’s happing right now, and Data Governance and Regulation – taking the lead from new legislation. Why this is important: Now is a good time, before the end-of-year rush, to take stock of where your business is and how trends may impact the data science industry. Knowing what is likely to happen places you in a better position to be prepared. [Click here to find out more!]( [Super Data Science podcast]( this week's [Super Data Science Podcast](, Shashank Kalanithi, the man who makes a sport out of YouTube and data analytics out of sports, talks about how he got started producing YouTube videos on data science, the essential differences between data science roles, and how data could shape the future of the sports industry. --------------------------------------------------------------- What is the Data Science Insider? This email is a briefing of the week's most disruptive, interesting, and useful resources curated by the SuperDataScience team for Data Scientists who want to take their careers to the next level. Want to take your data science skills to the next level? Check out the [SuperDataScience platform]( and sign up for membership today! Know someone who would benefit from getting The Data Science Insider? Send them [this link to sign up.]( # # If you wish to stop receiving our emails or change your subscription options, please [Manage Your Subscription](
SuperDataScience Pty Ltd (ABN 91 617 928 131), 15 Macleay Crescent, Pacific Paradise, QLD 4564, Australia