In this week’s Super Data Science newsletter: How Data Science Can Help Find Your Perfect Property. AI Used to 'Predict the Next Coronavirus. Databases that Support in-Database ML. 'ML Learns from Dog’s Noses to Smell Disease Cheers,
- The SuperDataScience Team P.S. Have friends and colleagues who could benefit from these weekly updates? Send them to [this link]( to subscribe to the Data Science Insider. --------------------------------------------------------------- [How Data Science Can Help Find Your Perfect Property]( brief: This article in TNW tells the tale of Data Scientist and Software Engineer, Andrea Ialenti, who decided to use his data science skills to help him find the right house at the right price. His experience is based on the Dublin real-estate market but the lessons learned are applicable to housing across the world. Ialenti began by gathering data on every property available on the market, he then wrote a scraper to collect basic information. With the help of Google Cloud, he was able to enhance the dataset with utility fields to make filtering easier. The enhanced dataset was then turned into a dashboard using Google Data Studio and the raw data was placed on a map to enable the power of geolocalized data. With this in place, Ialenti was able to further drill down his search and eventually find his ideal house. Why this is important: As data scientists, we are generally focused on developing work that will benefit our careers or the world at large. This story, however, is a nice reminder of how our skills can be used to help us in our personal lives. [Click here to find out!]( [AI Used to 'Predict the Next Coronavirus']( brief: Scientists have used AI to figure out where the next novel coronavirus may emerge. The team of researchers used a combination of ML and fundamental biology as their AI algorithm predicted more potential hosts of strains than have previously been detected. The researchers were able to plug existing biological evidence into an algorithm - teaching a computer how to spot viruses and host species that were most likely to be a source of this recombination. The team asked their algorithm to find biological patterns to predict which mammals might be susceptible to known coronaviruses. Next, they looked for species that were able to harbour several viruses at once. Researchers were able to use existing biological knowledge to teach the algorithm to search for patterns that made this more likely to happen. This step concluded that many more mammals were potential hosts for new coronaviruses than previous surveillance work had shown. Why this is important: The scientists say their findings could help to target the surveillance for new diseases - possibly helping prevent the next pandemic before its starts. Something we all can hope for! [Click here to read on!]( [Databases that Support in-Database ML]( In brief: When it comes to building very large data sets, the ideal is to build the model where the data already resides, so that no mass data transmission is needed. There are currently several databases that, to a limited extent, support that. The next best case is for the data to be on the same high-speed network as the model-building software, which typically means within the same data center. Even moving the data from one data center to another within a cloud availability zone can introduce a significant delay if you have terabytes (TB) or more. Therefore, it’s natural to want to develop an understanding of databases that support internal ML, and how do they do it. This InfoWorld article gives you a list of 8 databases (including Google Cloud BigQuery and IBM Db2 Warehouse) that support in-database ML with a detailed description of their offerings and benefits. Why this is important: By ensuring your knowledge of databases is up to date, you will be best placed to choose one which offers the best support for your large data set needs. [Click here to discover more!]( [Data Centers Must Adapt to Tackle Staffing Crisis]( In brief: Executives from Google Data Centers and Compass Datacenters have called for a drastic rethink in how companies recruit, according to research by the Uptime Institute. The Global Data Center Staffing Forecast is the first forecast of global data center workforce needs — by region, by data center type, and by education requirements and it has found that as the data center build-out continues across the globe, many more people will have to be hired in order to design, build and operate the critical infrastructure. The demand will exacerbate staffing shortages which are already pronounced. The new report projects global demand for new data center jobs to increase by about 2 percent annually through 2023 and 3 percent by 2025. One of the ways the report advises tackling this is by adjusting data center job requirements to allow for “equivalent experience” as an alternative to a college degree or certification. Why this is important: The growing gap in data recruitment is something we’ve looked at many times in these SuperDataScience newsletters but this forecast offers some very worrying statistics on the reality of the situation. It also offers a potential solution, which we should fully explore for the good of our industry. [Click here to see the full picture!]( [ML Learns from Dog’s Noses to Smell Disease]( In brief: Inspired by the incredible olfactory senses of dogs, scientists have been developing and demonstrating different types of “electronic noses” that can sniff out things like cancer, nerve gases or explosives. Now MIT researchers have increased the sensitivity and capabilities of these devices, by pairing them with ML to mimic the canine ability to interpret different scents. The system can detect and identify tiny traces of molecules with a sensitivity 200 times greater than that of a dog’s nose, but interpreting what those molecules mean, is where the technology had previously been lacking. To combat this, the team built an ML algorithm that was trained on 50 urine samples taken from prostate cancer patients and a control group. By analysing the molecular differences and similarities in the air surrounding each set of samples, the program was able to recognise patterns of volatile organic compounds representative of the disease with high reliability. Why this is important: In follow-up testing, the researchers say their system was able to match the accuracy of the dogs in identifying prostate cancer, both achieving a success rate of more than 70 percent. More research is needed to develop the system further, but the team hopes to one day compact the technology into a scent detector that can be built into smartphones. [Click here to find out more!]( [SuperDataScience podcast]( In this week's [SuperDataScience Podcast](, Sinan Ozdemir joins us to discuss his work in conversational AI algorithms and the challenges of maintaining up-to-date chatbots, especially during a global pandemic. --------------------------------------------------------------- What is the Data Science Insider? This email is a briefing of the week's most disruptive, interesting, and useful resources curated by the SuperDataScience team for Data Scientists who want to take their careers to the next level. Want more conversations like this? Last year, we held our first-ever DSGO Virtual Conferences, where more than 3,500 data scientists gathered to learn, grow, and connect! If you missed them or want to repeat this fantastic experience, stay tuned to our upcoming virtual and in-person events that will take your DS career to the next level. DSGO is your go-to place to elevate your technical skills, gain life-long career lessons from industry experts, and build lasting connections with data-driven peers. If you want to learn more and register for our future events, [click here](. Know someone who would benefit from getting The Data Science Insider? Send them [this link to sign up.]( # # If you wish to stop receiving our emails or change your subscription options, please [Manage Your Subscription](
SuperDataScience Pty Ltd (ABN 91 617 928) 131, 63 Blamey, St., Kelvin Grove, QLD 4059, Australia