Newsletter Subject

Horse-based AI

From

bloombergbusiness.com

Email Address

noreply@mail.bloombergbusiness.com

Sent On

Fri, Sep 22, 2023 11:05 AM

Email Preheader Text

Hi everyone, it’s Ellen, reporting from Copenhagen, where I learned that Danish AI language mod

Hi everyone, it’s Ellen, reporting from Copenhagen, where I learned that Danish AI language models are being trained on data about horses. B [View in browser]( [Bloomberg]( Hi everyone, it’s Ellen, reporting from Copenhagen, where I learned that Danish AI language models are being trained on data about horses. But first... Three things you need to know today: • Cisco is buying [Splunk for $28 billion]( • Microsoft is adding its [AI assistant to Windows]( • Europe’s Adevinta [could be the year’s biggest buyout]( Horsing around In 2021, a group of researchers wanted to build a Danish-language [data set]( with which to train artificial intelligence — but they ran into some issues. A lot of Danish writing, such as news articles, was under fairly restrictive copyright. The researchers had access to texts like the Danish tax code, but they knew those dry tomes weren’t a good representation of how Danes actually write or speak. So they turned to an only-in-Denmark solution: [heste-nettet.dk](. Heste-Nettet, which translates to “the horse net,” is a Danish web forum created in 1997 for equestrians, breeders and other equine enthusiasts to talk about horses. It also happened to be one of the first Danish forums on the internet, and the focus of its discussions soon expanded to much more than horses: relationship dilemmas, pediatrician recommendations, high-school math problems, how many minutes one should soft-boil an egg. Practically all Danes know Heste-Nettet. Often, when Googling a question in Danish, searchers end up on the Horse Net. It’s a place where “every possible question in the universe has been asked — and answered,” one user [wrote]( on Reddit. “It’s like Yahoo answers but better. Most people use Heste-Nettet instead of Wikipedia.” Heste-Nettet’s sprawl mirrors the way other early-internet forums in the pre-social media age evolved from niche topics into general-purpose Q&A repositories. Other, less horse-focused examples include [Bodybuilding.com]( and [Stackoverflow.com](. Large language models, which allow things like ChatGPT to engage with such fluency, are growing more popular and powerful, and anyone hoping to develop non-English language versions will need to find their own Heste-Nettets to get the necessary data. Today, Heste-Nettet maintains a distinctly Web 1.0 aesthetic. Its front page has posts about the best riding gloves for autumn, stallions ready for breeding and mares available for purchase. Heste-Nettet posts account for[ 22%]( of the Danish data set, which makes it the biggest single source of material in what appears to be the leading option for AI training data in the language. Neither Reddit nor X (formerly Twitter) offer the volume of casual Danish writing needed to train the AI, said Leon Derczynski, a computer science professor in Copenhagen who led the project: “We were left with Heste-Nettet.” From a researcher’s perspective, the horse- and non-horse-related chitchat is “very rich” and includes casual slang, Derczynski said. It also helps that it’s openly available for use. Those qualities make it valuable, even with its quirks. “There is definitely a horse bias,” Derczynski said. “If you want to know something about horses, it’s definitely in there.” —[Ellen Huet](mailto:ehuet4@bloomberg.net) The big story TikTok is in the process of opening its systems to researchers and academics, but many are [hesitant to accept the strict terms](. The rules require academics to share prepublished data. One to watch [Watch the Bloomberg Technology TV interview]( with Nasdaq’s Jeff Thomas on the health of the IPO market. Get fully charged Employees at ByteDance, TikTok’s parent company, [accused bosses of racism and retaliation]( in a lawsuit. YouTube announced AI editing techniques for video creators [utilizing generative AI](. The CEO of the search engine operator DuckDuckGo testified in the Google antitrust case that users [find it difficult to switch]( from Google as their default. More from Bloomberg Live event: The Bloomberg Technology Summit in London will host top technology leaders, business executives, innovators and entrepreneurs on Oct. 24. The event will explore the rapid advance of AI, green technology, the escalation of cyber warfare and more. [Register here](. Get Bloomberg Tech weeklies in your inbox: - [Cyber Bulletin]( for coverage of the shadow world of hackers and cyber-espionage - [Game On]( for reporting on the video game business - [Power On]( for Apple scoops, consumer tech news and more - [Screentime]( for a front-row seat to the collision of Hollywood and Silicon Valley - [Soundbite]( for reporting on podcasting, the music industry and audio trends - [Q&AI]( for answers to all your questions about AI Follow Us Like getting this newsletter? [Subscribe to Bloomberg.com]( for unlimited access to trusted, data-driven journalism and subscriber-only insights. Want to sponsor this newsletter? [Get in touch here](. You received this message because you are subscribed to Bloomberg's Tech Daily newsletter. If a friend forwarded you this message, [sign up here]( to get it in your inbox. [Unsubscribe]( [Bloomberg.com]( [Contact Us]( Bloomberg L.P. 731 Lexington Avenue, New York, NY 10022 [Ads Powered By Liveintent]( [Ad Choices](

Marketing emails from bloombergbusiness.com

View More
Sent On

13/05/2024

Sent On

13/05/2024

Sent On

11/05/2024

Sent On

10/05/2024

Sent On

10/05/2024

Sent On

10/05/2024

Email Content Statistics

Subscribe Now

Subject Line Length

Data shows that subject lines with 6 to 10 words generated 21 percent higher open rate.

Subscribe Now

Average in this category

Subscribe Now

Number of Words

The more words in the content, the more time the user will need to spend reading. Get straight to the point with catchy short phrases and interesting photos and graphics.

Subscribe Now

Average in this category

Subscribe Now

Number of Images

More images or large images might cause the email to load slower. Aim for a balance of words and images.

Subscribe Now

Average in this category

Subscribe Now

Time to Read

Longer reading time requires more attention and patience from users. Aim for short phrases and catchy keywords.

Subscribe Now

Average in this category

Subscribe Now

Predicted open rate

Subscribe Now

Spam Score

Spam score is determined by a large number of checks performed on the content of the email. For the best delivery results, it is advised to lower your spam score as much as possible.

Subscribe Now

Flesch reading score

Flesch reading score measures how complex a text is. The lower the score, the more difficult the text is to read. The Flesch readability score uses the average length of your sentences (measured by the number of words) and the average number of syllables per word in an equation to calculate the reading ease. Text with a very high Flesch reading ease score (about 100) is straightforward and easy to read, with short sentences and no words of more than two syllables. Usually, a reading ease score of 60-70 is considered acceptable/normal for web copy.

Subscribe Now

Technologies

What powers this email? Every email we receive is parsed to determine the sending ESP and any additional email technologies used.

Subscribe Now

Email Size (not include images)

Font Used

No. Font Name
Subscribe Now

Copyright © 2019–2024 SimilarMail.