Hi everyone, itâs Ellen, reporting from Copenhagen, where I learned that Danish AI language models are being trained on data about horses. B [View in browser](
[Bloomberg](
Hi everyone, itâs Ellen, reporting from Copenhagen, where I learned that Danish AI language models are being trained on data about horses. But first... Three things you need to know today: ⢠Cisco is buying [Splunk for $28 billion](
⢠Microsoft is adding its [AI assistant to Windows](
⢠Europeâs Adevinta [could be the yearâs biggest buyout]( Horsing around In 2021, a group of researchers wanted to build a Danish-language [data set]( with which to train artificial intelligence â but they ran into some issues. A lot of Danish writing, such as news articles, was under fairly restrictive copyright. The researchers had access to texts like the Danish tax code, but they knew those dry tomes werenât a good representation of how Danes actually write or speak. So they turned to an only-in-Denmark solution: [heste-nettet.dk](. Heste-Nettet, which translates to âthe horse net,â is a Danish web forum created in 1997 for equestrians, breeders and other equine enthusiasts to talk about horses. It also happened to be one of the first Danish forums on the internet, and the focus of its discussions soon expanded to much more than horses: relationship dilemmas, pediatrician recommendations, high-school math problems, how many minutes one should soft-boil an egg. Practically all Danes know Heste-Nettet. Often, when Googling a question in Danish, searchers end up on the Horse Net. Itâs a place where âevery possible question in the universe has been asked â and answered,â one user [wrote]( on Reddit. âItâs like Yahoo answers but better. Most people use Heste-Nettet instead of Wikipedia.â Heste-Nettetâs sprawl mirrors the way other early-internet forums in the pre-social media age evolved from niche topics into general-purpose Q&A repositories. Other, less horse-focused examples include [Bodybuilding.com]( and [Stackoverflow.com](. Large language models, which allow things like ChatGPT to engage with such fluency, are growing more popular and powerful, and anyone hoping to develop non-English language versions will need to find their own Heste-Nettets to get the necessary data. Today, Heste-Nettet maintains a distinctly Web 1.0 aesthetic. Its front page has posts about the best riding gloves for autumn, stallions ready for breeding and mares available for purchase. Heste-Nettet posts account for[ 22%]( of the Danish data set, which makes it the biggest single source of material in what appears to be the leading option for AI training data in the language. Neither Reddit nor X (formerly Twitter) offer the volume of casual Danish writing needed to train the AI, said Leon Derczynski, a computer science professor in Copenhagen who led the project: âWe were left with Heste-Nettet.â From a researcherâs perspective, the horse- and non-horse-related chitchat is âvery richâ and includes casual slang, Derczynski said. It also helps that itâs openly available for use. Those qualities make it valuable, even with its quirks. âThere is definitely a horse bias,â Derczynski said. âIf you want to know something about horses, itâs definitely in there.â â[Ellen Huet](mailto:ehuet4@bloomberg.net) The big story TikTok is in the process of opening its systems to researchers and academics, but many are [hesitant to accept the strict terms](. The rules require academics to share prepublished data. One to watch
[Watch the Bloomberg Technology TV interview]( with Nasdaqâs Jeff Thomas on the health of the IPO market. Get fully charged Employees at ByteDance, TikTokâs parent company, [accused bosses of racism and retaliation]( in a lawsuit. YouTube announced AI editing techniques for video creators [utilizing generative AI](. The CEO of the search engine operator DuckDuckGo testified in the Google antitrust case that users [find it difficult to switch]( from Google as their default. More from Bloomberg Live event: The Bloomberg Technology Summit in London will host top technology leaders, business executives, innovators and entrepreneurs on Oct. 24. The event will explore the rapid advance of AI, green technology, the escalation of cyber warfare and more. [Register here](. Get Bloomberg Tech weeklies in your inbox: - [Cyber Bulletin]( for coverage of the shadow world of hackers and cyber-espionage
- [Game On]( for reporting on the video game business
- [Power On]( for Apple scoops, consumer tech news and more
- [Screentime]( for a front-row seat to the collision of Hollywood and Silicon Valley
- [Soundbite]( for reporting on podcasting, the music industry and audio trends
- [Q&AI]( for answers to all your questions about AI Follow Us Like getting this newsletter? [Subscribe to Bloomberg.com]( for unlimited access to trusted, data-driven journalism and subscriber-only insights. Want to sponsor this newsletter? [Get in touch here](. You received this message because you are subscribed to Bloomberg's Tech Daily newsletter. If a friend forwarded you this message, [sign up here]( to get it in your inbox.
[Unsubscribe](
[Bloomberg.com](
[Contact Us]( Bloomberg L.P.
731 Lexington Avenue,
New York, NY 10022 [Ads Powered By Liveintent]( [Ad Choices](