一家波兰初创公司如何成为人工智能领域价值数十亿美元的声音

一家波兰初创公司如何成为人工智能领域价值数十亿美元的声音

2025-12-04Technology
--:--
--:--
马老师
早上好 Norris,我是马老师。欢迎收听专为你打造的 Goose Pod。今天是12月4日,星期四。
李白
吾乃李白。今日与君共论:波兰一无名小卒,何以声动天下,价重千金。
马老师
哎,李白兄,你听我说,这个事情很有意思的。一家叫 ElevenLabs 的波兰公司,两个三十岁的创始人,现在都是billionaire,身价超过十亿美金。公司估值六十六个亿,你懂的,这就像独孤九剑,一出手就破尽天下武学。
李白
哦?弱冠之年,竟能富甲天下?此非凡人也,乃谪仙降世,点石成金。其所持何物,能值连城之璧?想那OpenAI,亦不过用户八万万,何以这家公司能与之争辉?
马老师
他们做的,就是声音。能把任何文字,用任何人的声音读出来,高兴、难过,甚至笑声都可以。去年一月产品上线,作家拿去做有声书,视频博主拿去做多语言配音,市场一下子就引爆了,revenue直接冲到一亿九千万。
李白
妙哉!此术若成,岂非令往圣先贤,亲诵其传世文章?老夫若得此法,便可日日聆听屈子吟哦《离骚》,声声血泪,感天动地。何其壮哉!
马老师
这就说到了他们的初心。波兰的电影配音很糟糕,从头到尾就一个男声,毫无感情地念稿子,你想想看,莱昂纳多和斯嘉丽约翰逊谈恋爱,旁边一个大叔在读旁白,这个感觉,太奇怪了。所以他们就想用AI来解决这个问题。
李白
此等配音,恰似焚琴煮鹤,大煞风景!美景美人,却配以金石之声,了无意趣。难怪此二人心有不平,欲以神工再造,还其本真之音。确是英雄所见略同。
马老师
对的。你看,语音技术的进化,就像武功进阶。最早是拼接发音,像个机器人,我们叫它“铁布衫”,很僵硬。后来有了神经网络和深度学习,声音就开始有感情、有韵律了,我认为,这就到了“混元功”的境界,内外兼修,形神具备。
李白
原来如此。从匠人手中之机械鹦鹉,到如今可效百鸟之鸣、可仿万人之言的通灵神器,历经二百五十余载。从伏羲制乐,到伶伦断竹,再到今日之AI神技,音律之道,竟已至此境地!
马老师
是的,现在他们的模型能支持七十多种语言,还能控制情绪。比如你写一句话,旁边加个标签“悄悄说”,它读出来就是耳语。这个技术壁垒就很高,不是简单的模仿,是真正的创造,是 a new paradigm。
马老师
当然,木秀于林,风必摧之。他们搞得这么好,大公司肯定要进来。谷歌、微软、亚马逊,包括OpenAI,都盯着这块蛋糕。这就好比华山论剑,既有成名宗师,也有后起之秀,就看谁的剑更快。
李白
树大招风,自古皆然。然高天孤月,不惧繁星点点。巨擘虽众,未必能夺其锋芒。吾更虑者,乃此术之滥用。若奸佞之徒,用此法仿冒人声,行欺诈之事,岂非天下大乱?
马老师
你说到点子上了,这就是deepfake的问题。之前就有人用AI模仿拜登总统的声音,打电话让选民不要投票。还有更严重的,有声书的配音演员告他们,说公司偷了自己的声音去训练模型,这就涉及到了版权和伦理的冲突。
李白
此乃画皮之术,易形换声,惑乱人心!昔日妖狐尚需人皮,今日只需一缕声音,便可为祸人间。此等利器若无律法为鞘,必将血溅三尺,遗祸无穷。当以重典绳之!
马老师
没错,技术本身是中性的,关键看怎么用。现在,很多行业都因为它改变了。媒体可以用它快速生成新闻播报,游戏公司用它给成千上万的角色配音,甚至还能帮助失语者重新“开口说话”,这个impact是非常深远的。
李白
善哉。此物能助人,亦能害人,正如水能载舟,亦能覆舟。若用于正道,则天下喑哑之人,皆可畅所欲言;四海八荒之奇闻,皆可顷刻传遍。实乃功德无量。
马老师
是的,他们现在有一半的收入来自企业客户,像思科、T-Mobile这些大公司,另一半来自个人创作者。这说明它的商业模式已经跑通了,从to C到to B,两手抓,两手都很硬。整个生态就盘活了。
马老师
他们的野心不止于此。声音做完了,他们开始做AI音乐,你写一段文字,它就能生成一首完整的歌。下一步,就是AI视频,创造虚拟的数字人。他们的终极目标,是打造一个管理所有AI工具的平台。野心很大,你懂的。
李白
哦?意到曲成,心念影动?此非凡人梦寐以求之境乎?若真能如此,我辈诗人墨客,只需吟哦佳句,便有仙乐自来,画卷天成。快哉!浮一大白!
马老师
好,今天的讨论就到这里。感谢收听Goose Pod,我们明天再见。
李白
青冥浩荡不见底,日月照耀金银台。明日此时,再与君酌。

波兰初创公司 ElevenLabs 凭借其先进的 AI 语音技术,迅速崛起成为人工智能领域的巨头。该公司能将文字转化为逼真、富有情感的语音,支持多种语言,并已成功应用于有声书、多语言配音等领域。ElevenLabs 的创新不仅解决了行业痛点,也引发了关于技术伦理和版权的讨论,但其商业模式已跑通,并正拓展至 AI 音乐和视频领域,展现出巨大的发展潜力。

How A Tiny Polish Startup Became The Multi-Billion-Dollar Voice Of AI

Read original at Forbes

ElevenLabs’ computer voices are so convincing they could fool your mother. That’s both a blessing—its 30 Under 30 alumni founders are now both billionaires—and a curse for the four-year-old company. Dubbed films in Poland are horrible. A lone lektor delivers all the dialogue in an enervated Slavic monotone.

There is no cast. No variation between speakers. Young audiences hate it. “Ask any Polish person and they will tell you it’s terrible,” says Mateusz Staniszewski, the cofounder of AI speech outfit ElevenLabs. “I guess it was a communist thing that stuck as a cheap way to produce content.” While working at Palantir, Staniszewski teamed up with high school friend and Google engineer Piotr Dabkowski to experiment with artificial intelligence.

The pair realized that one project, a particularly promising AI public speaking coach, could solve the uniquely Polish horror of Leonardo DiCaprio or Scarlett Johansson being drowned out by a lektor “star” like Maciej Gudowski. Cody Pickens for ForbesThe pair pooled their savings and by May 2022 had quit their jobs to work full-time on ElevenLabs.

Out of the gate, their new AI text-to-speech generator was leagues better than the robotic voices of Apple’s Siri and Amazon’s Alexa. ElevenLabs’ AI voices were capable of happiness, excitement, even laughter. In January 2023 ElevenLabs launched its first model. It could take any piece of text and use AI to read it aloud in any voice—including a clone of your own (or, worryingly, someone else’s).

There was immediate demand. Authors could instantly spawn audiobooks with the software (pro rates now start from $99 a month for higher quality and more time). YouTube creators used ElevenLabs to translate their videos into other languages (its models can now speak in 29). The Warsaw- and London-based startup landed deals with lang­uage learning and meditation apps; then media companies like HarperCollins and Germany’s Bertelsmann jumped in.

“It was obvious that this was the best model and everyone was picking it off the shelf,” says investor Jennifer Li of Andreessen Horowitz, which co-led a $19 million round in May 2023. A year later, the cofounders were honored as part of Forbes 30 Under 30 Europe. Others, though, found more unnerving uses: AI soundalikes of public figures such as President Trump crassly narrating video game duels, actress Emma Watson reading Mein Kampf and podcaster Joe Rogan touting scams quickly went viral.

Worse, fraudsters began using AI cloning tools to impersonate loved ones’ voices and steal millions in sophisticated deepfake swindles. None of it stopped venture capitalists from pouring in money. ElevenLabs has raised more than $300 million in all, soaring to a $6.6 billion valuation in October to become one of Europe’s most valuable startups.

Staniszewski, 30, who acts as CEO (the firm has no traditional titles), and research head Dabkowski, 30, are now both billionaires, worth just over $1 billion each, per Forbes estimates. Around half of ElevenLabs’ $193 million in trailing 12-month revenue comes from corporates like Cisco, Twilio and Swiss recruitment agency Adecco, which use its tech to field customer service calls or interview job seekers.

Epic Games uses it to voice characters in Fortnite, including a chat with Darth Vader (with the consent of James Earl Jones’ estate). The other half of its revenue comes from the YouTubers, podcasters and authors who were early adopters. “When you talk to them, it’s mind-blowing how good they are,” says Gartner analyst Tom Coshow.

Unlike most AI firms, too, ElevenLabs is profitable, netting an estimated $116 million in the last 12 months (a 60% margin). It’s now competing against giants like Google, Microsoft, Amazon and OpenAI to become the de facto voice of AI. It’s not a new space: Tech companies started spinning up products to listen, transcribe and generate speech around a decade ago.

While it’s somewhat of a sideline for Microsoft, Satya Nadella was willing to shell out $20 billion to buy Nasdaq-listed voice transcription service Nuance in March 2022. OpenAI launched its own voice tool, which can feed human conversations into ChatGPT, in October 2024. It Goes to 11 | ElevenLabs’ numero­phile cofounders, Mati Staniszewski (left) and Piotr Dabkowski (right), love the number 11, especially the “rule of 11” divisibility trick.

Their next goal? An $11 billion valuation, naturally.Cody Pickens for ForbesBut ElevenLabs’ 300-person team isn’t playing catch-up. Its models are so good that it’s able to get away with charging up to three times as much as these American rivals. Its library of 10,000 uncannily human-sounding voices is the largest by far and now includes A-listers Michael Caine and Matthew McConaughey.

It’s also more reliable. Data training startup Labelbox tested six of the top voice models with a reading quiz and found that ElevenLabs made half as many errors as its closest competitor, OpenAI. “We are one of the very few companies that are ahead of OpenAI—not only on speech, but speech-to-text and music.

That’s hard,” Staniszewski says. ElevenLabs’ recipe is simple. A tight cadre of machine learning researchers, with obsessive focus on one narrow problem, and a tight budget (the cofounders fronted the first $100,000 training run) drove model breakthroughs. “Having a ton of compute can be a curse because you don’t think how to solve it in a smart way,” Dabkowski says.

But a lawsuit from a pair of audiobook narrators hints at another ingredient. Karissa Vacker and Mark Boyett allege that ElevenLabs used thousands of copyright-protected audiobooks to train its models. They claim so many of their books were scraped that clones of their voices ended up as default options on ElevenLabs.

The case, in which ElevenLabs denied wrongdoing, was settled out of court in November. (Vacker and Boyett did not respond to a comment request; ElevenLabs declined further comment.) Maturity is setting in. The company finally drew up a list of “no go” voices (mostly politicians and celebrities) after an ElevenLabs-made clone of Joe Biden’s voice was used to discourage voting in a robocall campaign around the 2024 Democratic primary.

ElevenLabs now has seven full-time human moderators (plus AI, natch) scouring its clips for misuse. Newly cloned voices need to pass a consent check, and the company offers a free deepfake detector. Staniszewski and Dabkowski have big plans beyond voice. Both cash-strapped creators and budget-conscious media companies wanted royalty-free background music, so they delivered an AI music generator in August.

Don’t have time to shoot a video? ElevenLabs will have AI avatars to front Sora-style videos next year. Their boldest bet is that they can translate their expertise to provide a single hub for clients to manage all their AI tools. “We are building a platform that allows you to create voice agents and deploy them smoothly,” Staniszewski says.

Of course, that puts ElevenLabs on a collision course with a gaggle of other startups hoping to do the same thing. It helps that it’s been profitable since its earliest days, but its startup competitors are richly funded, and the tech giants have virtually unlimited resources. Still, it must innovate.

Voice models will soon be commoditized. When other models catch up, fickle customers that already balk at ElevenLabs’ pricing will likely switch. As it broadens beyond voices to more computationally intensive music and video, ElevenLabs needs to expand its own GPU farms to stay in the race. It has already spent $50 million on a data center project in Oregon.

“If we are to build the generational company in AI, you need to build scale, and we are building,” Staniszewski says. Back in Poland, the aging corps of lektors are still in business, for now. Dabkowski hasn’t forgotten ElevenLabs’ original pitch, boasting that his next model will translate and voice an entire movie in one shot.

“We never give up on our missions,” he says.More from ForbesForbesVibe Coding Turned This Swedish AI Unicorn Into The Fastest Growing Software Startup EverForbesHow An AI Notetaker Became One Of The Few Profitable AI StartupsBy Iain MartinForbesThis AI Founder’s Audacious Plan To Buy Out His Own VCsBy Iain MartinForbesMagic Money: The Mysterious Case Of The $15 Billion Metaverse Startup And Its Anonymous Multi-Billion Dollar InvestorBy Phoebe Liu

Analysis

Conflict+
Related Info+
Core Event+
Background+
Impact+
Future+

Related Podcasts

一家波兰初创公司如何成为人工智能领域价值数十亿美元的声音 | Goose Pod | Goose Pod