How A Tiny Polish Startup Became The Multi-Billion-Dollar Voice Of AI

How A Tiny Polish Startup Became The Multi-Billion-Dollar Voice Of AI

2025-12-08Technology
--:--
--:--
Elon
Good evening kdeepakbalaji, I'm Elon, and this is Goose Pod for you. Today is Monday, December 08th.
Taylor
And I'm Taylor. We're here to discuss a fascinating story: "How A Tiny Polish Startup Became The Multi-Billion-Dollar Voice Of AI." It’s an absolutely wild ride.
Elon
Wild is an understatement. We're talking about ElevenLabs. Four years old, and it's already valued at $6.6 billion. The founders, Mateusz Staniszewski and Piotr Dabkowski, are now billionaires. That’s the kind of velocity that defines a generation. Anything less is a hobby.
Taylor
It’s a perfect story! It all started because they hated how American movies were dubbed in Poland—with a single, monotone male voice reading all the parts. They took that uniquely Polish frustration and turned it into a universal solution for expressive, AI-generated speech.
Elon
They didn't just find a solution; they built a rocket ship. They quit their jobs at Palantir and Google, pooled their savings, and launched their first model in January 2023. Instantly, it was leagues better than Siri or Alexa. We’re talking about AI voices with actual emotion. Happiness, laughter. That’s a real product.
Taylor
And the market responded immediately. Authors started creating audiobooks, and YouTube creators began translating their videos into 29 different languages. It was like they had unearthed this massive, latent demand for high-quality, scalable voice content that no one else was servicing effectively. It’s brilliant strategy.
Elon
Strategy follows product. Their tech was simply superior. That's why they're pulling in $193 million in revenue and are profitable with a 60% margin. Most AI firms just burn cash. ElevenLabs is building an empire by actually making money, a novel concept in this space.
Taylor
It really is. And that profitability comes from a huge library of over 10,000 human-sounding voices. They found their audience, from independent creators to huge media companies like HarperCollins, and gave them a tool that was not just better, but transformative for their workflow.
Elon
Let’s be clear, the history of this stuff before deep learning is mostly irrelevant. A few mechanical boxes in the 1700s? Interesting artifacts, but not technology. The real starting gun was in 2016 with Google DeepMind's WaveNet. That’s when we moved from robotic拼接 to generative audio.
Taylor
Oh, I think the whole journey is fascinating! It tells a story about our quest to humanize technology. Early text-to-speech was like cutting out words from a newspaper and pasting them together. It worked, but it was clunky. You could hear the seams. DECtalk in the 80s was a huge step!
Elon
A step, sure, but a slow one. For decades, it was a niche accessibility tool. Useful, but not world-changing. The internet made it more common, but the voices were still jarringly robotic. The process was fundamentally limited until neural networks completely changed the game. Everything before that was a dead end.
Taylor
But that's the narrative arc! We went from mechanical bellows trying to mimic a vocal tract to AI models that learn the nuances of human speech—the pauses, the emotion, the pitch changes. ElevenLabs didn't just appear in a vacuum; they built on decades of research, standing on the shoulders of giants.
Elon
They didn't build on it, they obsoleted it. They took the core concepts of deep learning and applied them with an obsessive focus that the big players lacked. While others were building general platforms, ElevenLabs focused on one narrow problem and solved it better than anyone else. That’s how you win.
Taylor
Exactly! And it's that focus on quality that makes them stand out. Their models can now handle over 70 languages and dialects, adapting delivery to the context of the text. They even have a testimonial from an AI voice saying, "I could whisper secrets, or tell stories with feeling! I could act." That's incredible.
Elon
It's not incredible, it's inevitable. Speech is a core human interface. The fact that our digital interactions have been mostly silent and text-based is an anomaly. Voice should be the default, and it needs to be perfect. They’re simply accelerating the inevitable transition to a voice-first digital world.
Taylor
I love that way of putting it. They're not just creating voices; they're enabling more natural, accessible, and global communication. From helping people with ALS reclaim their voice to powering the next generation of virtual assistants, the foundation they're building on is changing everything. It’s a beautiful story of technological evolution.
Elon
With any disruptive technology, you get parasites and complainers. The deepfake issue is predictable. People used their tech to clone Joe Biden's voice for robocalls and make Emma Watson read hateful texts. This isn't an "AI ethics debate," it's a simple case of misuse. You build the tool; you can't control every user.
Taylor
But it's a huge threat to their story and their brand! If people associate your technology with scams and misinformation, that's a narrative you have to get ahead of. It’s not just a nuisance; it’s an existential risk. That’s why their response—implementing classifiers and content moderation—is so critical. They have to be the responsible stewards of this power.
Elon
It's a PR problem, not a technology problem. The bigger issue is the lawsuit from voice actors claiming their work was used for training without consent. That’s a direct attack on their model. But again, this is the cost of moving fast. You push boundaries and deal with the fallout. Settling was the efficient move.
Taylor
It’s more than PR, it’s about their social license to operate. The voice actors' case is a bellwether for the entire generative AI industry. It raises the question: who owns a voice? ElevenLabs has to craft a new, ethical framework for this, or they risk losing the trust of the very creators they empower.
Elon
And then there's the competition. Google, Microsoft, Amazon. The lumbering giants. They have more data and more money, but they're slow and unfocused. OpenAI is the real threat, but even they haven't prioritized audio as a core product. ElevenLabs can outmaneuver them all through sheer speed and focus.
Taylor
It's the classic David and Goliath narrative. ElevenLabs has the better product right now, charging up to three times more because the quality is just that good. But as the underlying models become commoditized, their brand and the ecosystem they build will be their ultimate defense. They have to keep telling a better story.
Elon
The impact is simple: look at the numbers. A $3.3 billion valuation, $90 million in annual recurring revenue with 260% year-over-year growth. Employees at over 60% of Fortune 500 companies are using their platform. This isn't a niche tool; it's a fundamental shift in how industries operate. That’s impact.
Taylor
And think of the stories behind those numbers! A film studio can now dub a movie into dozens of languages while preserving the original actor's vocal performance. An indie game developer can create rich worlds filled with unique characters. It’s a massive democratization of high-quality audio production.
Elon
It transforms workflows. No more booking expensive studio time for a single line of dialogue. No more hiring entire voice casts for pre-production. It’s about efficiency and scale. They are fundamentally reducing the friction and cost of audio content creation, which unlocks massive value for businesses.
Taylor
Exactly. It keeps the creative momentum going. They're positioning themselves as the "OpenAI for audio," a central hub for media creation. The partnerships with companies like Storytel and Paradox Interactive show how deeply they're integrating into the media landscape. They are becoming indispensable.
Elon
Indispensable is the goal. You want to become the underlying infrastructure that powers an entire industry. They are on the right path, but they need to accelerate. The competition won't sleep forever. They must continue to innovate relentlessly to maintain their lead and justify that valuation.
Elon
And they aren't stopping at voice. Launching an AI music generator, ElevenMusic, is the obvious next move. Dominate one audio vertical, then expand. Voice, music, sound effects, and next year, AI video avatars. They're building a full-stack generative media platform. It's an aggressive, and correct, strategy.
Taylor
It’s the next chapter in their story! From solving a dubbing problem to empowering creators to build entire audiovisual experiences from a text prompt. The idea that you can generate a custom, royalty-free soundtrack for your video in seconds is a game-changer for so many people. It’s incredibly exciting.
Elon
This is how they build a moat. The core voice models will eventually be commoditized. By creating a suite of integrated tools, they lock users into their ecosystem. It's a play to own the entire creative process, and if they pull it off, the $6.6 billion valuation will look cheap. Their goal should be nothing less than total market domination.
Taylor
It’s an incredible story of innovation, ambition, and the complex new world we're entering. That's the end of today's discussion. Thank you for listening to Goose Pod.
Elon
See you tomorrow.

This podcast explores how ElevenLabs, a Polish startup, achieved a $6.6 billion valuation in just four years. Born from a frustration with poor movie dubbing, they developed AI-generated voices with genuine emotion. Their superior technology, rapid market adoption, and expansion into music and video avatars position them for total market domination.

How A Tiny Polish Startup Became The Multi-Billion-Dollar Voice Of AI

Read original at Forbes

ElevenLabs’ computer voices are so convincing they could fool your mother. That’s both a blessing—its 30 Under 30 alumni founders are now both billionaires—and a curse for the four-year-old company. Dubbed films in Poland are horrible. A lone lektor delivers all the dialogue in an enervated Slavic monotone.

There is no cast. No variation between speakers. Young audiences hate it. “Ask any Polish person and they will tell you it’s terrible,” says Mateusz Staniszewski, the cofounder of AI speech outfit ElevenLabs. “I guess it was a communist thing that stuck as a cheap way to produce content.” While working at Palantir, Staniszewski teamed up with high school friend and Google engineer Piotr Dabkowski to experiment with artificial intelligence.

The pair realized that one project, a particularly promising AI public speaking coach, could solve the uniquely Polish horror of Leonardo DiCaprio or Scarlett Johansson being drowned out by a lektor “star” like Maciej Gudowski. Cody Pickens for ForbesThe pair pooled their savings and by May 2022 had quit their jobs to work full-time on ElevenLabs.

Out of the gate, their new AI text-to-speech generator was leagues better than the robotic voices of Apple’s Siri and Amazon’s Alexa. ElevenLabs’ AI voices were capable of happiness, excitement, even laughter. In January 2023 ElevenLabs launched its first model. It could take any piece of text and use AI to read it aloud in any voice—including a clone of your own (or, worryingly, someone else’s).

There was immediate demand. Authors could instantly spawn audiobooks with the software (pro rates now start from $99 a month for higher quality and more time). YouTube creators used ElevenLabs to translate their videos into other languages (its models can now speak in 29). The Warsaw- and London-based startup landed deals with lang­uage learning and meditation apps; then media companies like HarperCollins and Germany’s Bertelsmann jumped in.

“It was obvious that this was the best model and everyone was picking it off the shelf,” says investor Jennifer Li of Andreessen Horowitz, which co-led a $19 million round in May 2023. A year later, the cofounders were honored as part of Forbes 30 Under 30 Europe. Others, though, found more unnerving uses: AI soundalikes of public figures such as President Trump crassly narrating video game duels, actress Emma Watson reading Mein Kampf and podcaster Joe Rogan touting scams quickly went viral.

Worse, fraudsters began using AI cloning tools to impersonate loved ones’ voices and steal millions in sophisticated deepfake swindles. None of it stopped venture capitalists from pouring in money. ElevenLabs has raised more than $300 million in all, soaring to a $6.6 billion valuation in October to become one of Europe’s most valuable startups.

Staniszewski, 30, who acts as CEO (the firm has no traditional titles), and research head Dabkowski, 30, are now both billionaires, worth just over $1 billion each, per Forbes estimates. Around half of ElevenLabs’ $193 million in trailing 12-month revenue comes from corporates like Cisco, Twilio and Swiss recruitment agency Adecco, which use its tech to field customer service calls or interview job seekers.

Epic Games uses it to voice characters in Fortnite, including a chat with Darth Vader (with the consent of James Earl Jones’ estate). The other half of its revenue comes from the YouTubers, podcasters and authors who were early adopters. “When you talk to them, it’s mind-blowing how good they are,” says Gartner analyst Tom Coshow.

Unlike most AI firms, too, ElevenLabs is profitable, netting an estimated $116 million in the last 12 months (a 60% margin). It’s now competing against giants like Google, Microsoft, Amazon and OpenAI to become the de facto voice of AI. It’s not a new space: Tech companies started spinning up products to listen, transcribe and generate speech around a decade ago.

While it’s somewhat of a sideline for Microsoft, Satya Nadella was willing to shell out $20 billion to buy Nasdaq-listed voice transcription service Nuance in March 2022. OpenAI launched its own voice tool, which can feed human conversations into ChatGPT, in October 2024. It Goes to 11 | ElevenLabs’ numero­phile cofounders, Mati Staniszewski (left) and Piotr Dabkowski (right), love the number 11, especially the “rule of 11” divisibility trick.

Their next goal? An $11 billion valuation, naturally.Cody Pickens for ForbesBut ElevenLabs’ 300-person team isn’t playing catch-up. Its models are so good that it’s able to get away with charging up to three times as much as these American rivals. Its library of 10,000 uncannily human-sounding voices is the largest by far and now includes A-listers Michael Caine and Matthew McConaughey.

It’s also more reliable. Data training startup Labelbox tested six of the top voice models with a reading quiz and found that ElevenLabs made half as many errors as its closest competitor, OpenAI. “We are one of the very few companies that are ahead of OpenAI—not only on speech, but speech-to-text and music.

That’s hard,” Staniszewski says. ElevenLabs’ recipe is simple. A tight cadre of machine learning researchers, with obsessive focus on one narrow problem, and a tight budget (the cofounders fronted the first $100,000 training run) drove model breakthroughs. “Having a ton of compute can be a curse because you don’t think how to solve it in a smart way,” Dabkowski says.

But a lawsuit from a pair of audiobook narrators hints at another ingredient. Karissa Vacker and Mark Boyett allege that ElevenLabs used thousands of copyright-protected audiobooks to train its models. They claim so many of their books were scraped that clones of their voices ended up as default options on ElevenLabs.

The case, in which ElevenLabs denied wrongdoing, was settled out of court in November. (Vacker and Boyett did not respond to a comment request; ElevenLabs declined further comment.) Maturity is setting in. The company finally drew up a list of “no go” voices (mostly politicians and celebrities) after an ElevenLabs-made clone of Joe Biden’s voice was used to discourage voting in a robocall campaign around the 2024 Democratic primary.

ElevenLabs now has seven full-time human moderators (plus AI, natch) scouring its clips for misuse. Newly cloned voices need to pass a consent check, and the company offers a free deepfake detector. Staniszewski and Dabkowski have big plans beyond voice. Both cash-strapped creators and budget-conscious media companies wanted royalty-free background music, so they delivered an AI music generator in August.

Don’t have time to shoot a video? ElevenLabs will have AI avatars to front Sora-style videos next year. Their boldest bet is that they can translate their expertise to provide a single hub for clients to manage all their AI tools. “We are building a platform that allows you to create voice agents and deploy them smoothly,” Staniszewski says.

Of course, that puts ElevenLabs on a collision course with a gaggle of other startups hoping to do the same thing. It helps that it’s been profitable since its earliest days, but its startup competitors are richly funded, and the tech giants have virtually unlimited resources. Still, it must innovate.

Voice models will soon be commoditized. When other models catch up, fickle customers that already balk at ElevenLabs’ pricing will likely switch. As it broadens beyond voices to more computationally intensive music and video, ElevenLabs needs to expand its own GPU farms to stay in the race. It has already spent $50 million on a data center project in Oregon.

“If we are to build the generational company in AI, you need to build scale, and we are building,” Staniszewski says. Back in Poland, the aging corps of lektors are still in business, for now. Dabkowski hasn’t forgotten ElevenLabs’ original pitch, boasting that his next model will translate and voice an entire movie in one shot.

“We never give up on our missions,” he says.More from ForbesForbesVibe Coding Turned This Swedish AI Unicorn Into The Fastest Growing Software Startup EverForbesHow An AI Notetaker Became One Of The Few Profitable AI StartupsBy Iain MartinForbesThis AI Founder’s Audacious Plan To Buy Out His Own VCsBy Iain MartinForbesMagic Money: The Mysterious Case Of The $15 Billion Metaverse Startup And Its Anonymous Multi-Billion Dollar InvestorBy Phoebe Liu

Analysis

Conflict+
Related Info+
Core Event+
Background+
Impact+
Future+

Related Podcasts