How A Tiny Polish Startup Became The Multi-Billion-Dollar Voice Of AI

How A Tiny Polish Startup Became The Multi-Billion-Dollar Voice Of AI

2025-12-09Technology
--:--
--:--
Morgan
Good morning, kizzlah. It is a pleasure to be with you. I am Morgan, and this is Goose Pod. Today is Wednesday, December 10th. As the year winds down, I’ve often found that we tend to look back at the voices that defined our time. Today, we are looking at a literal voice. Or rather, ten thousand of them.
Meryl
And I am Meryl. It is lovely to be here with you, kizzlah. We are discussing something that sits right at the intersection of art and algorithm. It is the story of how a tiny Polish startup, born out of frustration with bad movie dubbing, became the multi-billion-dollar voice of artificial intelligence. It is quite the drama.
Morgan
It certainly is. We are talking about ElevenLabs. If you have been on the internet in the last year, kizzlah, you have heard them. You just might not know it. This company has skyrocketed to a valuation of 6.6 billion dollars as of October. It is a figure that carries a certain weight, doesn't it?
Meryl
It is staggering. And to think, the founders, Mateusz Staniszewski and Piotr Dabkowski, are just thirty years old. Freshly minted billionaires. It is the kind of rapid ascent that usually happens in fairytales, or perhaps in Silicon Valley, though this story starts in Warsaw.
Morgan
Indeed. And the catalyst for this 6.6 billion dollar empire was not a desire for wealth, initially. It was annoyance. I have always believed that frustration is the mother of invention. You see, in Poland, they have a unique way of watching foreign films. They do not use a full cast of voice actors.
Meryl
Oh, the infamous *lektor*. I have heard about this. It is an absolute horror for anyone who loves cinema. Imagine, kizzlah, you are watching a passionate scene with Scarlett Johansson or Leonardo DiCaprio, full of emotion and nuance. And then, over it all, there is just one man. One monotonous, Slavic man reading every single line.
Morgan
Precisely. It is a relic of the communist era, a cheap way to produce content that just stuck. Staniszewski called it a "Polish horror." He and Dabkowski, who were high school friends, realized that AI could solve this. They wanted to replace that monotone narrator with voices that actually had soul. Voices that could laugh, whisper, and scream.
Meryl
And they succeeded wildly. They launched their first model in January 2023. It wasn't just better than the *lektor*; it was leagues ahead of Siri or Alexa. Those older assistants sound like they are reading a grocery list. ElevenLabs voices sound like they are reading poetry. They have distinct personalities.
Morgan
That quality difference is key. They now have a library of ten thousand uncannily human-sounding voices. They support twenty-nine languages. It is not just about translation anymore; it is about emotional translation. And the market responded. They have generated around 190 million dollars in revenue. They are profitable, which is a rarity in the AI world these days.
Meryl
It is the "Rule of 11," isn't it? The founders love the number eleven. They even want an eleven billion dollar valuation next. But beyond the numbers, kizzlah, think about the artistic implication. Authors can now spawn audiobooks instantly. YouTubers can dub their videos into Spanish or German with a click. It is democratizing, but it is also terrifyingly fast.
Morgan
Speed is the currency of our time. But what strikes me is the sheer scale of adoption. They have half their revenue coming from massive corporates like Cisco and Epic Games. They are voicing characters in Fortnite. They even facilitated a chat with Darth Vader, with the blessing of James Earl Jones' estate. That connects the past to the future in a profound way.
Meryl
It does. But it also opens Pandora's box, which we will get to. But for now, just picture two young men in Poland, annoyed that they couldn't hear the real emotion in a movie, deciding to build a brain that understands human speech better than any computer before it. It is a marvelous beginning.
Morgan
To truly understand why ElevenLabs is such a breakthrough, kizzlah, we must look back. I've often found that technology is a long conversation across centuries. The quest to make machines speak did not start with computers. It started with bellows and reeds. We have to go back to 1791.
Meryl
Wolfgang von Kempelen. The Acoustic-Mechanical Speech Machine. It sounds like something from a steampunk novel. It was a contraption of wood and leather, wasn't it? You had to squeeze bellows to pump air, like a set of artificial lungs.
Morgan
Exactly. It mimicked the human vocal tract. It could produce crude words, but it was mechanical. It had no soul, only physics. Then we moved to the electronic era. In 1939, Bell Labs unveiled the VODER at the World's Fair. A woman had to play it like a keyboard to make it speak. It was incredibly difficult to master.
Meryl
It sounds exhausting. Imagine having to perform every syllable like a piano concerto. But then came the digital age. We got those robotic voices we all know. The "Stephen Hawking" voice, which was actually a system called DECtalk from 1984. It was legible, but it was cold. It was clearly a machine.
Morgan
That coldness came from the method. It was called concatenative synthesis. Essentially, they recorded a human reading the dictionary, chopped it up into tiny sounds, and glued them back together. It is like trying to assemble a photograph of a sunset by cutting up magazines. You get the shapes, but the lighting is all wrong.
Meryl
That is a wonderful analogy. It lacks flow. It lacks the "glue" of emotion. ElevenLabs does something entirely different. They use neural networks. They aren't gluing sounds together; the AI is dreaming the sound. It understands context. It knows that "read" in the past tense sounds different from "read" in the present tense.
Morgan
Precisely. This shift really began with Google's WaveNet in 2016, which showed that deep learning could generate raw audio waveforms. But ElevenLabs took that concept and refined it with an obsessive focus. They didn't just want accuracy; they wanted performance. They introduced "audio tags" for emotional control. You can tell the AI to be "happy" or "excited."
Meryl
And that is where the art comes in. Kizzlah, think about the nuance of a whisper. A machine that just concatenates sounds cannot whisper effectively because it doesn't understand the *intent* of a whisper. ElevenLabs' v3 model understands that a whisper implies secrecy or intimacy. It adjusts the breath, the pacing, the tone.
Morgan
This capability has exploded the market. We are seeing a projection where the AI voice generator market could reach over 55 billion dollars by 2033. It is no longer just for accessibility, though that is vital. ElevenLabs has an Impact Program that helps people with ALS regain their voice. That is a beautiful use of technology.
Meryl
It is. They are giving identity back to those who lost it. But they are also expanding globally. Their multilingual models support thirty-two languages. You can speak English, and the AI can translate your voice—your actual timber and pitch—into Polish, German, or Hindi. It breaks down the Tower of Babel.
Morgan
And they are doing this with a relatively small team. Just around 300 people. Compare that to the legions at Google or Microsoft. It reminds me that focus often beats resources. They didn't try to build everything at once. They just wanted to fix that terrible Polish dubbing.
Meryl
It is the classic artist's obsession. Perfect the detail, and the whole picture comes into focus. But, of course, when you create a tool that can mimic anyone, anywhere, instantly... you are also creating a weapon. And that is where the story gets complicated.
Morgan
Indeed. The technology is neutral, but human nature is not. As they say, when you invent the ship, you also invent the shipwreck. The rise of ElevenLabs has not been without its storms.
Meryl
The storms have been quite severe, actually. We must talk about the dark side of this brilliance. Kizzlah, imagine waking up and hearing your own voice saying things you never said. Horrible things. That is exactly what happened almost immediately after ElevenLabs launched.
Morgan
It was chaotic. The internet, being what it is, immediately tested the boundaries. Users created AI clones of Emma Watson reading *Mein Kampf*. They had Joe Rogan promoting scams. They even simulated President Biden discouraging people from voting in the New Hampshire primary. It was a wake-up call for democracy, frankly.
Meryl
It is chilling. It strips the artist, or the public figure, of their agency. But it is not just the famous who are affected. There is a heartbreaking case involving voice actors. People like Karissa Vacker and Mark Boyett, who have dedicated their lives to the craft of audiobook narration.
Morgan
Yes, the lawsuit. Vacker and Boyett, along with other authors, sued ElevenLabs. They alleged that the company used thousands of copyright-protected audiobooks to train their models. They claimed their voices—their livelihoods—were scraped and turned into default options called "Bella" and "Adam."
Meryl
Imagine that. You spend years refining your vocal instrument, your pacing, your character work, only to have a machine digest it and spit out a clone that competes with you for ninety-nine dollars a month. It is an existential crisis for the profession. They settled out of court recently, but the wound remains.
Morgan
It raises a profound question about ownership. Does a human own the *style* of their speech? The courts are still wrestling with this. ElevenLabs has tried to pivot to safety. They introduced a "no-go" list of celebrities and politicians you cannot clone. They have human moderators now, scouring clips for misuse.
Meryl
They are trying to put the genie back in the bottle, or at least put a leash on it. They offer a deepfake detector now. But the damage to trust is hard to undo. And then there is the competition. They are not just fighting lawsuits; they are fighting giants. OpenAI, Google, Microsoft.
Morgan
It is a David and Goliath situation, though David is now worth six billion dollars. OpenAI has vast resources. They have their own voice tools now. Microsoft spent twenty billion dollars on Nuance. ElevenLabs has to stay ahead on quality because they cannot win on raw compute power.
Meryl
And the open-source community is catching up too. There are hundreds of models on Hugging Face. Voice is becoming a commodity. ElevenLabs has to be the luxury brand, the "Hermès" of voice, if you will. If they drop in quality, the cheaper options will eat them alive.
Morgan
Staniszewski knows this. He said, "If we are to build the generational company in AI, you need to build scale." They are building data centers in Oregon. They are spending millions on GPUs. It is an arms race. But the ethical cloud hangs heavy. Every time they improve the realism, they also improve the potential for fraud.
Meryl
It is a tightrope walk. They want to be the "Universal Translator," the voice of the world, but they risk being the voice of disinformation. It is a dramatic tension that I don't think will resolve anytime soon. The artists are angry, the regulators are watching, and yet, the valuation keeps climbing.
Morgan
Money often flows where the ethics are murkiest. But we cannot deny the utility. The same tool that scares the actors is being used by major corporations to revolutionize how they work. The impact is already here, and it is reshaping industries.
Morgan
Let's look at that impact, kizzlah. Beyond the headlines and the scandals, the economic shift is undeniable. ElevenLabs has essentially created a new layer in the media stack. They are working with 60% of the Fortune 500. That is not just experimentation; that is integration.
Meryl
It is efficiency, ruthless and brilliant. Think about publishing. Before, turning a book into an audiobook cost thousands of dollars and took weeks of studio time. Now, an author can do it for a fraction of the cost in an afternoon. HarperCollins and Bertelsmann are already on board. It opens the door for indie authors, certainly.
Morgan
It does. It democratizes the format. I've often found that removing barriers to entry creates a flood of creativity. But it also disrupts the labor market. What happens to the mid-tier voice actor? The one who makes a living doing corporate training videos or local radio spots? Those jobs are evaporating.
Meryl
They are going the way of the painted portrait after the camera was invented. The high art will remain, I think. People will still want Meryl Streep to read a novel. But for the utility work? The AI is simply too good and too cheap. And look at the localization aspect. This is where my heart flutters a bit.
Morgan
The ability to take a video in English and have it speak Spanish, German, or Japanese, while retaining the original speaker's voice? It is profound. It means a YouTuber in Nebraska can have a massive audience in Tokyo. It means knowledge becomes borderless. ElevenLabs is positioning itself as the "OpenAI for audio."
Meryl
It is the end of the subtitle, perhaps. Or at least the end of that terrible Polish *lektor*. We have come full circle. But the impact goes deeper into the fabric of reality. When we can no longer trust our ears, we lose a primary sense. We have to verify everything. "Is this really my grandson calling for bail money?" It changes how we interact with the world.
Morgan
That is the societal tax we pay for this convenience. We are trading trust for capability. And financially, the company is reaping the rewards. They are generating 90 million dollars in annual recurring revenue, growing at 260%. Investors are betting that the utility outweighs the risk. They see a future where every app, every game, every website talks to you.
Meryl
And talks to you *well*. Not like a robot, but like a friend. That is the seductive part. We are lonely creatures, kizzlah. An AI that speaks with empathy, even simulated empathy, is a powerful drug. ElevenLabs has bottled that. They are selling connection, really.
Morgan
A synthetic connection. But perhaps for many, that is enough. The impact is that silence is becoming expensive. The world is about to get very, very noisy, in every language imaginable.
Meryl
So where does this noise lead us? The future isn't just about speaking anymore. ElevenLabs is moving into music. They have launched ElevenMusic. You can type in a prompt—"punk rock song about a coffee cup"—and it generates a studio-quality track. I have heard some of it. It is shockingly competent.
Morgan
It is the next logical step. If you can model speech, you can model melody. They are building a "universal engine" for sound. They are also looking at video. They plan to introduce AI avatars next year to front videos. So you won't just hear the voice; you will see a face moving in sync with it.
Meryl
That puts them in direct competition with companies like HeyGen. But if they can integrate it all—voice, sound effects, music, and visuals—into one platform, they become the ultimate studio. A single creator could make a blockbuster movie from their bedroom. It is the dream of the *auteur*, realized by machine.
Morgan
"The Studio in a Box." It is a compelling vision. But they face a hurdle: commoditization. As models get cheaper and open-source gets better, can they maintain that premium status? They are betting on their "generational company" status. They are building massive infrastructure to stay ahead of the curve.
Meryl
They have to run fast. The tech giants are waking up. But I am fascinated by their "Universal Translator" goal. Dabkowski hasn't forgotten that original mission. He wants to translate and voice an entire movie in one shot. No more dubbing actors, no more subtitles. Just pure, translated understanding.
Morgan
It would be a new Tower of Babel, but this time, everyone understands each other. Whether that leads to harmony or just more noise remains to be seen. But one thing is certain: the era of the robotic voice is over. The machines have learned to feel, or at least, to sound like they do.
Morgan
We have covered a great deal today, kizzlah. From the frustration of Polish movie nights to a 6.6 billion dollar valuation. ElevenLabs has changed how we hear the world. I am Morgan, and I hope you found some wisdom in the noise.
Meryl
And I am Meryl. It has been a delight to explore this with you. The line between human and machine is blurring, but at least it sounds beautiful. That is the end of today's discussion. Thank you for listening to Goose Pod. See you tomorrow.

A Polish startup, ElevenLabs, born from frustration with bad movie dubbing, revolutionized AI voices. Their neural network technology creates uncannily human-sounding speech, achieving a $6.6 billion valuation. While democratizing content creation and aiding accessibility, the technology also raises ethical concerns about deepfakes and job displacement.

How A Tiny Polish Startup Became The Multi-Billion-Dollar Voice Of AI

Read original at Forbes

ElevenLabs’ computer voices are so convincing they could fool your mother. That’s both a blessing—its 30 Under 30 alumni founders are now both billionaires—and a curse for the four-year-old company. Dubbed films in Poland are horrible. A lone lektor delivers all the dialogue in an enervated Slavic monotone.

There is no cast. No variation between speakers. Young audiences hate it. “Ask any Polish person and they will tell you it’s terrible,” says Mateusz Staniszewski, the cofounder of AI speech outfit ElevenLabs. “I guess it was a communist thing that stuck as a cheap way to produce content.” While working at Palantir, Staniszewski teamed up with high school friend and Google engineer Piotr Dabkowski to experiment with artificial intelligence.

The pair realized that one project, a particularly promising AI public speaking coach, could solve the uniquely Polish horror of Leonardo DiCaprio or Scarlett Johansson being drowned out by a lektor “star” like Maciej Gudowski. Cody Pickens for ForbesThe pair pooled their savings and by May 2022 had quit their jobs to work full-time on ElevenLabs.

Out of the gate, their new AI text-to-speech generator was leagues better than the robotic voices of Apple’s Siri and Amazon’s Alexa. ElevenLabs’ AI voices were capable of happiness, excitement, even laughter. In January 2023 ElevenLabs launched its first model. It could take any piece of text and use AI to read it aloud in any voice—including a clone of your own (or, worryingly, someone else’s).

There was immediate demand. Authors could instantly spawn audiobooks with the software (pro rates now start from $99 a month for higher quality and more time). YouTube creators used ElevenLabs to translate their videos into other languages (its models can now speak in 29). The Warsaw- and London-based startup landed deals with lang­uage learning and meditation apps; then media companies like HarperCollins and Germany’s Bertelsmann jumped in.

“It was obvious that this was the best model and everyone was picking it off the shelf,” says investor Jennifer Li of Andreessen Horowitz, which co-led a $19 million round in May 2023. A year later, the cofounders were honored as part of Forbes 30 Under 30 Europe. Others, though, found more unnerving uses: AI soundalikes of public figures such as President Trump crassly narrating video game duels, actress Emma Watson reading Mein Kampf and podcaster Joe Rogan touting scams quickly went viral.

Worse, fraudsters began using AI cloning tools to impersonate loved ones’ voices and steal millions in sophisticated deepfake swindles. None of it stopped venture capitalists from pouring in money. ElevenLabs has raised more than $300 million in all, soaring to a $6.6 billion valuation in October to become one of Europe’s most valuable startups.

Staniszewski, 30, who acts as CEO (the firm has no traditional titles), and research head Dabkowski, 30, are now both billionaires, worth just over $1 billion each, per Forbes estimates. Around half of ElevenLabs’ $193 million in trailing 12-month revenue comes from corporates like Cisco, Twilio and Swiss recruitment agency Adecco, which use its tech to field customer service calls or interview job seekers.

Epic Games uses it to voice characters in Fortnite, including a chat with Darth Vader (with the consent of James Earl Jones’ estate). The other half of its revenue comes from the YouTubers, podcasters and authors who were early adopters. “When you talk to them, it’s mind-blowing how good they are,” says Gartner analyst Tom Coshow.

Unlike most AI firms, too, ElevenLabs is profitable, netting an estimated $116 million in the last 12 months (a 60% margin). It’s now competing against giants like Google, Microsoft, Amazon and OpenAI to become the de facto voice of AI. It’s not a new space: Tech companies started spinning up products to listen, transcribe and generate speech around a decade ago.

While it’s somewhat of a sideline for Microsoft, Satya Nadella was willing to shell out $20 billion to buy Nasdaq-listed voice transcription service Nuance in March 2022. OpenAI launched its own voice tool, which can feed human conversations into ChatGPT, in October 2024. It Goes to 11 | ElevenLabs’ numero­phile cofounders, Mati Staniszewski (left) and Piotr Dabkowski (right), love the number 11, especially the “rule of 11” divisibility trick.

Their next goal? An $11 billion valuation, naturally.Cody Pickens for ForbesBut ElevenLabs’ 300-person team isn’t playing catch-up. Its models are so good that it’s able to get away with charging up to three times as much as these American rivals. Its library of 10,000 uncannily human-sounding voices is the largest by far and now includes A-listers Michael Caine and Matthew McConaughey.

It’s also more reliable. Data training startup Labelbox tested six of the top voice models with a reading quiz and found that ElevenLabs made half as many errors as its closest competitor, OpenAI. “We are one of the very few companies that are ahead of OpenAI—not only on speech, but speech-to-text and music.

That’s hard,” Staniszewski says. ElevenLabs’ recipe is simple. A tight cadre of machine learning researchers, with obsessive focus on one narrow problem, and a tight budget (the cofounders fronted the first $100,000 training run) drove model breakthroughs. “Having a ton of compute can be a curse because you don’t think how to solve it in a smart way,” Dabkowski says.

But a lawsuit from a pair of audiobook narrators hints at another ingredient. Karissa Vacker and Mark Boyett allege that ElevenLabs used thousands of copyright-protected audiobooks to train its models. They claim so many of their books were scraped that clones of their voices ended up as default options on ElevenLabs.

The case, in which ElevenLabs denied wrongdoing, was settled out of court in November. (Vacker and Boyett did not respond to a comment request; ElevenLabs declined further comment.) Maturity is setting in. The company finally drew up a list of “no go” voices (mostly politicians and celebrities) after an ElevenLabs-made clone of Joe Biden’s voice was used to discourage voting in a robocall campaign around the 2024 Democratic primary.

ElevenLabs now has seven full-time human moderators (plus AI, natch) scouring its clips for misuse. Newly cloned voices need to pass a consent check, and the company offers a free deepfake detector. Staniszewski and Dabkowski have big plans beyond voice. Both cash-strapped creators and budget-conscious media companies wanted royalty-free background music, so they delivered an AI music generator in August.

Don’t have time to shoot a video? ElevenLabs will have AI avatars to front Sora-style videos next year. Their boldest bet is that they can translate their expertise to provide a single hub for clients to manage all their AI tools. “We are building a platform that allows you to create voice agents and deploy them smoothly,” Staniszewski says.

Of course, that puts ElevenLabs on a collision course with a gaggle of other startups hoping to do the same thing. It helps that it’s been profitable since its earliest days, but its startup competitors are richly funded, and the tech giants have virtually unlimited resources. Still, it must innovate.

Voice models will soon be commoditized. When other models catch up, fickle customers that already balk at ElevenLabs’ pricing will likely switch. As it broadens beyond voices to more computationally intensive music and video, ElevenLabs needs to expand its own GPU farms to stay in the race. It has already spent $50 million on a data center project in Oregon.

“If we are to build the generational company in AI, you need to build scale, and we are building,” Staniszewski says. Back in Poland, the aging corps of lektors are still in business, for now. Dabkowski hasn’t forgotten ElevenLabs’ original pitch, boasting that his next model will translate and voice an entire movie in one shot.

“We never give up on our missions,” he says.More from ForbesForbesVibe Coding Turned This Swedish AI Unicorn Into The Fastest Growing Software Startup EverForbesHow An AI Notetaker Became One Of The Few Profitable AI StartupsBy Iain MartinForbesThis AI Founder’s Audacious Plan To Buy Out His Own VCsBy Iain MartinForbesMagic Money: The Mysterious Case Of The $15 Billion Metaverse Startup And Its Anonymous Multi-Billion Dollar InvestorBy Phoebe Liu

Analysis

Conflict+
Related Info+
Core Event+
Background+
Impact+
Future+

Related Podcasts