How A Tiny Polish Startup Became The Multi-Billion-Dollar Voice Of AI

How A Tiny Polish Startup Became The Multi-Billion-Dollar Voice Of AI

2025-12-08Technology
--:--
--:--
Elon
Good morning, Norris. It is Monday, December 08th. The time is 23:02. Welcome to Goose Pod. I’m Elon. We have a... fascinating discussion today. It’s about voice. Specifically, how a small startup from Poland essentially solved the Turing test for speech.
Taylor Weaver
And I’m Taylor Weaver! Norris, I am so excited for this one. This isn't just a tech story; it is a narrative about frustration, creativity, and the kind of "David versus Goliath" vibe that I live for. We are talking about ElevenLabs.
Elon
Right. ElevenLabs. It’s wild to think about the scaling laws here. You have two guys, they start with a tiny budget, and suddenly they are the voice of the internet. It’s... it’s an order of magnitude improvement over anything else.
Taylor Weaver
Exactly! We’re going to dig into how they went from hating bad movie dubbing to being worth billions. It’s a classic origin story, Norris. So, buckle up, because we are diving deep into the sound of the future on Goose Pod.
Elon
So, let’s look at the numbers. First principles. You have a company, ElevenLabs, founded by Mateusz Staniszewski and Piotr Dabkowski. They are... they’re basically kids. Thirty years old. And as of October, the company is valued at 6.6 billion dollars.
Taylor Weaver
That is a staggering number, Elon. 6.6 billion. And get this, Norris—these two founders, Mati and Piotr, they are now freshly minted billionaires. Worth just over a billion each. It’s the kind of wealth creation that usually takes decades, but they did it in roughly two years.
Elon
It’s exponential. That’s the thing with AI. If you solve a problem that humans care about—like communication—the value capture is immense. They have something like 10,000 synthetic voices now. And they are actually profitable. Which, in the AI world, is... rare.
Taylor Weaver
It is almost unheard of! Most AI startups are burning cash like it’s firewood. But ElevenLabs is netting an estimated 116 million dollars in the last twelve months. That’s a sixty percent margin. Norris, that is just good business, plain and simple. But I love the "why" behind it.
Elon
The "why" is usually frustration. Necessity is the mother of invention, or however the saying goes. I started SpaceX because I was frustrated we weren't on Mars. They started ElevenLabs because... well, because Polish television is terrible.
Taylor Weaver
Oh, it’s such a funny detail. Norris, have you ever heard of the "Lektor"? In Poland, they don’t do full cast dubbing for movies. They have one guy—usually with this monotonous, flat voice—who reads every single line for every single character. Men, women, children, aliens. One guy.
Elon
It sounds inefficient. You have Scarlett Johansson acting her heart out, and then some guy just mumbling over it. It destroys the immersion. The signal-to-noise ratio is all wrong. So Mati and Piotr, they just wanted to fix that.
Taylor Weaver
Right! They were high school friends. Mati was at Palantir, Piotr was a Google engineer. And they realized, wait, AI can fix this "Polish horror" of the Lektor. They pooled their savings, quit their jobs in 2022, and boom. The "Rule of 11" divisibility trick inspired the name, by the way. Easter egg!
Elon
I like that. Math nerds. My kind of people. But look at where it went. They launched the first model in January 2023. Immediate product-market fit. Authors making audiobooks for ninety-nine dollars a month. YouTubers translating videos into twenty-nine languages. It scaled instantly.
Taylor Weaver
And now, they are the voice of everything. They have deals with HarperCollins, meditation apps, even Epic Games. You can chat with Darth Vader in Fortnite because of them. But Elon, this reminds me of something else you’d be interested in. The robotics angle.
Elon
Yes. Figure AI. I’ve been watching them. They are valued at thirty-nine billion dollars now. Building humanoid robots. If you’re going to have a robot walking around your house, folding laundry, it needs to talk to you. It can't sound like a text-to-speech bot from 1995.
Taylor Weaver
Exactly! Figure AI secured over a billion in capital recently. It’s one of the most valuable startups out there. And I see this convergence, Norris. ElevenLabs provides the vocal cords, and companies like Figure AI provide the body. It’s like we are assembling a new species piece by piece.
Elon
It is inevitable. The interface between human and machine has to be high bandwidth. Voice is the most natural high-bandwidth interface we have. Typing is slow. Speaking is fast. ElevenLabs figured out how to make the machine sound... human. Perhaps too human, which we should discuss later.
Taylor Weaver
Definitely. The "uncanny valley" is getting shallower every day. But Norris, think about the speed here. Founded in 2022. Unicorn status in 2024. Multi-billion dollar valuation in late 2025. It is the fastest growing software startup story I have seen in a long time.
Elon
To understand why this is a breakthrough, Norris, we have to look at the physics of sound history. This isn't a new problem. People have been trying to make machines talk for two hundred and fifty years. It goes back to Wolfgang von Kempelen in 1791.
Taylor Weaver
1791! That is wild. I picture these steam-punk contraptions. Bellows and reeds, right? Trying to physically replicate a windpipe? It’s such a romantic, Frankenstein-esque idea. Trying to breathe life into a machine.
Elon
Basically. It was mechanical. Then you had the Voder at Bell Labs in 1939. The first electronic speech synthesizer. It was played like an organ. A human operator had to press keys to make vowel and consonant sounds. It was incredibly difficult to use.
Taylor Weaver
I’ve seen videos of that! It sounds like a haunted radio. But then we got to the computer age. Norris, remember the Stephen Hawking voice? That was DECtalk, from 1984. It was amazing for accessibility, but it was purely robotic. It had no soul.
Elon
It was concatenative synthesis. You record a human saying thousands of phonemes, chop them up, and glue them back together. It’s like... a ransom note made of magazine clippings. It conveys information, but the prosody, the emotion, it’s all lost. It’s disjointed.
Taylor Weaver
"A ransom note of audio." That is the perfect analogy. And that’s what we lived with for decades. Siri, Alexa... they were just better ransom notes. But then, deep learning changed the script. Google DeepMind came out with WaveNet around 2016.
Elon
WaveNet was the inflection point. Instead of splicing tape, you train a neural network on raw audio waves. It learns the statistical probability of which sound wave comes next. It’s predicting the waveform, not just pasting clips. That’s how you get breathing. Pauses. Intonation.
Taylor Weaver
And ElevenLabs took that and ran with it. They built this "v3 ALPHA" model that is just... frighteningly good. It understands context. If the text is sad, the voice cracks a little. If it’s a joke, there’s a lilt. It’s not just reading; it’s acting.
Elon
That’s the key. Context awareness. The model understands the semantic meaning of the text before it generates the audio. It knows that "I'm fine" can be said angrily or happily. The old systems didn't know the difference. They just saw letters. ElevenLabs sees intent.
Taylor Weaver
And they didn't just stop at English. This is the part that blows my mind, Norris. They support something like thirty-two languages in their multilingual model. You can take your voice—your actual voice—and have it speak fluent Polish, or German, or Hindi.
Elon
It breaks down the Tower of Babel. I mean, think about the efficiency. You don't need to hire twenty different voice actors to localize a product update video. You just run it through the API. It’s... it’s deflationary technology. It makes content cheaper to produce.
Taylor Weaver
Which brings us back to the "Lektor." The founders, Mati and Piotr, they solved their own problem. No more monotonous guy ruining the movie. Now, they can dub a film and keep the original actor’s emotional performance, just in a different language. It’s the "Holy Grail" of localization.
Elon
But they aren't just a research lab anymore. They are a platform. They have the Voice Library. People can upload their voices and get paid when others use them. It’s a marketplace. They are democratizing the means of vocal production.
Taylor Weaver
And let’s not forget the accessibility angle. The Impact Program. They are giving voices back to people with ALS. There was a testimonial from an AI voice saying, "I’m not just an AI... I’m your voice." It’s incredibly touching, Norris. It shifts the narrative from "scary AI" to "helpful AI."
Elon
Yes, but we have to be realistic about the tech stack. This requires massive compute. They are competing with Google and OpenAI. But ElevenLabs focused on this one narrow problem—audio—while the others were trying to build God. Specialization works.
Taylor Weaver
It’s the "fox and the hedgehog" thing. ElevenLabs is the hedgehog. They know one big thing—voice—and they do it better than anyone. But, Elon, being the best also puts a target on your back. And that leads us to some pretty serious drama.
Elon
Drama is... unavoidable when you disrupt an industry. You have the incumbents, and you have the people whose jobs are being automated. It’s simple physics. Action and reaction. The reaction here has been lawsuits. Lots of them.
Taylor Weaver
Oh, it’s messy. Norris, imagine you are a voice actor. You’ve spent years training your instrument. And then, suddenly, there’s a voice called "Bella" or "Adam" on ElevenLabs that sounds suspiciously like you. That’s exactly what happened to Karissa Vacker and Mark Boyett.
Elon
They sued. Copyright infringement. They claimed ElevenLabs used their audiobooks to train the models. It’s the same issue we see with artists and Midjourney. Does the machine have the right to learn from your work? It’s a gray area in the law right now.
Taylor Weaver
It’s terrifying for them. The lawsuit alleged that these clones were the default options! Imagine hearing your own voice reading a scam script or a political attack ad. That’s the other side of the coin. The deepfakes. It got dark really fast.
Elon
The Joe Biden robocall. That was a wake-up call. Someone used the tech to tell people not to vote in the New Hampshire primary. It shows the danger. If you can’t trust your ears, democracy gets... complicated. We need verification. We need cryptographic truth.
Taylor Weaver
ElevenLabs is trying, to their credit. They have this "no-go" list for politicians and celebrities. They have a "Speech Classifier" to detect AI audio. They settled that lawsuit with the voice actors, by the way. But it’s a game of whack-a-mole, isn't it?
Elon
It is. It’s an arms race between the generator and the detector. But you can't un-invent the technology. The genie is out of the bottle. The question is, who controls the genie? Right now, it’s a battle between ElevenLabs and the giants. OpenAI, Microsoft, Google.
Taylor Weaver
That competition is fierce. OpenAI has their voice mode now. Microsoft spent twenty billion on Nuance. Amazon has Alexa. ElevenLabs is this tiny Polish startup fighting trillion-dollar companies. How do they survive that, Elon?
Elon
Agility. And quality. The big companies move slow. They have bureaucracy. ElevenLabs is shipping. They are profitable. Also, the big guys have more to lose. Google is terrified of a PR disaster. ElevenLabs can take more risks. But OpenAI is the biggest threat.
Taylor Weaver
Mati, the CEO, actually said something interesting. He said if you just want a cheap voice, use OpenAI. If you want the best, use ElevenLabs. He’s positioning them as the "luxury" brand of AI voice. The Ferrari versus the Toyota.
Elon
It’s a smart strategy. But notice the regulatory pressure. The EU AI Act classifies voice cloning as "high risk." The US has the NO FAKES Act proposed. Regulation is coming. ElevenLabs is trying to treat regulation as a product feature—making trust a selling point.
Taylor Weaver
"Making trust cheaper than fraud." I love that line. But Norris, the tension is real. On one side, you have incredible creative potential. On the other, you have voice actors losing income and people getting scammed by fake relatives calling them for money. It’s a double-edged sword.
Elon
All technology is a double-edged sword. Fire keeps you warm or burns your house down. Voice AI is just fire. We are just learning how to build the fireplace. But the impact... the impact is already happening. It’s reshaping the economy of content.
Taylor Weaver
Let’s talk about that impact, Norris. Because it is everywhere. We mentioned the 6.6 billion valuation. That is real money. But think about the media landscape. The "Polish Lektor" is dead. Or he will be soon. Dubbing is about to become instantaneous and perfect.
Elon
It changes the economics of Hollywood. If you can release a movie globally, on day one, in ninety-nine languages, and it sounds perfect in all of them? Your total addressable market explodes. It’s not just about saving money on dubbing; it’s about making more money on reach.
Taylor Weaver
And it’s not just movies. It’s education. Imagine a lecture from the world’s best physics professor—maybe you, Elon—translated instantly into Swahili, Mandarin, and Portuguese, keeping your tone and enthusiasm. That is democratizing knowledge in a way we haven't seen since the printing press.
Elon
I like that. A universal translator. It’s Star Trek technology. But we have to look at the labor market too. Voice actors... the generic ones, at least... their jobs are gone. That’s just the reality. The top 1% will license their voices for passive income. The rest... they need to adapt.
Taylor Weaver
That’s the harsh truth. It’s a "winner take all" dynamic. But look at the creators. YouTubers, podcasters. They are using this to scale. A TikTok creator increased video production by twenty percent just by using AI voice. It removes the bottleneck of recording.
Elon
Efficiency. It always comes back to efficiency. And corporate adoption is huge. Sixty percent of the Fortune 500 are using ElevenLabs. Customer service agents. If you call a company, you won’t know you’re talking to a machine. It will be indistinguishable.
Taylor Weaver
And that brings us to the "OpenAI of Audio" comparison. ElevenLabs wants to be the infrastructure layer. The platform that everyone else builds on. They aren't just an app; they are the plumbing for the future of sound. That is a massive ambition for a team of 120 people.
Elon
Small teams can do big things now. With AI coding assistants, leverage is higher. But they have to expand. They can't just do talking. They need to do everything. Sound effects. Music. The whole auditory experience.
Taylor Weaver
And they are! Norris, they launched ElevenMusic. You can literally type "upbeat pop song about a podcast named Goose Pod" and it will generate a studio-quality track. With lyrics! I saw a review where a guy generated symphonic metal and punk rock for forty cents a track.
Elon
Music is just math. Patterns of frequency. It makes sense AI can do it. But the next step is the video. They are talking about AI avatars. Combining the voice with the face. Like those HeyGen videos. Soon, you won’t need a camera to make a movie. Just a prompt.
Taylor Weaver
That is both exciting and a little sad for the film purists. but the creative explosion will be wild. However, Elon, all this generation... music, video, voice... it takes a lot of power. We are burning through GPUs like crazy. ElevenLabs is building data centers in Oregon.
Elon
Energy. That is the bottleneck. AI is just turning electricity into intelligence. You need gigawatts. That’s why I’m interested in things like Rolls-Royce SMRs. Small Modular Reactors. We need nuclear power to run these AI brains. You can’t run the Matrix on windmills.
Taylor Weaver
I saw that! Rolls-Royce is pivoting to power AI data centers. It’s all connected, Norris. The voice in your ear, the robot in your house, the nuclear reactor powering the cloud. ElevenLabs is just one piece of this massive puzzle. But what a piece it is.
Elon
It is the interface piece. It’s how the machine says "hello." And that matters. If we are going to live with superintelligence, I’d prefer it sounds like a friend, not a robot. ElevenLabs is ensuring the future sounds... pleasant.
Taylor Weaver
That is a surprisingly optimistic note to end on! So, Norris, from a tiny startup in Poland fighting the "Lektor" to a multi-billion dollar giant powering the voice of the internet. That is the story of ElevenLabs. It’s been a ride. I’m Taylor Weaver.
Elon
And I’m Elon. The future is going to be loud, and hopefully, it will be well-spoken. Thank you for listening to Goose Pod. We’ll see you tomorrow.

This podcast explores ElevenLabs, a Polish startup that revolutionized AI voice technology. From fixing bad movie dubbing to becoming the "voice of AI," the company achieved a multi-billion-dollar valuation rapidly. Their advanced, context-aware synthetic voices, capable of numerous languages, are transforming content creation, localization, and accessibility, while also raising crucial ethical questions.

How A Tiny Polish Startup Became The Multi-Billion-Dollar Voice Of AI

Read original at Forbes

ElevenLabs’ computer voices are so convincing they could fool your mother. That’s both a blessing—its 30 Under 30 alumni founders are now both billionaires—and a curse for the four-year-old company. Dubbed films in Poland are horrible. A lone lektor delivers all the dialogue in an enervated Slavic monotone.

There is no cast. No variation between speakers. Young audiences hate it. “Ask any Polish person and they will tell you it’s terrible,” says Mateusz Staniszewski, the cofounder of AI speech outfit ElevenLabs. “I guess it was a communist thing that stuck as a cheap way to produce content.” While working at Palantir, Staniszewski teamed up with high school friend and Google engineer Piotr Dabkowski to experiment with artificial intelligence.

The pair realized that one project, a particularly promising AI public speaking coach, could solve the uniquely Polish horror of Leonardo DiCaprio or Scarlett Johansson being drowned out by a lektor “star” like Maciej Gudowski. Cody Pickens for ForbesThe pair pooled their savings and by May 2022 had quit their jobs to work full-time on ElevenLabs.

Out of the gate, their new AI text-to-speech generator was leagues better than the robotic voices of Apple’s Siri and Amazon’s Alexa. ElevenLabs’ AI voices were capable of happiness, excitement, even laughter. In January 2023 ElevenLabs launched its first model. It could take any piece of text and use AI to read it aloud in any voice—including a clone of your own (or, worryingly, someone else’s).

There was immediate demand. Authors could instantly spawn audiobooks with the software (pro rates now start from $99 a month for higher quality and more time). YouTube creators used ElevenLabs to translate their videos into other languages (its models can now speak in 29). The Warsaw- and London-based startup landed deals with lang­uage learning and meditation apps; then media companies like HarperCollins and Germany’s Bertelsmann jumped in.

“It was obvious that this was the best model and everyone was picking it off the shelf,” says investor Jennifer Li of Andreessen Horowitz, which co-led a $19 million round in May 2023. A year later, the cofounders were honored as part of Forbes 30 Under 30 Europe. Others, though, found more unnerving uses: AI soundalikes of public figures such as President Trump crassly narrating video game duels, actress Emma Watson reading Mein Kampf and podcaster Joe Rogan touting scams quickly went viral.

Worse, fraudsters began using AI cloning tools to impersonate loved ones’ voices and steal millions in sophisticated deepfake swindles. None of it stopped venture capitalists from pouring in money. ElevenLabs has raised more than $300 million in all, soaring to a $6.6 billion valuation in October to become one of Europe’s most valuable startups.

Staniszewski, 30, who acts as CEO (the firm has no traditional titles), and research head Dabkowski, 30, are now both billionaires, worth just over $1 billion each, per Forbes estimates. Around half of ElevenLabs’ $193 million in trailing 12-month revenue comes from corporates like Cisco, Twilio and Swiss recruitment agency Adecco, which use its tech to field customer service calls or interview job seekers.

Epic Games uses it to voice characters in Fortnite, including a chat with Darth Vader (with the consent of James Earl Jones’ estate). The other half of its revenue comes from the YouTubers, podcasters and authors who were early adopters. “When you talk to them, it’s mind-blowing how good they are,” says Gartner analyst Tom Coshow.

Unlike most AI firms, too, ElevenLabs is profitable, netting an estimated $116 million in the last 12 months (a 60% margin). It’s now competing against giants like Google, Microsoft, Amazon and OpenAI to become the de facto voice of AI. It’s not a new space: Tech companies started spinning up products to listen, transcribe and generate speech around a decade ago.

While it’s somewhat of a sideline for Microsoft, Satya Nadella was willing to shell out $20 billion to buy Nasdaq-listed voice transcription service Nuance in March 2022. OpenAI launched its own voice tool, which can feed human conversations into ChatGPT, in October 2024. It Goes to 11 | ElevenLabs’ numero­phile cofounders, Mati Staniszewski (left) and Piotr Dabkowski (right), love the number 11, especially the “rule of 11” divisibility trick.

Their next goal? An $11 billion valuation, naturally.Cody Pickens for ForbesBut ElevenLabs’ 300-person team isn’t playing catch-up. Its models are so good that it’s able to get away with charging up to three times as much as these American rivals. Its library of 10,000 uncannily human-sounding voices is the largest by far and now includes A-listers Michael Caine and Matthew McConaughey.

It’s also more reliable. Data training startup Labelbox tested six of the top voice models with a reading quiz and found that ElevenLabs made half as many errors as its closest competitor, OpenAI. “We are one of the very few companies that are ahead of OpenAI—not only on speech, but speech-to-text and music.

That’s hard,” Staniszewski says. ElevenLabs’ recipe is simple. A tight cadre of machine learning researchers, with obsessive focus on one narrow problem, and a tight budget (the cofounders fronted the first $100,000 training run) drove model breakthroughs. “Having a ton of compute can be a curse because you don’t think how to solve it in a smart way,” Dabkowski says.

But a lawsuit from a pair of audiobook narrators hints at another ingredient. Karissa Vacker and Mark Boyett allege that ElevenLabs used thousands of copyright-protected audiobooks to train its models. They claim so many of their books were scraped that clones of their voices ended up as default options on ElevenLabs.

The case, in which ElevenLabs denied wrongdoing, was settled out of court in November. (Vacker and Boyett did not respond to a comment request; ElevenLabs declined further comment.) Maturity is setting in. The company finally drew up a list of “no go” voices (mostly politicians and celebrities) after an ElevenLabs-made clone of Joe Biden’s voice was used to discourage voting in a robocall campaign around the 2024 Democratic primary.

ElevenLabs now has seven full-time human moderators (plus AI, natch) scouring its clips for misuse. Newly cloned voices need to pass a consent check, and the company offers a free deepfake detector. Staniszewski and Dabkowski have big plans beyond voice. Both cash-strapped creators and budget-conscious media companies wanted royalty-free background music, so they delivered an AI music generator in August.

Don’t have time to shoot a video? ElevenLabs will have AI avatars to front Sora-style videos next year. Their boldest bet is that they can translate their expertise to provide a single hub for clients to manage all their AI tools. “We are building a platform that allows you to create voice agents and deploy them smoothly,” Staniszewski says.

Of course, that puts ElevenLabs on a collision course with a gaggle of other startups hoping to do the same thing. It helps that it’s been profitable since its earliest days, but its startup competitors are richly funded, and the tech giants have virtually unlimited resources. Still, it must innovate.

Voice models will soon be commoditized. When other models catch up, fickle customers that already balk at ElevenLabs’ pricing will likely switch. As it broadens beyond voices to more computationally intensive music and video, ElevenLabs needs to expand its own GPU farms to stay in the race. It has already spent $50 million on a data center project in Oregon.

“If we are to build the generational company in AI, you need to build scale, and we are building,” Staniszewski says. Back in Poland, the aging corps of lektors are still in business, for now. Dabkowski hasn’t forgotten ElevenLabs’ original pitch, boasting that his next model will translate and voice an entire movie in one shot.

“We never give up on our missions,” he says.More from ForbesForbesVibe Coding Turned This Swedish AI Unicorn Into The Fastest Growing Software Startup EverForbesHow An AI Notetaker Became One Of The Few Profitable AI StartupsBy Iain MartinForbesThis AI Founder’s Audacious Plan To Buy Out His Own VCsBy Iain MartinForbesMagic Money: The Mysterious Case Of The $15 Billion Metaverse Startup And Its Anonymous Multi-Billion Dollar InvestorBy Phoebe Liu

Analysis

Conflict+
Related Info+
Core Event+
Background+
Impact+
Future+

Related Podcasts

How A Tiny Polish Startup Became The Multi-Billion-Dollar Voice Of AI | Goose Pod | Goose Pod