Note: We’re hosting an event in our San Francisco office for the most ambitious founders, scientists, and investors interested in building the voice revolution. If that’s you, register to join here.
Every time we imagine a computer in science fiction it has a voice. Jarvis. C-3P0. Samantha from her. Humanoid or not, our idea of a futuristic machine is one that speaks to us.
Voice is how humans naturally communicate. It is how we prefer to share (and overshare). In other words, voice has always been poised to become the dominant way we interact with computers. Late 2024 marked a fundamental shift—the infrastructure needed to build compelling voice experiences finally became accessible to startups.
After years of fragmented, expensive voice tech we now have off-the-shelf solutions. This is enabling startups to use voice as a modality to solve new customer challenges.
So, which applications will usher in the new voice-first paradigm? Here’s what we’re seeing.
Both the market and the technology are now ready for Voice AI.
The market has been ready since the caveman days. Ever since we first began sharing information in informal speech.
But to unlock speech as a usable interface, we needed systems that could understand not just words, but the full range of qualitative signals humans embed in conversation—emotion, intent, tone, context.
Incumbents spent decades laying this foundation. They poured billions into text-to-speech and speech recognition, advancing the field to the point where machines could recognize spoken words and respond—if only in stiff, canned ways.
That foundational R&D set the stage. But what’s changed—and what made late 2024 a turning point—is the convergence of three breakthroughs:
The conversation experience has crossed a critical threshold. Sub-300ms round-trip times and interruptible speech make conversations feel genuinely human. Anything over 2-3 seconds now feels “too slow”—a clear signal that user expectations have fundamentally shifted. We expect voices to sound, and feel human.
Startups can now access these capabilities through cloud APIs rather than building them from scratch.
Unlike early voice assistants that required hand-crafting responses, modern voice AI systems leverage existing LLMs like GPT-4, Claude, or Llama. Startups don’t need to build reasoning capabilities from scratch—they can combine voice APIs with existing language models to better understand unstructured input and generate responses.
The economics of voice AI have transformed entirely. What once required massive infrastructure investments can now be implemented at a fraction of the cost. API pricing has plummeted, with quality voice services now accessible at pennies per minute.
Like the early days of mobile apps or web applications, we’re about to see an explosion of experimentation. Builders will begin to investigate what applications truly work outside the lab, in the market.
Voice will create a generation of companies built on vocal interaction in robust natural language. These companies will use voice as data, voice as a means for interacting with software, voice as a wedge into previously untapped markets, and voice as catalyst for creating entirely new markets.
We’re seeing several on-ramps for companies focusing on voice. There are great companies building in each of these sections, but it’s still really early days:
In our April 2024 essay on the AI Workforce, we described how “AI is turning software into labor.” Software doesn’t just enable human workers, but is now doing the work itself. Voice AI is yet another step towards that “AI as labor” future.
B2B voice AI applications are an extension of this thesis. We are seeing them grow incredibly fast because they provide high and measurable ROI.
Many current voice AI companies show heavy concentration in call center optimization and sales automation—areas where businesses immediately understand the value proposition, and can easily quantify it. Customer service, sales calls, and internal training represent low-hanging fruit where AI voice replacement is already happening – bringing costs down, while improving outcomes.
Companies like Smith.ai exemplify this approach, providing AI-powered customer service agents specifically for SMBs. Their platform can respond to customer calls and messages 24/7, with high accuracy at a fraction of a cost.
Numeo AI is another interesting example. Numeo AI’s voice agent is already outperforming humans in the logistics domains. Their AI carrier agents negotiate freight rates more effectively because they make decisions faster, have access to more data points, and can operate without emotional biases.
B2B buyers understand unit economics. When you can replace a call center agent who costs $40,000 annually with software that costs $4,000 annually and works 24/7, the math is simple.
Companies here will gain market share fast. But there will be heavy competition. Companies building here need to be moving fast right now.
In some cases, voice is actually the perfect wedge to enter markets that may be hard to crack. As we’ve discussed in our Stackable Business Models essay, the initial voice solution is often just the first step in a much larger platform play.
The wedge strategy works like this:
That said, this approach requires a clear “act two” from day one. Founders must have a master plan for expansion beyond their initial voice solution to build defensibility once competitors inevitably arrive. The most successful companies in this space may use voice as the entry point to become workflow automation companies, data analytics providers, full-stack customer engagement platforms or all the above.
Back to Numeo AI as an example. They begin by automating routine trucking dispatch calls with their VoiceFlow product – an intelligent voice agent that handles broker communications, rate negotiations, and load status updates. This voice functionality serves as their wedge into the logistics back office.
Voice works as a trojan horse because phone calls remain at the center of many trucking logistics operations. The medium fits the message.
The prosumer segment represents the bridge between B2B efficiency and consumer delight. Here, voice becomes an interface that unlocks better, more nuanced experiences, rather than just replacing labor. And, in some cases, it democratizes access to services that were previously considered inaccessible or premium in the pre-AI days.
These may be voice-native experiences that already existed, like career counseling, for example, but can now be created at scale (and economically) using AI.
An interesting example in this space is Boardy AI, which acts as a super career counselor – using its network to help people find new opportunities. All you have to do is hop on a call so Boardy can understand your context in detail. This is a super-powered headhunter, at scale. The human equivalent simply doesn’t exist at this price point.
This on-ramp heavily leverages the multi-layered data gathering aspects of voice AI to create better core services. People are far more willing to provide detailed or personal information in conversations – especially private conversations than in text. There are good reasons for this:
(Real time data is uniquely valuable. In fact, we believe it is the only type of data we’ve seen that is capable of sustaining a network effect. More on that thinking here – and we will be writing more about this in coming weeks).
TLDR; voice enables new types of qualitative data collection we’ve never unlocked at scale. This creates opportunities for AI applications that use conversational interfaces to gather better information in real time and open up new markets.
The third on-ramp is only for the real visionaries.
The most transformative applications will emerge when voice AI develops what we call “soul”—the ability to be not just efficient but delightful. This software understands and adapts to users fluidly, anticipating emotion, and needs.
Companies that make this “soul” their core value prospect are poised to deliver completely new user experiences we can’t even conceive of right now.
An early example of this is our portco Autograph, a company building the “digital after-life platform”—capturing life stories through effortless weekly phone calls, then turning those recordings into a conversational, voice-cloned legacy that families can access forever.
Pre-voice AI, we also saw early glimpses of this theme with Character AI, which achieved massive adoption not through utility but through emotional connection. But there are far more paradigms to unlock here, especially if you add on voice as a capability.
Voice AI with soul will excel in areas requiring emotional intelligence, like therapy, education, and companionship.
Foundational technologies are already emerging—companies like Hume AI are building empathic language models that understand tone, voice modulation, and emotional subtleties. We’ve yet to see many app layer companies truly leveraging emotionally intelligent, voice-first technology to solve problems and create value.
Based on our analysis of the current voice AI landscape and market gaps, here are the key areas where we see the most promising opportunities:
Most voice AI companies are still building horizontal solutions. They’re more focused on getting vocal AI up and running across broad use cases.
Horizontal rollout has been a dominant theme for AI in general, one that we believe is beginning to shatter – see our thinking in our essay, The Verticalization of Everything. Voice AI may follow similar trends. While today’s big winners are ElevenLabs or other horizontal voice agents, there is still so much white space to build vertical-specific voice solutions.
When evaluating vertical opportunities for voice AI, we look for these characteristics:
The sweet spot for voice AI applications right now still remains high-volume, repetitive tasks where 80% accuracy is acceptable. Appointment scheduling, initial sales qualification, and basic customer support queries represent enormous markets where voice AI can deliver 10x better unit economics than human alternatives.
We expect AI to expand into more complex use cases very soon. But this is an area where Voice AI is essentially plug-and-play.
This is still perhaps the biggest gap in the AI voice space to date. Whether it’s voice-guided fitness instruction, personalized meditation, or interactive storytelling, there’s significant white space for voice AI that prioritizes experience over efficiency.
Companies that truly understand how to create novel experiences using voice as the primary module stand to create entirely new markets.
As voice AI becomes more prevalent, we’re entering uncharted security territory. “Vishing” (voice phishing) is already emerging as a major threat, with AI-generated voices becoming indistinguishable from real humans. The companies built to defend against these threats represent a massive, underexplored market.
We’re tracking several companies in this space, including startups working on voice authentication and fraud detection. This represents both a significant risk and an enormous opportunity—as voice interfaces proliferate, the market for securing them will grow exponentially.
The real competitive advantage in voice AI will be understanding customer problems and designing solutions where a conversation feels like a magical uplift. There are a few key advantages founders can leverage right now:
Domain Expertise: The most successful voice AI companies will be built by people who deeply understand their customers’ pain points and can design better experiences. You have to understand why the voice solution is 100x better than existing software and build accordingly.
Obsessive Focus on PMF: Because underlying voice technologies improve at the speed of foundational models, founders must be relentless about finding and perfecting PMF before competitors catch up. You have to continuously be adding value to your service outside of technical performance.
Vision for Voice-First Experiences: We’re interested in founders who aren’t just adding voice to existing products but reimagining what’s possible when voice becomes the primary interface.
Understanding of When Voice is Product vs. Wedge: Is voice your core product and primary value proposition? Or is it your wedge to gain initial traction before expanding into a broader platform?
This distinction shapes everything from go-to-market strategy to future product roadmaps. Both can work, but they require different strategies from Day 1.
Voice-as-wedge requires a clear “Act Two” plan. What will you use to unlock and how will you build from there?
Voice-as-product demands sustainable differentiation beyond the voice technology itself. What new service, experience, or data are you leveraging using voice, and how will you continue iterating?
The voice AI revolution is just beginning. The infrastructure is ready, the models are capable, and the first wave of applications is proving that voice can deliver genuine value.
For founders willing to dig deep into specific problems and build with relentless focus on PMF, this is a crucible moment.
If you’re building the Jarvis for XYZ, or imagining what it would be like to spill your soul to an AI companion, we see you. We have built a community of people who are invigorated by this moment, and ready to build.
If that’s you, join us at our SF event by registering here.
As Founders ourselves, we respect your time. That’s why we built BriefLink, a new software tool that minimizes the upfront time of getting the VC meeting. Simply tell us about your company in 9 easy questions, and you’ll hear from us if it’s a fit.
Try ChatNFX