Voice Is the New OS: Getting Ready for the AI-First World - Part 2

September 7, 2021

PART-II: What the Future Holds

‍

Voice Is the New OS: Getting Ready for the A.I. First World

‍

Advanced voice technology will soon be ubiquitous, as natural and intelligent UI integrates seamlessly into our daily lives.

‍

Within the next four years, 50% of all searches will be either images or speech.

‍

The human voice will be a primary interface for the smart and connected home, providing a natural means to communicate with kitchen appliances, alarm systems, sound systems, lights, and more. More and more new-age cars manufacturers will adopt intelligent, voice-driven systems for entertainment and location-based search, keeping drivers’ and passengers’ eyes and hands-free. Audio and video entertainment systems will be programmed on naturally spoken voices for content discovery. Voice-controlled devices will also dominate workplaces that require hands-free mobility, such as hospitals, warehouses, laboratories, and production plants.

‍

After 40-odd years of development, voice recognition has nearly reached its zenith. IAs can now effectively recognize speech. With groundbreaking advancements in artificial intelligence, we are finally overcoming these formerly insurmountable challenges by training and adapting systems through machine learning.

‍

It’s all about Natural Language Understanding (NLU)

‍

Natural language processing, the field of computing and AI concerned with making computers understand and speak the natural languages of humans has come a long way in recent years.

‍

Once the stuff of science fiction, recent advances in language understanding, machine learning, computer vision, and speech recognition have made voice interfaces far more practical, making it easier to communicate with the devices around us.

‍

I earlier wrote about some UI trends and how fast we are getting into a Zero UI environment here.

‍

Anatomy of a Voice Interaction and NLU

‍

For every application, building a voice interaction and NLU model is a structured process, which involves:

‍

Creating a custom knowledge graph from the database or website
Training NLU models to understand and interpret queries
Optimizing machine learning algorithms to identify and perform the correct action
Integrating into client applications on all major platforms

‍

The breadth of data and context also challenges delivering high-level natural language understanding. For instance, “I saw the man on the hill with the telescope” can mean any of these.

‍

Every knowledge domain requires that NLU systems recognize new and specialized terminology and understand how meanings of words shift in a new context. When perfected, it leads to a very superlative experience:

‍

Voice-First World and The Age of the Machine Man

‍

To paraphrase William Gibson, the future is already here and, thanks to intelligent assistants (IA), it is more evenly distributed than ever before. Based on our spoken input, assistants like Alexa, Siri, and Google Now answer questions, navigate routes and organize meetings. Alexa can also order pizza, hail an Uber, or complete an order from Amazon.

‍

Machines are learning to take over our digital lives using our voice. We’re also finding that we are more comfortable communicating with devices through spoken word.

‍

“The ultimate destination for the voice interface will be an autonomous humanoid robotic system that very much like a science fiction movie, will fundamentally interact with us via voice.” – Brian Roemmele

‍

Voice Commerce for the Cognitive Era

‍

Voice Commerce is still in the late novelty stage as far as implementation is concerned. However, it is poised to bring about a paradigm shift for customers and other industries, i.e., all advertisement-dependent industries.

‍

Amazon’s vision here is the most ambitious: “to embed voice services in every possible device, thereby reducing the importance of the device, OS, and application layers.” It’s no coincidence that those are also the layers in which Amazon is the weakest. The market potential is huge, and it is no surprise that all the big tech companies are investing heavily in voice and AI.

‍

Echo as Commerce Device

‍

From Amazon’s standpoint, more important than being the hub of the smart home is continuing to expand its dominance in e-commerce. In March 2015, Amazon rolled out Dash Buttons, which allow consumers to reorder CPG items with the push of a button, and very recently activated Alexa to shop for Amazon Prime products.

‍

Alexa now features more than 1,400 third-party skills. Here is a quick look at 50 things Alexa can do and an inside story about how Echo was created.

‍

Voice Banking & Payments: Facilitating Frictionless Commerce

‍

Payment is the most important element in a commerce transaction. Hence, logically, voice payment is a key functionality required in every voice assistant’s core skill-set.

‍

Capital One recently announced the rollout of a new skill, which allows consumers to do their banking by voice, including checking balances, reviewing transactions, making payments, and more. For example, Capital One customers could ask Alexa questions like, “Alexa, what is my Quicksilver Card balance?” or “Alexa, ask Capital One to pay my credit card bill.” Alexa uses pre-linked funds to pay the bill and can pull up account information and reply to the questions for other items. Alexa can already pay for your Uber or order pizza.

‍

Earlier this year, Microsoft Cortana was integrated with Paytm Wallet. Paytm customers can now pay their utility bills and make mobile transactions using a Cortana-enabled smartphone.

‍

Recently, Apple’s Siri also got some impressive upgrades, including the ability to make payments using Square Cash, Venmo, and number26.

‍

Voice payments and commerce seem to be the next logical step for everyone. However, like Amazon, some players are a better fit for the task than others.

‍

Voice: Reinventing Home Entertainment

‍

The way we interact with our devices at work and at home is constantly changing. Not too far in the future, the ‘voice’ will be the primary means of interacting with technology in the home. Voice solves the issue of increasing device functionality without adding cumbersome buttons and displays to streamlined designs.

‍

Image Credit-Mindmeld

‍

For example, in the connected home of the future, an efficient voice interface will eliminate light switches, appliance buttons, remote controls, along with any task that requires you to grab your phone and perform a quick search. As consumers see the capabilities, the market demand for these products will skyrocket.

‍

Viewers can ask their TV, remote control, or set-top box to search for programs, movie titles, actors and actresses, favorite genres, particular sports, and virtually any other category of preferred content.

‍

Furthermore, Voice Biometrics also personalizes services where different household members can be identified by their voice and instantly have access to individual custom home screens, commonly-searched-for content, recently-viewed, and personal web applications like social media feed.

‍

All the major tech companies – Facebook, Google, Microsoft, Apple, Yahoo, Baidu, and Amazon – are centering their strategy around natural-language user interactions and believe that this is where the future of human-computer interaction is heading.

‍

The next big step will be for the very concept of the device to fade away.

‍

Most high-end smartphones now have a voice assistant built into them. The technology has evolved from pure voice recognition to artificial intelligence (AI), which can understand human language, analyze the content, and respond accordingly. It is reasonable to expect that all smartphones, TVs, PCs, tablets, GPSs, and game consoles will come with a voice assistant in five years. A fair percentage of cars, high-end appliances, and toys will also have this technology built-in.

‍

“The next big step will be for the very concept of the ‘device’ to fade away. Over time, the computer itself – whatever its form factor – will be an intelligent assistant helping you through your day. We will move from mobile-first to an AI-first world.” – Google CEO Sundar Pichai

‍

Voice might seem mostly a novelty today, but the next big thing often starts out looking that way in technology.

‍

As Brian writes, "The computer as we know it has been shrinking and, in many ways, will disappear and become a nexus connecting us via speech. There will still be touch screens and VR headsets, perhaps even ephemeral holographic displays in the next ten years. However, voice interfaces will continue to grow and supplement these experiences.”

‍

Most AI experts agree that applications demonstrating a human-like understanding of language are possible today for the first time.

‍

“AI interfaces – which in most cases will mean voice interfaces – Could become the master routers of the internet economic loop, rendering many of the other layers interchangeable…” – Chris Dixon

‍

Keyboard-less smart home devices are rapidly growing in popularity, which require voice interfaces for all user interactions – i.e., no more typing, swiping, or searching.

‍

The user interface of the future tech devices and appliances will be the one that you can’t see or touch. Voice recognition, voice commands, and audio engagement will be the de facto way to interact with technology. And when all that happens, we might be the last generation to type a keyword into a search engine.

‍

To learn about Prove’s identity solutions and how to accelerate revenue while mitigating fraud, schedule a demo today.

Tags:

North America

Keep reading

See all blogs

Company News

Prove Launches Unified Authentication Solution with Passive, Persistent Customer Recognition

New solution delivers measurable ROI by replacing SMS OTPs and mitigating SIM swap fraud – cutting costs, reducing abandonment, and enabling seamless customer experiences across channels.

BusinessWire

August 12, 2025

Company News

Blog

A Watershed Moment for Digital Banking: FDIC's New Stance on Pre-Filled Data Paves the Way for a Safer, Faster Future

In this blog, we detail how the FDIC's new guidance, influenced by Prove's advocacy, allows banks to use pre-filled customer data for CIP. This change, empowering solutions like Prove Pre-fill®, significantly reduces friction, enhances security, and improves financial inclusion in digital banking by streamlining account opening and combating fraud.

Ryan Alexander

August 6, 2025

Blog

The Anatomy of a Systemic Failure: The U.S. Banking System as a Conduit for Transnational Crime

The U.S. banking system is a conduit for transnational crime, with systemic failures in identity verification and a lack of oversight, enabling fraud and money laundering. Read Prove’s call to action for the financial industry.

Mary Ann Miller

July 30, 2025

Blog

Let us Prove it
Talk to an expert today

Let's talk

Trusted by 1,500+ leading companies to reduce fraud and improve consumer experiences, Prove is the world’s most accurate identity verification and authentication platform.