Automatic Speech Recognition: A Sound Innovation

Tracing the History of Automatic Speech Recognition

1952 was an interesting year. American Scientist Jonas E. Salk developed the first experimentally safe polio vaccine, a medical breakthrough that would lead to thousands of saved lives. Still in the medical field, 1952 was the year that a mechanical heart was first used in a human patient. The United States Army formed the Special Forces in 1952.

1952 was also the year that speech recognition was first developed as a technology. Throughout the latter decades of the 20th century, speech recognition was developed, and improved upon. This led to growth in both popularity and usage, particularly in fields such as the military. At this point, it’s safe to say that automatic speech recognition has stood the test of time. More so, speech recognition has secured its importance in today’s digital age, and will very likely be essential in the industries of tomorrow.

With all that said, what exactly is automatic speech recognition?

According to Matthew Zajechowski of Usability Geek, Automatic Speech Recognition or ASR is “the technology that allows human beings to use their voices to speak with a computer interface in a way that, in its most sophisticated variations, resembles normal human conversation.”

Tracking The Technology Through Time

The journey of ASR began in 1952 when scientists at Bell Labs created a program called “Audrey” that could transcribe simple numbers. Other organizations improved this invention in subsequent years, with IBM introducing “Shoebox” a decade after the creation of Audrey. The Shoebox could understand 16 English words.

In 1971, The Defense Advanced Research Projects Agency (DARPA) funded speech recognition research for a system that could comprehend a minimum of 1000 words.

Momentum for ASR picked up through the 80s, giving rise to practical speech recognition operations such as Dragon Dictate in 1990. AT&T Inc. used this technology for their Recognition Processing Service to route calls without the use of an operator. During this decade, ASR vocabulary grew from a few hundred words to several thousand. Innovators developed a new ASR method in the 90s thanks to big data and faster computers: End to End Deep Learning ASR. This method allowed for “learning” and “training” to improve accuracy as more information is fed into networks. One of the key benefits of End to End Deep Learning ASR was its speed, accuracy, and scalability without added costs.

Progress in the New Millennium

By the year 2000, speech recognition technology had advanced to almost 80% accuracy, a marked improvement from the decades that came before. Google launched its Voice Search App, which, on top of improving on existing ASR technology, possessed processing power that could be offloaded to its data centers. This vast amount of data created a great opportunity for Google: the creation of predictive speech. During this time, the company’s English Voice Search System included approximately 230 billion words from user searches.

With the rise of smart technology, corporations raced to integrate ASR into their products. In 2011, Apple launched Siri, a virtual assistant that uses voice queries, focus tracking, and a natural-language user interface. Amazon created its own virtual assistant named Alexa in 2013 and Google launched Google Home three years later.

These voice assistants allowed users to link and control digital devices more conveniently through voice commands. Samsung’s Bixby and Microsoft’s Cortana (which was discontinued in 2019) are other notable examples of voice assistants that use ASR.

Echo through Industries: Automatic Speech Recognition in Use

Education – Through ASR, students can learn proper pronunciation, helping them develop fluency as they speak. It’s also a great tool for learning new languages, as well as enhancing study techniques.

Home Automation – By integrating ASR in-home devices, users are able to connect and manage multiple devices such as televisions, sound systems, fridges, ovens, and washing machines, leading to more efficient processes, greater time management, and smarter decision making.

Telephone Service – One of the earlier uses of ASR, telephony continues to benefit from the advancements of ASR to this day. Call centers utilize ASR by integrating it with Interactive Voice Response (IVR) systems.

Car Systems – Drivers can control their smart device without taking their hands off the steering wheel or their eyes off the road, making for a safer driving experience for themselves as well as other road users.

Health Care – ASR is an essential tool in both the front-end and back-end of the medical documentation process. Health workers dictate into a speech-recognition engine, and their words are displayed as they are spoken. The user can then edit and approve the generated document for further processing.

Military Operations – ASR is seen as a valuable and integral part of military operations and machines. For example, military personnel used speech recognizers in fighter aircraft for setting radio frequencies, commanding auto-pilot systems, setting coordinates, managing displays and even controlling weapons release parameters.

People with Disabilities – For individuals that are Deaf or Hard of Hearing, ASR programs are widely used to automatically create closed-captions for discussions that take place in boardrooms, classrooms, and even on apps such as TikTok. People who are blind or have low vision can rely on ASR to vocalize written text. ASR is also a great tool for people who have difficulties with using their hands as well as those with conditions like dyslexia, giving them the power to type without conventional input devices and listen to text instead of reading it.

Benefits of Automatic Speech Recognition

Thanks to advancements in ASR, capturing speech has become much faster than conventional input processes such as typing. This system also enables users to utilize text-to-speech in real-time. In addition, it has the capacity to spell as well as any other writing tool. ASR can help streamline a multitude of processes and increase productivity in your business. It is an essential tool for diverse and inclusive workplaces by helping individuals living with disabilities.

Make the Right Call with Automatic Speech Recognition

The history of ASR is truly a fascinating one. It will be exciting to see what the future holds for this technology. The rise of artificial intelligence and ASR technology that is more powerful and less expensive means that voice may become the next dominant interface.

The future is here

The future of digital will reward those who can take advantage of it. To this end, we are constantly learning, and staying abreast of advancements and updates to keep our clients ahead of the game online. We’re obsessed with helping you achieve your objectives even in a changing landscape. Reach out to us for digital marketing solutions that can help develop tailored speech recognition systems that work for you.

Schedule Your Strategy Session