As technology evolves and advances, leaders are constantly on the lookout to evaluate and assimilate solutions that drive business value. In many functional areas, new technologies add incremental advances in efficiency. Periodically, a technology comes along that creates a tectonic shift in the landscape that markedly changes the nature of competition. Bill Gates famously said “People overestimate the technological change that will happen in two years and underestimate the change over ten.” A skeptical eye toward innovation is prudent. Burying your head in the sand is a recipe for obsolescence.
“People overestimate the technological change that will happen in two years and underestimate the change over ten.”
Most industries have a plan for data. There are no shortage of dashboards, metrics, OKRs, etc. that measure every movement inside and outside a business. Artificial Intelligence and Machine Learning promise to exponentially increase the availability of this data and its usefulness in commercial applications. Reduced to its simplest form, data is an historical record. What happened, how much did it happen, when did it happen, these are all questions that can be answered with mountains of data and turned into useful information.
Measuring and quantifying events, actions, interactions, etc. is what consumes most of the attention within industries. With connected devices streaming activities in real time, our ability to track activities of daily life is accelerating rapidly. The promise of all this measurement is the ability to make predictions about the future, based on historical events. These events may have taken place years ago, or within the last millisecond. As an organization, the degree to which business units can harness this ever-expanding universe of measurement has become the measure of success. Most businesses have a strategy for data. If they don’t, they are certain to slowly slide into obscurity.
There is however, an untapped reservoir of information that most businesses have not begun to exploit; the human voice. Typing communication into a keyboard either on a computer or mobile device is actually a secondary function. We use these input devices to communicate primarily because the ability to use our voice and “speak” our intentions to a computer has developed much more slowly and is much more complicated than using our hands. There are many reasons for this, the simplest explanation for which is – it’s hard.
The amount of information captured in the human voice is astonishing. Researchers like Dr. Rita Singh at Carnegie Mellon are able to make accurate predictions from the voice about everything from the speaker’s physical characteristics – height, weight, facial structure and age, for example – to their socioeconomic background, level of income and even the state of their physical and mental health.
This may sound like science fiction (a phrase which itself is becoming less useful!) but it is being used today by researchers to address significant problems. For example, “Swatting” or calling in a hoax distress call to governmental or emergency services. In some cases, this “prank” can cause serious harm, and at the very least a tremendous waste of resources. Dr. Singh’s work has allowed those responsible for these events to be profiled and identified.
In addition to the scientific dissection of the voice signal, there have been recent advancements in Natural Language Processing that have catapulted forward the field of speech recognition and understanding. In 2018, Google engineers released a trained language library known as BERT. This machine learning base knowledge set was released as open source technology allowing researchers around the world to build their own layers and advance the field much faster than prior models.
Anyone who has used Amazon Alexa, Google Assistant, or Apple’s Siri has experienced the current limitations of natural language understanding or NLU. Understanding speech and converting it into usable text (or representative measurements) is the first hurdle to a more human-like interaction with a computer. Accounting for dialects, accents, idioms, etc., is far more complex and the more data we collect the closer the systems have come to reaching human levels of language understanding.
Getting a computer program to understand context has also been incredibly complicated. It’s one thing to simply translate the spoken word into text. To place that text into the context into which it was intended is even more challenging. There are literally billions of combinations that make up our conversational lexicon. Researchers like those at Google have been instrumental in moving us closer to mapping out these combinations. It takes significant time to refine and make it usable. If machines make predictions about your intentions based on your speech and they are able to do so accurately 80% of the time, is that success?
Like most answers to AI questions, it depends. If your model is ordering socks on an ecommerce website, that level of accuracy might be just fine. However, if your application is piloting a vehicle with human lives at risk, you would want a near perfect degree of accuracy. To approach human levels of conversation, a computer must get it right more than 90% of the time. The difference between 80-90% accuracy is actually the difference between usable and practically unintelligible.
The commercial potential for voice technology is not limited to identification and interpreting commands. Ping An, Life Insurance Company of China, is spending billions of dollars each year perfecting voice technologies for applications in financial services and healthcare. The company claims to have an 80% accuracy in detecting deception in applicants applying for financial products. Expanding on the work of researchers, Ping An hopes to develop voice processing systems that can diagnose health problems simply by listening to speech and analyzing the signal. Ethical and privacy implications aside, automating human services to the largest single population on the planet holds tremendous potential benefits.
At AC Global Risk, we have also discovered something amazing buried in the voice – we are able to identify risk. The original use case was in the military to save lives – by making sure that soldiers hired into the allied forces were unlikely to exact a green on blue attack. And today the tech is used as an alert by organizations in many ways – identifying real vs synthetic identity – a massive problem in financial services, hiring rangers instead of poachers, or making sure that the person you’ve spent 3 months interviewing isn’t a foreign agent or hasn’t been previously been fired for misconduct – data that a background check will seldom reveal. Other use cases include healthcare, loan applications, insurance fraud, and many others. It’s about using the voice to create trust and clear people as quickly as possible so that experts can focus on the few that potentially present high risk to the organization.
The human voice has just begun to be explored for commercial applications. Whether it is used for diagnosis, profiling, security, or accuracy, there are almost infinite uses for voice technology across the economy. Like other areas of technology, corporations will be anxious to reap the benefits of research and technological advancements. In addition to a strategy for data, the more forward looking executives will also incorporate strategic vision for voice analysis, computer vision, and other sensory/perception technologies that emerge.