29 Aug Microsoft advancing speech recognition, deep learning capabilities
Organizations use apps for virtually everything. There are programs used for dictation, answering questions and providing critical insights that will benefit business operations. However, there are still a number of gaps in what we need the software to do versus what functionality we can actually code into it. Machines cannot think contextually or freely like humans can, creating critical holes in overall understanding and accuracy.
As our reliance on applications and technology continues to increase, efforts must be made to improve their capabilities. Microsoft is rising to the challenge of delivering better techniques that will improve speech recognition, deep learning and virtual assistant technology into the future.
Hitting recognition records
If you’ve ever used dictation software, you’re likely familiar with the frustration these programs cause. They often take down words inaccurately and make it painfully necessary to enunciate every word. Even then, the software can miss the mark, making it difficult for you to refer to your notes and understand what you were saying. In industries like health care or law where verbal notes are common and accuracy is critical, having a subpar speech recognition software isn’t acceptable.
Microsoft has made considerable progress in improving its own program. The organization announced that its speech recognition software hit a 5.1 percent error rate, putting its accuracy on par with professional human transcribers who are able to listen to recordings several times. TechCrunch noted that this was accomplished by improving the neural net-based acoustic and language models in the system. The researchers also enabled the speech software to utilize entire conversations, allowing it to adapt its transcriptions to the context and predict what phrases were likely to come next. This technology is on a different level from the clunky programs that have traditionally been used and serves as an example for AI and deep learning efforts in the future.
Capitalizing on Cortana
Virtual assistants came into widespread adoption when Apple introduced Siri on the iPhone 4S in 2011. Since then, there have been multiple brands trying their hands at creating the perfect helper within your hardware. However, the technology is still young, and it can’t answer more complex or context-driven questions. According to a survey by Creative Strategies, 50 percent of people use voice assistants in the car, and 39 percent leverage them while at home. Microsoft’s efforts in speech recognition and AI may be just the thing that virtual assistants need to become true powerhouses.
Microsoft’s own assistant Cortana operates on Windows systems, Xbox consoles and as a cross-platform mobile app, making it one of the more versatile AIs. The company is looking to improve all assistants by making a new dataset available to the public, Quartz reported. The dataset consists of 22 pairs of humans talking to each other, asking questions and trying to come up with good responses. The main goal of this release is to enable future AI to analyze how humans would do the same tasks the virtual assistants handle, particularly in the realm of information retrieval. AIs can learn how a natural exchange sounds, the importance of context and how a human knows when they’ve been supplied a good answer. The dataset differs from past attempts in that it also includes information about the participants’ levels of emotion, engagement, stress and satisfaction.
“Microsoft is focusing on areas that people use these systems the most.”
Diving into deep learning
Microsoft’s forays into improving speech recognition and virtual assistants through advancing deep learning and AI capabilities aren’t surprising. The organization has made numerous moves in the past to capitalize on these technologies and take them to the next level. Microsoft recently announced Brainwave, an FPGA-based system for deep learning in the cloud with ultra-low latency, TechCrunch reported. The solution is capable of sustaining 39.5 Teraflops on a large gated recurrent unit without any batching. Microsoft synthesized deep neural net processing units into its FPGAs to help adapt infrastructure faster and offer near real-time processing power.
As AI and deep learning become more popular, organizations will be looking to take advantage and improve upon the technology. Microsoft is doing that by focusing on areas where people use these systems the most – speech recognition and virtual assistants. To keep up with Microsoft capabilities, partner with a certified reseller like Pinnacle Business Systems.