Back in 2003 NTT Docomo generated a lot of news on this topic. Their research paper “Unvoiced speech recognition using EMG – mime speech recognition” was the first step in trying to find a way to speak silently while the other party can hear voice. This is probably the most quoted paper on this topic.
NASA was working on this area around the same time. They referred to this approach as ‘Subvocal Speech’. While the original intention of this approach was for astronauts suits, the intention was that it could also be available for other commercial use. Also, NASA was effectively working on limited number of words using this approach (picture source).
For both the approaches above, there isn’t a lot of recent updated information. While it has been easy to recognize certain characters, it takes a lot of effort to do the whole speech. Its also a challenge to play your voice rather than a robotic voice to the other party.
To give a comparison of how big a challenge this is, look at the Youtube videos where they do an automatic captions generation. Even though you can understand what the person is speaking, its always a challenge for the machine. You can read more about the challenge here.
Many researchers have achieved reasonable success using cameras for lip reading (see here and here) but researchers from Google’s AI division DeepMind and the University of Oxford have used artificial intelligence to create the most accurate lip-reading software ever.
Now the challenge with smartphones for using camera for speech recognition will be high speed data connectivity and ability to see lip movement clearly. While in indoor environment this can be solved with Wi-Fi connectivity and looking at the camera, it may be a bit tricky outdoors or not looking at the camera while driving. Who knows, this may be a killer use-case for 5G.