Smart mobile devices have become ubiquitous in our lives and they are also seen as a solution in non-tech business areas. Any business that needs data input has also begun considering mobile devices.
We have been observing an emergence of new use-cases for mobile devices: usage for data input in environments with low computer skills personnel, like plumbers, construction workers, or in environments where traditional interaction with such mobile devices is uncommon. I am thinking about environments where people cannot use their hands to interact with a keyboard or touch screen, or where the noise would prevent them from using voice commands.
The most common alternatives to touch screens, voice recognition and gesture control, are evolving fast, and new solutions are becoming available very quickly. These alternative mobile data input methods allow for the usage of computer technology in some new areas where, due to technical limitations, it was not possible before.
As mobile devices evolve at the benefit of lower costs, better performance, and affordable internet connectivity, and new technology emerge for interaction, we see a potential to address new markets by offering an alternate solution to input data in new ways into already available mobile devices. But how far are these technologies from production usage?
While there are several solutions targeted for gesture control (such as Microsoft Kinect, Google SOLI, MYO Gesture Control Armband, and Leapmotion), none of them are usable directly from a mobile phone yet. There is quite accurate object recognition when using the mobile phone camera, but the gesture recognition is not production-ready.
Se we looked to test voice recognition and solutions based only on the capabilities of mobile phones without using any other external device. We created a prototype for one common use-case: consider a mobile application (Android and iOS) where you can navigate through menu items and introduce alpha-numeric data. All of the operations should be possible by both mobile touchscreen and voice commands, with a preference to a solution that is able to work in an offline mode.
In order to avoid continuously listening for voice commands, we will use face detection to start interpreting voice commands. When the app detects a human face, it will launch the startup screen, where you can see several cards and select from them using a voice command. Each card will take you to a screen where you can either pick an option from a drop-down menu or introduce text in different text boxes (like name, email, etc.). Then you should be able to save the introduced data and navigate back to the startup screen. Each action should be able to be fulfilled using the mobile touchscreen or voice commands.
In order to reach the main screen, you can either touch the screen or sit in front of the camera.
Any option can be selected by touching the corresponding area or saying the corresponding number aloud as a voice command.
You can select the fields and dictate content with voice commands.
Select options from the spinner and navigate between the screens using voice commands.
For implementing this proof of concept, we tested voice commands and gesture and face recognition on several software libraries that can enhance your user experience on your mobile device.
Below is what we tested, and the conclusions we reached.
Image, Gesture, and Face Recognition
Voice Assistants on Android
Voice Assistants on iOS
While the alternative voice commands were appreciated by our recipients, the touchscreen should still be available and will remain as the main interface for mobile applications for a while. Even if the voice or gesture controls are not yet at that production-ready maturity, we will see that in the next couple of years, future mobile apps incorporating alternative ways to interact with the user, as well as allowing a more complex experience and reaching new usability areas.