Voice Commands in VR – Modbox

As a distraction from a large amount of Modbox bugs and adding online support I spent a weekend adding a voice command system to Modbox

Commands are:
– Open *Tool name*
– Go To *Menu Option*
– Spawn *Entity Name*

Then for a variety of actions it’s: Modbox *Action*. Such as toggling play mode on/off, open the load creation menu, selecting mods, etc

First thing I had to do to develop this was pick a good speech recognition library. Based on reading this Unity Labs blog post I tried out Windows dictation recognition, Google Cloud Speech, and IBM Watson.

Google Cloud Speech appeared to work the best – but by far the easiest to integrate was Windows Speech library since it’s already added to Unity (just need to include Unity.Windows.Speech), and there is a lot of great documentation behind it (since it’s used for HoloLens Unity apps). Biggest restriction with it was that it required the user having Windows 10 – so it not only restricted Modbox to Windows, but only Windows 10. If I eventually get Modbox on another platform I can switch then, but for now high end VR is entirely Windows dominated so I can’t see that being needed for years.

First thing I found was thatĀ Speech recognition is a LOT more reliable when it’s just checking for specific commands (like a list of 30 of them), rather than going directly from speech to text. I plan to eventually use direct speech to text for the user entering in text (like if they are naming their creations in Modbox) – but for now based on the context it just generates a list of possible commands. When in edit mode it goes through all Entities the user can spawn and generates a ‘Spawn *Name*’ commands. If in a menu (one of the large floating menu systems) it generates a voice command for each button (just based on the text on the bottom). Rather than manually creating hundreds of possible voice commands it was easy to just generate them based on context.

I was surprised to find voice commands actually useful! I expected this to just be a novelty additions for some users to try out – but now I think it could be a important part of the editor workflow. In many cases it’s moreĀ intuitive and quicker than going into the menu system.

For some commands, like switching to play mode, it’s definitely just as easy to push the menu button and select ‘Play’ – equal amount of time really and effort as saying ‘Modbox Play’. But for more complex actions, like spawning a specific entity, voice commands were massively faster. Rather than going through a menu system to find a ‘Dynamite’ entity in 1 out of the 100 entities (if you have a lot of mods active) you can just say ‘Spawn Dynamite’. I think for this use case, where your trying to select from hundreds of different options and you know what your looking for, voice commands win out of any possible option.

The problem with using a voice command system in a game is reliability. If your game depends on the user being able to do voice commands, and it only works 95% of the time, then that can be incredibly frustrating. Not working 5% of the time means it can’t be depended on for important gameplay – there is nothing more frustrating than dealing with unreliable controls in a challenging game. For a creation tool however, it’s a very useful alternative to a menu system – especially in VR when navigating menus can be complex.

Voice commands should be live in the next Modbox update.