The rise of touch-based interfaces has revolutionized computing for decades to come, and may eventually go as far as getting rid of the dominant Windows-Icons-Mouse-Pointer (WIMP) computer interface that has been the main way people interacted with computers for almost 3 decades. But what’s next? Is it voice? It is gestures? Let’s analyze the landscape.
Kinect and the gesture based world
Last year, Microsoft unveiled the Kinect, a revolutionary new device that allowed Xbox 360 users to use their bodies as the way to interact with computer games. With the Kinect, Microsoft solved a substantial interface problem that had been slowing down the success of any computing-intensive efforts on the TV screen: How does one increase functionality on a large screen without requiring more complex remote controls.
It was already a given that computer keyboards and mice were not the way to interface with a large screen, as the added complexity they added to a living room setup were making for a more annoying user experience than simple TV remotes. In a world of users being already annoyed by the proliferation of TV remotes.
By introducing a mode of interaction that allowed to get rid of contact with the screen, either through a remote control or by directly touching it (a mode that works for smaller screen like phones or tablets but gets more complex when the screen is larger and thus farther away from the user), Microsoft created a whole new interaction language for computing devices and has made it possible to grow substantially beyond the traditional mode of interfacing with a computer interface.
The kinect interface is something that was predicted as far back as the early 1990s, when AT&T ran its now famous “You Will” TV ad campaign in the United States, and was later popularized by Steven Spielberg’s 2002 Minority Report movie. The idea of gesturing in the air took substantial hold in the collective consciousness and when the iPhone and iPad were introduced, people looked at that multitouch model as the answer.
But there has long been a challenge as to how to deal with larger screen. It is difficult to consider users stepping up to a 42 inch (or larger) TV screen and touching the surface of the screen in order to get what they want. So the next challenge became how to get rid of that last piece that stopped the users from interfacing. Nintendo first advanced the idea by putting a sensor in the joystick to the Wii.
But the Wii still required some component to interface with the system. The real genius of the Microsoft solution was that it did away with every interaction components. In returning to a basic approach using only our bodies, Microsoft created a model that will continue to impact computing for decades to come.
Siri and the voice-driven world
Meanwhile, Apple had also been considering how to get rid of direct touching of a device. Their research led to an innovative company that was doing interesting things in voice-related controls: Siri, Inc.
Upon seeing the company, Apple promptly acquired it, realizing that the approach Siri was taking to voice-enabled interfaces was one of the next components of the future of computing.
There has been many other efforts at doing voice-controlled interfaces, dating back to the 1990s and it has long been assumed by the mainstream that voice-controlled interfaces were the way computing was headed. In Stanley Kubrick’s “2001: A space odyssey”, we all met HAL, a computer that could handle conversations with human beings with some disastrous effects for the human. With the possible exception of Star Trek, the ability for a computer to carry on a conversation with a human being has generally been presented as a disastrous thing for humanity: a long strand of movies showed computers and robots (basically humanoid computers, as far as movies are concerned) have shown that the ability to carry a conversation with a human being is the first step to a computer deciding to rebel against his owner. Whether it is HAL in 2001 (1968), Delos Control in West World (1973), Master Control Program in Tron (1982), Edgar in Electric Dreams (1984), or Skynet in the Terminator franchise (1984, 1991, 2003), humanity always seems to be endangered by talking computers.
When technological fears are represented in media forms, they merely represent the farthest the imagination of a society can go when it comes to imagining the peak of what is possible with technology. So computers being able to carry on conversations have long been a holly grail of the computing world… and now Apple owns something that may point to the right approach.
A question as to whether this type of technology will become more than just something that can be found in Apple products is going to turn not on technical merits but on juridical points as Apple may have locked up most of the potential roads forwards by patenting substantial portions of the space. Unless they decide to make the technology available to others, it may find itself locked into the Apple ecosystem with no way to get out for a long period of time.
However, Apple’s competitors including Google and Microsoft will probably consider this an important enough space to attempt to go after it with all their might. As voice driven interfaces have long been seen as a high mark on the technological and technical advances scale, those companies will not want to be left out of what may be the next big thing.
Lessons in hidden complexity
What Siri and the Kinect have in common is that they have managed to package up a lot of complexity into what appears to be a very simple solution.
The Kinect includes infrared projectors, an infrared camera, a video camera, motion sensors, several microphones and motors to track individuals and enough processing power to handle audio and video recognitions into a sleek black bar that sends infrared light all over a room, records where the light is not showing and based on that makes some guesses where bodies are located while at the same time tracking audio and video cues to direct whatever program is being used with it.
Siri records what its user says to it, sends the recording to a server which turns that recording into text it translates it into actionable material before identifying the right information database to retrieve information from, getting the necessary information, packaging it into a file that will be read back by the device and sending that file back to the phone (all this in a matter of a few seconds).
The big secret in the success of these offerings is that they do not scream about all the innovation and complexity. In fact, they go out of their ways to hide that complexity and there is a lesson here for most startups: it is not because things are complex that you should display them as such; quite the contrary, as new companies should address complex problems and go as far as possible to shield users away from that complexity.
The other important thing to pay attention to here is that we are seeing the increasing disappearance of the interface. In the early days of computing, the keyboard was the way to enter information into a machine (I know some will gripe that there were punch cards before that but I’m talking about the modern era of computing); that was eventually augmented by other devices like the mouse and touchpad, which remained the dominant form of interaction with computers over almost 3 decades; more recently, we have seen a more direct interaction with screens as touch interfaces have gained prominence on new form factors like mobile phones and tablets. Each step of the way, we are seeing the friction between a user’s input and the computer’s response slowly fritting away, with less and less being between the user and the device.
But in each case, the interaction was between the user and the computing device itself.
With whole-body and voice interfaces like the Kinect and Siri, that last amount of friction is disappearing and we may be entering a new era of ubiquitous computers, where the computing devices just disappears. I think the current direction of those technologies points to a potential future where we could see the iphone nano, a new screen-less version of iphone that would take the form factor of a simple pair of headphones with no extra wires. All interactions would be handled via voice command with hardly any need for a screen.
Meanwhile, in the living room, your TV will be both voice and motion aware, turning itself off when no one is in the room and completely controlled without requiring any physical remote control (something that will guarantee high acceptance as the proliferation of remote controls in the living room has become a modern annoyance).
That future is not too far away. As an eternal optimist, I would put its promise at somewhere around 5 years from now but as a realist, I also need to warn that it is likely this could take as much as a decade to reach the mainstream. So get ready for new interfaces and if you want to beta test the future, grab a Kinect or a new iPhone 4S: they are the first of many such devices that will populate our world within the next decade.