TNL.net is designed for modern browsers but the content is still readable in older ones. If you want to ensure the best experience, please install a browser that was developed after 2009.

tnl.net

Beyond touch interfaces

The rise of touch-based inter­faces has rev­o­lu­tion­ized com­put­ing for decades to come, and may even­tu­ally go as far as get­ting rid of the dom­i­nant Windows-Icons-Mouse-Pointer (WIMP) com­puter inter­face that has been the main way peo­ple inter­acted with com­put­ers for almost 3 decades. But what’s next? Is it voice? It is ges­tures? Let’s ana­lyze the landscape.

Kinect and the ges­ture based world

Kinect, via Werner Wattenberg on Flicker

Last year, Microsoft unveiled the Kinect, a rev­o­lu­tion­ary new device that allowed Xbox 360 users to use their bod­ies as the way to inter­act with com­puter games. With the Kinect, Microsoft solved a sub­stan­tial inter­face prob­lem that had been slow­ing down the suc­cess of any computing-intensive efforts on the TV screen: How does one increase func­tion­al­ity on a large screen with­out requir­ing more com­plex remote controls.

It was already a given that com­puter key­boards and mice were not the way to inter­face with a large screen, as the added com­plex­ity they added to a liv­ing room setup were mak­ing for a more annoy­ing user expe­ri­ence than sim­ple TV remotes. In a world of users being already annoyed by the pro­lif­er­a­tion of TV remotes.

Remotes, via paulm on Flickr

By intro­duc­ing a mode of inter­ac­tion that allowed to get rid of con­tact with the screen, either through a remote con­trol or by directly touch­ing it (a mode that works for smaller screen like phones or tablets but gets more com­plex when the screen is larger and thus far­ther away from the user), Microsoft cre­ated a whole new inter­ac­tion lan­guage for com­put­ing devices and has made it pos­si­ble to grow sub­stan­tially beyond the tra­di­tional mode of inter­fac­ing with a com­puter interface.

The kinect inter­face is some­thing that was pre­dicted as far back as the early 1990s, when AT&T ran its now famous “You Will” TV ad cam­paign in the United States, and was later pop­u­lar­ized by Steven Spielberg’s 2002 Minor­ity Report movie. The idea of ges­tur­ing in the air took sub­stan­tial hold in the col­lec­tive con­scious­ness and when the iPhone and iPad were intro­duced, peo­ple looked at that mul­ti­touch model as the answer.

But there has long been a chal­lenge as to how to deal with larger screen. It is dif­fi­cult to con­sider users step­ping up to a 42 inch (or larger) TV screen and touch­ing the sur­face of the screen in order to get what they want. So the next chal­lenge became how to get rid of that last piece that stopped the users from inter­fac­ing. Nin­tendo first advanced the idea by putting a sen­sor in the joy­stick to the Wii.

Wiimote

 But the Wii still required some com­po­nent to inter­face with the sys­tem. The real genius of the Microsoft solu­tion was that it did away with every inter­ac­tion com­po­nents. In return­ing to a basic approach using only our bod­ies, Microsoft cre­ated a model that will con­tinue to impact com­put­ing for decades to come.

Siri and the voice-driven world

Siri

Mean­while, Apple had also been con­sid­er­ing how to get rid of direct touch­ing of a device. Their research led to an inno­v­a­tive com­pany that was doing inter­est­ing things in voice-related con­trols: Siri, Inc.

Upon see­ing the com­pany, Apple promptly acquired it, real­iz­ing that the approach Siri was tak­ing to voice-enabled inter­faces was one of the next com­po­nents of the future of computing.

There has been many other efforts at doing voice-controlled inter­faces, dat­ing back to the 1990s and it has long been assumed by the main­stream that voice-controlled inter­faces were the way com­put­ing was headed. In Stan­ley Kubrick’s “2001: A space odyssey”, we all met HAL, a com­puter that could han­dle con­ver­sa­tions with human beings with some dis­as­trous effects for the human. With the pos­si­ble excep­tion of Star Trek, the abil­ity for a com­puter to carry on a con­ver­sa­tion with a human being has gen­er­ally been pre­sented as a dis­as­trous thing for human­ity: a long strand of movies showed com­put­ers and robots (basi­cally humanoid com­put­ers, as far as movies are con­cerned) have shown that the abil­ity to carry a con­ver­sa­tion with a human being is the first step to a com­puter decid­ing to rebel against his owner. Whether it is HAL in 2001 (1968), Delos Con­trol in West World (1973), Mas­ter Con­trol Pro­gram in Tron (1982), Edgar in Elec­tric Dreams (1984), or Skynet in the Ter­mi­na­tor fran­chise (1984, 1991, 2003), human­ity always seems to be endan­gered by talk­ing computers.

When tech­no­log­i­cal fears are rep­re­sented in media forms, they merely rep­re­sent the far­thest the imag­i­na­tion of a soci­ety can go when it comes to imag­in­ing the peak of what is pos­si­ble with tech­nol­ogy. So com­put­ers being able to carry on con­ver­sa­tions have long been a holly grail of the com­put­ing world… and now Apple owns some­thing that may point to the right approach.

A ques­tion as to whether this type of tech­nol­ogy will become more than just some­thing that can be found in Apple prod­ucts is going to turn not on tech­ni­cal mer­its but on juridi­cal points as Apple may have locked up most of the poten­tial roads for­wards by patent­ing sub­stan­tial por­tions of the space. Unless they decide to make the tech­nol­ogy avail­able to oth­ers, it may find itself locked into the Apple ecosys­tem with no way to get out for a long period of time.

How­ever, Apple’s com­peti­tors includ­ing Google and Microsoft will prob­a­bly con­sider this an impor­tant enough space to attempt to go after it with all their might. As voice dri­ven inter­faces have long been seen as a high mark on the tech­no­log­i­cal and tech­ni­cal advances scale, those com­pa­nies will not want to be left out of what may be the next big thing.

Lessons in hid­den complexity

What Siri and the Kinect have in com­mon is that they have man­aged to pack­age up a lot of com­plex­ity into what appears to be a very sim­ple solution.

The Kinect includes infrared pro­jec­tors, an infrared cam­era, a video cam­era, motion sen­sors, sev­eral micro­phones and motors to track indi­vid­u­als and enough pro­cess­ing power to han­dle audio and video recog­ni­tions into a sleek black bar that sends infrared light all over a room, records where the light is not show­ing and based on that makes some guesses where bod­ies are located while at the same time track­ing audio and video cues to direct what­ever pro­gram is being used with it.

Siri records what its user says to it, sends the record­ing to a server which turns that record­ing into text it trans­lates it into action­able mate­r­ial before iden­ti­fy­ing the right infor­ma­tion data­base to retrieve infor­ma­tion from,  get­ting the nec­es­sary infor­ma­tion, pack­ag­ing it into a file that will be read back by the device and send­ing that file back to the phone (all this in a mat­ter of a few seconds).

The big secret in the suc­cess of these offer­ings is that they do not scream about all the inno­va­tion and com­plex­ity. In fact, they go out of their ways to hide that com­plex­ity and there is a les­son here for most star­tups: it is not because things are com­plex that you should dis­play them as such; quite the con­trary, as new com­pa­nies should address com­plex prob­lems and go as far as pos­si­ble to shield users away from that complexity.

Dis­ap­pear­ing devices

The other impor­tant thing to pay atten­tion to here is that we are see­ing the increas­ing dis­ap­pear­ance of the inter­face. In the early days of com­put­ing, the key­board was the way to enter infor­ma­tion into a machine (I know some will gripe that there were punch cards before that but I’m talk­ing about the mod­ern era of com­put­ing); that was even­tu­ally aug­mented by other devices like the mouse and touch­pad, which remained the dom­i­nant form of inter­ac­tion with com­put­ers over almost 3 decades; more recently, we have seen a more direct inter­ac­tion with screens as touch inter­faces have gained promi­nence on new form fac­tors like mobile phones and tablets. Each step of the way, we are see­ing the fric­tion between a user’s input and the computer’s response slowly frit­ting away, with less and less being between the user and the device.

But in each case, the inter­ac­tion was between the user and the com­put­ing device itself.

With whole-body and voice inter­faces like the Kinect and Siri, that last amount of fric­tion is dis­ap­pear­ing and we may be enter­ing a new era of ubiq­ui­tous com­put­ers, where the com­put­ing devices just dis­ap­pears. I think the cur­rent direc­tion of those tech­nolo­gies points to a poten­tial future where we could see the iphone nano, a new screen-less ver­sion of iphone that would take the form fac­tor of a sim­ple pair of head­phones with no extra wires. All inter­ac­tions would be han­dled via voice com­mand with hardly any need for a screen.

Mean­while, in the liv­ing room, your TV will be both voice and motion aware, turn­ing itself off when no one is in the room and com­pletely con­trolled with­out requir­ing any phys­i­cal remote con­trol (some­thing that will guar­an­tee high accep­tance as the pro­lif­er­a­tion of remote con­trols in the liv­ing room has become a mod­ern annoyance).

That future is not too far away. As an eter­nal opti­mist, I would put its promise at some­where around 5 years from now but as a real­ist, I also need to warn that it is likely this could take as much as a decade to reach the main­stream. So get ready for new inter­faces and if you want to beta test the future, grab a Kinect or a new iPhone 4S: they are the first of many such devices that will pop­u­late our world within the next decade.

Originally published on November 27, 2011 in Technology . You may find related thoughts pieces under the following terms: , , , , , , ,