Multitouch interaction: some background

Since we have published Ikbel's video and demos about our work on multitouch support for Linux, we have read a bunch of questions and some confusion about multitouch, MPX, gestures, devices, and so on. This text is an attempt at bringing some clarification. There are more complete sources such as Bill Buxton's web page.

Multitouch interaction is a special case of multimodal interaction, that is interaction that involves several input means at the same time. Here, the several input means are usually the users' fingers or other parts of their hands. Multitouch interaction has been imagined and even studied decades ago, but only recent hardware and algorithmic developments have made it a reality for the rest of us.

Multitouch interaction styles: like for single touch interaction (even more, actually), there is a whole continuum of possible interaction styles with multitouch devices. However, most demos published in the last years fit in one of the three following categories, or are simple combinations of them:

  • multitouch gestures: you perform gestures with one, two or more fingers and the gestures are mapped onto commands (low-level such as click or double-click, high-level such as zoom, pan, etc). A number of patents have been filed involving the use of such gestures.
  • multipointer interaction: just as if you had several mice, you can point at several objects at the same time, drag several windows, or resize a window by picking two corners at the same time.
  • physical interaction: the contact surfaces are used to interact with the contents of the screen. You can scoop objects between your two hands, for instance.

    Multitouch devices and algorithms. There mainly are three categories of devices on the market today:

  • large surfaces that contain one or more video cameras and perform computer vision and tracking. This include FTIR tables such as Jeff Han's, IntuiLab's, and many DIY tables, as well as direct illumination tables such as Multitouch Oy's and the Microsoft Surface. Direct illumination also allows to detect objects with visual tags. The quality of these optical devices depends on the quality of the tracking algorithms, but most are able to individually track multiple fingers and some provide for physical interaction as well.
  • multitouch displays. These are made of layers that use various electromagnetic, optical or sound effects to detect contacts, and that are combined with LCD displays. Some, like Stantum's transparent resistive layers (starting with their JazzMutant Lemur a few years ago), work with any physical contact but do not distinguish between fingers and other objects. Others, like N-Trig's dual transparent layer, distinguish special pens from fingers.
  • multitouch trackpads. Many trackpads on notebooks have multitouch capabilities. A special case in this category is CircleTwelve's DiamondTouch, a very large white trackpad designed so that one can project images on it from above; up to four users can use the DiamondTouch at a time, and in a way it behaves like four multitouch trackpads.
  • With vision-based systems, all algorithmic work is done in the computer. With electromagnetic-based devices, algorithms may be in the firmware, in the driver, or in user space (servers, applications, libraries). There are three levels of algorithmic processing: making (X, Y) couples from hardware measurements, tracking them over time so as to make (id, X, Y) triples, and analysing movements so as to recognise gestures.

    We have started a list of available multitouch devices with their known features. Do not hesitate to contact me to correct mistakes or add devices to the list.

    Multitouch or multidevice? With the devices above, there are several interaction streams for a single device. This is the reason why the Linux input system had to be modified by Henrik Rydberg to accommodate multitouch, actually. But when used for multipointer interaction, this is equivalent to having several pointing devices. In other words, the "device" concept is reaching its limits. See my Tabletop 2007 paper referenced below for a discussion of this.

    Linux and multitouch. There are several layers involved:

  • since 2.6.30, the Linux input system has an interface for multitouch events
  • a few kernel-space drivers are available or under development, some by our group: for the Stantum products, the NTrig layer, the DiamondTouch, and Broadcom touchpads. These feed the input system with multitouch events.
  • as in our demos, you can use the multitouch events directly.
  • but eventually, you'll probably want X.org to take these events and make something useful out of them. The main MPX code is not exactly made for this, because it is focused on multi-pointer interaction. The MPX blobs branch is more suited. Some have the idea of adding gesture recognition as X.org modules. XInput2 will probably contain some of the results of these works.

    Some self-promotion:

  • Stephane Chatty, Alexandre Lemort, Stephane Vales, Multiple input support in a model-based interaction framework , Proceedings of the second annual IEEE international workshop on horizontal interactive human-computer systems (Tabletop 2007), IEEE computer society, 2007.
  • Stephane Chatty, Extending a graphical toolkit for two-handed interaction, Proceedings of the ACM UIST'94 conference, ACM Press, 1994.

    Contact: chatty at enac.fr