DreamSpace: natural interaction


DreamSpace: natural interaction


Introduction

Humans discover and understand their world through visual and conversational interactions. Computers (information/communication systems in general) can be designed and built to allow humans to interact in natural ways, using the common skills of speaking, gesturing, glancing, moving around, reaching out.

pointing

Description

Our DreamSpace allows users to collaborate in a shared space. The system "hears" users' voice commands and "sees" their gestures and body positions. Interactions are natural, more like human-to-human interactions. The "computer" understands the user, and -- just as important -- other users understand. Users are free to focus on virtual objects and information and understanding and thinking, with minimal constraints or distractions by "the computer", which is present only as wall-sized 3D images and sounds (but no keyboard, mouse, wires, wands, etc.). As shown in the schematic below, this intuitive human-like interaction is made possible by

emerging interface technologies:

schematic of DreamSpace schematic of DreamSpace

The DreamSpace (also called "Visualization Space") is a networked workspace where the computing system adapts to the human to optimize ease of use, enjoyment, and the organization and understanding of information. The DreamSpace paradigm of computing is ideal for many applications:

education and entertainment

In location-based entertainment (e.g., a virtual themepark), where interface hardware is often damaged, our hands-off gadget-free interface allows robust unobtrusive interactivity. Virtual adventures and walk-throughs are also more fun and memorable. A user might take another user on a tour of a historic site, a virtual factory, or flying tour of the continents.

scientific visualization

The interactive communication of complex visual concepts is now as easy as pointing and speaking. A high-speed network and a supercomputer (the IBM RS6000 SP) provide computational power and the ability to handle enormous data bases, e.g., geoseismic data. Users can collaborate remotely with users at other workspaces and workstations. The DreamSpace began as a system for scientific visualization, called the "Visualization Space".

video teleconferencing

DreamSpace sees and hears persons who interact with it. The audio and video inputs can be streamed to a remote location for video conferencing. And DreamSpace knows where a person is, and can send just those pixels. Or, it can send just the upper body or head, and perhaps show the hands only when there are significant movements. The following triptych illustrates the automatic tracking feature.
3 pictures of Athicha moving around the room
As researcher Athicha Muthitacharoen walks around the room, DreamSpace tracks her outline, and chunks of pixels are cropped and sent to a remote location.

Interaction example

The DreamSpace (early 1997) is pictured here, showing Mark Lucente in a typical interaction. To move one of the virtual objects - the earth - displayed on the large display, Mark simply points at the object and asks the system to "put that there":

picture of interaction, before"Put that... " picture of interaction, after"there."

DreamSpace interaction movies

MPEG clips showing interaction, plus verbal explanations


Frequently-asked questions

You ask....
...and I (Mark Lucente) answer. Send questions to Mark Lucente

What is DreamSpace running on?
The DreamSpace runs on an IBM PC. Both the IBM ViaVoice(tm) speech recognition and the vision input operate on an IBM IntelliStation Z-Pro with two processors, running Windows NT OS. Interface integration, communication, and application software all share this same PC. Computer graphics rendering power is provided by a standard graphics accelerator card. An ATM network link to an SP allows for additional application processing power.

Why are you using such a high-end PC?
Initially, we needed lots of computing power for the interface modalities (voice and vision), which are processed in the main CPU(s) -- not in specialized hardware. ViaVoice(tm) represents an advance in speech recognition technology that frees up CPU cycles, and the machine-vision system has been optimized and runs one one processor without significantly interfering with the rest of the system. Simpler PCs have also been used.

How much does it cost?
The IBM PC, sound card, video digitizer, camera, microphone, and graphics accelerator card currently in use costs about $8,000. This is more power and bandwidth than is required (or utilized). The cost of the display depends on size: our rear-projection display is big (over 2 meters wide) and bright and costs about $30,000. A new display technology from IBM will deliver better performance for a fraction of the price. And smaller displays costs much less.

What else can it run on?
The DreamSpace has been implemented on a variety of platforms during the past three years: an IBM Netfinity 7000; an IBM PC 704; IBM AIX (Unix), distributed across the network; a single-processor IBM IntelliStation ; a single IBM Thinkpad.

What do all these terms mean?: "natural computing"? "natural interface"? "DreamSpace"?
I believe that computers can be designed from the ground up to be as easy and natural to use as, say, talking to your best friend, your mom, your cat. Making computers more natural to use ("natural computing") requires a new kind of "natural interface" -- one that allows humans to communicate the way they naturally communicate with each other: speaking, gesturing, moving around, etc. The DreamSpace (and its predecessor, the Visualization Space) uses a particular kind of natural interface. Users interact with and control the images displayed in the DreamSpace simply by speaking, gesturing, etc. Other systems that use natural interaction are in the works: desks, tables, cars, kitchens, living rooms -- natural objects and environments that also happen to be "smart" and interactive. A simplified version of the DreamSpace (developed at IBM's T.J.Watson Research Lab in Yorktown, New York) was set up and demonstrated at the Comdex 1997 computer exhibition in Las Vegas in November 1997.

When will natural computing be available?
2002 to 2007. Initially, it will be used in specialized applications, and eventually as part of most information technologies. Bits and pieces of this technology is already seeping into common use. IBM ViaVoice(tm) speech recognition software, for example, is already being used by over 3 million people. Before 2002, you will see uses of machine vision for interface, e.g., in arcade games, amusement parks, etc. By 2005-2007, there will be many specialized applications using the full multimodal interface for medical imaging, education, amusement, research, and (this is the best part) things that we haven't thought of yet. General use will pick up around this time.

What are working on now? What do breakthroughs do you need to make this practical and commonplace?
Now that we have begun to give computers senses, the next step is to move a level higher and to give them the ability to understand -- or appear to understand -- each user. Adding intelligence to the interface is the goal, and a great deal of work is being done. When computers begin to appear to understand each user, the act of using a computer will seem simple, and (we hope) fun.

What is your goal for natural interaction?
The real goal is to allow humans to do what they do best: communicate. Computers must allow humans to express themselves and communicate ideas as simply and intuitively as the communicate to each other, face-to-face. I will reach my goal when nobody ever again says "I don't know how to use a computer." Of course, I would prefer that people don't even know that a "computer" is. Humans should be able to focus on ideas, messages, meaning, and understanding when performing a task. The "computer" should be invisible (in some cases) or present as a collaborator or facilitator, exhibiting just enough personality to help the user feel comfortable.

Where can I buy your machine vision system?
It's not for sale, yet. Our machine vision software is based on algorithms developed at the MIT Media Lab, where it was used for a number of wonderful demonstrations. During the past three years, IBM researchers (Gert-Jan Zwart and myself) have re-engineered this vision system for use in multimodal interfaces -- first on computers operated using Unix (IBM AIX) and then using the NT OS. Machine vision for input is advancing rapidly at IBM Research and elsewhere worldwide, and will likely be available in various forms during the next five years.

Can the vision input track more than one person?
Yes. The Comdex version of the DreamSpace could track a single person, and peripheral persons were usually ignored. We have developed a two-person vision system, and more sophisticated software for vision input in in the works.

Are you wearing special gloves or holding a wand or something?
No. The DreamSpace uses deviceless multimodal user interface. At present, a wireless microphone is worn to help reject ambient noise. Future systems will make use of embedded microphones to hear the users.

Can ViaVoice really tell what you are saying, without wearing one of those microphones on your face?
Yes.

Can I have this for my desktop?
Yes, soon, perhaps in 5 to 7 years. We are developing natural computing systems for many applications, involving different size displays. Perhaps you will no longer need a desktop at all! Deviceless natural interfaces are useful (and fun) for a wide range of systems: embedded into your car; embedded into a desktop or workbench; embedded into your kitchen or living room; or embedded into the tabletop at your favorite cafe or diner.

Does your system talk back to you, like HAL?
Yes. We use simple text-to-speech (built into every ViaVoice(tm) product) to allow DreamSpace to prompt, warn, encourage, inform or instruct the user. However, it's not an intelligent system... yet.


papers and presentations


Stories from the news media

We thank all media who maintain online stories and allow us to link to them!

Related resources on the Net

Component technologies

Related work

Related conferences

Other interesting work


IBM-internal info on DreamSpace and natural interaction
last update: 1998 Nov.
Mark Lucente

[ IBM research | natural interaction ]

[ IBM ][ Orders ][ Contact IBM ][ Legal ]

This content was originally hosted at IBM Research.