Introduction
Humans discover and understand their world through visual
and conversational interactions.
Computers (information/communication systems in general)
can be designed and built to allow humans to interact in natural ways,
using the common skills of speaking, gesturing, glancing, moving
around, reaching out.
Description
Our DreamSpace allows users to collaborate in a shared space. The
system "hears" users' voice commands and "sees" their gestures and
body positions. Interactions are natural, more like human-to-human
interactions. The "computer" understands the user, and -- just as
important -- other users understand. Users are free to focus on
virtual objects and information and understanding and thinking, with
minimal constraints or distractions by "the computer", which is
present only as wall-sized 3D images and sounds (but no keyboard,
mouse, wires, wands, etc.). As shown in the schematic below, this
intuitive human-like interaction is made possible by
emerging interface technologies:
-
voice input: user-independent, continuous speech
IBM ViaVoice(tm)
-
vision input of gesture and body: camera and machine vision algorithm;
-
wall-sized stereoscopic "3D" display;
-
high-bandwidth networks.
schematic of DreamSpace
The DreamSpace (also called "Visualization Space") is a networked workspace where the computing system
adapts to the human to optimize ease of use, enjoyment, and the
organization and understanding of information. The DreamSpace
paradigm of computing is ideal for many applications:
education and entertainment
In location-based entertainment (e.g., a virtual
themepark), where interface hardware is often damaged, our
hands-off gadget-free interface allows robust unobtrusive
interactivity. Virtual adventures and walk-throughs are also more fun
and memorable. A user might take another user on a tour of a historic
site, a virtual factory, or flying tour of the continents.
scientific visualization
The interactive communication of complex visual concepts is now as
easy as pointing and speaking. A high-speed network and a
supercomputer (the IBM RS6000 SP) provide computational power and the
ability to handle enormous data bases, e.g., geoseismic data. Users
can collaborate remotely with users at other workspaces and
workstations. The DreamSpace began as a system for scientific visualization, called the
"Visualization Space".
video teleconferencing
DreamSpace sees and hears persons who interact with it. The audio
and video inputs can be streamed to a remote location for video conferencing.
And DreamSpace knows where a person is, and can send just those
pixels. Or, it can send just the upper body or head, and perhaps show
the hands only when there are significant movements. The following triptych
illustrates the automatic tracking feature.
As researcher Athicha Muthitacharoen walks around the room, DreamSpace
tracks her outline, and chunks of pixels are cropped and sent to a
remote location.
Interaction example
The DreamSpace (early 1997) is pictured here,
showing
Mark Lucente
in a typical interaction.
To move one of the virtual objects - the earth - displayed on the
large display, Mark simply points at the object and asks the
system to "put that there":
"Put that... "
"there."
DreamSpace interaction movies
MPEG clips showing interaction, plus verbal explanations
Frequently-asked questions
You ask....
...and I (Mark Lucente) answer. Send questions
to
Mark Lucente
-
-
- What is DreamSpace running on?
- The DreamSpace runs on an IBM PC. Both the
IBM ViaVoice(tm)
speech recognition and the vision input operate on an IBM IntelliStation
Z-Pro with two processors, running Windows NT OS.
Interface
integration, communication, and application software all share this
same PC. Computer graphics rendering power is provided by a standard
graphics accelerator card. An ATM network link to an SP allows for
additional application processing power.
- Why are you using such a high-end PC?
- Initially, we needed lots of computing power for the interface
modalities (voice and vision), which are processed in the main CPU(s)
-- not in specialized hardware.
ViaVoice(tm)
represents an advance in
speech recognition technology that frees up CPU cycles, and the
machine-vision system has been optimized and runs one one processor
without significantly interfering with the rest of the system.
Simpler PCs have also been used.
- How much does it cost?
- The IBM PC, sound card, video digitizer, camera,
microphone, and graphics accelerator card currently in use costs about
$8,000. This is more power and bandwidth than is required (or
utilized). The cost of the display depends on size: our
rear-projection display is big (over 2 meters wide) and bright and costs
about $30,000. A
new display technology
from IBM will deliver better
performance for a fraction of the price. And smaller displays costs
much less.
- What else can it run on?
- The DreamSpace has been implemented on a variety of platforms during
the past three years:
an
IBM
Netfinity 7000;
an
IBM PC 704;
IBM AIX (Unix), distributed across the network; a
single-processor
IBM IntelliStation
; a single
IBM Thinkpad.
- What do all these terms mean?: "natural computing"? "natural
interface"? "DreamSpace"?
- I believe that computers can be designed from the ground up to be
as easy and natural to use as, say, talking to your best friend, your
mom, your cat. Making computers more natural to use ("natural
computing") requires a new kind of "natural interface" -- one that
allows humans to communicate the way they naturally communicate with
each other: speaking, gesturing, moving around, etc. The
DreamSpace (and its predecessor, the
Visualization Space)
uses a
particular kind of natural interface. Users interact with and control
the images displayed in the DreamSpace simply by speaking, gesturing,
etc. Other systems that use
natural interaction are in the works: desks,
tables, cars, kitchens, living rooms -- natural objects and
environments that also happen to be "smart" and interactive.
A simplified version of the DreamSpace (developed at IBM's T.J.Watson
Research Lab in Yorktown, New York) was set up and demonstrated at the
Comdex 1997 computer exhibition in Las Vegas in November 1997.
- When will natural computing be available?
- 2002 to 2007. Initially, it will be used in
specialized applications, and eventually as part of most information
technologies. Bits and pieces of this technology is already seeping
into common use.
IBM ViaVoice(tm)
speech recognition software, for
example, is already being used by over 3 million people. Before 2002,
you will see uses of machine vision for interface, e.g., in
arcade games, amusement parks, etc. By 2005-2007, there will be
many specialized applications using the full multimodal
interface for medical imaging, education,
amusement, research, and (this is the best part) things that we
haven't thought of yet. General use will pick up around this time.
-
What are working on now? What do breakthroughs do you need to make
this practical and commonplace?
- Now that we have begun to give computers senses, the next step is
to move a level higher and to give them the ability to understand --
or appear to understand -- each user. Adding intelligence to the
interface is the goal, and a great deal of work is being done. When
computers begin to appear to understand each user, the act of using a
computer will seem simple, and (we hope) fun.
- What is your goal for
natural interaction?
- The real goal is to allow humans to do what they do best:
communicate. Computers must allow humans to express themselves and
communicate ideas as simply and intuitively as the communicate to each
other, face-to-face. I will reach my goal when nobody ever again says
"I don't know how to use a computer." Of course, I would prefer
that people don't even know that a "computer" is. Humans should be able
to focus on ideas, messages, meaning, and understanding when
performing a task. The "computer" should be invisible (in some cases)
or present as a collaborator or facilitator, exhibiting just
enough personality to help the user feel comfortable.
- Where can I buy your machine vision system?
- It's not for sale, yet. Our machine vision software is based on
algorithms developed at the MIT Media Lab, where it was used for a
number of wonderful demonstrations. During the past three years, IBM
researchers (Gert-Jan Zwart and myself) have re-engineered this vision
system for use in multimodal interfaces -- first on
computers operated using Unix (IBM AIX) and then using the NT OS.
Machine vision for input is advancing rapidly at IBM Research and
elsewhere worldwide, and will likely be available in various forms
during the next five years.
- Can the vision input track more than one person?
- Yes. The Comdex version of the DreamSpace could track a single
person, and peripheral persons were usually ignored. We have developed
a two-person vision system, and more sophisticated software for vision
input in in the works.
- Are you wearing special gloves or holding a wand or
something?
- No. The DreamSpace uses deviceless multimodal
user interface. At present, a wireless microphone is worn to help
reject ambient noise. Future systems will make use of embedded
microphones to hear the users.
- Can ViaVoice really tell what you are saying, without wearing
one of those microphones on your face?
- Yes.
- Can I have this for my desktop?
- Yes, soon, perhaps in 5 to 7 years. We are developing natural
computing systems for many applications, involving different size
displays. Perhaps you will no longer need a desktop at all!
Deviceless natural interfaces are useful (and fun) for a wide range of
systems: embedded into your car; embedded into a desktop or workbench;
embedded into your kitchen or living room; or embedded into the
tabletop at your favorite cafe or diner.
- Does your system talk back to you, like HAL?
- Yes. We use simple text-to-speech (built into every
ViaVoice(tm)
product) to allow DreamSpace to prompt, warn, encourage, inform or
instruct the user. However, it's not an intelligent system... yet.
papers and presentations
Stories from the news media
We thank all media who maintain online stories and
allow us to link to them!
Related resources on the Net
Component technologies
Related work
Related conferences
Other interesting work
IBM-internal info on DreamSpace and natural interaction
last update: 1998 Nov.
Mark Lucente
[ IBM research |
natural interaction
]