[From special issue of SIGGRAPH's ``Computer Graphics'' publication on ``Current, New, and Emerging Display Systems'', May 1997.]

PDF and PostScript versions are available.

Interactive three-dimensional holographic displays: seeing the future in depth

Mark Lucente

IBM Research Division
Thomas J. Watson Research Center
"lucente" at "alum.mit.edu"

Introduction

Computer graphics is confined chiefly to flat images. Images may look three-dimensional (3-D), and sometimes create the illusion of 3-D when displayed, for example, on a stereoscopic display [1-3]. Nevertheless, when viewing an image on most display systems, the human visual system (HVS) sees a flat plane of pixels. Volumetric displays can create a 3-D computer graphics image, but fail to provide many visual depth cues (e.g., shading, texture gradients) and cannot provide the powerful depth cue of overlap (occlusion). Discrete parallax displays (such as lenticular displays) promise to create 3-D images with all of the depth cues, but are limited by achievable resolution. Only a real-time electronic holographic (``holovideo'') display [4-12] can create a truly 3-D computer graphics image with all of the depth cues (motion parallax, ocular accommodation, occlusion, etc.) and resolution sufficient to provide extreme realism [2]. Holovideo displays promise to enhance numerous applications in the creation and manipulation of information, including telepresence, education, medical imaging, interactive design, and scientific visualization.

The technology of electronic interactive three-dimensional holographic displays is in its first decade. Though fancied in popular science fiction, only recently have researchers created the first real holovideo systems by confronting the two basic requirements of electronic holography: (1) computational speed, and (2) high-bandwidth modulation of visible light. This article describes the approaches used to address these problems, as well as emerging technologies and techniques that provide firm footing for the development of practical holovideo. [See Glossary at end for a list of terms and abbreviations.]

Electroholography Basics

Optical holography, used to create 3-D images, begins by using coherent light to record an interference pattern [13]. Illumination light is modulated by the recorded holographic fringe pattern (called a ``fringe''), subsequently diffracting to form a 3-D image. As illustrated in Figure 1, a fringe region that contains a low spatial frequency component diffracts light by a small angle. A region that contains a high spatial frequency component diffracts light by a large angle. In general, a region of a fringe contains a variety of spatial frequency components and therefore diffracts light in a variety of directions.

An electroholographic display generates a 3-D holographic image from a 3- D description of a scene. This process involves many steps, grouped into two main processes: (1) computational, in which the 3-D description is converted into a holographic fringe, and (2) optical, in which light is modulated by the fringe. Figure 2 shows a map of the many techniques used in these two processes.

The difficulties in both fringe computation and optical modulation result from the enormous amount of information (or ``bandwidth'') required by holography. Instead of treating an image as a pixel array with a sample spacing of approximately 100 microns as is common in a two-dimensional (2-D) display, a holographic display must compute a holographic fringe with a s ample spacing of approximately 0.5 micron to cause modulated light to diffract and form a 3-D image.

A typical palm-sized full-parallax (light diffracts vertically as well as horizontally) hologram has a sample count (i.e., ``space-bandwidth product'' or simply ``bandwidth'') of over 100 gigasamples. Horizontal-parallax-only (HPO) imaging eliminates vertical parallax resulting in a bandwidth savings of over 100 times without greatly compromising display performance [8]. Holovideo is more difficult than 2-D displays by a factor of about 40,000, or about 400 for an HPO system. The first holovideo display created small (50 ml) images that required minutes of computation for each update [9]. New approaches, such as holographic bandwidth compression and faster digital hardware, enable computation at interactive rates and promise to continue to increase the speed and complexity of displayed holovideo images [5]. At present, the largest holovideo system creates an image that is as large as a human hand (about one liter) [11]. Figure 3 shows typical images displayed on the MIT holovideo system.


Holographic Fringe Computation

The computational process in electroholography converts a 3-D description of an object or scene into a fringe pattern. Holovideo computation comprises two stages: (1) a computer graphics rendering-like stage, and (2) a holographic fringe generation stage in which 3-D image information is encoded in terms of the physics of optical diffraction. (See Figure 2.)

The computer graphics stage often involves spatially transforming polygons (or other primitives), lighting, occlusion processing, shading, and (in some cases) rendering to 2-D images. In some applications, this stage may be trivial. For example, MRI data may already exist as 3-D voxels, each with a color or other characteristic.

The fringe generation stage uses the results of the computer graphics stage to compute a huge 2-D holographic fringe. This stage is generally more computationally intensive, and often dictates the functions performed in the computer graphics stage. Furthermore, linking these two computing stages has prompted a variety of techniques. Holovideo computation can be classed into two basic approaches: interference-based and diffraction-specific.

The Interference-Based Approach

The conventional approach to computing fringes is to simulate optical interference, the physical process used to record optical holograms [13]. Typically, the computer graphics stage is a 3-D filling operation which generates a list of 3-D points (or other primitives), including information about color, lighting, shading, and occlusion.

Following basic laws of optical propagation, complex wavefronts from object elements are summed with a reference wavefront to calculate the interference fringe [8]. This summation is required at the many millions of fringe samples and for each image point, resulting in billions of computational steps for small simple holographic images. Furthermore, these are complex arithmetic operations involving trigonometric functions and square roots, necessitating expensive floating point calculations. Researchers using the interference approach generally employ supercomputers and use simple images to achieve interactive display [8]. This approach produces an image with resolution that is finer than can be utilized by the human visual system.

Stereograms: A stereogram is a type of hologram that is composed of a series of discrete 2-D perspective views of the object scene [4]. An HPO stereogram produces a view-dependent image that presents in each horizontally displaced direction the corresponding perspective view of the object scene, much like a lenticular display or a parallax barrier display [1-3]. The computer graphics stage first generates a sequence of view images by moving the camera laterally in steps. These images are combined to generate a fringe for display.

The stereogram approach allows for computation at nearly interactive rates when implemented on specialized hardware [4]. One disadvantage of the stereogram approach is the need for a large number of perspective views to create a high-quality image free from sampling artifacts, limiting the computation speed. New techniques may improve image quality and computational ease of stereograms [14].

The Diffraction-Specific Approach

The diffraction-specific approach breaks from the traditional simulation of optical holographic interference by working backwards from the 3-D image [5-7]. The fringe is treated as being subsampled spatially (into functional holographic elements or ``hogels'') and spectrally (into an array of ``hogel vectors''). One way to generate a hogel-vector array begins by rendering a series of orthographic projections, each corresponding to a spectral sample of the hogels. The orthographic projections provide a discrete sampling of space (pixels) and spectrum (projection direction). They are easily converted into a hogel-vector array [5]. A usable fringe is recovered from the hogel-vector representation during a decoding step employing a set of precomputed ``basis fringes.''

The multiple-projection technique employs standard 3-D computer graphics rendering (similar to the stereogram approach). The diffraction-specific approach increases overall computation speed and achieves bandwidth compression. A reduction in bandwidth is accompanied by a loss in image sharpness -- an added blur that can be matched to the acuity of the HVS simply by choosing an appropriate compression ratio and sampling parameters. For a compression ratio (CR, the ratio between the size of the fringe and the hogel-vector array) of 8:1 or lower, the added blur is invisible to the HVS. For CR of 16:1 or 32:1, good images are still achieved, with acceptable image degradation [5].

Specialized Hardware: Diffraction-specific fringe computation is fast enough for interactive holographic displays. Decoding is the slower step, requiring many multiplication-accumulation calculations (MACs). Specialized hardware can be utilized for these simple and regular calculations, resulting in tremendous speed improvements. Researchers using a small digital signal processing (DSP) card achieved a computation time of about one second for a 6-MB fringe with CR=32:1 [15]. In another demonstration, the decoding MACs are performed on the same Silicon Graphics RealityEngine2 (RE2) used to render the series of orthographic projections [5]. The orthographic projections rendered on the RE2 are converted into a hogel-vector array using filtering. The array is then decoded on the RE2, as shown in Figure 4. The texture-mapping function rapidly multiplies a component from each hogel vector by a replicated array of a single basis fringe. This operation is repeated several times, once for each hogel-vector component, accumulating the result in the accumulation buffer. A computation time of 0.9 seconds was achieved for fringes of 6-MB with CR=32:1 [5].

Fringelets: Fringelet bandwidth compression (Figure 2) further subsamples in the spatial domain [6]. Each hogel is encoded as a spatially smaller ``fringelet.'' Using a simple sample-replication decoding scheme, fringelets provide the fastest method (to date) of fringe computation. Complex images have been generated in under one second for 6-MB fringes [6]. Furthermore, a ``fringelet display'' can optically decode fringelets to produce a CR-times greater image volume without increased electronic bandwidth.

Optical Modulation and Processing

The second process of a holographic display is optical modulation and processing. Information about the desired 3-D scene passes from electronic bits to photons by modulating light with a computed holographic fringe using spatial light modulators (SLMs). The challenge in a holographic display arises from the many millions of samples in a fringe. Successful approaches to holographic optical modulation exploit parallelism and/or the time-response of the HVS.

Liquid-Crystal and Related SLMs

A liquid crystal display (LCD) is a common electro-optic SLM used to modulate light for projection of 2-D images. A typical LCD contains about one million elements (``pixels''). A one-million-sample fringe can produce only a small flat image. A magneto-optic SLM, which uses the magneto-optic effect to electronically modulate light, often contains less than one million elements [16]. Early researchers using LCD SLMs or magneto-optic SLMs created small planar images [16-19]. The low pixel count of typical LCDs is overcome by tiling together several such modulators [12].

For any modulation technique, several issues must be addressed. Modulation elements are too big - typically 50 microns wide (in an LCD) compared to the fringe sampling pitch of about 0.5 micron. Demagnification is employed to reduce the effective sample size, with the necessary but unattractive effect of proportionally reducing the lateral dimensions of the image.(See Figure 5.) Holographic imaging may employ either amplitude or phase modulation. LCDs are basically phase modulators when used without polarizing optics. Phase modulation can be more optically efficient, and so is most often used. Finally, it is desirable to employ modulators possessing many levels of modulation, i.e., grayscale. Common LCDs have nominally 256 grayscale levels, sufficient for producing reasonably complex images.

Deformable micro-mirror devices (DMDs) are micromechanical SLMs fabricated on a semiconductor chip as an electronically addressed array of tiny mirror elements. Electrostatically depressing or tilting each element modulates the phase or amplitude of a reflected beam of light. A phase-modulating device was used to create a small flat holographic image [20], and a binary amplitude-modulating DMD was used to create a small interactive 3-D holographic image [21].

Scanned Acousto-Optic Modulator (AOM)

The time-multiplexing of a very fast AOM SLM has been used in holovideo. A wide-aperture AOM phase modulates only about 1000 samples at any one instant in time, using a rapidly propagating acoustic wave within a crystal2E By scanning the image of modulated light with a rapidly moving mirror, a much larger apparent fringe can be modulated. The latency of the HVS is typically 20 ms, and the eye time-integrates to see the entire fringe displayed during this time interval. This technique was invented and exploited by researchers at the MIT Media Laboratory to produce the world's first real-time 3-D holographic display in 1989 [9,10]. A generalized schematic of this approach is shown in Figure 6. After RF processing, computed fringes traverse the aperture of an AOM (as acoustic waves), which phase-modulate a beam of laser light. Two lenses image and demagnify the diffracted light at a plane in front of the viewer. The horizontal scanning system angularly multiplexes the image of the modulated light. A vertical scanning mirror reflects diffracted light to the correct vertical position in the hologram plane.

One advantage of the scanned-AOM system is that it can be scaled up to produce larger images. The first images produced in this way were 50 ml, generated from 2-MB fringes [9]. More recently, by building a scanned-AOM system with 18 parallel modulation channels, images created from a 36-MB fringe occupy a volume greater than one liter [11].

One disadvantage of the scanned-AOM approach is the need to convert digitally computed fringes into high-frequency analog signals. The 18-channel synchronized high-speed framebuffer system used at MIT was made for this application, and was a major practical obstacle in this approach [15]. LCDs, DMDs, and other SLMs are more readily interfaced to digital electronics. Indeed, LCD SLMs are commonly constructed to plug directly into a digital computer, or are built on an integrated circuit chip [22]. Another disadvantage of the scanned-AOM approach is the need for optical processing. Typical LCD-based holographic displays require only demagnification and the optical concatenation of multiple devices. The time-multiplexing of the scanned-AOM system requires state-of-the-art scanning mirrors which must be synchronized to the fringe data stream. Despite these obstacles, the scanned-AOM approach has produced the largest holovideo images [6].

Other Techniques

Color: Full-color holovideo images are produced by computing three separate fringes. Each represents one of the additive primary colors (red, green, and blue) taking into account the three different wavelengths used in a color holovideo display. The three fringes are used to modulate three separate beams of light (one for each primary color) [10].

SAW AOM: Recently, researchers have used an AOM device with multiple ultrasonic transducers [23]. These multiple electrodes are fed a complex pattern and launch surface acoustic waves (SAWs) across the device aperture. Diffracted light forms a holographic image. Preliminary results show that this approach may eliminate the need for time multiplexing and consequently scanning mirrors. However, the large number of electrodes may be prohibitively expensive. Also, the array of SAW electrodes necessitates an additional numerical inversion transformation, making rapid computation difficult.

Discussion

Real-time 3-D holographic displays are expensive, new, and rare. Although they alone among 3-D display technologies provide extremely realistic imagery, their cost must be justified. Each specific computer graphics application dictates whether holovideo is a necessity or an extravagant expense.

Applications

I divide interactive computer-graphics applications into two extreme modes of interaction: the ``arm's reach'' mode, and the ``far away'' mode. An arm's-reach application involves interacting with scenes in a space directly in front of the user, where the user constantly interacts, moving around it to gain understanding. In this mode, all of the visual depth cues are employed, particularly motion parallax, binocular disparity, convergence, and ocular accommodation. These applications warrant the expense of holovideo and the extreme realism and three-dimensionality of its images: computer-aided design, multi-dimensional data visualization, virtual surgery, teleoperation, training and education (e.g., holographic virtual textbooks on anatomy, molecules, or engines).

At the other extreme, a far-away application involves scenes that are beyond arm's reach and are generally larger. The imagery of such applications - e.g., flight simulation, virtual walk-throughs - make adequate use of the kinetic depth cue, pictorial depth cues, and other depth cues associated with flat display systems. A high-resolution 2-D display may be a more cost-effective solution for far-away applications.

Present... and Future

Currently there are no off-the-shelf holographic displays. Holographic display technology is in a research stage, analogous to the state of 2-D display technology in the 1920s. What, then, does the future hold? The future promises exactly what holovideo needs: more computing power, higher-bandwidth optical modulation, and improvements in holographic information processing.

Computing power continues to increase. A doubling of computing power at a constant cost - a trend that continues at a rate of every 18 months - effectively doubles the interactive image volume of a holographic display [5]. Inexpensive computation - around $100 per gigaMAC - is the most crucial enabling technology for practical holovideo, and should be available in 2002.

Although optical modulation has borrowed from existing technologies (e.g., transmissive LCDs, AOMs), new technologies will fuel the development of larger, more practical holovideo displays. Because bandwidth is most important, I use as a figure of merit the number of bits that can be modulated in the latency time of the HVS (typically 20 ms). An AOM can modulate about 16 Mb in this time interval, at a cost of about $2000 (or $120 per Mb), including the associated electronics. The DMD, a new technology for high-end 2-D video projection technology, delivers approximately 100 Mb in 20 ms, for a cost of about $3000, or $30 per Mb. Future mass-production could reduce the cost further. Reflective LCDs are another possible technology. Several researchers create small reflective LCDs directly on a semiconductor chip using VLSI technology [22].

The bandwidths of computation and modulation are likely to increase steadily. Improvements in holographic information processing will likely provide occasional dramatic improvements in both of these areas. Already, holographic bandwidth compression increases fringe computation speed by 3000 times for same-hardware implementation [6]. Standard MPEG algorithms can be used to encode and decode computed fringes [24]. Nonuniformly sampled fringes provide lossless bandwidth compression and promise further advances [25].

User demand may be the one additional key to the development of holovideo2E As other types of 3-D display technologies (e.g., autostereoscopic displays) acquaint users with the advantages of spatial imaging, these users will grow hungry for holovideo, a display technology that can produce truly 3-D images that look as good as - or better than - actual 3-D objects and scenes.

References

The RealityEngine2 graphics framebuffer system is manufactured by Silicon Graphics, Inc., Mountain View, CA.

Glossary

AOM - acousto-optic modulator. A type of high-bandwidth SLM.

CR - compression ratio.

HPO - horizontal-parallax-only. A type of hologram that exhibits horizontal motion parallax horizontally but not vertically.

basis fringe - an elemental fringe precomputed to diffract light in a specific manner.

DMD - deformable micro-mirror device. A micromechanical SLM.

fringe - the holographic pattern that is either recorded optically or generated computationally and used to diffract light to form an image.

HVS - human visual system.

hogel - holographic element. A small piece of hologram that has homogeneous diffraction properties.

LCD - liquid crystal display. An electro-optic SLM.

MB - 1048576 bytes.

MAC - multiplication accumulation. A numerical calculation consisting of one multiplication and one addition.

SLM - spatial light modulator. A device that modulates a beam of light.


Last Modified 1997 March