Mark Lucente
These e-commerce shopping agents are knowledgeable, patient,
and affable during their conversational interaction with shoppers
- while helping generate sales for site owners.
|
Consumers spend billions online, though most are unhappy with their shopping experiences [1]. My colleagues and I at Soliloquy, Inc., are creating conversational natural language (NL) interfaces for e-commerce - software that appears on a web site as a sales "Expert" - conversing with the shopper, answering questions, and helping to find the perfect product to purchase. The metaphor is natural conversation - just like talking with a (human) salesperson. The shopper and the Expert co-produce knowledge and understanding through conversation - a shopping experience that is more effective, fun, and human than point-click-download, etc. By constraining the knowledge domain to a particular product line (e.g., PCs, or restaurants, or mutual funds), software Experts can be more robust and helpful. Domain constraint facilitates several features:
Our online conversational Experts combine several technologies that are ripe for the picking: speech recognition, speech synthesis, natural language understanding, and the World Wide Web. Early use of our software Experts has yielded many insights from this confluence of technologies and has raised several important questions about the evolving online shopping experience. Particularly important are the multimodal nature of the interaction and the human's natural perception of the Expert's personality.
|
Shopping ExpertOur shopping Experts appear to the user (i.e., the shopper) as a small conversation window on an e-commerce webpage. (See Figure .) The human-like Expert (which sometimes includes a cartoon face) appears eager to help the shopper to understand what is for sale and to find the perfect items to purchase. The Expert begins the conversation by greeting the shopper and prompting for questions. The shopper speaks to the Expert or types into the window. The Expert responds in several ways: - a spoken response (if audio output is enabled); - a text response; - a hyperlinked multimedia response including images (such as some products for sale). The Expert's main goal in a conversation is to help the shopper to find the best item to buy. The Expert's response is generated to satisfy several sub-goals: - answer shopper's questions; - proffer items that the shopper may wish to purchase; - ask for clarification; - prompt the shopper for information that allows the Expert to refine the search. The Expert uses a proprietary natural language understanding system built on top of a structured knowledge base. NL subsystems convert the incoming text (from the keyboard or from a speech recognizer) into conceptual representations, adds these concepts to the conversational context, determines sub-goals, and finally motivates actions, e.g., constructing an answer, searching the product database, navigating the webpage, or prompting the shopper for more information. A conversation manager orchestrates the input and output while updating the context representation. The structured knowledge base contains several components:
|
Figure 1. Expert-based conversation window on an e-commerce Web page.
|
Inherently MultimodalThe interaction between a human user and our shopping Expert differs from human-human conversation in important ways that we have begun to understand and exploit. Because the Expert appears on a webpage, the shopper-Expert interaction is inherently multimodal. The Expert can sense the shopper's mouse and keyboard activity. The user sees images and hypertext generated on-the-fly. Furthermore, the Expert can offer graphical interface elements (e.g., buttons or hypertext words) to augment the conversation - analogous to a human-human conversation involving physical objects that are gestured toward or used iconically. The multimodal interaction allows for rich conversation and contextual development that often exceeds that of (human-human) speech-only conversation. Although the Expert's NL understanding capabilities are not as advanced as those of a typical human, the presentation of multimedia responses (e.g., pictures of items for sale) facilitates complex conversation and delights users. For example, if a user asks to spend under $2000, this constraint is illustrated in a small summary table (see Figure), visible to the absent-minded user's quick glance, which reduces redundant or divergent exchanges. (This simple illustration is, of course, not possible in a speech-only conversation, e.g., over the telephone.) Sensing user selections (i.e., mouse position) allows for a multimodal interaction that is often much faster than speech-only interactions. For example, when the Expert responds with three pictures of products that match the user's requirements, the user can point to one picture while saying "How much does this on cost?"
|
Affable PersonalityUser testing shows that the shopper perceives the Expert as having a personality [Nass]. Verbal output contributes strongly to this personality, similar to human-human interaction (See Boyce's "Natural Spoken Dialogue Systems for Telephony Applications" in this section). (The prosody of the speech output contributes too, but this topic will be addressed in future publications.) We give the Experts personalities that are suited to their shopping function. We teach the Expert to use an adaptive blend of vernacular, humor, and down-to-business tone, creating a personality that is pleasant to most users but one that projects authority - rather like an affable college professor. The Expert's personality adjusts dynamically, and can be adjusted by the site owner. Consider the Notebook Expert, an Expert that sits on a notebook PC website and conversationally helps the shopper to find the best to buy. Notebook computers are expensive and personal, and shopping for one is fraught with anxiety, indecision, soul-searching, doubt and confusion over specialized jargon and features. (No wonder so few shoppers become buyers on traditional click-and-download sites!) The perceived personality of the Expert helps the shopper through this process, increasing the rate at which shoppers become buyers. The Expert exhibits patience by remaining attentive - no matter how long the conversation lasts. The Expert is perceived to be honest and forthcoming, as it shows all the information requested and answers questions quickly and precisely. And the Expert adds a personal touch - remembering the shopper's name and particular requirements and engaging the shopper on topics that are central to the shopper's needs. All of this personality (and perceived intelligence) adds up to an improved shopping experience and increased sales. On one notebook PC shopping website, 30% of shoppers who converse with the Expert go on to purchase - compared with under 2% for shoppers who do not use the Expert. A final and fascinating observation: our constant goal is to educate the Experts and extend their personality and intelligence. We believe that this is a wonderful trend - that of educating as well as engineering - as it allows teachers, psychologists, linguists, consultants and (human) experts from many fields to contribute to the creation of useful software that is a pleasure to engage.
|
[1]
Richard E. Cullingford, Natural language processing: a knowledge-engineering approach, Rowman & Littlefield, Totowa, NJ, 1986
[3]
Clifford Nass , Kwan Min Lee, Does computer-generated speech manifest personality? an experimental test of similarity-attraction, Proceedings of the CHI 2000 conference on Human factors in computing systems, p.329-336, April 01-06, 2000, The Hague, The Netherlands
|
Mark Lucente
|