HyperSoap: Object-Based Media Group

HyperSoap

Jon Dakss, Stefan Agamanolis
Edmond Chalom, V. Michael Bove, Jr.

Project Participants
Kevin Brooks, Paul Nemirovsky, Alex Westner

HyperSoap is a short soap opera program in which a viewer can "click" with an enhanced remote control on clothing, furniture, and other items to see information about how they can be purchased. It is an example of hyperlinked video, or video in which specific objects are made selectable by some form of user interface, and the user's interactions with these objects modify the presentation of the video.

Produced by the MIT Media Lab in association with lab sponsor JCPenney, HyperSoap points toward the possibility of interactive product placement in a broadcast setting, or toward video catalogs in the form of CDROMs or DVDs in which there is a story rather than just illustrations and product descriptions. It was created with a new system developed at the Media Lab for authoring hyperlinked video.
Users of the World Wide Web are familiar with the concept of hyperlinks, in which "clicking" on specially tagged words or graphics in a document retrieves other documents, or perhaps modifies the current one. The idea of applying the same kind of interaction in video programs has often been discussed as a desirable possibility -- consider for instance a fashion program in which clicking on an article of clothing provides information about it, or a nature documentary in which children click on plants and animals in the scene to learn more about them. Playback of such material is well within the capabilities of typical digital television decoders with graphical overlay capability, but creating it has posed a challenge because of the difficulty of identifying and tracking the selectable regions in every frame, by either manual or automatic methods.

We have developed a method of tracking and segmenting video objects that simplifies the process of creating hyperlinked video. The author of the video uses a computer mouse to scribble roughly on each desired object in a frame of video and the system generates full segmentation masks for that frame and for following and preceding frames until there is a scene change or the entrance of new objects. These masks label every pixel in every frame of the video as belonging to one of the regions roughly sketched out by the author at the beginning of the process. The author may then associate each region with a particular action (e.g. graphical overlay, switching to a different video data stream, transmission of data on a back channel). During playback, the viewer can select objects with a mouse or an analogous device, such as an enhanced TV remote control with point-and-click capability. In our demonstrations, we use a video projector that can identify the location of a laser pointer aimed at its screen.

We apply a novel method of using color, texture, motion, and position to segment and track video objects. Our system uses a combination of these attributes to develop multi-modal statistical models for each region as roughly defined by the author. The system then creates the segmentation masks by finding areas that are statistically similar and tracking them throughout a video scene. The authoring tool and the playback system are supported by Isis, a programming language specially tailored for object-based media.

We utilized this system to create HyperSoap, a hyperlinked video program that resembles television serial dramas (known as "soap operas") in which the viewer can select props, clothing and scenery to see purchasing information for the item such as the item's price and retailer. We produced this program entirely from scratch, not starting with pre-made video material, in order to learn more about how the production (scripting, shooting, editing) of hyperlinked video would differ from that of traditional television programming. We also learned a great deal about how people interact with hyperlinked video, and based our design of several modes of user interaction on this information.

Depending on the mode of playback and the preferences of the viewer, the playback system will display information about the selected object in a number of different ways. In one particular mode, the system waits for an appropriate point to interrupt the video, typically when an actor has finished speaking his line, and displays a separate screen containing a detailed still image of the selected product along with a text box that includes the product's brand name, description, and price. In another mode, appropriate for a broadcast scenario or when the viewer desires more instant feedback, an abbreviated information box appears immediately, without pausing the video, and then fades away after a few seconds. If requested by the viewer, a summary of all the products that were selected is shown at the end of the video. We created a musical soundtrack in which the individual pieces, composed to match the mood of a particular part of the scene, are capable of being seamlessly looped and cross-faded. If the video is paused to display product information, the music continues to play in the background, lessening the impact of the interruption on the continuity of the video.

The technology used to create HyperSoap is being licensed to a Cambridge-based company named WatchPoint Media which provides hyperlinked video as a service for interactive entertainment and a portal to electronic commerce. Their goals closely resemble HyperSoap: enabling consumers with Digital TV or broadband Internet access to browse and buy the items that appear in television programs by clicking on the items that interest them with their remote control or mouse. Visit www.watchpointmedia.com for more information.

For more information about this project, please contact V. Michael Bove, Jr.

Related Publications

Jonathan Dakss, Stefan Agamanolis, Edmond Chalom, V. Michael Bove, Jr., ``Hyperlinked Video,'' Proc. SPIE Multimedia Systems and Applications, v. 3528, (in press), 1998.

V. Michael Bove, Jr., Jonathan Dakss, Stefan Agamanolis, Edmond Chalom, ``Adding Hyperlinks to Digital Television,'' Proc. SMPTE 140th Technical Conference, 1998.

Stefan Agamanolis and V. Michael Bove, Jr., ``Multi-Level Scripting for Responsive Multimedia,'' IEEE MultiMedia, Oct-Dec 1997.