Inquiry with Imagery: Historical Archive Retrieval with Digital Cameras

Brian K Smith, Erik Blankinship, Alfred Ashford III, Michael Baker & Timothy Hirzel

MIT Media Laboratory
20 Ames Street
Cambridge, MA 02139 USA
+1 617 253 6537
{bsmith, erikb, coltrane, mbaker, hirzel}


This paper describes an integration of geographic information systems (GIS) and multimedia technologies to transform the ways K-12 students learn about their local communities. We have augmented a digital camera with a global positioning system (GPS) and a digital compass to record its position and orientation when pictures are taken. The metadata are used to retrieve and present historical images of the photographed locations to students. Another set of tools allows them to annotate and compare these historical images to develop explanations of how and why their communities have changed over time. We describe the camera architecture and learning outcomes that we expect to see in classroom use.



In most K-12 classrooms, students are exposed to historical issues through the writings and narrative accounts of others. In general, they lack primary data sources to complement these writings and allow them to form their own interpretations of the past. We see opportunities for students to generate their own explanations of historical trends with archival photographs. Rather than just relying on captions and narratives to explain content, we are providing tools for students to annotate and compare historical images and to detect and explain patterns and relations over time. In this way, we hope to help them become better observers and critics of the real world by using imagery as data.

We are developing new ways for students to investigate the histories of their communities by combining geographic information systems (GIS) and multimedia technologies. Historical photographs provide a glimpse at the architectural, fashion, transport, and cultural trends of a period. When these images are arranged spatially on maps, students can begin looking for patterns and relations that may vary geographically. While innovations in multimedia and GIS learning environments have been documented [e.g.,4, 6], the fusion of the two technologies has not been fully explored.

In this paper, we describe tools for K-12 students to investigate and explain how and why their communities have evolved over time. To facilitate student inquiry, we have augmented a digital camera with a global positioning system (GPS) and a digital compass to record position and orientation metadata when pictures are taken. When the camera is downloaded, each augmented picture is used to retrieve historical pictures of the photographed location using image and GIS databases. By integrating GIS data with multimedia objects [3, 9], student photographs can be geo-referenced to provide data for theory construction. By linking students' images of the present with those of the past, we create a starting point for inquiry into community change.

Retrieving historical images

To give a sense for the types of activities that we hope to see, we begin with a hypothetical use scenario, a group of students exploring their local community. These students use our camera to take pictures of buildings and settings in their communities that they like and dislike. After doing so, they return to their classroom and download their images into our software (Figure 1). The thumbnails on the right side of the display show students' photographs. When one of these thumbnails is clicked, its enlarged image appears at right center, and a set of historical thumbnails matching the location of the selected image is displayed at the top --- clicking one of these expands its image at the left center. Figure 1 shows how a photograph of Harvard Square in 1999 retrieves nine images of the same location between 1860 and 1980.

Retrieving historical images Figure 1: The current retrieval interface. Thumbnails on the right are images taken by students. Choosing one of these displays its larger image and an array of historical thumbnails across the top. The left image is the historical photo chosen from the retrieved collection.

The students now need to explain why they liked or disliked the objects that they photographed. They do this by creating descriptive ontologies and labeling objects in the images with these features. For instance, Figure 2 shows a list of features that students might develop (e.g., transport types, commercial buildings, road types). The historical photos are tagged with these labels, and students can begin comparing images over time to see similarities and differences. As they mark up more photographs, they can begin to retrieve images using their ontological features and describe urban planning patterns [1] that have varied or remained consistent throughout history.

Annotating images Figure 2: Annotating images. Students develop ontologies to characterize interesting features of images. Objects in the photographs are labeled with these features and used to develop explanations of community change.

When students are taught to explore their outdoor surroundings, they can become more aware of the intricacies of man-made environments [10]. We assist this process by giving access to historical images that might otherwise go unseen by students. We claim that doing "field work" with our camera, obtaining a record of local history, and working to explain the various changes in the community can lead to new insights about historical, architectural, and social change.

What can you learn from image data?

So rather than providing students with textbook explanations of history, we adopt a learner-centered approach [e.g., 8] to engage students in constructing and reflecting on their own explanations of image data. Previous work [5, 7, 12] has discussed the use of video as data in learning and coordinating complex tasks. We build on these projects by allowing students to acquire their own data in the form of photographs, and the annotation tools allow them to construct theories around issues in urban planning and cultural change.

In the above scenario, there are a number of ways that students can learn with historical images provided by the camera. We are currently working to understand how such learning opportunities can aid the following:

  1. Observation and interpretation. Rather than viewing images as "visual aids" to accompany textual explanations, students are responsible for drawing conclusions from image data. Comparing images across time periods can also provide insights into community change.
  2. Reasoning about urban planning. We want students to develop hypotheses about the function of architectural structures. For instance, pedestrian crosswalks appeared rather recently in history. Students can pinpoint the time when they appeared and develop theories about why they may have been necessary. For instance, evidence of increased commercial buildings in the historical images may be correlated with the emergence of crosswalks (i.e., more commerce leads to more pedestrians).
  3. Reasoning about culture. Images can provide important clues about community culture. For instance, a picture containing a "Buy War Bonds" advertisement is the beginning of a story about America during World War II. We hope to have students explore the meanings behind cultural artifacts found in images, possibly by collaborating with older adults to discover what is was like to live during the 1940's.
  4. Inquiry is an iterative process. Although students could browse historical images without the camera, we feel that it is important for them to do "field work", to visit locations while constructing explanations of community change. During annotation, students may observe image features that require further investigation in the field (e.g., they may want to rethink traffic flows after seeing how roads changed over a period of time). By returning to the field to generate further observations and questions, we hope they will better understand the iterative nature of inquiry.

Accessing historical images

A Kodak DC260 digital camera has been augmented with a Trimble Lassen-SK8 GPS and a Precision Navigation TCM2-80 digital compass. The camera uses Flashpoint Technology's Digita operating environment [2], allowing it to be scripted to send commands to the sensors through its serial port and to embed received data into JPEG images (Figure 3). In this way, the camera's origin and orientation are recorded when pictures are taken.

The camera hardware Figure 3: An "out of the box" view of the camera hardware. A Kodak DC260 digital camera is attached to a Trimble Lassen-SK8 GPS and a Precision Navigation TCM2-80 digital compass. This hardware configuration allows recording of position and orientation information into a JPEG image. A portion of the camera script that sends and receives data from the sensors and embeds it into the image file is also shown.

Our Java application parses the GPS and compass metadata from downloaded images and uses them to access a spatial map of Cambridge, Massachusetts stored in Esri Incorporated's ArcView GIS. We start at the camera's origin and trace the orientation vector until we intersect a building or other landmark [11]. This raytracing routine approximates line of sight to return the name of the nearest landmark to the camera lens (Figure 4).

The ArcView GIS map Figure 4: A segment of the ArcView GIS map for Cambridge, Massachusetts. The large dot shows the current camera position at a GPS coordinate. Orientation is used to trace a vector from the camera origin along its line of sight. The current algorithm simply returns the first building that intersects the line of sight vector.

A separate Perl database associates each building name with a set of historical photographs. Each of these images has been hand-indexed with the position and orientation that it was taken from and the year when it was taken. The retrieval engine selects and displays images that closely match the view of the target image. If we cannot find images with similar shot distances and/or orientations, we relax the constraints and return any photographs of the location. We currently test our retrieval algorithms with 1000+ hand-indexed images between Harvard Square and MIT.

Future work

We are expanding our image database to provide students with richer data sources. The algorithm used to retrieve images is still rather simple, and we are developing a more sophisticated engine. For instance, the camera currently records tilt information, and we can use that data to disambiguate target buildings (e.g., photographs of tall buildings with smaller ones in the foreground). We will also automatically index student photographs into the image database to create records of the present that can be used in future classrooms.

We are working towards a new class of visualization and modeling applications that use imagery as a primary data source for inquiry. Rather than simply looking at photographs or watching videos, we want to see students arguing and debating over differences in image data. While most scientific visualization tools map quantitative data into visual representations, our students work directly with observational, image data, constructing qualitative models that can be used to predict future outcomes and events. The work described here is a first step towards fusing GIS and multimedia systems to produce new learning experiences through imagery.

Although we have tested the camera ourselves, our first deployment with children (14-16 years old) begins in August 1999. This initial deployment will inform the iterative design of the camera and software tools for constructing explanations about community change. We will also attempt to understand the types of supports that teachers need to provide for this activity to successfully engage students in new ways of thinking


We would like to thank the Cambridge Historical Commission for their gracious donation of 100+ years of historical images. This work is supported by the MIT Media Laboratory's News in the Future consortium and kind donations from Eastman Kodak.


  1. Alexander, C., Ishikawa, S., & Silverstein, M. (1977). A Pattern Language: Towns, Buildings, Construction. Oxford: Oxford University Press.
  2. Flashpoint Technology. (1998). Digita Operating System: Script Reference. San Jose, CA: Flashpoint Technology.
  3. Kraak, M.-J. (1996). Integrating multimedia in geographical information systems. IEEE Multimedia, 3(2): 59-65.
  4. McWilliams, H. & Rooney, P. (1997). Mapping our city: Learning to use spatial data in the middle school science classroom. Paper presented at the Annual Meeting of the American Educational Research Association. Chicago, IL.
  5. Nardi, B.A., Kuchinsky, A., Whittaker, S., Leichner, R., & Schwarz, H. (1996). Video-as-data: Technical and social aspects of a collaborative multimedia application. Computer Supported Collaborative Work, 4: 73-100.
  6. Pea, R.D. (1991). Learning through multimedia. IEEE Computer Graphics & Applications, 11(4): 58-66.
  7. Smith, B.K. & Reiser, B.J. (1997). What should a wildebeest say? Interactive nature films for high school classrooms. In ACM Multimedia 97 Proceedings (pp. 193-201). New York: ACM Press.
  8. Soloway, E., Guzdial, M., & Hay, K.E. (1994). Learner-centered design: The challenge for HCI in the 21st century. interactions, 1(2): 36-48.
  9. Spohrer, J. (1998). Worldboard: What comes after the WWW? Available:
  10. Stilgoe, J.R. (1998). Outside Lies Magic: Regaining History and Awareness in Everyday Places. New York: Walker and Company.
  11. Tsui, C. (1998). Multimedia Data Integration and Retrieval in Planning Support Systems. M.S. thesis. Department of Urban Studies and Planning, Massachusetts Institute of Technology.
  12. Whittaker, S. & O'Conaill, B. (1997). The role of vision in face-to-face and mediated communication. In K.E. Finn, A.J. Sellen, & S.B. Wilbur (Eds.), Video-Mediated Communication (pp. 23-49). Hillsdale, NJ: Lawrence Erlbaum As-sociates.

Copyright 1999 ACM.