PREVIEW Foam Magazine Issue #29 What's Next?

Page 191

How can we efficiently explore massive digital ­image collections to ask interesting questions? The ­examples of such collections are 167,000 images on Art Now Flickr gallery, or 176,000 Farm Security ­Administration/Office of War Information photographs taken between 1935 and 1944 and digitized by Library of Congress. How can we work with such image sets? The basic method used by media researchers when the amounts of media being relatively small – see all images or video, notice patterns, and interpret them – no longer works. Given the size of typical contemporary digital media collections, simply seeing what’s inside them is impossible even before we begin formulating questions and hypotheses and selecting samples for closer analysis. Although it may appear that the reasons for this are the limitations of human vision and human information processing, I think that it is actually the fault of current interface designs. Popular interfaces for accessing digital media collections such as list, image gallery, and image strip do not allow us to see the contents of a whole collection. These interfaces usually only display a few items at a time, regardless of whether you are in a browsing mode, or in a search mode. Because we are not able to see a collection as a whole, we can’t compare sets of images or videos to each other, notice patterns of change over time, or understand parts of the collection in relation to the whole.

Against Search: How to Look without Knowing What You Want to Find?

access single media items at a time at a limited range of speeds. This went hand in hand with the organization of media distribution: record and video stores, libraries, television and radio broadcasters all only make available a few items at a time. At the same time, hierarchical classification systems used in library catalogues and rooms encouraged the users to access a collection in ways defined by classification schemes, as opposed to browsing at random. When you looked through a card catalogue, or physically walked from shelf to shelf, you were following a classification based on subjects, with books organized by author names inside each subject category. Thus, although a single book itself supported random access, the larger structures in which books and other media objects were organized did not. Together, these distribution and classification systems encouraged 20th century media researchers to decide beforehand what media items to study. A researcher usually started with a particular person (a filmmaker, a photographer, etc.) or a particular subject category (for example, ‘1960s experimental American films’.) In doing that, a researcher could be said to move down the hierarchy of information in a catalogue and then select a particular level as the subject of her project: cinema > American cinema > American experimental film > American experimental film of the 1960s. The more adventurous would add new branches to the categorical tree; most were satisfied with contributing individual leaves (articles and books). Unfortunately, the current standard in media access – computer search – does not take us out of this paradigm. Search interface is an empty box waiting for you to type something. Before you click on the search button you have to decide what keywords and phrases to search for. So while the search brings a dramatic increase in speed of access, its deep assumption (which we may be able to trace back to its origins in 1950s ‘information retrieval’) is that you know beforehand something about the ­collection worth exploring further. To put this another way: search assumes that you want to find a needle in a haystack of information. It does not allow you to see the shape of the haystack. If you could, it would give your ideas of what else there is worth seeking, beside the needle you originally had in mind. Search also does not reveal where all different needles in the haystack are situated, i.e. it does not show how particular data objects or subsets are related to the complete data. Using search is like looking at a pointillist painting at a close range and only seeing colour dots, without being able to zoom out to see the shapes. The hypertext paradigm which defines the World Wide Web is also limited: it allows navigation around the web of pages according to the links defined by others, as opposed to moving in any direction. This is consistent with the original vision of hypertext as articulated by Vannevar Bush in 1945: a way for a researcher to create

The popular media access technologies of the 19th and 20th century – slide lanterns, film projectors, Moviola and Steenbeck, record players, audio and video tape recorders, VCR, DVD players, etc. – were designed to

191

how to see 1,000,000 images?

Early 21st century humanities scholars, critics and ­curators have access to unprecedented amounts of visual media – more than they can possibly study, let alone simply watch, or even search. A number of ­interconnected developments which took place between 1990 and 2010 – digitization of many analogue media collections, the rise of user-generated content and social media, the adoption of the web as media distribution platform, and globalization which increased the number of agents and institutions producing media around the world – led to an exponential increase in the quantity of media while simultaneously making it much easier to find, share, teach with, and research. Millions of hours of television programs already digitized by various national libraries and media museums, four million pages of digitized U.S. newspaper pages from 1836 to 1922 (chroniclingamerica.loc.gov), 150 billion snapshots of web pages captured from 1996 (www. archive.org), and trillions of videos on YouTube and photographs on Facebook and numerous other media sources are waiting to be ‘digged’ into.

↗ Frames taken from the Japanese videogame Kingdom Hearts II every 6 seconds from the sequence of gameplay sessions, which constitute a full traversal of the game from beginning to end. This visualization represents 62.5 hours of gameplay (22,4999 frames). Game recording and visualization: William Huber (with Lev Manovich)

How to Work with Massive Image ­Collections?


Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.