Navigating large bodies of text

IBM Systems Journal
Vol. 35, No. 3&4, 1996 - MIT Media Lab

[ Table of contents: HTML, PDF, ASCII, PostScript ]
[ This article: HTML, PDF, ASCII, PostScript ]
Feature article
0018-8670/96/$5.00 © 1996 IBM

Navigating large bodies of text

by D. Small
Reprint Order No. G321-5621.
The display of information by computers does not often fulfill the promise of the computer as a visual information appliance. A design experiment is described in this paper in which a large body of text, such as the complete plays of William Shakespeare, is made visible. The typography is designed to handle display at a variety of scales so that the user can move smoothly between detailed views and overviews of as many as one million words. Visual filtering techniques are described that aid in the analyses of text at a wide range of scales. Also, the use of three-dimensional space to organize complex relationships among different information elements is described. New interaction paradigms are explored that aid in the navigation of complex, three-dimensional information spaces.

Initially, new media emulate the media that they replace, before creating any new paradigms. This can certainly be said of the way in which electronic media have handled text. The glowing glass screen of the computer is seen as a flat surface on which is pasted images that resemble sheets of paper. Window-like systems have advanced this emulation only to the extent that they allow for many rectangular planes of infinitely thin virtual paper to be stacked haphazardly around the glass surface of the computer display. The graphic power of computer workstations has now advanced to the point where we can begin to explore new ways of treating text and the computer display. By allowing the computer to do what it can do well, such as compute three-dimensional graphics and display moving images, we can develop a truly new design language for the medium.
It is night and you are dreaming. The sky is dark and you are floating lightly above the earth. As you cast your thoughts about, you flit and fly above the landscape from place to place and then zoom out into space. As soon as a constellation appears ahead, you skim instantly to that place, exploring, moving as fast as thought.
Navigation of information might be just as I described--smooth, simple, and as fast as your thoughts. Professor Muriel Cooper, founder of the Visible Language Workshop within the MIT Media Laboratory, coined the phrase information landscape [1] to describe this sort of space, where information "hangs" like constellations and the reader "flies" from place to place, exploring yet maintaining context while moving so that the journey itself can be as meaningful as the final destination (see Figure 1).
Figure 1
A landscape, whether real or virtual, provides an experience in which context is continuous and meaningful. It is through context that we can understand new information and can relate it to what is already known. Drs. Stephen and Rachel Kaplan wrote about the experience of mystery in landscapes in their book Cognition and Environment: Functioning in an Uncertain World. [2]

In the case of mystery, the new information is not present; it is only suggested or implied. Rather than being sudden, there is a strong element of continuity: the bend in the road, the brightly lighted field seen through a screen of foliage--these settings imply that new information will be continuous with, and related to, that which has gone before. Given this continuity one can usually think of several alternative hypotheses as to what one might discover.

By escaping the confines of the flat sheet of paper, we can arrange information into meaningful landscapes that exhibit qualities of mystery, continuity, and visual delight.
The plays of William Shakespeare are used to explore the design of an electronic information space that maintains the qualities of a meaningful landscape. A large image from the system is shown in the background on the title page of this paper. Each character in the play A Midsummer Night's Dream is marked in a different color, and different typefaces are used to distinguish stage direction, names, dialog, and commentary. Each scene is laid out in a single column of text. These scenes are aligned at their top edge and separated horizontally by a small gutter. The gutter, or space between columns of text, is increased between acts so that each act forms a distinct visual chunk. Each of the five acts of the play are arrayed from left to right, and finally, each play is arranged one above the other.

Information landscapes

Before delving into the details of the Virtual Shakespeare Project, it will be useful to review some previous work in the design of information landscapes by the students at the Visible Language Workshop. The purpose of these projects was to help define the design issues associated with three-dimensional typography.
In this early work, we developed a method of displaying typographic forms at any size, position, and orientation in three-dimensional (3D) space. A virtual camera is then moved through and about the space, exploring the information, both text and images, which inhabits the space. By adapting and expanding on techniques developed for two-dimensional graphic design to the mostly unexplored realm of three-dimensional design, a number of visual experiments were produced. These interactive sketches addressed a number of design issues, including the use of perspective, scale, space, and interaction to create simple and flexible visualizations of information.
First, we must remember that letters were designed to be viewed directly on a flat two-dimensional (2D) surface and, by allowing arbitrary viewpoints, perspective distortion is created. Although this is correct for 3D perception, it is less than ideal for reading. Since it is not always possible to guarantee the angle of the view relative to the angle of the text, one cannot be certain of maintaining the integrity of the letterform. Each new angle will result in a differently shaped letter and at extreme angles the image can be reduced to a line. Furthermore, when the camera moves behind the text, it looks reversed as though seen in a mirror. While certain word shapes can still be recognized in less than ideal circumstances, in general there are few views from which text holds its legibility. One can solve this problem by constantly rotating all of the text objects so that they face the viewer, but this has the problem of destroying the overall structure of a complex three-dimensional space. Another solution is to constrain the movement of the camera to maintain a minimum legibility of text in the scene, but such constraints are not always acceptable.
In Figure 2 a map of North America is labeled with the country names Canada and the United States (partially seen), which are easily read when the map is viewed from the common orientation with north at the top. Still, nothing prevents the viewer from moving to the North Pole, from where the text will appear reversed.
Figure 2
A graphic designer can use size differences to visually distinguish certain elements in a text, such as the headline of a newspaper story or the fine print on a contract. In a three-dimensional space, you cannot always resolve the relative size of two objects. If one object appears smaller in the picture plane, it could actually be smaller, or it could be the same size and farther away, or it could even be much larger and very far away. So, in the design of an information space, one must be careful about using size as a differentiating variable. One interesting advantage of designing an information space is that the designer can use an almost unlimited range of scale to represent information. In print, the difference in size between the largest and smallest element is limited by the resolution of the printer and the physical size of the paper, but in a virtual space, typography can have almost unlimited variations in scale. This was explored in some detail in the Virtual Shakespeare design that will be discussed later.
We can also examine the use of space and how that differs in the design of digital media. Traditional graphic design has always been concerned with the disposition of pictorial space. There is a constant tension between the desire to include as much content as possible and using white space to create a harmonious and uncluttered image. In two dimensions, the designer is constrained by the limits of physical space and the static nature of the medium. In three dimensions, despite the easing of those constraints, the problem remains to create clear, legible relationships. Because of the ease with which one can create content at different scales and orientations, it is possible in an electronic landscape to present massive amounts of information while giving an impression of low visual density. This capability can work against the design as easily as it can help. Vast amounts of empty space can lead to an environment with very low legibility. A legible landscape is one that is meaningful, rich, and clear. Kevin Lynch wrote in his book The Image of the City: [3]

By this [legibility] we mean the ease with which its parts can be recognized and can be organized into a coherent pattern. Just as this printed page, if it is legible, can be visually grasped as a related pattern of recognizable symbols, so a legible city would be one whose districts or landmarks or pathways are easily identifiable and are easily grouped into an over-all pattern.

Because typographic elements can appear at any scale, an information landscape can create a good sense of overview and context while losing a clear understanding of the density of the content. In a paper book, we can understand at a glance the amount of text by the size of the book, the width of its spine, and so forth. As we dynamically shift the scale of an electronic text, we may not be able to have a constant yardstick or scale against which to understand the size of a text. This can be seen in the figure on the title page, where an entire Shakespearian play is visible. It is difficult to get a sense of how many words are in the play or how long it will take to read.
Finally, we can examine how we can use either increased spatial resolution or time varying images to increase the density of information displays. The constant dimensions of the computer screen and its low resolution (100 dots per inch, or dpi) when compared to print (over 600 dpi) greatly limit the amount of text that can be simultaneously presented to the user. One solution to this problem is simply to increase the resolution of the display (see Figure 3). The Visible Language Workshop has built an extremely high-resolution display that enables the simultaneous display of large amounts of information. [4]
Figure 3
This approach, however, makes extreme demands on display technology, compute power, and data bandwidth. Instead we can use a standard size display and dynamically shift our viewpoint around a larger virtual information space. Although the resolution at any one moment in time is still limited, we can smoothly move from an overview to a detailed view in a manner that helps to maintain the all-important context of the larger body of information. Animated typography can also be used to enhance the emotional content of a message. Yin Yin Wong has explored the use of dynamic forms to create expressive visual narratives (see Figure 4). [5] By using moving typography, she was able to overcome the resolution limits of the display to create type that appears to be speaking to the reader with an incredible density of meaning. Her work hints that the computer cannot only faithfully reproduce text, as in a book, but can express the meaning of the text, as an actor performing a role.
Figure 4

Related work

I would like to briefly discuss two examples that inspired this work. One particularly interesting piece is Vannevar Bush's 1946 essay "As We May Think," in which the idea of hypertext is first proposed. [6] I was particularly struck by his description of the physical device that would be used to access information, which he termed the Memex.

If the user wishes to consult a certain book, he taps its code on the keyboard, and the title page of the book promptly appears before him, projected onto one of his viewing positions. Frequently-used codes are mnemonic, so that he seldom consults his code book; but when he does, a single tap of a key projects it for his use. Moreover, he has supplemental levers. On deflecting one of these levers to the right he runs through the book before him, each page in turn being projected at a speed which just allows a recognizing glance at each. If he deflects it further to the right, he steps through the book 10 pages at a time; still further at 100 pages at a time. Deflection to the left gives him the same control backwards. A special button transfers him immediately to the first page of the index. Any given book of his library can thus be called up and consulted with far greater facility than if it were taken from a shelf. As he has several projection positions, he can leave one item in position while he calls up another. He can add marginal notes and comments, taking advantage of one possible type of dry photography, and it could even be arranged so that he can do this by a stylus scheme, such as is now employed in the telautograph seen in railroad waiting rooms, just as though he had the physical page before him.

Note that there are several ways of traversing the information. One can jump directly to any point in the corpus. In addition, we can travel through the work at speeds of increasing orders of magnitude. Technology has now advanced to the point where building a Memex machine as Bush describes is perfectly feasible, and some of his ideas, such as using order of magnitude jumps in navigation, were incorporated into the design of Virtual Shakespeare.
While Bush's Memex and other descriptions of "cyberspace" [7] are useful thought experiments, another example of an existing artifact that addresses the issues of macro/micro readings is Maya Ying Lin's Vietnam Veterans' Memorial in Washington, D.C. The 58000 war casualties are displayed so that one can start by looking at individual names and slowly include more and more names until the enormity of 58000 is made visible. Approaching from either side, you first read a few names. As you continue forward, the wall grows taller beside you, each panel containing more and more names. Each name is clearly legible as you pass it, but the entire list of names becomes a statistical blur in the distance. A consistent scale helps the visitor to measure the information from the level of an individual up to the war as a whole. Because the observer can use his or her own body to measure the text, it is possible to avoid the problem of indeterminate scale seen in the title page figure. This suggests that a stronger feeling of embodiment in an information landscape will aid in its comprehension.

Virtual Shakespeare

The purpose of the Virtual Shakespeare Project was to explore the design of a large body of textual information. I chose to visualize the complete plays of William Shakespeare. The amount of text is on the order of one million words and the work itself has many structures that can be made visible: speeches, scenes, acts, and so forth. A rendering model was developed that is optimized for rapid navigation and changes in scale. If your viewpoint is close to the text it will be fully rendered. If it is farther away, and therefore smaller on the display, a simplified texture is used in place of each line of text. This technique, called greeking, maintains the overall shape of each line, although individual words are lost. As distance increases to the point where each line of text blurs into the next, each block of text is drawn as a simple rectangle of the same size and overall density. Breaks between the dialog of different characters are used as the delineator for the larger text blocks. This means that even at a great distance, the reader can still follow who was speaking and how much was said. The final stage comes when the dialogs become so small as to merge together. At this point each scene is rendered as a simple rectangle. These different views are shown in Figure 5. As we move back to include ever larger amounts of information in our view, the display of the information becomes more abstract while maintaining visual continuity.
Figure 5
It is important that all transitions from one level of detail to another be as smooth and inconspicuous as possible. The reader should believe that all the information is there on the screen. A simple cross-fade is used to blur the transition from one state to another. This works quite well; however, there are still some problems associated with color and the typography itself. As typographic elements change size, it is not always possible to maintain a consistent perceived color for the text. As an object becomes smaller in the visual field, its surroundings have a greater effect on its perceived color. In the case of the rendering engine used in the Virtual Shakespeare Project, the text becomes darker as it gets smaller. This becomes a problem when color, or even brightness, is used to distinguish one object from another. Figure 6 shows how highlighting can work effectively at two different distances. Because it is easy to see Titania's dialog while viewing the entire play, we can readily explore her thread through the narrative.
Figure 6
In addition to perceived shifts in color, the typographic forms themselves appeared somewhat unstable. That is, the thickness of the stems appears to change, the serifs wriggle and fade, and the counters tend to clog up when the letterforms shrink. Typefaces are generally designed to be used in only a small range of sizes and always so that they are flat to the page. One problem that needed to be overcome was the fact that letters were being used that were much larger or smaller than had ever been intended and that some view angles created such perspective distortion as to render the typeface illegible. Through experimentation, we have found that some typefaces are more sturdy in this respect than others; however, no single typeface is adequate for all situations. What will be required is to design a new kind of typeface that can dynamically adjust its form to its environment. For example, in the days of lead typefaces, each size was designed independently. Designers knew that a letterform that looked clean and elegant at 12 points would be tall and spindly looking at 6 points, so letterforms became squatter and thicker as they grew smaller. New work in multimaster typefaces by Adobe Systems Inc. allows the generation of a range of faces from a single master; [8] however, they have not been used in a dynamic display. Future work will explore the generation of variable typefaces that can adapt to suit their environment.
Despite these problems, it is possible to use cues such as color or change in typeface to visually highlight portions of the text. For example, one may be interested in seeing all of the dialog for a specific character. Whatever visual technique is used, it should clearly distinguish the selected text at a wide range of scales. If a change in typeface, such as boldface or italic, is used it can be difficult to see when viewing an entire scene or act. Dynamic highlighting, such as blinking can be effective when the selected text is small and could be swamped by other information; however, it can also render illegible just that information that one wishes to make visible. To avoid these problems, I used brightness to cue the dialog of a character. The contrast between the selected and unselected text was continuously adjusted to account for changes in scale. As the distance from the text increases, bright objects become surrounded by more and more black space and must be made brighter to seem to maintain a consistent visual distinction from the unselected text. The ability to visually filter out some portions of the text enables the reader to see patterns and structure that were impossible to find in the traditional book format. For example, when Titania's dialog is highlighted, you can immediately see her role in the narrative structure. She is introduced in the second act, has a rather long soliloquy, and then comes and goes a few times during the rest of the play. By allowing us to read within a meaningful context, the computer can fundamentally change the kinds of understandings we can glean from a text.
The use of space in an information landscape is fundamentally different from that of traditional design. One example of this can be seen in the presentation of footnotes or supplementary material. In traditional book design there are few options for visually treating such related materials. Footnotes can be placed at the bottom of the page or in the margin and referenced by number or asterisk. The length of the footnote is quite limited, unless it appears in an extremely small and barely legible typeface. Tschichold carefully enumerates the many typesetting problems associated with footnotes in The Form of the Book. [9] Hypertext systems, such as those used to access the Internet World Wide Web, allow the designer to tag a text with footnotes of arbitrary size. However, when the reader selects a link the footnote appears and completely obliterates the original text. The use of 3D space gives the designer new possible solutions to this problem, two of which are shown in Figure 7.
Figure 7
One obvious solution is to take advantage of the ability to rapidly change scale and place the footnote next to the referring text, but much smaller, as shown in the first image. Since size is arbitrary, you can even put a footnote in the dot of an i or in the period at the end of a sentence. The problem with this solution, which this extreme example makes clear, is that it is difficult to see both the footnote and the referring text at the same time. In the second image, the footnote is shown at the same size as the main text, but at ninety degrees to it. When looking directly at the text, the footnotes, being infinitely thin, disappear, but with a quick twist they can be read. This solution has the advantage of providing quick, yet unobtrusive access and allowing the simultaneous display of both texts.
Although these new rendering techniques allow many different views of a large-scale text, the visualization is only useful if the user can easily navigate about the text. A number of new methods of navigation were developed to address this problem. Most current interface paradigms (windows, buttons, mice) were based on a two-dimensional screen. A three-dimensional model requires new kinds of controls that allow for easy manipulation in space. Three different approaches were developed that make use of LEGO** brick technology and a magnetic field sensor that is used to determine an exact position and orientation of an object in space. [10]
The first approach was to use one position/orientation sensor to act as a handle for the text and another as a virtual camera. By holding the first sensor in one hand it is possible to easily control the position and orientation of the text. A graphical representation of the "handle" is displayed along with the text to help orient the user. The other sensor is built into a small LEGO helicopter that can easily fit into the other hand. A helicopter was used because it has an implicit sense of pointing and an implicit orientation (rotor above, tail behind, windshield in front). The helicopter pilot "looks" at the handle and determines the position of the virtual camera in the information space (see Figure 8). In addition to these controls, some simple LEGO machines were constructed that allow the user to zoom in and out, and to position the text horizontally and vertically relative to the virtual handle.
Figure 8
The second approach consisted of a small LEGO stage. The positioning gears were built into the structure of the stage in two orthogonal orientations (see Figure 9). To move the text left and right, the wheel above the stage is turned left and right. To move up and down, the wheel on the side of the stage is used. The angle of the text is controlled by rotating the entire stage assembly. To select different characters in the play, the corresponding LEGO actor is placed on the stage. A resistor is soldered onto the feet of each figure and the stage can then recognize each actor when he or she is snapped in place. The footlights come on and the text for the character lights up.
Figure 9
The last approach was to merge the display with the navigation controls (see Figure 10). This was done by building a handheld display prototype of balsa wood and LEGO bricks. There is a single button to the right of the display that acts as a clutch. When the button is engaged, moving the display moves the virtual camera about the information landscape. Tipping the display forward moves the camera down and scrolls the text up. Pushing the display away from your body zooms out, and pulling the display close to your body zooms in. Tilting the display to the right or left causes the text to slide horizontally in that direction. The amount of motion increases exponentially with the amount of movement, which gives the user both fine control over small motions and very fast, large motions. When the thumb control is not pressed, the display is locked and will not move.
Figure 10
All of these methods increase the ease with which the reader can navigate the text and greatly increase the utility of the system. For example, giving the user a physical handle on the text makes it easy to quickly reorient it to see footnotes, which may be placed at right angles to the main text. The most important lesson learned from these experiments was that it is impossible to separate the visual design from the design of the interface. Subtle interactions between the visual design and the physical controls may facilitate many actions but make others more difficult.

Conclusion

The current paradigm of a computer with a fixed, heavy display, a keyboard, and a mouse is not nearly as comfortable to use as a simple book. Nonetheless the ability of the computer to gather, analyze, and filter vast amounts of data makes it indispensable in today's world. Any attempt to improve the design of information in electronic media should address not only the visual display of the information, but the design of the computer itself and how one interacts with it in the context of the real world.
In making information accessible to people, it is necessary for designers to rethink current design paradigms. The computer screen is not a piece of paper and should not be treated as such. By taking advantage of the ability of the computer to display dynamic, flexible, and adaptive typography, we can invent new ways for people to read, interact with, and assimilate the written word. Like a garden, well-designed information should be legible, inviting, and comfortable, and its exploration should and can be a true delight.
**Trademark or registered trademark of LEGO Systems, Inc.

Cited references

Accepted for publication March 25, 1996.
Reprint Order No. G321-5621.

[ Journals home page | Subscribe/order | Current issue | Recent issues | Description ]

©1998 IBM Corporation