Milestones in the History of Data

Visualization: A Case Study in Statistical


Michael Friendly

Psychology Department and Statistical Consulting Service

York University

4700 Keele Street, Toronto, ON, Canada M3J 1P3

in: Classification: The Ubiquitous Challenge.



author = {M. Friendly},

title = {Milestones in the History of Data Visualization:

A Case Study in Statistical Historiography}, year = {2005}, publisher = {Springer}, address = {New York}, booktitle = {Classification: The Ubiquitous Challenge}, editor = {C. Weihs and W. Gaul}, pages = {34--52}, url = {http://www.math.yorku.ca/SCS/Papers/gfkl.pdf}, } © 2005 Springer-Verlag.

© 2005 Springer-Verlag.

Milestones in the History of Data Visualization:

A Case Study in Statistical Historiography Michael Friendly Psychology Department York University, Toronto, Canada friendly@yorku.ca Abstract. The Milestones Project is a comprehensive attempt to collect, document, illustrate, and interpret the historical developments leading to modern data visualization and visual thinking. This paper provides an overview and brief tour of the milestones content, with a few illustrations of significant contributions to the history of data visualization. This forms one basis for exploring interesting questions and problems in the use of statistical and graphical methods to explore this history, a topic that can be called “statistical historiography.” 1 Introduction The only new thing in the world is the history you don’t know.—Harry S Truman The graphic portrayal of quantitative information has deep roots. These roots reach into the histories of the earliest map-making and visual depiction, and later into thematic cartography, statistics and statistical graphics, medicine, and other fields, which are intertwined with each other. They also connect with the rise of statistical thinking and widespread data collection for planning and commerce up through the 19th century. Along the way, a variety of advancements contributed to the widespread use of data visualization today. These include technologies for drawing and reproducing images, advances in mathematics and statistics, and new developments in data collection, empirical observation and recording.

From above ground, we can see the current fruit; we must look below to un- derstand their germination. Yet the great variety of roots and nutrients across these domains, that gave rise to the many branches we see today, are often not well known, and have never been assembled in a single garden, to be studied or admired.

The Milestones Project is designed to provide a broadly comprehensive and representative catalog of important developments in all fields related to the history of data visualization. Toward this end, a large collection of images, bibliographical references, cross-references and web links to commentaries on these innovations has been assembled.

This is a useful contribution in its own right, but is a step towards larger goals as well. First, we see this not as a static collection, but rather a dynamic database that will grow over time as additional sources and historical contributions are uncovered or suggested to us. Second, we envisage this project as providing a tool to enable researchers to work with or study this history, finding themes, antecedents, influences, patterns, trends, and so forth. Finally, as implied by our title, work on this project 2 Friendly has suggested several interesting questions subsumed under the self-referential term “statistical historiography.”

1.1 The Milestones Project The past only exists insofar as it is present in the records of today. And what those records are is determined by what questions we ask.—Wheeler (1982, p. 24) There are many historical accounts of developments within the fields of probability (Hald, 1990), statistics (Pearson, 1978, Porter, 1986, Stigler, 1986), astronomy (Riddell, 1980), cartography (Wallis and Robinson, 1987), which relate to, inter alia, some of the important developments contributing to modern data visualization. There are other, more specialized accounts, which focus on the early history of graphic recording (Hoff and Geddes, 1959, 1962), statistical graphs (Funkhouser, 1936, 1937, Royston, 1970, Tilling, 1975), fitting equations to empirical data (Farebrother, 1999), cartography (Friis, 1974, Kruskal, 1977) and thematic mapping (Palsky, 1996, Robinson, 1982), and so forth; Robinson (1982, Ch. 2) presents an excellent overview of some of the important scientific, intellectual, and technical developments of the 15th –18th centuries leading to thematic cartography and statistical thinking.

But there are no accounts that span the entire development of visual thinking and the visual representation of data, and which collate the contributions of disparate disciplines. In as much as their histories are intertwined, so too should be any telling of the development of data visualization. Another reason for interweaving these accounts is that practitioners in these fields today tend to be highly specialized, often unaware of related developments in areas outside their domain, much less their history. Extending Wheeler (1982), the records of history also exist insofar as they are collected, illustrated, and made coherent.

The initial step in portraying the history of data visualization was a simple chronological listing of milestone items with capsule descriptions, bibliographic references, markers for date, person, place, and links to portraits, images, related sources or more detailed commentaries. Its current public and visible form is that of

hyper-linked, interactive documents available on the web and in PDF form (http:

//www.math.yorku.ca/SCS/Gallery/milestone/). We started with the developments listed by Beniger and Robyn (1978) and incorporated additional listings from Hankins (1999), Tufte (1983, 1990, 1997), Heiser (2000), and others.

With assistance from Les Chevaliers, many other contributions, original sources, and images have been added. As explained below, our current goal is to turn this into a true multi-media database, which can be searched in flexible ways and can be treated as data for analysis.

2 Milestones Tour

–  –  –

picture— recounting the history of data visualization— each milestone item has a story to be told: What motivated this development? What was the communication goal? How does it relate to other developments? What were the pre-cursors? What makes it a milestone? To illustrate, we present just a few exemplars from a few of these periods. For brevity, we exclude the earliest period (pre-17th century) and the most recent period (1975–present) in this description.

2.1 1600-1699: Measurement and theory Among the most important problems of the 17th century were those concerned with physical measurement— of time, distance, and space— for astronomy, surveying, map making, navigation and territorial expansion. This century also saw great new growth in theory and the dawn of practice— the rise of analytic geometry, theories of errors of measurement and estimation, the birth of probability theory, and the beginnings of demographic statistics and “political arithmetic.” As an example, Figure 1 shows a 1644 graphic by Michael Florent van Langren, a Flemish astronomer to the court of Spain, believed to be the first visual representation of statistical data (Tufte, 1997, p. 15). At that time, lack of a reliable means to determine longitude at sea hindered navigation and exploration.1 This 1D line graph shows all 12 known estimates of the difference in longitude between Toledo and Rome, and the name of the astronomer (Mercator, Tycho Brahe, Ptolemy, etc.) who provided each observation.

Fig. 1. Langren’s 1644 graph of determinations of the distance, in longitude, from Toledo to Rome. The correct distance is 16◦ 30. Source: Tufte (1997, p. 15) What is notable is that van Langren could have presented this information in various tables— ordered by author to show provenance, by date to show priority, or by distance. However, only a graph shows the wide variation in the estimates; note that the range of values covers nearly half the length of the scale. Van Langren took as his overall summary the center of the range, where there happened to be a large enough gap for him to inscribe “ROMA.” Unfortunately, all of the estimates were biased upwards; the true distance (16◦ 30 ) is shown by the arrow. Van Langren’s 1 For navigation, latitude could be fixed from star inclinations, but longitude required accurate measurement of time at sea, an unsolved problem until 1765.

4 Friendly graph is also a milestone as the earliest-known exemplar of the principle of “effect ordering for data display” (Friendly and Kwan, 2003).

2.2 1700-1799: New graphic forms The 18th century witnessed, and participated in, the initial germination of the seeds of visualization that had been planted earlier. Map-makers began to try to show more than just geographical position on a map. As a result, new graphic forms (isolines and contours) were invented, and thematic mapping of physical quantities took root.

Towards the end of this century, we see the first attempts at the thematic mapping of geologic, economic, and medical data.


graphs, and graphs of functions were introduced, along with the early beginnings of statistical theory (measurement error) and systematic collection of empirical data. As other (economic and political) data began to be collected, some novel visual forms were invented to portray them, so the data could “speak to the eyes.” As well, several technological innovations provided necessary nutrients. These facilitated the reproduction of data images (color printing, lithography), while other developments eased the task of creating them. Yet, most of these new graphic forms appeared in publications with limited circulation, unlikely to attract wide attention.

William Playfair (1759–1823) is widely considered the inventor of most of the graphical forms widely used today— first the line graph and bar chart (Playfair, 1786), later the pie chart and circle graph (Playfair, 1801). A somewhat later graph (Playfair, 1821), shown in Figure 2, exemplifies the best that Playfair had to offer with these graphic forms. Playfair used three parallel time series to show the price of wheat, weekly wages, and reigning monarch over a ∼250 year span from 1565 to 1820, and used this graph to argue that workers had become better off in the most recent years.

2.3 1800-1850: Beginnings of modern graphics With the fertilization provided by the previous innovations of design and technique, the first half of the 19th century witnessed explosive growth in statistical graphics and thematic mapping, at a rate which would not be equalled until modern times.

In statistical graphics, all of the modern forms of data display were invented: bar and pie charts, histograms, line graphs and time-series plots, contour plots, scatterplots, and so forth. In thematic cartography, mapping progressed from single maps to comprehensive atlases, depicting data on a wide variety of topics (economic, social, moral, medical, physical, etc.), and introduced a wide range of novel forms of symbolism.

To illustrate this period, we choose an 1844 “tableau-graphique” (Figure 3) by Charles Joseph Minard, an early progenitor of the modern mosaic plot (Friendly, 1994). On the surface, mosaic plots descend from bar charts, but Minard introduced two simultaneous innovations: the use of divided and proportional-width bars so that area had a concrete visual interpretation. The graph shows the transportation Milestones in the History of Data Visualization 5 Fig. 2. William Playfair’s 1821 time series graph of prices, wages, and ruling monarch over a 250 year period. Source: Playfair (1821), image from Tufte (1983, p. 34) of commercial goods along one canal route in France by variable-width, divided bars (Minard, 1844). In this display the width of each vertical bar shows distance along this route; the divided bar segments have height ∼ amount of goods of various types (shown by shading), so the area of each rectangular segment is proportional to cost of transport. Minard, a true visual engineer (Friendly, 2000), developed such diagrams to argue visually for setting differential price rates for partial vs. complete runs. Playfair had tried to make data “speak to the eyes,” but Minard wished to make them “calculer par l’oeil” as well.

2.4 1850-1900: The Golden Age of statistical graphics By the mid-1800s, all the conditions for the rapid growth of visualization had been established. Official state statistical offices were established throughout Europe, in recognition of the growing importance of numerical information for social planning, industrialization, commerce, and transportation. Statistical theory, initiated by Gauss and Laplace, and extended to the social realm by Quetelet, provided the means to make sense of large bodies of data.

What started as the Age of Enthusiasm (Palsky, 1996) for graphics may also be called the Golden Age, with unparalleled beauty and many innovations in graphics and thematic cartography.

2.5 1900-1950: The modern dark ages If the late 1800s were the “golden age” of statistical graphics and thematic cartography, the early 1900s could be called the “modern dark ages” of visualization (Friendly and Denis, 2000).

