A network framework of cultural history

See allHide authors and affiliations

Science  01 Aug 2014:
Vol. 345, Issue 6196, pp. 558-562
DOI: 10.1126/science.1240064

A macroscopic view of cultural history

Sociologists and anthropologists study the growth and evolution of human culture, but it is hard to measure cultural interactions on a historical time scale. Schich et al. developed a tool for extracting information about cultural history from simple but large sets of birth and death records. A network of cultural centers connected via the birth and death of more than 150,000 notable individuals revealed human mobility patterns and cultural attraction dynamics. Patterns of city growth over a period of 2000 years differed between countries, but the distribution of birth-to-death distances remained unchanged over more than eight centuries.

Science, this issue p. 558


The emergent processes driving cultural history are a product of complex interactions among large numbers of individuals, determined by difficult-to-quantify historical conditions. To characterize these processes, we have reconstructed aggregate intellectual mobility over two millennia through the birth and death locations of more than 150,000 notable individuals. The tools of network and complexity theory were then used to identify characteristic statistical patterns and determine the cultural and historical relevance of deviations. The resulting network of locations provides a macroscopic perspective of cultural history, which helps us to retrace cultural narratives of Europe and North America using large-scale visualization and quantitative dynamical tools and to derive historical trends of cultural centers beyond the scope of specific events or narrow time intervals.

Quantifying historical developments is crucial to understanding a large variety of complex processes from population dynamics to disease spreading, conflicts, and urban evolution. However, in historical research there is an inherent tension (1, 2) between qualitative analyses of individual historical accounts and quantitative approaches aiming to measure and model more general patterns. We believe that these approaches are complementary: We need quantitative methods to identify statistical regularities, as well as qualitative approaches to explain the impact of local deviations from the uncovered general patterns. We have therefore developed a data-driven macroscopic perspective that offers a combination of both approaches.

We collected data from (FB) (3), the General Artist Lexicon (AKL) (46), and the Getty Union List of Artist Names (ULAN) (7), representing spatiotemporal birth and death information of notable individuals, spanning a time period of more than two millennia. The data sets are included in the supplementary materials (SM), accompanied by an explanation of their nature and data preparation (8) (tables S1 and S2). Potential sources of bias are addressed in the SM, including biographical, temporal, and spatial coverage; curated versus crowd-sourced data; increasing numbers of individuals who are still alive; place aggregation; location name changes and spelling variants; and effects of data set language. Most important, compared with contemporary worldwide migration flux (9), our data sets focus on birth-to-death migration within and out of Europe and North America (see fig. S1). Notability of individuals, simply defined as the curatorial decision of inclusion in the respective data set, differs slightly between the more current, partly crowd-sourced FB and the expert-curated AKL and ULAN.

There was sufficient data density for historical studies: In each data set, the number of notable individuals with birth and death locations provides substantially more data points over time than the commonly used estimates of the world population before the 20th century (Fig. 1A and fig. S2). Even though death locations are underreported (e.g., 153,000 out of 1.1 million in AKL), the data density was sufficient to construct heat maps or Lexis surfaces (10), as used in demography, to reveal death age (ir)regularities during more than five centuries, which enables us to highlight the impact of wars and varying longevity (compare Fig. 1B and fig. S3 for details).

Fig. 1 Birth and death data of notable individuals reveal interactions between culturally relevant locations over two millennia.

(A) Notable individuals with birth and death locations, alive in a given year from 1 to 2012 CE, for the FB, AKL, and ULAN databases shown together with the estimated world population (in millions, i.e., divided by 106 to compare the slope, compare fig. S2). As the data sets grow by orders of magnitude, fluctuations smooth out, allowing for quantification to complement qualitative inquiry. AKL and ULAN grow exponentially with the emancipation of artists around 1200. The decrease after the gray line is due to the fact that we only record individuals with known birth and death dates, and at recent times, more individuals are not yet dead or recorded (details on known biases are in the SM). (B) Demographic life table for FB indicating death age frequency from 1500 to 2012 CE (compare fig. S3 for detail). (C) Birth-death scatter plot for locations in FB, cumulated over all time with outliers colored as birth sources (blue) and death attractors (red) (see figs. S4 and S13 for dynamics, significance, and further data sets). (D) Illustration of birth-death flows of antiquarians in the 18th century, based on the Winckelmann Corpus (11), using the color scheme of the scatter plot above. (E) Migration in Europe based on FB, with node size corresponding to PageRank (compare figs. S5 to S7 for detail, further regions, and data sets).

We next added a spatial dimension by plotting the number of deaths versus births in each location (Fig. 1C and fig. S4). The plot distinguishes locations where notable people tended to be born (birth sources) from locations where they tended to die (death attractors). Both long-lived and short-lived death locations were observed, with the short-lived locations representing plane crash sites, battlefields, or concentration camps. We found outliers, where the imbalance of births and deaths results in significant deviations from the diagonal (as defined in the SM under Birth-Death Imbalance). Indeed, highly significant outliers, like Hollywood, had more than 10 times as many deaths as births.

When individual birth and death locations are connected, the resulting network reveals a consistent pattern of cultural attraction and interaction in space. For example, several hundred antiquarians in the 18th century (with data derived from the Winckelmann Corpus) (11), died in a number of relevant cultural centers such as Rome, Paris, or Dresden, even though they had been born all over Europe (Fig. 1D; see SM).

We also constructed a worldwide historical migration network, connecting 37,062 locations via the birth-death data of 120,211 individuals in the FB data set from King David in 1069 BCE to Poppy Barlow in 2012 CE (see fig. S5). On a map of Europe (Fig. 1E), the distribution of colors reveals a differentiated landscape of sources (blue, more births) and attractors (red, more deaths). The sizes of nodes represent their importance, estimated by their PageRank, calculated from the underlying migration network (12). We chose PageRank, one of the most popular centrality measures, as it offers clear advantages over other centrality measures (compare SM under PageRank versus Eigenvector Centrality), as well as a simple analogy, where every death counts as a vote for the target location, in the same way that hyperlinks are considered as a vote for their target Web site. We find that the PageRank hierarchy intuitively reflects the hierarchy of urban population size (13). Yet, although PageRank correlated reasonably well with the number of births in locations (r = 0.74), and even better with the number of deaths (r = 0.97), it did not predict the imbalance of births and deaths (r = 0.34): Large attractive locations, such as London, Paris, or Rome were complemented by many small attractors, e.g., at the French Riviera or both sides of the Alps. Other highly ranked locations, such as Edinburgh or Dublin, were more fertile than deadly, as was most of rural Europe. Additional regions and data sets with similar conclusions are presented in figs. S5 to S7.

The numbers of notable individuals N(t) and locations S(t) grew exponentially over time (Fig. 2A and fig. S8). Yet, the difference in growth rates for individuals (r) and locations (s) implies an underlying Heaps’ law (14) S(t) ≈ N(t)α, where α = s/r ≈ 0.9. The sublinear exponent α < 1 indicates that, in the long run, the growth of already existing attractive locations for notable individuals dominates over the emergence of new attractive locations.

Fig. 2 Birth-death networks provide historical evidence for global patterns and local instabilities in human mobility dynamics.

(A) The number N(t) of individuals as a function of the number S(t) of locations, where α = 0.9 (compare fig. S8 for other data sets). (B) Cumulative probability distribution slopes for birth and death frequency in FB locations from before 1300 to 2012 CE. The shaded area indicates the uncertainty of the slope (18) (see fig. S10, G to J, for detail and other data sets). (C) The fat-tailed distribution of birth-to-death distances Δr in FB exhibits little change over time from before 1300 to 2012 CE (compare fig. S11). (D) The relative death share and, consequently, rank of major FB locations over centuries from before 1300 to 2012 CE (compare fig. S12).

The probability distributions of birth locations ƒB and death locations ƒD, or birth-to-death paths ƒB→D, follow Zipf’s law (Fig. 2B and figs. S9 and S10) (15). The nature of the frequency distributions was highly consistent over several centuries, whereas the slopes for birth and death changed gradually over time (Fig. 2B and fig. S10, G to I). To our surprise, the slopes for births and deaths started to differ significantly from the 19th century onward in FB and even earlier for artists in AKL. The difference indicates that larger cultural centers attract a greater proportion of notable individuals, in line with recently discovered urban scaling laws (16, 17). We used an established method to fit a power law to the data to obtain the scaling exponents (18). We further confirmed the significant difference between ƒB and ƒD, using a two-sample Kolmogorov-Smirnov (KS) test comparing birth and death distributions directly (see. fig. S10J).

The distribution of birth-to-death distances Δr changed very little during more than eight centuries (Fig. 2C and fig. S11). The median distance from birth to death has not even doubled between the 14th and the 21st centuries (214 km and 382 km, respectively), with a minimum of 135 km in the 17th century (see vertical lines in Fig. 2C). Only long-range mobility, captured by the tail of the probability distribution Pr), changed because of the gradual colonization of the world and increasing traffic between the U.S. coasts. As such, these results are consistent with Ravenstein's laws of migration (19, 20), formulated in the late 19th century, and other empirical observations of human mobility in geography, demography, and sociology, from Zipf's intercity movement of persons (21) to modern census statistics (22) and measurements based on tracking dollar bills or mobile phones (23, 24). Our findings are nevertheless relevant, as (i) we can determine these patterns from a relatively small fraction of birth and death location pairs, and (ii) we demonstrate that the patterns hold for more than eight centuries on an international scale that is not divided by country boundaries.

Aside from these global patterns, we find considerable instabilities on a local level over the order of centuries. The death share, or the relative fraction of notable deaths in specific locations, was highly unstable over centuries (Fig. 2D and fig. S12). This local instability confirms recent expectations regarding the rise and fall of populations in top-ranked cities (13, 16, 25) but also points to substantial amounts of noise in the system (26).

Adding another aspect of local instability, the dynamics of birth-death imbalance for individual locations over centuries are tracked in fig. S13, measured as multiples me of the square-root-deviation e from the perfectly balanced diagonal in Fig. 1C and fig. S4. In fact, individual locations fluctuate substantially in this respect, as in the case of New York City, which is now a clear death attractor but gave birth to more notable individuals than it attracted around 1920.

Next, we illustrate the qualitative relevance of our macroscopic perspective by delineating the meta-narratives of European and North American cultural history, based on birth-death data without additional source material (movies S1 and S2, Fig. 3A, and fig. S14). The sequence of images in Fig. 3A exemplifies the cultural narrative of Europe from 0 to 2012 CE, as presented in movie S1 based on FB: In the beginning, a pan-European elite defined Rome as the center of its empire via massive long-range interactions, followed by increasing point-to-point migration throughout Europe, where Rome remained a hub along with rising subcenters, such as Cordova and Paris. Starting in the 16th century, data density in Europe becomes sufficient to reveal regional clusters. In fact, it becomes evident that Europe is characterized by two radically different cultural regimes: A winner-takes-all regime, with massive centralization toward centers such as Paris, and a fit-gets-richer regime, where many subcenters compete with each other in federal clusters throughout Central Europe and Northern Italy (27) (see Fig. 3, B and C, and fig. S15).

Fig. 3 The visualization of birth-death network dynamics offers a meta-narrative of cultural history.

(A) A sequence of frames, based on movie S1, exemplifies the FB narrative for Europe from Roman times to the present. The dynamically applied color scheme (with black and white inverted in print) denotes birth-death imbalance (blue to red) (compare Fig. 1C). In the supplementary movie, individuals appear as particles, indicating collective directions of flow as they move toward their death locations. Throughout the movie, local cohesive dynamics emerge regionally in addition to the massive long-range interactions, first from and to Rome and eventually to emergent country capitals and economic centers, including those in the East. The final network state for locations in 2012—within what is now France and Germany—is the result of massive centralization toward Paris versus multicentric competition in Germany. (B and C) Death-share plots for locations from before 1300 to 2012 CE confirm that France is characterized by a winner-takes-all regime, where Paris takes in a substantial and almost constant share of notable individuals (27). Germany, in contrast, is characterized by a subcritical fit-gets-richer regime, where no center surpasses 19% in any given century.

After demonstrating the global quantitative and qualitative relevance of our macroscopic approach, we now focus on the dynamics of individual cultural centers, defined as locations with substantial amounts of notable deaths. We examined notable events identified from the Google Ngram English data set (28), a procedure that can and should be complemented with data sets in other languages to allow for comparison and eventually worldwide coverage (known biases are discussed in the SM). Recording the frequency of words and word combinations in an estimated 5% of all books ever published, the Google Ngram data were originally used to plot the pattern frequency against book publication dates (29). Here, instead, we obtained events by searching for the pattern {location} in {year},” which allows us to map the “expression” of cultural centers over longer time periods, similar to a gene expression plot (30) (Fig. 4A). Particularly after 1750, dark spikes in the trajectory reveal outstanding historical events. Web searches even allow us to semiautomatically add event labels to these spikes. The resulting Ngram trajectories can be examined relative to total death rate trajectories (Fig. 4B and fig. S16), tracking deviations of locations from their nearly constant fitness ηiD(t) (compare fig. S17 and our model in the SM), and even relative to births and deaths within professional genres in FB, AKL, and ULAN (Fig. 4, C and D). By revealing such correlated changes and continuities, our approach allows for cross-fertilization of domain knowledge into other domains, periods, and geographic areas.

Fig. 4 Temporal death rate patterns in cultural centers reveal midterm trends that are hard to extract from other sources.

(A) English Google Ngram trajectory for the pattern “Paris in {year}” from 1500 to 1995 CE. Dark spikes point to outstanding historical events in the city, labeled semiautomatically using Web searches, such as “Paris in 1763” returning “Treaty of Paris.” (B) Paris death rate trajectories for FB total and AKL total indicate deviations from the nearly constant fitness ηiD(t) (compare fig. S16 and our model in the SM). Color indicates periods of accelerated (bright) versus slower growth (dark). The numbers at the ends of the trajectories indicate the respective number of individuals. (C) Trajectories for FB governance and AKL architecture positively correlate around the French Revolution from 1785 to 1805 (r = 0.89), whereas FB governance and artists in AKL fine arts slightly negatively correlate (r = –0.34). (D) Trajectories for AKL applied arts, AKL fine arts, and FB performing arts.

Supplementary Materials

Materials and Methods

Figs. S1 to S17

Tables S1 and S2

References (3161)

Movies S1 and S2

External Databases S1 to S4

References and Notes

  1. Materials and methods are available as supplementary materials on Science Online.
  2. Acknowledgments: We are grateful to Verlag Walther De Gruyter (AKL), The Getty Research Institute (ULAN), and Biering and Brinkmann (WCEN) for making data available to us and for allowing all data needed to replicate the conclusions of the paper to be available as SM. We furthermore thank our collaborators at BarabásiLab and ETH SOMS for discussions and comments on the manuscript. The work of M.S. was partially supported by German Research Foundation (DFG) grant (no. SCHI 1065/2-1) and The University of Texas at Dallas Arts and Technology (ATEC) Fellowship no. 1. D.H. is grateful for partial support by the European Research Council Advanced Investigator Grant “Momentum” (grant no. 324247).
View Abstract

Navigate This Article