Felix Stalder on Mon, 20 Aug 2012 14:20:45 +0200 (CEST) |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
<nettime> Computing and Visualizing the 19th-Century Literary Genome. |
[Another quantitative study of cultural history, like Moretti's Graphs, Maps, Trees (2003) or Lev Manovich's work in cultural analytics. Fascinating stuff, if I only knew what to make of it. The figures, for example, are really beautiful, though, for me, entirely incomprehensible. Ah, the joys of visualization. Felix ] Jockers, Matthew, Stanford University, USA, [email protected] http://tinyurl.com/9khetrl Overview In literary studies, we have no shortage of anecdotal wisdom regarding the role of influence on creativity. Consider just a few of the most prominent voices: - 'Talents imitate, geniuses steal' - Oscar Wilde (1854-1900?).1 - 'All ideas are second hand, consciously and unconsciously drawn from a million outside sources' Mark Twain (1903). - 'The historical sense compels a man to write not merely with his own generation in his bones, but with a feeling that the whole of the literature - has a simultaneous existence.' T. S. Eliot (1920). - 'The elements of which the artwork is created are external to the author and independent of him.' Osip Brik (1929). Anxiety of Influence - Harold Bloom (1973). Whether consciously influenced by a predecessor or not, it might be argued that every book is in some sense a necessary descendant of, or necessarily 'connected to', those before it. Influence may be direct, as when a writer models his or her writing on another writer,2 or influence may be indirect in the form of unconscious borrowing. Influence may even be 'oppositional' as in the case of a writer who wishes to make his or her writing intentionally different from that of a predecessor. The aforementioned thinkers offer informed but anecdotal evidence in support of their claims of influence. My research brings a complementary quantitative and macroanalytic dimension to the discussion of influence. For this, I employ the tools and techniques of stylometry, corpus linguistics, machine learning, and network analysis to measure influence in a corpus of late 18th- and 19th-century novels. Method The 3,592 books in my corpus span from 1780 to 1900 and were written by authors from Britain, Ireland, and America; the corpus is almost even in terms of gender representation. From each of these books, I extracted stylistic information using techniques similar to those employed in authorship attribution analysis: the relative frequencies of every word and mark of punctuation are calculated and the resulting data winnowed so as to exclude features not meeting a preset relative frequency threshold.3 From each book I also extracted thematic (or ?topical') information using Latent Dirichlet Allocation (Blei, Ng et al. 2003; Blei, Griffiths et al. 2004; Chang, Boyd-Graber et al. 2009). The thematic data includes information about the percentages of each theme/topic found in each text.4 I combine these two categories of data - stylistic and thematic - to create 'book signals' composed of 592 unique feature measurements. The 'Euclidian' metric is then used to calculate every book's distance from every other book in the corpus. The result is a distance matrix of dimension 3,592 x 3,592.5 While measuring and tracking 'actual' or 'true' influence - conscious or unconscious - is impossible, it is possible to use the stylistic-thematic distance/similarity measurements as a proxy for influence.6 Network visualization software can then be used as a way to organize, visualize, and study the presence of influence among of books in my corpus.7 To prepare the data for use in a network environment, I converted the distance matrix into a long-form table with 12,902,464 rows and three columns in which each row captures a distance relationship between two books. The first cell contains a 'source' book, the second cell a 'target' book, and a third cell the measured distance between the two. After removing all of the records in which the target book was published before, or in the same year as, the source book,8 the data was reduced from 12,902,464 records to 6,447,640. This data and a separate table of metadata were then imported into the open source network analysis software package Gephi (2009) for analysis and visualization. Networks are constructed out of nodes (books) and edges (distances). When plotted, nodes with less similarity (i.e. larger distances between them) will spread out further in the network. Figure 1 offers a simplified example of three imaginary books. Figure 1 http://tinyurl.com/9khetrl Figure 1: a sample network with edge numbers representing measured distances between nodes While it is not possible to show the details of the entire network here, it is possible to display several of the most obvious macro-structures. Figure 2, for example, presents a zoomed out view of the network with book nodes colored according to dates of publication.9 Figure 2 http://tinyurl.com/9khetrl Figure 2: The 19th-century novel network colored according to publication date The shading of nodes and edges according to publication date reveals the inherently chronological nature of stylistic and thematic change. The progressive darkening of the nodes from east to west allows us to see, at the macro-scale, how style and theme are changing and evolving over time. 10Also seen in this image is a 'satellite' of books in the northwest. This satellite represents a 'community' of novels that are highly self-similar but at the same time markedly different from the books in the main network cluster. 11When the network is recolored according to gender (figure 3), a new axis can be seen splitting the network into northern and southern sectors along gender lines. Figure 3 http://tinyurl.com/9khetrl Figure 3: The 19th-century novel network colored according to author-gender This visualization (Figure 3) reveals that works by female authors (colored light gray) and male authors (black) are more stylistically and thematically homogeneous within their respective gender classes. As a result of this similarity in 'signals,' female-authored books cluster together on the south side of the main network, while male-authored books are drawn together in the north.12 These two 'views' of the network allow us to begin imagining the larger macro-history of thematic-stylistic change and influence in the 19th-century novel. What is not obvious in this macro-view, however, is that a great many of the individual books we have traditionally studied are in fact 'mutations' or outliers from the general trends. Harriet Beecher Stowe's Uncle Tom's Cabin, for example, clusters closer to the works of male authors, and Maria Edgeworth's Belinda has a signal that does not become dominant for forty years after the date of Belinda's publication. Also absent from the macro-view are the individual thematic-stylistic 'legacies'. Using three measures of network significance (weighted in-degree, weighted out-degree and Page-Rank), 13I will end my presentation with the argument that Jane Austen and Walter Scott are at once the least influenced (i.e. most original) of the early writers in the network and, at the same time, the most influential in terms of the longevity, or 'fitness,' of their thematic-stylistic signals. The signals introduced by Austen and Scott position them at the beginning of a stylistic-thematic genealogy; they are, in this sense, the literary equivalent of Homo erectus or, if you prefer, Adam and Eve -- --- http://felix.openflows.com ------------------------ books out now: *|Vergessene Zukunft. Radikale Netzkulturen in Europa. transcript 2012 *|Deep Search. The Politics of Searching Beyond Google. Studienv. 2009 *|Mediale Kunst/Media Arts Zurich.13 Positions. Scheidegger&Spiess2008 *|Manuel Castells and the Theory of the Network Society.Polity P. 2006 *|Open Cultures and the Nature of Networks. Ed Futura / Revolver, 2005 # distributed via <nettime>: no commercial use without permission # <nettime> is a moderated mailing list for net criticism, # collaborative text filtering and cultural politics of the nets # more info: http://mx.kein.org/mailman/listinfo/nettime-l # archive: http://www.nettime.org contact: [email protected]