N-Grams and the History of Computing, Part 2
I was fiddling around some more with n-grams, and I came across a surprising result. So surprising, in fact, that I am deeply suspicious of it. As you can see from the graph, I searched for "radio," "television," "computer" from 1920 to 2000. The oddity is the powerful surge of "computer" in the 1950s and 60s. If n-grams are supposed to be a tool for the quantitative study of culture, surely there is something badly off here. "computer" passes "television" in about 1962 and "radio" in 1966. In that period the majority of Anglo-American households had at least one television and multiple radios (including those built into their cars). A substantial portion of the typical day would have been spent watching and/or listening to each, and an additional chunk discussing programs with friends and colleagues. Computers, on the other hand, were in approximately 0 homes (perhaps a few hundred if you count home consoles and simple electronic calculators). The total number of computer users might perhaps have reached the low 100,000s by the end of this period. It's fair to say that the vast majority didn't know much or think much about them, other than while reading the occasional newspaper article. So what exactly am I measuring here? I suspect two factors are biasing the results:
- The collection bias of university libraries: There were many new books on computing in the 1960s targeted at elite audiences in academia, business, and industry. It was the hot new tool for researchers in fields from bio-medicine to history, and in industries from manufacturing to the law. TV, on the other hand, was mostly looked down upon by the intellectual classes, a "vast wasteland". (As an aside along these same lines, it certainly seems salient that the incidence of "TV" is miniscule compared to the more formal "television", although the former was surely spoken and perhaps printed more often than the latter).
- The bias towards low-print-run editions inherent in the Google Ngram Viewer's algorithm. N-grams just counts words. Printing volume counts for nothing. Even to the extent that Google Books does incorporate TV Guide, for example, the many millions of copies of each of its words that were actually printed gets no more weight than the mass-market sensation, The Computer: An Effective Management Tool for Manufacturing Control.
Perhaps the Google NGram Viewer has more to tell us about the nature and history of its source-base than it does about the history of culture writ large.
- cfmcdonald's blog
- Log in or register to post comments