On the exciting occasion of the release of Alex Wright’s Cataloging the World: Paul Otlet and the Birth of the Information Age I decided to post another vault piece. I didn’t write too much about Otlet, but it covers a lot of the same ground, if not with great competence.
The point of departure for the present paper was Clifford Lynch’s 2005 article in D-Lib Magazine, “Where Do We Go From Here? The Next Decade for Digital Libraries.” I was inspired by the broad parameters Lynch gave to the field, and the theoretical possibilities such a vision opens up. The field, as practiced today, is overwhelmingly defined in an ex post facto manner, from a survey of existing collections, technologies, and administrative priorities. The relatively limited number of practicable options we have before us at the given moment bound the parameters of discussion; these options are implicitly assumed as given.
Computer science is not generally a discipline, like those in literature and the arts, which busies itself with excavating the past to take up possibilities previous generations had discarded. But I believe the perpetual storm of digital innovation masks an underlying inertia and even amnesia; certainly it encourages speculation on the future to proceed entirely from extrapolation of the present. Perhaps a reassessment of early theoretical computer science, in this case digital library science, will allow for more radical futurological speculation than the current milieu, mired in what Sigmund Freud called “the narcissism of minor differences,” may permit.
I became interested in the prehistory of the present moment: the series of intellectual leaps that made conceiving a digital library possible, and, perhaps more interestingly, the speculations that have been discarded, or forgotten in the flow of technological advances that are often naively assumed to have been inevitable. I am interested, then, in the murky period where the “digital library” existed only as thought experiment in the minds (and writings) of some of the twentieth century’s most visionary theorists of communication and information.
Clifford Lynch opens his piece thusly: “The field of digital libraries has always been poorly-defined, a ‘discipline’ of amorphous borders and crossroads, but also of atavistic resonance and unreasonable inspiration.” This amorphousness, rightly or wrongly, is often assumed to be a barrier and a nuisance, and to the extent that it produces redundancies, inefficiencies, and failures of interoperability, it is. But in a way, it is highly auspicious that the field remains subject to “atavistic resonances” and “unreasonable inspiration;” that is, it still has an essentially vitality, a basic radicalism. The atavistic resonances – the echoes of the work of theorists like Paul Otlet, Vannevar Bush, and J.C.R. Licklider – can, however, be faint. My intention is to amplify some of these fading tones by paying close attention to these ideas of the past, in order to suggest, if obliquely, the range of possibilities for the future.
Though the present paper deals only with the twentieth century, a bolder path could in fact be taken. Literary speculation on access to the totality of knowledge is ancient; the earliest digital library is, perhaps, the mind of God. The encyclopédistes of the French Enlightenment were among the first to perceive knowledge as a totality, and one that can theoretically be collected and organized as such. The Faust myth provides another precedent: finding the existing store of literature insufficient to his ambitions, Faust bargains with the devil for unlimited knowledge. Even in the twentieth century, utopian and millenarian undertones – Lynch’s “unreasonable ambition” – are audible in the discourse of the future-library, as people look into the future and see fulfilled some of the grandest, and hitherto most unreasonable, desires of humankind.
The earliest thinkers on the pre-historical timeline Lynch briefly outlines are H.G. Wells and Paul Otlet. Wells, who of course is better known for his science fiction, was also a passionate advocate for the democratization of knowledge, and for the use of modern technology to organize that knowledge in hitherto inconceivable ways. He called his idea the “world brain,” explaining that:
[b]oth the assembling and the distribution of knowledge in the world at present are extremely ineffective, and thinkers of the forward-looking type whose ideas we are now considering, are beginning to realize that the most hopeful line for the development of our racial intelligence lies rather in the direction of creating a new world organ for the collection, indexing, summarizing and release of knowledge, than in any further tinkering with the highly conservative and resistant university system, local, national and traditional in texture, which already exists. These innovators, who may be dreamers today, but who hope to become very active organizers tomorrow, project a unified, if not a centralized, world organ to ‘pull the mind of the world together’, which will be not so much a rival to the universities, as a supplementary and co-ordinating addition to their educational activities – on a planetary scale. (Wells 1937)
For my purposes, it is extremely interesting that it was a science fiction writer who first speculated upon the basic architecture of modern library and information technology in the previous century. Later speculations are alternately more prescient and more bizarre in that they anchor their vision to specific arrays of technology.
Another crucial milestone was Vannevar Bush’s 1945 paper in the Atlantic Monthly, “As We May Think.” Though the ideas had been germinating for over a decade, and had been discussed in a 1939 piece for Fortune, this article was their first proper introduction to the general public (Waldrop 27). Bush’s article was written as the Second World War was winding down, and the scientific community – so integral to the war effort – was thinking about its way forward in peacetime. It was clear that the “mass of research” was accumulating faster than our ability to process, interpret, or even store. The situation struck Bush as not only unfortunate but absurd: “The summation of human experience is being expanded at a prodigious rate, and the means we use for threading through the consequent maze to the momentarily important item is the same as was used in the days of square-rigged ships.”
Bush looked to the “new and powerful instrumentalities” of the mid-twentieth century to overcome this impasse. To us, the technologies he describes seem almost comically antiquated; M. Mitchell Waldrop, in his study of the early history of the personal computer, remarks that “his desk library was still very much an analog device, grounded in the microfilm and photocell technologies of the 1930s.” His speculations on what could be done with these technologies, however, was revolutionary. His “memex” was to be a device, usable by non-experts, which would not only rapidly speed up by research by condensing the bulk of research into one easily workable device, but which would actually improve our “processes of thought.”
It would do so by making the organization of knowledge mimic the associative networks of the human mind. “With one item in its grasp,” Bush explains, “[the human mind] snaps instantly to the next that is suggested by the association of thoughts, in accordance with some intricate web of trails carried by the cells of the brain.” Likewise, on the memex, “when one of these items is in view, the other can be instantly recalled merely by tapping a button…It is exactly as though the physical item had been gathered together from widely separated sources and bound together to form a new book.” The “trails” thus formed could then be copied and transmitted. Today, we know such documents as hypertext.
Bush’s breakthrough was to conceive of a deterritorialized library, one that, more radically still, was not organized as a hierarchy but what the French theorist Gilles Deleuze would call a rhizome. This entirely new architecture of information is now familiar, but can still be pursued to more radical ends, as people in the worlds of linked data and the semantic web are currently doing. Theorists at the cutting edge of information technology might be willing to disagree with Bush’s assertion that “for mature thought there is no mechanical substitute.”
There is another concept buried in Bush’s essay that is pertinent to the present moment. He was concerned not only with the storing and networking information, but in the future of knowledge production. This would be done by a union of the vocoder and the stenotype machine – themselves still new technologies when the essay was written – to process speech into text, a technology that in 2010 is still fledgling. The idea of using the “memex” to actually produce knowledge makes it more like a computer than a mere microfilm reader; his idea is an ancestor of the contemporary notion of “born digital” material. And his phrase “the original copy,” be it an unfortunate malapropism or a daring solecism, is prescient: he can conceive of information as always-already a copy of itself, a notion which began with the printing press but has hypertrophied in the “information age.”
Another man, J.C.R. Licklider, took Bush’s ideas even further. Licklider was a man of immense talents, with important contributions to psychology, psycho-acoustics, computer science, and other fields. According to the subtitle of M. Mitchell Waldrop’s biography of him, he also presided over “the revolution that made computing personal.” His only book – written on commission from the Council on Library Resources – was a fairly slim volume called Libraries of the Future, which for Clifford Lynch marks “one of the transition points between pre-history and the actual history of digital libraries.” M. Mitchell Waldrop calls it “one of the founding documents of what is now called digital library research” (Waldrop 185).
It is difficult to appreciate the foresight of Licklider’s imaginary computer network, as we have become so deeply accustomed to the existing Internet that it takes some imagination to consider any other way of networking, publishing and transmitting information electronically. (Lynch reminds us that “very substantial digital library systems were developed prior to the World Wide Web.”) Even if the specifics of his model are often no longer relevant, his project is worth studying precisely because the horizon of his imagination is not constrained by an existing precedent. Written in that brief epoch where the future dominance of the computer over all spheres of life could be predicted but only dimly imagined, it is able to imagine the eclipse of the book and the traditional library but is able only to speculate on the future by extrapolating from technological trends and potentialities. It indeed holds a unique, transitional, position in the history of library science.
Though his model of “the computer” was a bulky, punch card operated machine (a set-up he would later help abolish), Licklider rightly predicted that within a few decades the storage and processing capacities of computers would grow exponentially:
Thus in the present century, we may be technically capable of processing the entire body of knowledge in almost any way we can describe; possibly in ten years and probably within twenty, we shall be able to command machines to ‘mull over’ separate subfields of the corpus and organize them for our use… (Licklider 7)
This reads much like Bush, as does his “neurophysiological” approach to organization:
…complex arrangements of neuronal elements and processes accept diverse stimuli, including spoken and printed sentences, and somehow process and store them in ways that support inferences and the answering of questions… (Licklider 24)
The final clause contains hints that Licklider’s thought differs substantially from Bush’s. Licklider was, above all, a theorist of “human-machine symbiosis” (Waldrop 4). For him, the neural metaphor did not describe only a particular kind of information architecture, but a direct analogy with the process of thought itself. He imagined documents doing something akin to reading themselves (Licklider 6). This will sound familiar to people who have heard of Linked Data, or Tim Berners-Lee’s idea of the Semantic Web, where all information on the web is encoded with rich layers of universally-readable metadata, making the internet a true “web” of semantically meaningful information rather than inert text and files requiring laborious processing by human beings. Likewise, Licklider’s network would be capable of “detecting apparent duplications and complementations in related fields, and noting similarities of form or structure in models or other information structures employed in substantively different areas.” This functionality is also present in Bush’s memex, but here, the process happens automatically.
Licklider envisions information being organized in an entirely new way. Early on, he offers a radical critique of the book, one that still has a charge, since even in 2010 we struggle to suppress our naïve fixation with it. The fixation extends beyond the simple nostalgia for the printed page in an era of e-readers and online magazines, to the way digital libraries conceive of information. Licklider asks us to forget the “schema” of a library based on “books on shelves,” and laments the “passiveness of the printed page” (Licklider 4ff).
Forty-five years later, most digital libraries still use the individual text as the basic unit, with the discrete file or package of files replacing the page or the binding of a book. We have not attempted to actively critique the dominant taxonomy of the archive, which organizes texts, considered as wholes, by their “authors.” Licklider understands that with computers, we have the power to transcend this humanist paradigm and develop entirely novel methods of organizing and interpreting information at much more granular levels. In other words, we can catalog not just books but knowledge itself.
In some ways, this is hopelessly utopian, and immediately invites serious challenges. What is the basic unit of knowledge? Licklider suggests the “idea,” conceding that this is a “discouragingly nebulous” term. If the idea is an algorithm, this is perfectly reasonable, but his ambitions are much greater. He wants to, somehow, distill the ideas inside a text from the “words and sentences themselves,” something that I, coming from a literature background, find entirely absurd. But I concede that simple declarative sentences, at least, can be partitioned into component parts, and approached heuristically in ways that minimize ambiguity. Licklider takes the idea, presumably from Noam Chomsky, a colleague at MIT, that languages can be generated from a simple system of rules. Licklider intends to reverse engineer this, condensing natural prose into something he calls “unambiguous English”: an oxymoron if ever there was one. This is something like lossy compression on the level of language.
Though I feel that these problems should not be ignored, I admit that Licklider is perfectly aware of them:
…on the other hand, no one seems likely to design or invent a formal system capable of automating sophisticated language behavior. The best approach, therefore, seems to us to be somewhere between the extremes – to call for a formal base plus an overlay of experience gained in interaction with the cooperative verbal community.
These pragmatic tradeoffs are endemic to computer science as a discipline, and the fact that he seems to anticipate the rise of wiki systems and crowdsourcing in his last comment is remarkable.
His book contains another dichotomy worth talking about, that between “syntactic” and “semantic” understanding. His misgivings about the latter have been shown. But what if syntactic analysis could itself yield meaningful results? Can data mining be complimented by concept mining?
In a talk the literary critic Peter de Bolla gave in New York several years ago, he discussed the evolution of the concept of human rights in the Eighteenth Century based entirely on an ingenious, automated trek through Eighteenth Century Collections Online (ECCO), a principal online database for scholars of the eighteenth century. De Bolla did searches for words occurring in proximity, whether in the same sentence, clause, or within the distance of a certain amount of words. For instance, he compared the frequency, decade by decade, with which the word “rights” appears in proximity to “personal,” as opposed to “property.” Over the course of his talk, this kind of archival researching seemed more and more convincing, if incomplete. The audience, mostly students and faculty accustomed to belaboring over single words and the fine details of individual sentences, responded with a mixture of horror and fascination.
Another profound consequence of Licklider’s paradigm would be to undermine the distinction between “library” and “nature” research, by forming a collection in which both are a kind of “acquisition.” This would be as close to a total compendium of knowledge as I am capable of conceiving. By eliminating a distinction between the “lab” and the “library,” Licklider’s habit of speaking of the “store of knowledge,” “the corpus,” rather than “the store of books” would be justified. His vision is universal; most digital libraries are specialist. Even the grandest projects, like Google Books, retain the book, the printed page, simply in a different presentation. Licklider’s model is more like the Internet than a delineated archive; “perhaps it will be best to call it simply a ‘network,’” he says. This was hardly dead metaphor at the time of writing, as it is now: the Oxford English Dictionary’s first recorded use of the term in this sense is from 1962, while Licklider’s research for this study took place between 1961 and 63. Certain universities are requiring their researchers to submit and store raw data, but as of yet nothing like what Licklider has proposed has, or could, be realized. The current publishing models are simply too entrenched. But I do not think this requires us to stop being “unreasonably inspired,” in Lynch’s phrase.
There is much in Licklider’s book that does not differ in any meaningful sense from science fiction, in that its technologies are purely hypothetical. General knowledge of computing at the time was so remedial that he needed to provide a footnote when he used the word “software,” for instance – even for a quite advanced audience. As in much science fiction, those predictions which have come to pass seem uncanny in their apparently prophetic brilliance, while those that haven’t seem even more risibly absurd and far-fetched that they must have to their original audiences (e.g., in an example on retrieving paper copies of digital documents: “Unfortunately, my office is not located near a pneumatic-tube station”). This reaction is due in part to our instinctively teleological view of history, which maintains that events unfold in a manner that is deterministic if unpredictable.
From this perspective, theory and speculation exist as spectators, as gamblers at the craps table of history. In fact, developments in technology and elsewhere are shaped profoundly by the intellectual milieu in which they arise. The visions of Licklider et al are thus partially constitutive of later trends in information technology, and this is why a critical understanding of the theory of digital libraries extends back before their actual emergence. And only by walking the road back, and examining paths not taken, can we understand the full scope of what lay ahead of us.
Bush, Vannever. “As We May Think.” The Atlantic Monthly.July 1945: 101-108. Retrieved from <http://www.theatlantic.com/magazine/archive/1969/12/as-we-may-think/3881/>.
Licklider, J.C.R. Libraries of the Future. Cambridge, MA: The M.I.T. Press, 1965.
Lynch, Clifford. “Where Do We Go From Here? The Next Decade for Digital Libraries.” D-Lib Magazine. July/August 2005.
Waldrop, M. Mitchell. The Dream Machine: J.C.R. Licklider and the Revolution That Made Computing Personal. New York: Viking, 2001.
Wells, H.G. “World Brain: The Idea of a Permanent World Encyclopaedia.” <https://sherlock.ischool.berkeley.edu/wells/world_brain.html>.
Many thanks to Clifford Lynch for helping direct me via e-mail to some of these and other resources in the early stages of this project.
 I refer here to the incessant arguments concerning, for example, metadata standards, file formats, et cetera, which continue to bedevil digital librarians. These are far from unimportant, but here, as the cliché goes, I am more concerned with the forest than with the trees.
 Cf. Michel Foucault, “What is an Author?”