Florian Cramer on Wed, 19 Dec 2007 03:24:50 +0100 (CET) |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
<nettime> Critique of the "Semantic Web" |
[This is a lecture manuscript written for the "Quaero Forum" on the politics and culture of search engines at Jan van Eyck Academy Maastricht, 9/2007 - it's still a bit rough; thanks to Felix Stalder for useful corrections and his suggestion to post it here. -F] Animals that Belong to the Emperor Failing universal classification schemes from Aristotle to the Semantic Web Quaero Forum, Maastricht Florian Cramer 29/9/2007 The weapon with which state-subsidized European search technology projects allegedly intend to beat Google is semantic information processing: pattern recognition in media file in the French Quaero project, Semantic Web technology in Theseus, its German off-spring. Originally, Quaero was a French-German collaboration, funded by both governments, until the German Theseus project split off from Quaero to pursue its own vision of future Web search. This vision is twofold, involving a number of classic holy grails of computer science: 1. to provide search on the basis of Semantic Web meta tags, 2. to have software recognize the contents of web pages in order to automatically apply those tags. While the second point is utopian enough and something that Artificial Intelligence research failed to achieve for decades, even the first point, the universal nomenclature of semantic tagging known as the Semantic Web, is doomed to fail by any critical standard of cultural reflection. The reason why the Theseus project nevertheless receives high public funding is economic and political, but, with its stated goals, hardly related to anything resembling a working web search engine. Founded and pursued by Tim Berners-Lee, the original architect of the World Wide Web, the "Semantic Web" is a term and project that is not only prone to major confusion, but also emblematic of how the alienation between engineering and humanities goes both ways: shockingly naive and simplistic understandings of cultural concepts among the former, and a complete misunderstanding of the "Semantic Web" among the latter because its terminology of "semantics" and "ontologies" is plainly weird or mystifying outside computer science. In 2004, prior to Quaero and Theseus, the German federal government subsidized research on the Semantic Web with 13.7 million Euro, reasoning that as a "semantic technology", it would allow people to phrase search terms as normal questions, thus giving computer illiterates easier access to the Internet. But the Semantic Web is actually not about this at all; the funding was, in another words, a 13.7 million Euro misunderstanding. {1} Natural language question parsing indeed is another holy grail of Artificial Intelligence research, parodied by Weizenbaum's "Eliza", and tried by Web search engines from "Ask Jeeves" - which renamed itself Ask.com after deemphasizing its original concept - to "Powerset", recently brought up by Geert Lovink on the Nettime mailing list.{2} Full semantic natural language understanding falls into the previously mentioned second category, the nut that "hard" A.I. research has claimed over decades to have almost, but just not quite cracked, while critical A.I. researchers like Luc Steels claim that it cannot be reached with current computer architectures regardless their speed. In search engine reality, natural language search systems boil down to nothing more but inefficient interface wrappers around Boolean search expressions with their logical AND, OR and NOT operators. The Semantic Web does not fall into this trap because it does not involve any automatic interpretation of meaning. Instead, Berners-Lee insists that his project "does not imply some magical artificial intelligence which allows machines to comprehend human mumblings"{3} - in sharp contradiction to the stated goal of the Theseus project. Instead, he conceives of the Semantic Web as a universal, unified markup or "meta tagging" system: "Instead of asking machines to understand people's language, it involves asking people to make the extra effort". This effort, semantic tagging, is a well-established and popular device on sites like the photo sharing platform flickr.com, the news aggregator digg.com and the bookmarking site del.icio.us. It simply means that users attach keywords to texts, images and other resources, making the information searchable by keywords or particular keyword combinations. On Flickr, for example, the search keyword combination "birthday", "children" and "clown" results in a list of pictures of clowns appearing at children's birthday parties - not because of any Quaero-style computer recognition of the image contents and Theseus-style automatic keyword mapping, but because the keywords had been manually assigned to these images by Flickr users. While such manual tagging also lies at the heart of the Semantic Web, systems like those of flickr, digg and deli.icio.us are nevertheless flawed from its perspective because they involve no unified standard or nomenclature for tagging. If, for example, a user tagged an image with the word "kids" instead of "children", it will not turn up in the search result. On top of that, the tags lack abstraction and universality: children for example could be classified as a subset of humans, humans as a subset of mammals; birthdays as a subset of celebrations etc. With such a classification, pictures marked up with "birthday" and "children" could also be found in a more general search for pictures of human celebrations. For this reason, unsystematic, ad-hoc, user-generated and site-specific tagging systems like those on Flickr are referred to as "folksonomies".{4} The Semantic Web promises to overcome folksonomies with one, unified and standardized keyword tagging system that can applied to anything. In other words, it is a universal classificatory description system and grand unified hierarchical meta tag tree. In line with computer science terminology, but sounding mysterious and idiosyncratic anyone else, Berners-Lee calls this classificatory system an "ontology", making the project particularly confusing for people with backgrounds in philosophy and humanities - because what he and computer science call "ontology" is, outside such jargon and in a more common sense language, not an ontology, but a cosmology. Just as cosmologies are by no means new, so are universal classification and tagging systems of all things in the world. In his essay and short-story "The Analytical Language of John Wilkins", Jorge Luis Borges writes about the English 17th century scholar that "He divided the universe in forty categories or classes, these being further subdivided into differences, which was then subdivided into species. He assigned to each class a monosyllable of two letters; to each difference, a consonant; to each species, a vowel. For example: de, which means an element; deb, the first of the elements, fire; deba, a part of the element fire, a flame." [...] Similar classification schemes have been designed throughout the Middle Ages and Renaissance among others by Ramon Llull, Giordano Bruno, the encyclopedist Johann Heinrich Alsted and the theosophist Jan Amos Comenius, scholars in whose tradition Wilkins, a founding member of the "Invisible College", works and thinks. Before Diderot's and d'Alembert's revolutionary, heretic device of arbitrarily structuring human knowledge by the alphabet, encyclopedias has developed increasingly complex tree-like classification systems of all things in the world they described.{5} The cosmology-called-ontology of the Semantic Web is not only similar, but precisely the same. Medieval and Renaissance classificatory cosmologies could only work on the basis of a stable assumption of what the world is and how it is structured: for example, by the four directions, the four seasons, the four temperaments, the seven virtues and seven vices, etc. They were, in other words, still embedded into the paradigm of Medieval scholastic science that in turn had been derived from Aristotle's system of categories and its classification of beings into genres and species. The Semantic Web is, bluntly said, nothing else but technocratic neo-scholasticism based on a naive if not dangerous belief that the world can be described according to a single and universally valid viewpoint; in other words, a blatant example of cybernetic control ideology and engineering blindness to ambiguity and cultural issues. Although no Semantic Web existed in the 1940s, Borges' essay hits the nail of the issue. One is tempted to replace the name John Wilkins with Tim Berners-Lee when Borges reviews the former's categories and finds that stones, for example, are absurdly classified as either common, or modic, precious, transparent and insoluble, or that beauty is assigned to a "living brood fish". He concludes that "These ambiguities, redundancies and deficiencies remind us of those which doctor Franz Kuhn attributes to a certain Chinese encyclopaedia entitled 'Celestial Empire of benevolent Knowledge'. In its remote pages it is written that the animals are divided into: (a) belonging to the emperor, (b) embalmed, (c) tame, (d) sucking pigs, (e) sirens, (f) fabulous, (g) stray dogs, (h) included in the present classification, (i) frenzied, (j) innumerable, (k) drawn with a very fine camelhair brush, (l) et cetera, (m) having just broken the water pitcher, (n) that from a long way off look like flies." Although this is Borges' own fiction, it nevertheless reveals the arbitrariness of categories and classifications. It also had a thorough impact as a philosophical critique. Michel Foucault's "The Order of Things" begins with a discussion of the above list of animals, which, as he admitted elsewhere, "shattered all the familiar landmarks" of his thought, opening his eyes on how the order of knowledge is culturally constructed and may be conceived differently. To understand Foucault's discourse theory, it practically suffices to read Borges' "Ficciones". The order of things, and unified classification schemes, do not just break down in fiction. Sticking to the example of animals, it is obvious how Aristotelian philosophy continues to exist today, in the notion of gender and species, and even more questionably in the categorization of humans into biological races. But it does not even even work in biology itself. The platypus, an Australian animal that is a breastfeeding mammal, but it lays eggs, lives in the water and has a beak like a bird, famously defies the classifications that historically go back to Aristotle's "Zoology". If the platypus breaks genre and species classification, where would it fit the Semantic Web? In his book "Kant and the Platypus", Umberto Eco points out how the animal marks the difference between scholastic and empirical science.{6} A bit confusingly, he differentiates "cultural cases" - that means categorically defined phenomena - from "empirical cases", i.e. phenomena that are observed instead of predefined. "To be recognized as such," Eco states, cultural cases "need reference to a framework of cultural norms" (Eco 1997, p. 139). For Eco as a semiotician, this means that Being, or existence, is the frontier that systematic science cannot conquer - and this is what, in a philosophical sense, ontology means. The innovation of modern science since Galileo, Newton and Descartes is that it operates without the reference to those norms. When Diderot and d'Alembert abandoned the old classificatory order of knowledge in encyclopedias and replaced them with a non-classificatory, non-systematic alphabetic order, they precisely followed the empirical paradigm, taking phenomena as they occurred and not as they fit. In order to be a thoroughly critical investigation and abandon preconceptions, science gave up "Semantic Web"-like schemes. Returning to Internet folksonomies, a better example than the Platypus was brought up in a Web forum of the German computer news site heise.de. Discussing the Semantic Web and its classification scheme, an anonymous poster brought up the hypothetical example "A Muslim is a potential terrorist" in order to show that a unified semantic "ontology"/cosmology cannot be built. This example scratches only the surface of the pending cultural problems, since not the empirical cases like the Platypus, but cultural ones bear the real dynamite. It sheds a dubious light on computer linguists involved in the project if they don't even seem to have done their homework on Saussure and the arbitrariness, i.e. cultural dynamics, of the signifier in relation to the signified. The Semantic Web, and any search engine or database built upon it, rests on the illusion that an unambiguous assessment of the world would be even theoretically possible. Beyond cosmology falsely named ontology, it is metaphysics disguised as physics. On a more practical (but nonetheless cultural) level, the Semantic Web relies on a clean room illusion of a culture where semantic tags wouldn't simply be used for spamming and search engine manipulation which are already common enough for Google and other search engines to ignore meta tags embedded into web pages. And while Berners-Lee is a realist enough to state that meta tagging cannot be done by bots like those dreamed up by the Theseus project, his Semantic Web implies a complexity nightmare of meta information overtaking information, with each piece of information creating at least twice as much work for its semantic markup than for its creation proper, comparable to a library whose the catalogs outnumber the books they reference. "Semantics" and "ontology" are useful terms because they reference what computers, as purely syntactical machines, cannot process, and which can't be mapped into computer data structures except in subjective, diverse, culturally controversial and folksonomic ways. The creators of the so-called "Semantic Web" and "next-generation" search engines might learn from Borges who concludes: "I have registered the arbitrarities of Wilkins, [and] of the unknown (or false) Chinese encyclopaedia writer [...]; it is clear that there is no classification of the Universe not being arbitrary and full of conjectures. The reason for this is very simple: we do not know what thing the universe is." __________________________________________________________________ Footnotes: {1} User comment on heise.de: "Ich hab irgendwie den Eindruck dass unser Bundesforschungsministerium in der irrigen Annahme ist, das 13 Millionen Euro eine Software schaffen die es jedem Computer-Analphabeten ermöglicht, ganz ohne den `Extra Effort' seine `Pisa-Versagen vermarkten und als hochinnovative Rettung des Wissens- und Wirtschaftsstandorts Deutschland (wers glaubt ... ), {2} Geert Lovink, search engines on the move, 19/9/2007, http://www.nettime.org/Lists-Archives/nettime-l-0709/msg00028.html {3} Quoted after: An interview with Tim Berners-Lee, http://www.simple-talk.com/content/print.aspx?article=321 {4} "Folksonomy (also known as collaborative tagging , social classification, social indexing, social tagging, and other names) is the practice and method of collaboratively creating and managing tags to annotate and categorize content. In contrast to traditional subject indexing, metadata is not only generated by experts but also by creators and consumers of the content. Usually, freely chosen keywords are used instead of a controlled vocabulary", Wikipedia definition as of 18/12/2007, http://en.wikipedia.org/w/index.php?title=Folksonomy {5} As a remnant of this tradition, the Diderot/d'Alembert encyclopedia still contains such a knowledge tree. {6} Eco, Kant and the Platypus, 1997, p. 68 -- http://cramer.plaintext.cc:70 gopher://cramer.plaintext.cc # distributed via <nettime>: no commercial use without permission # <nettime> is a moderated mailing list for net criticism, # collaborative text filtering and cultural politics of the nets # more info: http://mail.kein.org/mailman/listinfo/nettime-l # archive: http://www.nettime.org contact: [email protected]