Patrice Riemens on Wed, 11 Mar 2009 14:01:39 -0400 (EDT) |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
<nettime> Ippolita Colective: The Dark Side of Google (Chapter 1, first part) |
Hi again Nettimers, A few additions to my previous mail. This is very much a translation in progress, in need of further clearing-up. To start with the title: since I translate in first instance from the French version (published 2008 by Payot, Paris, transl. Maxime Rovere) I got carried away by 'La Face cachee ...', but Ippolita's own English translation of the title is 'The Dark Side ...'). As in Dark Side of the Moon of course, though I prefer the Italian title "Umbre e Lucci..." - Lights and Shadows (vv) of Google... Further cabalistic signs: (...) between brackets as in text {...} not in text, my addition (suggested) /.../ in text, but deletion suggested [...] comments, contentious, or notes -TR yours truly (translator) Enjoy! patrizio and Diiiinooos! NB Notes will come later. Most are references, but some are quite substantive. ......................................................................... NB this book and translation are published under Creative Commons license 2.0 (Attribution, Non Commercial, Share Alike). Commercial distribution requires the authorisation of the copyright holder: Ippolita Collective and Feltrinelli Editore, Milano (.it) Ippolita Collective The Dark Side of Google (continued) Chapter 1. The History of Search (Engines) On searches and engines ... Search engines today come up as websites enabling one to identify and retrieve information. The mistaken idea according to which the Web and the Internet are one and the same thing is harboured by the majority of users because the web represents for them the simplest {, easiest} and most immediate access to the Internet. But the Net is in reality far more complex, heterogeneous, and diverse than the Web: it includes chat functions, newsgroups, e-mail, and all possible other features individuals wish 'to put on-line', and this no matter of under what format these informations take shape. To put it differently, the Net is not a static, but a dynamic whole; resources and their mutual interconnections are changing constantly, in a kind of birth transformation and death cycle. The physical connectivity vectors to these resources are also undergoing constant change. Once upon a time there was the modem connected by coper phone wires, and today we live in a world of broadband and optic fiber. And the individuals who shape the Net by projecting their digital alter ego unto it are also perpetually mutating, at least as long as they stay {physically} alive. Hence, the Net is not the Web; it is a co-evolutionary dynamic built up of the complex interaction between various types of engines ['machines'?]: mechanical machines ({personal}computers, 'pipes', servers, /modems/, etc. {aka 'hardware'}), biological machines (human individuals {aka 'wetware'}, and signifying machines (shared resources {aka 'software'}). As we shift through the dark mass of information that is the Net, we need to realise something fundamental, yet uncanny at the same time: the history of search engines is by way much older than the history of the Web. The Web as we know it is the brainchild of Tim Bernes-Lee, Robert Caillau (*N1), and other European and US scientists. They created the first 'browser' between 1989 and 1991 at the CERN Laboratory in Geneva, together with the 'http' protocol {Hyper Text Transfer Protocol}, and the 'HTML' language {Hyper Text Mark-up Language} for writing and visualising hyper-textual documents, that is documents including 'links' (internal to the document itself, or external, linking to other, separate documents). This new technology represented a marked advance within the Internet, itself a merger between various US academic and military projects. As the web was still being incubated amongst a number of laboratories and universities across the world, search engines had been already in existence for many years as indexation and retrieval services to the information extant on the Internet. Obviously, the first search engines could not be looked into on the {not yet existing} Web: they were rough and straightforward programmes one had to configure and install on one's own computer. These instruments did the indexation of resources by way of protocols such as 'FTP' {File Transfer Protocol} for file-sharing, and 'Gopher' (an early rival of emergent 'http'), and other systems which have gone out of use today. 1994 saw the 'Webcrawler'(*N2) come into operation as a search engine solely devised for the Web. It was an experimental product developed at the University of Washington. The innovations this search engines was bringing along were truly extraordinary. Besides functioning as a web-site and making it possible to do 'full text' searches (*N3), it also included a tool, the 'spider' that catalogued web-pages automatically. 'Spider' is a software programme fulfilling two functions: it memorises the informations that are on the web-pages it encounters as it navigates through the Web, and it make these accessible to the users of the search engine. (More about this will be discussed details in the next chapters). As unbelievably innovative as it was in its time, Webcrawler was only able to return simple lists of web addresses as search result together with the mere headline title of the web pages it listed. In the last months of the year 1994, a new search engine, Lycos, came up that was able to index in a very short time 90% of the pages that were then extant on the World Wide Web (ca. 10 million in all). Lycos' principal innovation was to do without 'full text' systems, and analyse only the first 20 lines of the pages it indexed. It allowed Lycos to give as search result a short synopsis of these pages, abstracted from these first 20 lines. It was with Excite, coming up in December 1995, that search results for the first time gave a ranking to web pages in accordance with their importance. Introducing an evaluation system that assigned 'weight' to a web page constituted an first, rudimentary, step towards a thematic catalogue: it would at last put an end to interminable lists of disorderly search results. It made a kind of 'initial checking' possible of a 'directory' of web sites, comparable to a classic library system, with an indexation according to subject, language, etc. - but then for web resources. Apart from that, Excite entered history for another reason: it was the first search engine equipped with tools that were explicitly geared towards a commercial activity. After having acquired Webcrawler, Excite proposed its users personalisation facilities and free mail-boxes, becoming in less than two years one of the Web's most popular portal (1997). Yet, Excite dropped its original business model not long after that, and chose to utilise other firms' search technologies, Google being foremost among them today (*N4). This bird's eye view of Google's precursors would not be complete without mentioning what by 1997 had become the best and most popular search engine of all: AltaVista. AltaVista ('the view from above') was based on the findings of a DEC (Digital Equipment Corporation), Palo Alto, California, research group which had succeeded in 1995 to stock all the words in a {random} Internet HTML page, in a way precise enough to make a very refined search possible. DEC had granted AltaVista the further development of the first data base that could be directly looked up from the World Wide Web. AltaVista's in-house genius was Louis Monier (*N5). Louis Monier clustered rows of computers together, made use of the latest hardware, and worked with the best technologists on the market, to transform his baby into the most common and best loved search engine of its days. AltaVista was also the Net's first multi-lingual search engine, and the first with a technology able to include texts in Chinese, Japanese, and Korean in its searches. It also introduced the 'Babel Fish' automatic translation system, which is still in use today. By the time of its collapse in 1997 (*N6), AltaVista served 25 million queries a day and received sponsor funding to the tune of US$ 50m a year. It provided a search facility to the users of Yahoo!'s portal, still today the principal competitor of Google in the Web sphere. The Birth of Google. Once upon a time there was a garage, then came the University ... The name Google stems from "googol", a mathematical term to describe a 1 followed by 100 noughts. According to the legend, this was the number of web pages Larry Page and Sergei Brin dreamed of indexing with their new search engine. Both met in 1995 at Stanford, when Larry Page, then aged 24 and graduate of the University of Michigan, came to Stanford intent on enrolling for a doctorate in computer sciences. Sergei Brin was one of the students assigned to guide newcomers around the campus. Stanford was (and is) renowned as the place to develop highly innovative technological projects. This Californian university is not only a household name for its cutting edge research laboratories, it also enjoys near-organic links with with companies in the information technology (IT) sector, and with keen-eyed venture capitalists ready to sink consequent amounts of cash in the most promising university research. Brin and Page turned out to be both fascinated by the mind boggling fast growth of the Web, and with the concomitant problems related to research and information management. They jointly went for the 'Backrub' project, which got its name from the 'back links' it was meant to detect and map on a given web site. Backrub was re-named Google when it got its own web page in 1997. The fundamental innovation Google introduced in the search process was to reverse the page indexation procedure: it did no longer show sites according to their degree of 'proximity' with regard to the query, but showed them in a 'correct' order, that is conforming to the users's expectations. The first link provided should then correspond to the 'exact' answer to the question asked, the following ones slowly receding from the core of the search question. (*N7) It is in this perspective that the famous "I am Feeling Lucky" option came about. Clicking it opens the very first link in the Google results, and is profiled as the indisputably 'right' one. The algorithm that calculates the importance of a web page, known as PageRank[TM] and allegedly 'invented' by Larry Page, is actually based on statistics of the begin of the 19th Century, and especially the mathematical works of Andrej Andrejevich Markov, who calculated the relative respective weight of nodes within a network (*N8) At its beginnings Google was only an academic scientific project, where the weight evaluation system was mostly dependant upon the judgments of 'referees' operating within the format of 'peer review'. In theory the method presenting the best guarantees of objectivity s called the 'double blind' reading, as is habitually applied to articles before they are accepted for publication in a scientific review. A contribution is then submitted to two readers who are reputed scholars in their field; they are not not know the identity of the article's author (so as not to influence their judgment). The second 'blind' moment is when the article is being reviewed for publication, and the reviewer is deemed not to know who the two referees have been. To sum up, the more positively a scientific article has been received by fellow scientists (who are supposed to be of an independent mind), the more the article is deemed to be important and worth consideration. Page adopts this approach in his research domain, and applies the theory that states that the number of links to a web page is a way to evaluate the value of this page, and in a certain sense, its quality. We will later go into detail as to how this passage from 'quantity' of returned information correlates with the 'quality' of the results that are expected by the user (*N9). But this criterion is not sufficient in itself to establish quality, since links are not equal and do not represent the same value; or to be more precise: the static value of a link needs to be correlated with the dynamic value of its trajectory, since the Web is an environment (mathematically speaking, a graph) where not all trajectories have the same value: there are varying 'trajectory values' depending upon the 'weight' of the various nodes. And actually, to pursue further the metaphor relating to the scientific/ academic review process of scientific articles, not all reviews carry the same weight. A positive advice by reviewers less prestigious, or worse, by reviewers not very much liked within the scientific community, can be detrimental to the article being submitted as too many insufficiently authoritative reviews undermine the credibility of a publication. Hence, sites that are linked by sites that are themselves extensively refered to, are according to Page more important than others {more weakly referenced ones}. In this way, a trajectory (i.e. a link) that originates from a very popular site carries much more weight than one coming from a relatively unknown page. This is how a link from page A to a page B is interpreted in the way of a scientific referral whose weight is directly in proportion to the reputation of the reviewer furnishing that link (it should be noted, however, that Brin & Page explicitly talk in terms of 'vote' and 'democracy' in this regard). The authority of the scientific reviewer becomes the measure of a site's reputation. Google's web pages evaluation, known as PageRanking[TM], is thus elaborated on the basis of a 'public' referral system which is {allegedly} the equivalent of the way the 'scientific community' (*N10) is operating, only not limited to scientists but including all the surfers of the World Wide Web. Today, the organisational workings of the 'scientific community' and the issue of data-referencing in general have become crucial problems: in a context of 'information overflow' (*N11), especially on the Web, it has become increasingly difficult to estimate not only the importance of information, but also its trustworthiness, the more so since the very principle of peer review has been questioned by scientists themselves in the meanwhile (*N12). Amongst the more interesting alternative options are rankings based on the number of publications [???], networks of publications available under copyleft, and 'open access' projects, which include also research in the domain of the humanities, like Hyperjournal (*N13). This was the background when Page launched his 'spider' web-exploring programme in March 1996 in order to test the page ranking algorithm he has developed. The spider-based search engine of the two talented Stanford students became an instant hit amongst their peers and {more senior} researchers alike, gaining a wider and extraordinary popularity in the process. However, the bandwidth usage generated by the search engine quickly became a headache for Stanford's system administrators. Also, owners of indexed sites had some qualms about the intellectual property rights pertaining to their content, and were besides not best pleased by the fact that the Google's ranking system ran roughshod of more established evaluation systems, such as prizes and honorary mentions in favor of the number and quality of links (i.e. popularity) a page was able to garner around it: Google considers only the relational economy {of sites} expressed in terms of links, and nothing else. "Spider couldn't care less about the content of a page". Hence, the value of a search result must be based on the weight of the links between two pages, and not on some arbitrary classification enforced by the terms of the search. This breakthrough turned out to be the key to Google's subsequent success in the years to come: search results would in future no longer be fixed once and for all, but would vary dynamically in accordance with the page's position within the Web as a whole. Google.com or how ads (discreetly) entered the pages... Page and Brin went on developing and testing Google for eigthteen months, making use of free tools provided by the Free and Open Source Software (F/OSS) community (*N14), and of the GNU/Linux operating system. This enabled them to build up a system that is both modulable and scalable to an extremely large extent, which can be augmented and tweaked even while being fully in use. This modular structure constitutes to-day the basis of Google's data center, the 'Googleplex'(*N15), and makes possible the maintenance, upgrade, changes and addition of features and software, without the need to ever interrupt the service. By the middle of 1998, Google attended to something like 10.000 queries a day, and the in-house array of servers Page and Brin had piled up in their rented room was on the verge of collapse. Finding funds vastly in excess to what usually is allocated to academic research became therefore a matter of some urgency. The story has it that Google's exit from the university is due to a chance encounter with Andy Bechtolstein, one of the founders of Sun Microsystems and a talented old hand in the realm of IT. He became Google's maiden investor to the tune of one lakh US Dollars (100.000 in Indian English -TR ;-) The birth of Google as a commercial enterprise went together with its first hires, needed for further developments and maintenance of the data center. Among them was Greg Silverstein, now the CTO. Right from the beginnings, Google's data center took the shape of a starkly redundant system, where data are copied and stored in several places, so as to minimize any risk of data loss (a move that amounts to print the currency of search). Its most important feature is the possibility to add or to remove modules at any given time so as to boost the efficiency of the system. An other major trump card, as befit university hackers, was Brin's and Page's habit to recycle and tweak second hand hardware and make extensive use of F/OSS. Their limited financial resources enabled them to evolve what would become the core of their business model: nimble modularity at all levels. The Google-ranger[???]'s modularity means that it can scale up and down according to need and availability. No need to reprogram the system when new resources, whether hard- wet- or software, is added: the highly dynamic structure integrates the new modules, even if they are stands alone. Google's formally opened its offices on September 7, 1998 in Menlo Park, California. As the story goes, Larry Brin opened the doors with a remote, since the offices were located in a garage, a friend of theirs had subleted to the firm. A Spartan office-cum-garage then, but one featuring some not to be spurned comfort components: a washing machine, a dryer, and a spa. Right from start, Google's company philosophy is about making employees' life very cushy indeed. By January 1999, Google left the Stanford Campus for good. The official statement reads: "Google research project has now become Google Inc. Our aim is to give to the world searches of a far higher quality than what exist today, and going for a private company appears to be the best avenue to achieve this ambition. We have started to hire people and to configure more servers in order to make our system scalable (we are ordering servers 21 pieces at a time!). We have also started to launch our spider more frequently, and our results are now not only as fast, they have also a much better actualisation rate. We employ the most talented people, and through them we obtain the latest and most performing Web technologies". Brin and Page then went on for a few more lines to talk about the ten best reasons to come work for Google, quoting tech features, stock options, free drinks and snacks, and the satisfaction coming from millions of people "going to use and enjoy your software". The years 1998 and 1999 would see all search engines and other popular sites world-wide in the grip of the 'portal syndrome', a narrow obsession with developing sites that would attract and retain visitors at all costs on the site by providing ever more services, ads, and personalisation gizmo's. Google contrariwise remained the only web instrument without ads and additional features. It was to remain a search engine pure and simple, but for that also the best, the fastest, and the one without commercial ties-up whatsoever. But the firm could not survive purely on the money given by Bechtolsheim without generating any substantial profit while at the same time pursuing its research on identifying and organising information. Displaying a remarkable aptitude at talking the language of high finance, while constantly emphasising their commitment to research, Brin and Page then managed to reach an agreement with California's two topmost venture capital firms, which astonishingly assented in co-financing together one and the same company. A totally unique occurrence, seeing two giant venture capital institutions agreeing to share risks and profits of a single business proposition. On June 7, 1999, Google was able to announce that Sequoia Capital and Kleiner Perkins Caufield Byers had granted it US$ 2,5 crore in finance capital [N*16] (1 crore = 10 million, in Indian English -TR ;-). While one PhD thesis after the other saw the light at Google Inc., its two researchers-CEOs were looking for avenues to commercialise one way or the other the mass of indexed data. As a start they tried to sell their search service to portals by profiling themselves as OEM (Original Equipment Manufacturer [*N17], [I leave the original authors' explanation out, since it's rather confused - in the French version at least] but this was not very successful. But on the other hand, the business model that appeared to be more compatible to the new firm was one of direct advertisement, integrated within the search engine itself, and working by way of doing the count of the number of visitors who access the sites through commercial advertising links. This business model, called CPT, Cost per Thousand Impressions [*N18], has a structure that is as little intrusive as possible for the user. It is not based on flashy advertisement banners, but relies on discreet, yet very carefully selected links that appear above the search results. As these links are in a different font and color than the search results proper, they tend not to be perceived as too disturbing to the user's search activities. Self-Service Ads, or Beyond the Dot Com Crash... A business model based on simple sponsored links appearing alongside search results does not make very much sense in terms of profit generation: at this stage, Google's long term commercial strategy was in need of a qualitative jump. So the presidents came together in search commercially more promising solutions and came across Goto, a company founded by Bill Gross (*N19), now owned by Ouverture/Yahoo. Goto's business model was based on mixing real search results with sponsored returns, and billing advertisers only if users actually clicked on their web-address, a format known in the trade as CPC, Cost per Click. Compared to previous methods, this was particularly innovative. Sponsored links would only appear if they were functional to the user's actual search query, thus maximising the likelihood of a transaction to take place in the form of a click-thru to the commercial site. Google tried to reach an agreement with Goto, but its CEO's refusal forced it to seek an alternative, similar solution in-house. At that time, portals (think Excite, Lycos, infosite, AltaVista and Yahoo) were all using the CPM format, and CPC was something of a repressed wish. This tends to show that if you're not able to buy up a superior, mission critical technology from someone else, you'll have to develop it yourself /in an autonomous fashion/. March 2000 saw the implosion of the NASDAQ bubble, sinking in its wake all pipe-dreams of the 'Dot Com' sphere. With them went also the CPM model, with its illusion of an unlimited cash flow thru its myriads of ad banners "with millions of eyeballs" each. However, these banners were most of the time totally out of context on sites that had nothing to do with the advertiser's line of business. Google faced at that stage the dire need to look very closely at its cost/earning accounts, and urgently find a way to make search technology acquire financial value. The response came with AdWords, which saw the light in October 1999. Adwords functions as a sort of advertisement self-service, where commercial parties could chose the search key-words most likely to be associated with their own sites. AdWords was Google's application to put Goto's 'keywords-based advertisement' into effect. Google hence not only survived the Dot Com bust, it also was able, thanks to being a not - yet - publicly traded private company, to make good use of the opportunity to fish right and left for talent beached by all the other 'dot coms' gone belly up. By mid-2000, Google was answering 18 million queries a day and its document index contained 1 billion unique items. Six months later, queries had reached the 60 million mark. (to be continued) -------------------------- Translated by Patrice Riemens This translation project is supported and facilitated by: The Center for Internet and Society, Bangalore (http://cis-india.org) The Tactical Technology Collective, Bangalore Office (http://www.tacticaltech.org) Visthar, Dodda Gubbi post, Kothanyur-Bangalore (http://www.visthar.org) # distributed via <nettime>: no commercial use without permission # <nettime> is a moderated mailing list for net criticism, # collaborative text filtering and cultural politics of the nets # more info: http://mail.kein.org/mailman/listinfo/nettime-l # archive: http://www.nettime.org contact: [email protected]