Patrice Riemens on Tue, 31 Mar 2009 22:27:59 -0400 (EDT) |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
<nettime> Ippolita Collective: The Dark Side of Google (Chapter 5, first part) |
There was a wee-end interruption as I had gone to the Union Territory of Pudduchery (Pondicherry, ex- Inde Francaise) Cordial, patrizio and Diiiinooos! ......................................................................... NB this book and translation are published under Creative Commons license 2.0 (Attribution, Non Commercial, Share Alike). Commercial distribution requires the authorisation of the copyright holders: Ippolita Collective and Feltrinelli Editore, Milano (.it) Ippolita Collective The Dark Side of Google (continued) Chapter 5 As bonus: other funky functionalities Filtered algorithms: ready-made data banks and control of the users Graph Theory [*N1] is the mathematical basis of all network algorithms, PageRank[TM] among them. This branch of mathematics studies methods to create, manage, and navigate different classes of networks, and to describe them with graphs, and {rank them} according to their size. The introduction of electronic calculators saw Graph Theory take a huge flight from the mid 50s {of the previous century}. In terms of geometry, one can figure a graph as a collection of points in space and continuous curves connecting pairs of points without crossing. In Graph Theory, a graph (not to be confused with a graphic) is a figure made up of points, called vertices or nodes, and of of the lines connecting them, called arcs, edges or arrows. [cf. Wikipedia, 'graph' & associated entries][*N2]. A network is a particular type of graph, in which it is possible to assign a different value, or weight, to separate arcs. This enable to establish different values for different routes {between nodes}. The Internet is a graph, and the same can be said of all web pages taken together. Google's search system is based on this principle. One of the {most} fundamental aspect of network algorithms is the relationship between the time factor and the number of examined nodes. The 'travel time' of a search, for instance, that is the time it takes to connect one node to another, is dependent on the number of elements in the network, and is always set between a minimum and a maximum value. This value of which can vary widely according to the type of route algorithm used. In the network of web pages, every page is a node in the graph, and every link is an arc. Taking the time factor as starting point, it clearly appears that {search} returns generated by Google as answer to a question (technically the returns of a query on its data bases) can impossibly be based on an examination of the 'entire' Internet. Google's spider is constantly busy copying the Internet into its data base: not an easy task. However, is is not believable that the search engine browses through its complete database every time in order to retrieve the most important results. The key factor enabling Google to return almost immediate results is dependent on hidden sequences narrowing the general selection {of data}, meaning concretely, it is dependent on the application of specific filtering devices. Starting from the query itself, the filter makes sure the final result is promptly arrived at by way of a successive side-steps and choices which have been developed with the explicit aim to limit the range of the blocks {of data} that are likely to be analysed {for that particular query}. This is how Google can return results for queries in an astonishingly short time. However, this makes the search {process} just as opaque as it is fast, or with other words, the search shows no coherence with the body of data extent on that indexed part of network. Results for a search will be returned very quickly not only thanks to the {massive} computing power available, but also, and foremost, because filters are there to limit the extent of the data pool that will be searched. The filter's difficult task is to make a drastic selection of the network nodes {to be looked at} in order to exclude some and valorise others and their associated linkages. This method aims to exclude or include whole blocks of data amidst those that would generate results [French text not really clear]. All this is made possible by the existence of pre-set, ready to use search databases, returning standard answers to standard questions, but also tweaking the returns through individual user's profiling. The user's profile is made up from her/ his search history, language, geographic position {(IP address)}, etc. If a user habitually conduct searches only in French for instance, not the whole of Google's database will be queried, but only the French language part, obviously saving a lot of time in the process. Given the humongous amount of data, it is simply unthinkable to use transparent algorithms, meaning ones that will hit _all_ the network's nodes. It is therefore unavoidable that some manipulations, simplifications, and {deliberate} limitations in the number of possible analyses are taking place, and this both for technical reasons of computability in the strict sense, as well as for evident economic reasons. And one can, without falling into unjustified vilification, easily conceive that within a system already biased by approximations caused by filtering, further filters could be added to add, or maneuver into a better position of visibility, those returns that go with paying advertisements, or which carry some doctrinal message [?]. However, seen from Google's point of view, it must be noted that filters are not directly linked to an economic motive, since they are not meant to sell something. They are linked to the habits of the user, and her/ his personal interests. Google sells ads, not products (or if so, in a very limited way only, like Google Minium hardware, or indexation systems for companies {and organisations}). Google's prime consideration is therefore to obtain data generating parameters which can be used to target advertisement campaigns accurately. The personalisation of results according to their recipients is made possible by the information Google /furnishes and/ gathers in the most discreet way possible. E-mail, blogs, 'cloud computing' (or 'virtual hard disks') and other services function as as many databases in a way that is much more suitable to profile users than these could or would ever fathom. Hence, the additional services Google offers over and above search are very useful to the firm for experimenting new avenues {of business}, but also and foremost because they play a key role as 'aggregators' of personal information' about users. A prime example is the electronic mail service GMail, a virtual hard disk of sorts (2GB for the moment, and counting...), which [in its beta phase, when the book was written -TR] is made available through a distribution system based on PageRank[TM]. Put simply, each (user) node of the Google network has a certain weight (allowed number of invitations to join) and can use it to offer the service (via a link) to her/ his acquaintances. This method enable control over the usage made of the service, and at the same time the user discloses to Google key intelligence about her/ his {own network of} friends and acquaintances. In a second stage, this mechanism spreads out among invited individuals, who may extend new invitations: this way, a graph of /human/ relationships between the users will be created, representing an enormous strategic value with respect to 'personalised' ad targeting. If one considers all the information that can be gathered from e-mail traffic (to whom, why, in which language, which formats, which key words, which attachments, etc.) one can surmise the existence, in Google data vaults, not only of a partial - but significant - double of the Internet, but also, of a copy, equally partial, equally significant, of the relationships, personal, professional, and affective, of the service's users. In theory, filters merely serve to make the query process faster and more conform to individual requests. They are even necessary, technically speaking. Their usage, however, shows to which extent it is easy, for a party actually in position of dominance as regards to search services, to profit in a commercial sense of the data at its disposal, without much consideration to the privacy of its users. To resume, Google's database today is able, based on what it knows about this or that user, to marshall with the help of a few key words a query in a manner that varies according to the type of user. Far from being 'objective', search returns are actually {pre-set and} fine-tuned, and using the search service enables Google to 'recognise' an individual better and better, and to present her/ him with 'appropriate' results. Use of each Google service goes with acceptance of {a whole set of} rules and liability disclaimers by the users. Google, from its side, promises it will not reveal personal information {to third parties}. Yet, it is easy to presume that Google is able to exploit and commercialise users' data to its own ends [French text: different ends]. And then we need not even to consider the possibility (or rather: the probability) that all sorts of intelligence and police services can access these informations for any reason of 'national security' they may like to invoke. [The addition of {more} search filters in order to further personalise results is the most likely outcome. - unclear sentence in French text] Google's cookies: stuff that leave traces.... Users profile are always based on a system of search and selection {*N3]. Two types of profiling are prevalent on the Internet, one is straightforward, the other is by implication. Explicit profiling necessitates registration whereby the user fills in a form, disclosing personal details. The information send are archived in a database, to be analysed with the help of a string of parameters partitioning registered users into homogeneous groups (according to age, sex, occupation, interests, etc.). Conversely, implicit profiling is arrived at by tracking anonymous users during their visit to a site, through their IP address, or through cookies. Cookies are little text files used by web sites to leave some data behind in the user's computer. Every time the user comes back to that site, the browser resend the data stocked in the cookie. The aim is to automatise login authentication, to refresh /eventual/ running operations, but mostly to 'reunite' the user with data from her/ his previous visits. Most Internet sites offering online services use cookies, and Google is surely no exception [*N4]. The combination of cookies and filters on an algorithm enable to track an individual's navigation, and hence to accumulate information on her or his 'fingerprint'. Let's take an example: Individual 'X' has a mobile phone number on her name, and uses her mobile to call his family, friends and a few work colleagues. After some time, she decides to do away with this phone and take another one, not in her name, for the sake of her privacy. Now with her new phone, she reconstruct her circle of acquaintances by contacting family, friends, and colleagues at work. This sequence of 'social links' /(family, friends, colleagues)/ is, within {the totality of} all the world's phone calls, a unique one, which cannot be dissociated from the individual in question. So it is not impossible to formalise such a sequence as graph representing the nodes and the arcs of a network. The values of which (the respective 'weight' of the links between different nodes) could be determined by assigning 'affinity value' to 'proximity', starting from the departure point of the analysis, in our case individual 'X'. Getting rid of cookies is an excellent privacy protection practice, but [as?] one can easily extrapolate to search engines from the preceding example. With cookies, just by looking at some specific search themes, it becomes possible to identity groups, or even single individuals, according to the 'fingerprints' they leave behind on the Web. The unique trace which identifies our movements, our social contacts, our telephone calls, is just as unique as our preferences, our tastes, our idiosyncrasies, and our passions, which make each one of us different. Passions would be in this case, the sites we visit, and {for Google} more specifically the searches we are launching during our navigation. This mass of information we are giving to a search engine makes the compiling of our 'fingerprint' possible [*N5] Like all cookies, the ones on the Internet have a 'sell-by' date. Internet sites sending cookies to our browsers must give a date by which the browser is allowed to delete the information contained in the cookie. A smart use of cookies is not something often encountered: the fact that Google was able to exploit to its own advantage a technical trick to POSIX developers is interesting in this regard. (POSIX is the international standard that permits interoperability between Unix and Unix-like OSs, such as GNU/Linux). The expiration date for Google cookies is somewhere in 2038 - more or less the maximum possible. This means that for all practical purposes, the browser in our respective OSs will 'never' delete these cookies and the information contained therein [*N5]. Techno-masturbation: create! search! consume! ... your own contents! It is next to impossible to follow the rapid evolutions Google is going through on permanent basis. New services are launched in a quasi-convulsive way, and it is difficult to understand which ones are actually meant to have an impact on our lives, and which ones are likely to be discarded in the next few months or even weeks. And anyway, it does not make very much sense, in view of the fast rate of innovation and information 'burn' on the Internet, to lose oneself in complicated descriptions and exhaustive classifications which would inevitably contain errors and omissions. The natural dynamics and fluidity of the networks should dissuade from any attempt at attaining complete knowledge - in case someone would be attracted to do so. One would get lost even before having started on such a ill-advised endeavour. This being said, one can, albeit from a subjective and fragmentary viewpoint, try to formulate a general critique of Google, without going into technical details and even less engaging in uncertain forecasts. As far as personalisation is concerned, the increasing prevalence of the concept of 'prosumer'[*N7] is probably the most worth considering. Google is well-known for its propensity to launch 'beta' versions of its services, when these are still provisional and under testing mode. This is a dynamic, as we have seen in the previous chapter, which is directly inspired from {the modus operandi of} the Free Software development communities. Users contribute significantly to the development of the new service through their feedback, impressions, and suggestions regarding its usability; they are at the same time users and producers of the service - 'prosumers' being the name given to this hybrid breed. In its aim to become the position of global mediator of web contents, Google sells technology and search results (through advertisments) to users, who on the other hand, tend {more and more} to be the creators of net content, and the consumers of the same through the services of Google, which are more and more personalised. Two examples, which would seem, de prime abord, to have very little to do with each others, should make the point regarding this closed content production and consumption cycle: Google Web Toolkit (GWT)[*N8] and the alliance between Google's Gtalk and Nokia [*N9]. (nice cliff hanger, huh?) (to be continued) -------------------------- Translated by Patrice Riemens This translation project is supported and facilitated by: The Center for Internet and Society, Bangalore (http://cis-india.org) The Tactical Technology Collective, Bangalore Office (http://www.tacticaltech.org) Visthar, Dodda Gubbi post, Kothanyur-Bangalore (http://www.visthar.org) # distributed via <nettime>: no commercial use without permission # <nettime> is a moderated mailing list for net criticism, # collaborative text filtering and cultural politics of the nets # more info: http://mail.kein.org/mailman/listinfo/nettime-l # archive: http://www.nettime.org contact: [email protected]