Patrice Riemens on Tue, 7 Apr 2009 22:12:49 +0200 (CEST)


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

<nettime> Ippolita Collective: The Dark Side of Google, Chapter 6 (part 1)


NB this book and translation are published under Creative Commons license
2.0 (Attribution, Non Commercial, Share Alike).
Commercial distribution requires the authorisation of the copyright
holders: Ippolita Collective and Feltrinelli Editore, Milano (.it)


Ippolita Collective

The Dark Side of Google (continued)


Chapter 6.  Quality, Quantity, Relation (part 1)


The Rise of Information

The information society is heterogeneous in the extreme: it uses network
communication systems like telephony, digitalised versions of broadcast
[*N1], pre-Web traditional media, like dailies, radio or television,  and
Internet-born ones like e-mail or P2P exchange platforms, all this with
gay abandon, and even without an afterthought. But a closer look reveals
that all these systems are based on one single resource: information. Now
within the specific domain of search engines, and thus of information
retrieval, one can state that what consists information is represented by
the sum total of all extant web pages [*N2].

The quantitative and qualitative growth of all these pages and of their
content has been inordinate and continue to be so. That comes from
the fact that it has become so {unbelievably} easy today to put up content
on the Web.
But contents are not isolated islands, they take shape within a
multiplicity of relationships and links that bind together web pages,
websites, issues, documents, and finally the contents themselves.

Direct and unmediated access to this mass of information is well-nigh
impossible, even as a simple play of thought: it would be like 'to browse
through the web manually'. This is the reason why there are search
engines, to filters the Web's complexity and to serve as interface between
the information and ourselves, by giving us search results we are happy with.

In the preceding chapters, we have reviewed the principal working tools
of a search engine, that is the instruments Google, and other search
{companies}, have put in place to scan through web pages, to analyse and
order them with the help of ranking algorithm, to archive them on
appropriate hardware supports, and finally to return a result to the users
according to their search queries.

The quantity of stored web pages in memory is thus crucial to estimate the
technical and economic potency of a search engine. The larger its
'capital' of checkable web pages, the higher a search engine will score on
fiability and completeness of its returns, {but} this obviously within the
limits of the specified context.

Yet, however enormous a search engine's  'pages capital' may be, it will,
and could, never be entirely complete and exhaustive, and no amount of
time, money or technology invested in it could change that. It is absurd
to think that it would be possible to know, or, at a more down-to-earth
level, simply to copy and catalogue all the Internet. It would be like the
pretense to know the totality of the living world, including its constant
mutations.

The information storage devices used by search engines like Google are
like vessels: let's imagine we'd have to fill an enormous vessel with
diminutive droplets (think all the pages who constitute the Web's
information). Assuming that our vessel is able to contain them all, then
our task would be to capture and identify them all, one by one, in a
systematic and repetitive manner.

But if on the other hand, we'd think there are more droplets then our
vessel can contain, or that we cannot fathom an algorithm to capture them
all, or that the capture may be possible but will be slow, or even that
the whole task may be hopelessly ... endless, then we would need to switch
our tactics. Especially as our data-droplets change with time, pages get
modified, and resources are jumping from one address to another...

At this stage, we might decide to go only for the larger droplets, or to
concentrate our efforts on those place where most droplets fall, or we
could chose to collect only those droplets that interest us most, and then
try to link them together in the way we think is the most relevant.

As search engines {companies} continue to go after the {holy grail of}
cataloguing 'everything' {on the Net}, it might be better to take a more
localised approach to the Web, or to accept that for any given 'search
intention', there may well be many answers possible, and that among all
these answers some may be 'better', because they conform to specific
demands regarding [either?] speed [or?] and completeness. One should
always keep in mind that the quality of results is dependent upon our
subjective perception when it comes to being satisfied with a search
return. And in order to accept or to reject a search return, it is
essential to apply our critical faculties and to be conscious of the
subjectivity of one's viewpoint. In order to establish the trajectory one
is really interested in, it is necessary to assume the existence of a
closed and delimited  network, a kind of world that is bounded only by our
own personal requirements, yet always realising that this concerns a
subjective localisation, which is neither absolute nor constant in time.
[I am not completely happy with this, but then the French text... etc]

>From an analytical point of view, charting a network means being able to
partition the network for examination into sub-networks, which amounts to
creating little localised and temporary worlds (LCWs Localised Closed
Worlds) each containing at least one answer to the search that has been
launched. Without that many searches would go on with no end in sight,
especially since the {amount of} data to be analysed go well beyond the
ability of a human person to capture them all: hence this would be a
non-starter. Conversely, altering and specifying the query, and refining
one's vantage point, will generate a trajectory that is more concordant
with the departure point [of the search?]. By looking at the Web as a
closed and localised world we also accept that the very dynamic of birth,
growth and networked distribution of information ({even} happening while
this information may already have become invalid) is an 'emergence'
phenomenon, which is neither fortuitous, nor with[out?] a cause.

Emergence [*N3] is a occurrence which can be described in mathematical
terms as an unexpected and imprevisible outburst of complexity. But it is
foremost an event that generates situations which cannot be exhaustively
described. To analyse and navigate an 'emerging universe'  like the Web
demands a permanent repositioning of oneself. This not only determines a
'closed and localised world' of abilities and expectations, but also the
opening up towards new avenues of exploration (other worlds are always
possible, outside one's own closed one), and thus the appreciation that
results can only and always be {fragmented and} incomplete.


Quantity and quality

Indexation by way of pages accumulation is a quantitative phenomenon, but
does not in itself determine the quality of information on the Web; there
the prime objective is to capture all pages, not to make a selection. The
relationships between the pages give rise to emergence because they are
generated on basis of a simple criterion, links existing between them. The
quality of information springs hence forth from their typology, and is
determined by their ability to trace trajectories, without bothering about
a need to capture 'all' information available [?]. Quality therefore
depends mostly on making a vantage point explicit through a particular
search trajectory: basically, it are the surfers, the pirates, the users
of the web who determine, {but also} increase the quality of information
by establishing links between pages. The power of accumulation of Google's
algorithms is useful to achieve this, but is insufficient in itself.

The evaluation of the pages' content has been outsourced to algorithms, or
rather to the companies controlling them. The {whole} Google phenomenon is
caused by our habit to trust an entity with apparently unlimited power
that is able to offer us the opportunity to find 'something' interesting
and useful within its own resource 'capital', which itself is being
peddled as 'the whole Web'. However, the limits of this {allegedly}
miraculous offer are occulted: no word about was not in that 'capital', or
only in part, and especially not about what has been excised from it.

The thorny ethical and political problem attenant to the management and
control of information still refuses to go away: who is there to guarantee
the trustworthiness of an enterprise whose prime motive is profit, however
'good' it may be?

Even though  considerable economic resources and an outstanding
technological infrastructure are put to the task of constantly improving
the storage and retrieval of data, the political question that constitutes
the accumulation of data {by one single actor} cannot and should not be
sidestepped. Google represents an unheard of concentration of private
data, a source of  immense power, which is yet devoid of any transparency.
It is obvious that no privacy law can {address and} remedy this situation,
and that it would be even less the case through the creation of ad hoc
national or international instances /towards the control of personal and
sensitive data/. The answer /to the issue of confidentiality of data/ can
only reside with a larger awareness and taking responsibility by the
individuals who create the Web {as it is}, and this through {a process of}
self-information. Even if this is no easy road, it is the only one likely
to be worth pursuing in the end.

(to be continued)

--------------------------
Translated by Patrice Riemens
This translation project is supported and facilitated by:

The Center for Internet and Society, Bangalore
(http://cis-india.org)
The Tactical Technology Collective, Bangalore Office
(http://www.tacticaltech.org)
Visthar, Dodda Gubbi post, Kothanyur-Bangalore (till March 31st, 2009)
(http://www.visthar.org)
The Meyberg-Acosta Household, Pune (from April 2, 2009)





#  distributed via <nettime>: no commercial use without permission
#  <nettime>  is a moderated mailing list for net criticism,
#  collaborative text filtering and cultural politics of the nets
#  more info: http://mail.kein.org/mailman/listinfo/nettime-l
#  archive: http://www.nettime.org contact: [email protected]