Research exercise in Google’s search algorithms

Parkland First & Second result by logged IN_OUT resized copy

How does personalisation in search engine results work? Which signals are taken into consideration when the algorithm favorites one over another result? What is the weight of these different  signals? Curious to find out, I did a small research exercise.

Crowdsourcing search results for two queries [parkland survivors] and [holiday], I gathered 91 distinct query results, from participants from 8 countries. Participants were asked to provide information on search engine used, browser used, IP address accessed (global or local) and local domains, whether accessed from desktop or mobile (and particular app used), logged in or logged out, ad blockers used or not, other extensions added to the browser (Ghostery, PrivacyBadger and similar) in order to investigate the relation between these factors and the diversity of search query outputs.

Few search engines and browsers, both desktop and mobile versions, were used as control search points (such as DuckDuckGo, Tor, Brave, Onion Browser etc.)

The implications of personalisation are not yet fully grasped as Google’s algorithm is opaque and its workings obfuscated and iterative, thus escaping the possibility for thorough and objective analysis. This makes the question of method highly discussable, and the investigation of personalisation and its effects — complex. The exercise is part of an essay that outlines the importance of studying the societal implication of search engine ranking, discusses some of the methods for study, presents the findings from the research exercise, and further contemplates the theoretical outlook and methodological approaches.

The full text can be accessed here, but some of the findings I find interesting are the following:

Dominance of sources is evident in the top first and second position with more diversity towards the bottom. This is more visible with the [parkland survivors] query (which is an important event), showing dominance of traditional news sources (Independent UK). With the [holiday] query, the locality of the user (IP address) seems more influential, resulting in more locally relevant results, possibly also influenced by bigger ascribed weight to users’ query history and browsing.

SE plus query plus medium with researcher

There is a lower diversity in the top two positions when users were logged in (especially for [parkland survivors]), which opposes the logic of personalisation tendencies, according to which the query should yield different outputs for different users, based on their previous search history and behaviour. However, the “public ratification” Gillespie mentions, as well as Google’s tendency to link only to well established sources (a “popularity” device), recreating the structure of offline media, brings to light the contradiction between personalisation and authority.

 

Parkland First & Second result by logged IN_OUT resized copyParkland First & Second result by logged IN_OUT

The diversity of return outputs is bigger for the logged out queries. As top 1 result there are total of six distinct return outputs (sources) and for top 2 – eleven. For the logged in queries, in top 1 return outputs there are four distinct source, of which one is dominant by frequency, and for the top 2 there are nine, of which again, there is one dominant source. This is in contradiction with the logic of personalisation, according to which is we all get personalised results, the diversity of queries for the logged in parameter should be bigger, than for the logged out.

One of the questions that remain open is the question of method of research. There is no established method and approach to study personalisation. This is due to 1. the opaqueness of the algorithms 2. the multi-factor, complex inner workings 3. the perpetual beta-state of the algorithm. One of the predominant proposals has been the call for “algorithmic transparency” and public oversight of search engines (Nissenbaum and Introna, 2000). While full disclosure of how the indexing, assigning relevance and ranking is done will certainly help in investigating the scope and effect, much needs to be done on the researchers’ side as well.

Personalisation and inner workings of algorithms are issues that are not easy to investigate. So this research exercise is a small contribution towards   the ongoing research and the persistent open questions.

Leave a Reply

Your email address will not be published. Required fields are marked *