FREE ELECTRONIC LIBRARY - Dissertations, online materials

Pages:   || 2 | 3 | 4 | 5 |   ...   | 26 |

«On Search Engine Evaluation Metrics Inaugural-Dissertation zur Erlangung des Doktorgrades der Philosophie (Dr. Phil.) durch die Philosophische ...»

-- [ Page 1 ] --

On Search Engine Evaluation Metrics


zur Erlangung des Doktorgrades der Philosophie (Dr. Phil.)

durch die Philosophische Fakultät der

Heinrich-Heine-Universität Düsseldorf

Vorgelegt von Pavel Sirotkin

aus Düsseldorf


Prof. Wolfgang G. Stock

Düsseldorf, April 2012

-2Oh my God, a mistake!

- It’s not our mistake!

- Isn’t it? Whose is it?

- Information Retrieval.


-3Acknowledgements One man deserves the credit, one man deserves the blame… TOM LEHRER, “LOBACHEVSKY” I would like to thank my supervisor, Wolfgang Stock, who provided me with patience, support and the occasional much-needed prod to my derrière. He gave me the possibility to write a part of this thesis as part of my research at the Department of Information Science at Düsseldorf University; and it was also him who arranged for undergraduate students to act as raters for the study described in this thesis.

I would like to thank my co-supervisor, Wiebke Petersen, who bravely delved into a topic not directly connected to her research, and took the thesis on sightseeing tours in India and to winter beaches in Spain. Wiebke did not spare me any mathematical rod, and many a faulty formula has been spotted thanks to her.

I would like to thank Dirk Lewandowski, in whose undergraduate seminar I first encountered the topic of web search evaluation, and who provided me with encouragement and education on the topic. I am also indebted to him for valuable comments on a draft of this thesis.

I would like to thank the aforementioned undergraduates from the Department of Information Science for their time and effort on providing the data on which this thesis’ practical part stands.

Last, but definitely not least, I thank my wife Alexandra, to whom I am indebted for far more than I can express. She even tried to read my thesis, which just serves to show.

As is the custom, I happily refer to the acknowledged all good that I have derived from their help, while offering to blame myself for any errors they might have induced.

-4Contents 1 Introduction

1.1 What It Is All About

1.2 Web Search and Search Engines

1.3 Web Search Evaluation

Part I: Search Engine Evaluation Measures

2 Search Engines and Their Users

2.1 Search Engines in a Nutshell

2.2 Search Engine Usage

3 Evaluation and What It Is About

4 Explicit Metrics

4.1 Recall, Precision and Their Direct Descendants

4.2 Other System-based Metrics

4.3 User-based Metrics

4.4 General Problems of Explicit Metrics

5 Implicit Metrics

6 Implicit and Explicit Metrics

Part II: Meta-Evaluation

7 The Issue of Relevance

8 A Framework for Web Search Meta-Evaluation

8.1 Evaluation Criteria

8.2 Evaluation Methods

8.2.1 The Preference Identification Ratio

8.2.2 PIR Graphs

9 Proof of Concept: A Study

9.1 Gathering the Data

9.2 The Queries

9.3 User Behavior

9.4 Ranking Algorithm Comparison

10 Explicit Metrics

10.1 (N)DCG


10.3 (Mean) Average Precision

10.4 Other Metrics

10.5 Inter-metric Comparison

10.6 Preference Judgments and Extrinsic Single-result Ratings

10.7 PIR and Relevance Scales

10.7.1 Binary Relevance

10.7.2 Three-point Relevance

11 Implicit Metrics

11.1 Session Duration Evaluation

11.2 Click-based Evaluations

11.2.1 Click Count

11.2.2 Click Rank

12 Results: A Discussion

12.1 Search Engines and Users

12.2 Parameters and Metrics

12.2.1 Discount Functions

12.2.2 Thresholds Detailed Preference Identification

12.2.3 Rating Sources

12.2.4 Relevance Scales

12.2.5 Cut-off Ranks

12.2.6 Metric Performance

12.3 The Methodology and Its Potential

12.4 Further Research Possibilities

Executive Summary


Appendix: Metrics Evaluated in Part II

–  –  –

1.1 What It Is All About The present work deals with certain aspects of the evaluation of web search engines. This does not sound too exciting; but for some people, the author included, it really is a question that can induce one to spend months trying to figure out some seemingly obscure property of a cryptic acronym like MAP or NDCG. As I assume the reader longs to become part of this selected circle, I will, in this section, try to introduce him 1,2 to some of the basic concepts we will be concerned with; the prominent ones are web search and the web search engines, followed by the main ideas and troubles of the queen of sciences which is search engine evaluations.3 These, especially the last section, will also provide the rationale and justification for this work. The well-disposed specialist, on the other hand, might conceivably skip the next sections of the introduction since it is unlikely that he needs persuading that the field is important, or reminding what the field actually is.

After the introduction, there will be two major parts. Part I is, broadly speaking, a critical review of the literature on web search evaluation. It will explain the real-life properties of search engine usage (Chapter 2), followed by a general and widely used evaluation framework (Chapter 3). Chapters 4 and 5 introduce the two main types of evaluation metrics, explicit and implicit ones, together with detailed discussions of previous studies attempting to evaluate those metrics, and lots of nit-picking comments. Part I concludes with a general discussion of the relationship between explicit and implicit metrics as well as their common problems (Chapter 6).

Part II is where this thesis stops criticizing others and gets creative. In Chapter 7, the concept of relevance, so very central for evaluation, is discussed. After that, I present a framework for web search meta-evaluation, that is, an evaluation of the evaluation metrics themselves, in Chapter 8; this is also where a meta-evaluation measure, the Preference Identification Ratio (PIR), is introduced. Though relatively short, I regard this to be the pivotal section of this work, as it attempts to capture the idea of user-based evaluation described in the first part, and to apply it to a wide range of metrics and metric parameters. Chapter 9 then describes the “To avoid clumsy constructions resulting from trying to refer to two sexes simultaneously, I have used male pronouns exclusively. However, my information seekers are as likely to be female as male, and I mean no disrespect to women by my usage.” (Harter 1992, p. 603) Throughout our work, I have been quite liberal with footnotes. However, they are never crucial for the understanding of a matter; rather, they provide additional information or caveats which, I felt, would have unnecessarily disrupted the flow of the argument, however little of that there is. Therefore, the footnotes may be ignored if the gentle reader is not especially interested in the question under consideration.

I am slightly exaggerating.

-7layout and general properties of a study conducted on this principles. The study is rather small and its results will not always be significant. It is meant only in part as an evaluation of the issues under consideration; at least equally important, it is a proof of concept, showing what can be done with and within the framework. The findings of the study are presented in Chapter 10, which deals with different metrics, as well as with cut-off values, discount functions, and other parameters of explicit metrics. A shorter evaluation is given for some basic implicit measures (Chapter 11). Finally, Chapter 12 sums up the research, first in regard to the results of the study, and then exploring the prospects of the evaluation framework itself.

For the very busy or the very lazy, a one-page Executive Summary can be found at the very end of this thesis.

The most important part of this work, its raison d’être, is the relatively short Chapter 8. It summarizes the problem that tends to plague most search engine evaluations, namely, the lack of a clear question to be answered, or a clear definition of what is to be measured. A metric might be a reflection of a user’s probable satisfaction; or of the likelihood he will use the search engine again; or of his ability to find all the documents he desired. It might measure all of those, or none; or even nothing interesting at all. The point is that, though it is routinely done, we cannot assume a real-life meaning for an evaluation metric until we have looked at whether it reflects a particular aspect of that real life. The meta-evaluation metric I propose (the Preference Identification Ratio, or PIR) is to answer the question of whether a metric can pick out a user’s preference between two result lists. This means that user preferences will be elicited for pair of result lists, and compared to metric scores constructed from individual result ratings in the usual way (see Chapter 4 for the usual way, and Section 8.2.1 for details on PIR). This is not the question; it is a question, though I consider it to be a useful one. But it is a question of a kind that should be asked, and answered.

Furthermore, I think that many studies could yield more results than are usually described, considering the amount of data that is gathered. The PIR framework allows the researcher to vary a number of parameters,4 and to provide multiple evaluations of metrics for certain (or any) combination thereof. Chapters 9 to 12 describe a study that was performed using this method. However, one very important aspect has to be stated here (and occasionally restated later). This study is, unfortunately, quite small-scale. The main reason for that sad state of things are the limited resources available; what it means is that the study should be regarded as more of a demonstration of what evaluations can be done within one study. Most of the study’s conclusions should be regarded, at best, as preliminary evidence, or just as hints of areas for future research.

1.2 Web Search and Search Engines Web search is as old as the web itself, and in some cases as important. Of course, its importance is not self-reliant; rather, it stems from the sheer size of the web. The estimates vary wildly, but most place the number of pages indexed by major search engines at over 15 billion (de Kunder 2010). The size of the indexable web is significantly larger; as far back (for web timescales) as 2008, Google announced that its crawler had encountered over a Such as the cut-off values, significance thresholds, discount functions, and more.

-8trillion unique web pages, that is, pages with distinct URLs which were not exact copies of one another (Alpert and Hajaj 2008). Of course, not all of those are fit for a search engine’s index, but one still has to find and examine them, even if a vast majority will be omitted from the actual index. To find any specific piece of information is, then, a task which is hardly manageable unless one knows the URL of the desired web page, or at least that of the web site. Directories, which are built and maintained by hand, have long ceased to be the influence they once were; Yahoo, long the most prominent one, has by now completely taken its directory from the home page (cp. Figure 1.1). Though there were a few blog posts in midremarking on the taking down of Yahoo’s French, German, Italian and Spanish directories, this is slightly tarnished by the fact that the closure occurred over half a year before, “and it seems no one noticed” (McGee 2010). The other large directory, ODP/DMOZ, has stagnated at around 4.7 million pages since 2006 (archive.org 2006; Open Directory Project 2010b). It states that “link rot is setting in and [other services] can’t keep pace with the growth of the Internet” (Open Directory Project 2010a); but then, the “About” section that asserts this has not been updated since 2002. There are, of course, specialized directories that only cover web sites on a certain topic, and some of those are highly successful in their field.

But still, the user needs to find those directories, and the method of choice for this task seems to be web search. Even when the site a user seeks is very popular, search engines play an important role. According to the information service Alexa, 5 the global web site of CocaCola, judged to be the best-known brand in the world (Interbrand 2010), receives a quarter of its visitors via search engines. Amazon, surely one of the world’s best-known web sites, gets about 18% of its users from search engines; and, amazingly, even Google seems to be reached through search engines by around 3% of its visitors.6 For many sites, whether popular or not, the numbers are much higher. The University of Düsseldorf is at 27%, price comparison service idealo.de at 34%, and Wikipedia at 50%.

Web search is done with the help of web search engines. It is hard to tell how many search engines there are on the web. First, one has to properly define a search engine, and there the troubles start already. The major general-purpose web search engines have a certain structure;

they tend to build their own collection of web data (the index) and develop a method for returning some of them to the user in a certain order, depending on the query the user entered.

But most search engine overviews and ratings include providers like AOL (with a reported volume of 1%-2% of all web searches (Hitwise 2010; Nielsen 2010)), which rely on results provided by others (in this case, Google). Other reports list, with a volume of around 1% of worldwide searches each, sites like eBay and Facebook (comScore 2010). And it is hard to make a case against including those, as they undoubtedly do search for information on the web, and they do use databases and ranking methods of their own. Their database is limited to a particular type of information and a certain range of web sites – but so is, though to a lesser extent, that of other search engines, which exclude results they consider not sought for by their users (e.g. spam pages) or too hard to get at (e.g. flash animations). And even if we, now www.alexa.com. All values are for the four weeks up to October 5 th, 2010.

Pages:   || 2 | 3 | 4 | 5 |   ...   | 26 |

Similar works:


«113 A Contribution to the Study of the Persian Concept of Âberu Magdalena Zaborowska* PL ISSN 0239–8818 HEMISPHERES Vol. 29, No. 1, 2014 A Contribution to the Study of the Persian Concept of Âberu Abstract The concept of âberu is one of those key concepts in Iranian culture which are very hard to define. In this paper the author attempts to explain its philosophical meaning, both by philological analysis based on dictionary sources and by placing it within the context of Iranian,...»


«International Journal of Humanities & Social Science Studies (IJHSSS) A Peer-Reviewed Bi-monthly Bi-lingual Research Journal ISSN: 2349-6959 (Online), ISSN: 2349-6711 (Print) Volume-I, Issue-III, November 2014, Page No. 193-195 Published by Scholar Publications, Karimganj, Assam, India, 788711 Website: http://www.ijhsss.com The Law of Karma and Salvation Poulami Chakraborty Lecturer in Philosophy, Hiralal Majumder Memorial College, Dakshineswer, Kolkata, India Abstract The purpose of this paper...»

«ALAIN LOCKE: CULTURE AND THE PLURALITY OF BLACK LIFE A Dissertation Presented to the Faculty of the Graduate School Of Cornell University In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy By Michelle Renée Smith August 2009 © 2009 Michelle Renée Smith ALAIN LOCKE: CULTURE AND THE PLURALITY OF BLACK LIFE Michelle Renée Smith, Ph.D. Cornell University 2009 Better representation of ‘the black’ and of ‘black life’ in ‘public’ and recognition of black...»

«Reply to David Buller by Martin Daly & Margo Wilson The substantial excess risk of abuse and homicide incurred by stepchildren has been abundantly documented in dozens of studies using diverse methodologies (see companion document “The Cinderella effect”). Nevertheless, philosopher David Buller (2005a,b) has recently attempted to call the existence of this phenomenon into question by proposing “that all of the evidence cited in support” of it (Buller 2005b: 282; emphasis in original)...»

«DEVELOPMENT OF IMPROVED TRAVELER SURVEY METHODS FOR HIGHSPEED INTERCITY PASSENGER RAIL PLANNING A Dissertation by BENJAMIN ROBERT SPERRY Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY May 2012 Major Subject: Civil Engineering Development of Improved Traveler Survey Methods for High-Speed Intercity Passenger Rail Planning Copyright 2012 Benjamin Robert Sperry DEVELOPMENT OF IMPROVED TRAVELER...»

«Hermes in the Academy Ten Years’ Study of Western Esotericism at the University of Amsterdam Wouter J. Hanegraaff and Joyce Pijnenburg (eds.) A M S T E R DA M U N I V E R S I T Y P R E S S Hermes in the Academy Hermes in the Academy: Ten Years’ Study of Western Esotericism at the University of Amsterdam Wouter J. Hanegraaff and Joyce Pijnenburg, eds. Amsterdam University Press Cover illustration: Geheime Figuren der Rosenkreuzer, 1943 (ms. BPH 308), courtesy of Bibliotheca Philosophica...»

«19 Natascia Leonardi* ‘Ontology’ and Terminological Frameworks: an Overview of Issues and Term(s)1 Abstract This paper addresses the question of the protean nature of ‘ontology’, with special attention paid to its use within the domain of terminology theories and applications. This term is widely used nowadays within various disciplines for designating different types of organising relational frameworks. Yet, its designations remain unvaried and, in this way, it causes ambiguity. The...»

«CONTACT STRUCTURES ON OPEN 3-MANIFOLDS James J. Tripp A Dissertation in Mathematics Presented to the Faculties of the University of Pennsylvania in Partial Fulllment of the Requirements for the Degree of Doctor of Philosophy John B. Etnyre Supervisor of Dissertation David Harbater Graduate Group Chairperson Acknowledgments Thank you to John Etnyre, my advisor, and Stephan Sch¨ nenberger for many helpful o conversations and for reading drafts of this work. Also, thank you to Ko Honda, Will...»

«Predictive Coding: How the Human Brain Uses Context to Facilitate the Perception of Degraded Speech By Conor James Wild A thesis submitted to the Centre for Neuroscience Studies in conformity with the requirements for the degree of Doctor of Philosophy Queen’s University Kingston, Ontario, Canada August, 2012 Copyright © Conor J. Wild, 2012 Abstract The most common and natural human behaviours are often the most computationally difficult to understand. This is especially true of spoken...»

«Evidence of Regional and Global Climate Change in the Mineral Aerosol (Dust) Record from Ice Cores Through the Anthropocene and Pleistocene by Sarah Miranda Aarons A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Geology) in the University of Michigan Doctoral Committee: Assistant Professor Sarah M. Aciego, Co-Chair Professor Joel D. Blum, Co-Chair Associate Professor Jeremy N. Bassis Assistant Professor Rose Cory Assistant Professor...»

<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.