WWW.DISSERTATION.XLIBX.INFO
FREE ELECTRONIC LIBRARY - Dissertations, online materials
 
<< HOME
CONTACTS



Pages:     | 1 |   ...   | 12 | 13 || 15 | 16 |   ...   | 26 |

«On Search Engine Evaluation Metrics Inaugural-Dissertation zur Erlangung des Doktorgrades der Philosophie (Dr. Phil.) durch die Philosophische ...»

-- [ Page 14 ] --

9.2 The Queries The raters contributed a total of 42 queries. There were no restrictions placed on the kind of query to be entered since the intention was to gather queries that were as close to possible to those employed for real-life information needs. In particular, this means no query type was predetermined. However, the detailed information need statements allowed for an unequivocal identification of the original intent behind the queries. 31 queries were informational, 6 were transactional and 2 were navigational, according to Broder’s widely

–  –  –

Two queries are similar to what has been described as “closed directed informational queries” (Rose and Levinson 2004), and commonly called “factual”, as they are generally geared towards finding a specific fact (Spink and Ozmultu 2002). This indicates a query aimed at finding not as much information as possible on a certain topic, but rather a (presumably definitive and singular) answer to a well-defined question. Another example would be a query like “How long is the Nile”. I would like to further narrow down the definition of such queries to distinguish them from informational ones and highlighting special features particular to

them. These features are:

 The query can easily be reformulated as a question starting with “who”, “when”, or “where”.74  The originator of the query knows exactly what kind of information he is looking for.

 The answer is expected to be short and concise.

 The answer can be found in a snippet, eliminating the need to examine the actual results.

 One authoritative result is enough to cover the information need.

 There are many pages providing the needed information.

The last three properties can be used to derive some interesting predictions for this type of queries. If all the needed information can (and, ideally, would) be found on the result page of the search engine, the sessions with the highest user satisfaction will tend to have low durations and no clicks – features also typical of “abandoned sessions”.

In other words, sessions without any clicks are typically assumed to have been worthless for the user (Radlinski, Kurup and Joachims 2008), while for factual queries, the opposite might be true:

the absence of a click can indicate a top result list. Furthermore, the last two points indicate that, while factual queries resemble navigational ones in that a single result will satisfy the Here and further, information needs are translated from German where needed.

“What” and “how” are more complex cases. Obviously, my own example (“How long is the Nile”) starts with “how”. Mostly, questions aiming for a concise answer (“how long”, “how big”, “how old” etc.) are quite typical for factual queries, whereas open-ended, general questions (“how do you...”) are not. A similar division holds for “what”, with “What is the capital of Madagascar” being a prototypical factual query, but not “What is the meaning of life”. In general, the factuality of a query is more meaningfully defined by the next two points of expected preciseness and brevity of the answer, with the question word being, at best, a shortcut.

- 75 user’s information need, they are unlike navigational queries in that there are many possible results which might provide the fact the user is looking for.

Another type of information need not covered by traditional classifications is represented by what I will call “meta-queries”. The single example is provided by a query intended to check the position of a certain page in the result list. The exceptional property of these queries is that a result list cannot be “good” or “bad”. Whether the result sought in that example is in first position, somewhere in the tail or not in the result list at all – the information need is satisfied equally well. Another example would be the searches done for the comic shown in Figure 9.8.

Here, the user was looking for the number of Google hits for a particular query. The number of returned results and the results themselves do not have any effect on user satisfaction; more than that, it is unclear how any single result can be relevant or non-relevant to that query. The information the user is looking for is not found on any web page except for the search engine’s result page; it is either exclusively a result of the search algorithm (as in the case of the query aiming to find out the rank of a certain page in Google), or else a service provided by the search engine which can be described as “added value” compared to the information found on the web itself (as with submitting a query to determine the popularity of a certain phrase). The search, in short, is not a web search about real life; it is a search engine search about the web.75 Figure 9.8. The queries submitted to research this xkcd comic (Munroe 2008) fall in the meta-query category.

There is only one meta-query in the present study, so that it will not play a major role.

However, it would be interesting to determine how frequent this type of query is in real life, and particularly whether it occurs often enough to warrant a special consideration in further studies.

It is also interesting to note that the web comic shown in Figure 9.8 explicitly states that the relevant information is the number of Google hits, not the number of web pages it is supposed to represent, and not the popularity of the phrases implied by the number of web pages, and also not the general attitudes of the population to be derived from the popularity. This is in no way to criticize the xkcd comic, which is generally very much in tune with the general attitudes of the population (at least, to judge from what can be seen online), but rather to point out the extent to which this population is relying on the notion that Google is an accurate reflection of the web, or even of a large part of the modern world.





–  –  –

Table 9.2 shows the average query length for the different kinds of queries.

Leaving aside the small amount of queries that leaves factual, navigational and meta-queries a statistically insignificant minority, we see that the numbers are well in accord with the two to three term average found in other studies (Jansen and Spink 2006; Yandex 2008). Also in accord with other studies (White and Morris 2007), there were no operators used, which was to be expected for a relatively small number of queries.

There are surely not enough factual, navigational and meta-queries to allow for their meaningful evaluation on their own, and probably also not enough transactional queries. 76 For this reason, in considering the study’s results, I will evaluate two sets: the informational queries on their own, and all the queries taken together.

9.3 User Behavior Another set of data concerns user behavior during the search sessions. An interesting question is how long it takes for the raters to conduct a session in a single result list or side-by-side condition, and decide upon their satisfaction or preference judgment. Figure 9.9 shows the session length for single result list evaluation. Almost half of all sessions were concluded in less than 30 seconds, and around 80% took up to 90 seconds. Less than 5% of queries took more than 5 minutes. The situation is somewhat different in the side-by-side evaluation (Figure 9.10); here, about 50% of all sessions were completed in up to a minute, and about 75% in up to three and a half minutes. Here, 10% of sessions took more than 6 minutes; it has to be noted, however, that a number of sessions have extremely long durations of up to 24 hours. These are cases where the raters started a session, but seem to have temporarily abandoned the task. These sessions are included in the evaluation, as all of them were later correctly finished with a satisfaction or preference judgment.77 Also, this might well be in line with actual user behavior since a user can start a search session but be distracted and come back to an open result list page on the next day. However, these outliers greatly distort the average session time, so that median numbers are more meaningful in these cases. Those are 32 seconds for single result lists and 52 seconds for side-by-side result lists. While the raters – understandably – needed more time to conduct a query with two result lists, the session duration does not seem so high as to indicate an excessive cognitive burden.

Although there have been studies published with less than six different queries.

These might be considered to consist of multiple sub-sessions, with significantly lower overall durations.

However, since in this study there was no clear way of determining the end of the first sub-sessions and the start of the last ones (and since these sessions were rare), I opted to keeping them with their full duration.

–  –  –

The first notable result is the large number of sessions where no results are clicked on. As Figure 9.11 indicates, in both single and side-by-side evaluations, almost half of all sessions end without any result being selected. Of the sessions that did have clicks, most had a very small number, and the sessions with one to three clicks made up 75% and 63% respectively (of those with any clicks at all). The graph for non-zero click sessions can be seen in Figure

9.12. Side-by-side evaluations had generally more clicks than single ones, with the average being 2.4 and 1.9 clicks, respectively. With usual click numbers per session given in the literature ranging from 0.3-1.5 (Dupret and Liao 2010) to 2-3 (Spink and Jansen 2004), these results seem to fall within the expected intervals.

–  –  –

Figure 9.12.

Clicks per session. No session had more than 20 clicks.

- 79 These numbers indicate that, while raters click more often in side-by-side settings, the click frequency is not as high as to suggest a total change of search habits. With more results to choose from, it is only to be expected that more results are chosen for closer inspection; but since the users have been instructed to abandon the session whenever they felt probable improvements weren’t worth the extra effort, they had more chances of seeing good results and satisfying their information needs early on, causing the increase in clicks to be a moderate

0.5 per session.

–  –  –

Left – randomized Right – original Figure 9.14. Sample click trajectories. The arrow coming from the left shows the first click, further arrows indicate later clicks.

9.4 Ranking Algorithm Comparison Apart from evaluating user behavior, it seems interesting to evaluate the performance of the two result list types. Obviously, the original ranking would be expected to perform much better than the randomized list in preference as well as satisfaction. When I submitted a preliminary paper on this topic (Sirotkin 2011) to a conference, an anonymous reviewer raised a logical issue: “It seems obvious that users always prefer the ranked lists. I would be interested in knowing whether this was indeed the case, and if so, whether this ‘easy guess’ could have an influence on the evaluation conducted.” We can approach the matter of rankings coming from the log statistics; namely, with an evaluation of clicks depending on the

–  –  –

Figure 9.15.

Click ranks for original and randomized rankings.

Obviously, clicks are not the best indication of the raters’ evaluation of the two result list types – especially if we actually asked them to explicitly state their satisfaction and preference levels. Figure 9.16 shows the user preference for the original and randomized result lists; and the results are not what was expected by the reviewer (and myself). Of course, the original result list is preferred most of the time; but in over a quarter of all cases, the lists were deemed to be of comparable quality and in almost 15% of judgments, the randomized result list was actually preferred to the original. The reasons for this are not quite easy to fathom. One simple explanation would be that the quality of the original list was just not good enough, so that a randomization might create a better sequence of results about one in six times.

However, a detailed examination of queries where the randomized result list is preferred at least by some users shows another pattern. Almost all of those queries are informational and transactional and stated quite broadly, so that the number of possibly interesting hits can go into the thousands or tens of thousands; some examples are shown in Table 9.3. The case of the query “korrellation”, though not representative statistically, may nevertheless be typical. It is a factual query, for which any web site with an authoritative feel mentioning the word will probably be not only highly relevant, but also sufficient. The result list preference might be Not that this notion needs much support, mind you; it is universally accepted.

–  –  –

The difference between result list types becomes smaller still if we switch from preference to satisfaction (shown in Figure 9.17). While the original result lists had a higher proportion of queries which satisfied most raters (average satisfaction of over 0.5 was 80% versus 64%), the randomized option had a higher proportion of average-quality result lists. Result lists which entirely failed to satisfy the users were equally uncommon (17% and 18%, respectively). This can be taken to mean that, while the original result list is indeed more satisfactory, many (or, In case you wonder: it’s spelled “Korrelation”.



Pages:     | 1 |   ...   | 12 | 13 || 15 | 16 |   ...   | 26 |


Similar works:

«Schule und Absentismus Zur Bedeutung schulischer Kontextfaktoren für die Erklärung und Vorhersage von Schulabsentismus Dissertation zur Erlangung der Doktorwürde an der Philosophischen Fakultät der Universität Freiburg (Schweiz) Genehmigt von der Philosophischen Fakultät auf Antrag der Professoren Frau Margrit Stamm (1. Gutachterin) und Herrn Fritz Staub (2. Gutachter) Freiburg, den 08. Juni 2009 Prof. Jean-Michel Spieser, Dekan Christine Catrin Sälzer (Ruckdäschel) aus DEUTSCHLAND...»

«CONTROLS ON ALONG-STRIKE VARIATIONS IN STRATIGRAPHY AND PROVENANCE OF A SUCCESSOR FORELAND BASIN SYSTEM, JURASSIC– CRETACEOUS EVOLUTION OF THE SOUTHERN PATAGONIAN ANDES AND THE MAGALLANES–AUSTRAL BASIN A DISSERTATION SUBMITTED TO THE DEPARTMENT OF GEOLOGICAL SCIENCES AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FUFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Matthew A. Malkowski June 2016 © 2016 by Matthew Alan Malkowski. All Rights...»

«ABSTRACT Title of Dissertation: TRANSCRIPT PROFILING AS A METHOD TO STUDY FRUIT MATURATION, TREERIPENING, AND THE ROLE OF “TREE FACTOR” IN ‘GALA’ AND ‘FUJI’ APPLES Shu-fei Lin, Doctor of Philosophy, 2005 Dissertation directed by: Professor Christopher S. Walsh Department of Natural Resource Sciences and Landscape Architecture ‘Gala’ and ‘Fuji’ are two high-quality apple (Malus domestica Borkh) cultivars. Their fruits mature and tree-ripen over a long period of time, and are...»

«Seismic structure, gas hydrate, and slumping studies on the Northern Cascadia margin using multiple migration and full waveform inversion of OBS and MCS data by Subbarao Yelisetti M.Sc. (Physics), University of Hyderabad, Hyderabad, India 2006 M.Tech. (Mineral Exploration), University of Hyderabad, Hyderabad, India 2008 A Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY in the School of Earth and Ocean Sciences c Subbarao Yelisetti, 2014...»

«31°! NV14 t\o, SHARING THE LIGHT: FEMININE POWER IN TUDOR AND STUART COMEDY DISSERTATION Presented to the Graduate Council of the University of North Texas in Partial Fulfillment of the Requirements For the Degree of DOCTOR OF PHILOSOPHY By Jane Hinkle Tanner, B.A., M.A. Denton, Texas May, 1994 31°! NV14 t\o, SHARING THE LIGHT: FEMININE POWER IN TUDOR AND STUART COMEDY DISSERTATION Presented to the Graduate Council of the University of North Texas in Partial Fulfillment of the Requirements...»

«EFFECT OF TRUST AND RISK ON IT OUTSOURCING RELATIONSHIP QUALITY AND OUTSOURCING SUCCESS A Thesis Submitted to the Faculty of Drexel University by Narasimha Paravastu in partial fulfillment of the requirements for the degree of Doctor of Philosophy January, 2007 ii © Copyright 2007 Narasimha Paravastu. All Rights Reserved. iii DEDICATIONS TO MY GRAND FATHER LATE P.S.S. RAMANUJA SWAMI WITH REVERENCE AND LOVE iv ACKNOWLEDGMENTS I would like to deeply thank many people for helping me during the...»

«DATABASE SUPPORT FOR TOP-DOWN PROTEOMICS BY YONG-BIN KIM DISSERTATION Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate College of the University of Illinois at Urbana-Champaign, 2010 Urbana, Illinois Doctoral Committee: Professor Geneva Belford, Chair Professor Neil L Kelleher Professor Jiawei Han Professor Chengxiang Zhai i Abstract Top-down proteomics is a revolutionary application for the identification and...»

«International Journal of Qualitative Methods 2 (2) Spring, 2003 ‘Clear as Mud’: Toward Greater Clarity in Generic Qualitative Research Kate Caelli, Lynne Ray, & Judy Mill Kate Caelli, RN Ph.D., Associate Professor, University of Alberta, Edmonton, Alberta, Canada Lynne Ray, RN Ph.D., Assistant Professor, University of Alberta, Edmonton, Alberta, Canada Judy Mill, RN Ph.D., Assistant Professor, University of Alberta, Edmonton, Alberta, Canada Abstract: We have observed a growth in the number...»

«Dynamic Walking Principles Applied to Human Gait by Steven H. Collins A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Mechanical Engineering) in The University of Michigan Doctoral Committee: Associate Professor Arthur D. Kuo, Chair Professor Richard B. Gillespie Professor Karl Grosh Assistant Professor Riann A. Palmieri © Steven H. Collins iii Contents List of Figures iv Abstract viii Chapter 1 Introduction 1 1.1 Motivations 1 1.2...»

«SPIRITUALISM, SCIENCE, AND THE SUPERNATURAL IN MID-VICTORIAN BRITAIN* RICHARD NOAKES I: INTRODUCTION In December 1861, a few months after he published the first instalment of his supernatural masterpiece, A Strange Story, the distinguished novelist Edward Bulwer Lytton told his friend John Forster that he wished to make philosophers inquire into [spirit manifestations] as I think Bacon, Newton, and Davy would have inquired. There must be a natural cause for them — if they are not purely...»

«NOT NOTHINGNESS: PETER BROOK’S ‘EMPTY SPACE’ AND ITS ARCHITECTURE Negin Djavaherian School of Architecture McGill University, Montreal, Canada August 2012 A thesis submitted to McGill University in partial fulfilment of the requirements of degree of Doctor of Philosophy © Negin Djavaherian, 2012 To my parents, Parvaneh and Hassan, and to Alois ABSTRACT The thesis explores architectural potential and experience in the theatre of Peter Brook (1925-). The importance of his thought, writings...»

«Novel Sol-Gel Nanoporous Materials, Nanocomposites and Their Applications in Bioscience A Thesis Submitted to the Faculty of Drexel University by Zhengfei Sun in partial fulfillment of the requirements for the degree of Doctor of Philosophy September 2005 © Copyright 2005 Zhengfei Sun. All Rights Reserved. 2 Dedications This dissertation is dedicated to my parents, Mr. Chongzhen Sun and Mrs. Xiuqing Nie for their encouragement, support and love. 3 Acknowledgments In retrospect as I approach...»





 
<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.