WWW.DISSERTATION.XLIBX.INFO
FREE ELECTRONIC LIBRARY - Dissertations, online materials
 
<< HOME
CONTACTS



Pages:     | 1 |   ...   | 23 | 24 || 26 |

«On Search Engine Evaluation Metrics Inaugural-Dissertation zur Erlangung des Doktorgrades der Philosophie (Dr. Phil.) durch die Philosophische ...»

-- [ Page 25 ] --

- 177 including all sessions, not just the ones where a user preference exists, and providing more details, it allows for a finer evaluation, and consideration of more precise goals. In Section 12.2.3, I mentioned personalization efforts; if such mechanisms could be implemented and tested, they might provide an important bridge between the relatively low results of differentuser evaluation and same-user evaluation. In Section 12.2.4, a useful possible study was suggested, which would provide user groups with different relevance scales and different instructions, and could show if an intuitive six-point relevance scale is really better than a binary or three-point one; and which subcategory of the latter scales produces the most accurate preference predictions.

Another interesting topic is constituted by log-based, and in particular click-based, metrics. In Chapter 11, I have explained that the study layout was not suited for most click metrics proposed and used in the last years. They do not provide absolute scores for a result or result list, but rather, given a result list and click data, construct a result list that is postulated to be of better quality. Thus, we cannot take two result lists and compare them using, say, the “click skip above” model (see Chapter 5); instead, we have to take one result list, collect the log data, then construct a second result list, and obtain a user preference between the two lists.

Obviously, there also possible research topics apart from those already mentioned in this study. Perhaps the most intriguing of them concerns result snippets. If you recall, one of the ratings required from the evaluators in the present study was whether the result descriptions they encountered in the result lists were “good”, that is, whether the user would click on this result for this query. I have not performed the evaluation (yet), as this is a large, separate task with many possibilities of its own.

Snippets have been the object of some research lately; however, it mostly focused on snippet creation (e.g. Turpin et al. 2007; Teevan et al. 2009). There has also been research on snippetbased evaluation (Lewandowski 2008; Höchstötter and Lewandowski 2009), which mostly focused on the evaluation of snippets on their own, as well as comparing ratings for snippets and documents. But there is more to be done. One possibility for which the PIR framework seems well suited is combining snippet and document judgment.

It is well established that the user only examines a small subset of available results, even within a cut-off rank of, say, ten. One method for determining which results will be clicked on is evaluating the snippets. It stands to reason that if a user does not regard a snippet as relevant, he will not click on it; he will not see the document itself; and he will not gain anything from the result. 113 Thus, the results with unattractive snippets will be assigned a relevance score of zero. This is an approach that has received attention (Turpin et al. 2009), although not as much as its possible impact would warrant.

In the next step, there are at least two variants of dealing with unattractive snippets. We can consider them as having no influence on the user at all, and discard all documents which will Except for the case where the snippet actually contains the sought-after information, as may be the case with factual queries. This case can be accounted for by using three snippet relevance categories: “unpromising”, “promising” and “useful on its own”.

- 178 not be seen; in this case, the rank of the following documents will move up. Or we can assume that the results do not provide any benefits to the user, and furthermore distract him from the possibly more useful results; then, we would just set the relevance scores of the documents to zero. PIR should be able to provide an answer which of these models (or perhaps some other) can better predict user preferences, and whether any of them performs better than the current model which does not consider snippets at all.

There are, of course, many more open questions out there, such as different result types (e.g.

image or news search). I think that the framework introduced in this thesis, combining direct, session-based user opinion as a reference point and variation of as much data as possible to assess not only one metric, but as many parameters as needed, can help answer some of them.

–  –  –

In this thesis, I describe...

 an overview of the metrics used in search engine evaluation;

 the theoretical issues which can be raised about them;

 previous studies which evaluate evaluation metrics.

I introduce...

 a meta-evaluation measure, the Preference Identification Ratio (PIR), which captures a metric’s ability to correctly recognize explicitly stated user preferences;

 an evaluation method which varies metrics and parameters to allow to use one set of data to run dozens or hundreds of different evaluations;

 a new category of query, called “meta-query”, which requires information on the search engine itself and cannot produce a “good” or “bad” result list.

I find that...

 randomizing the top 50 results of a leading search engine (Yahoo) leads to result lists that are regarded to be equal or superior to the original ones for over 40% of queries;





 after the first five ranks, further results of a leading search engine (Yahoo) are on average no better than the average of the first 50 results;

 a cut-off rank slightly smaller than 10 does not only reduce the data-gathering effort but in most cases actually improves the evaluation accuracy;

 the widely-used Mean Average Precision metric is in most cases a poor predictor of user preference, worse than Discounted Cumulated Gain and not better than Precision;

 the kind of discount function employed in a metric can be crucial for its ability to predict user preference;

 a six-point relevance scale intuitively understandable to raters produces better results than a binary or three-point scale;

 session duration and click count on their own are not useful in predicting user preference

I find preliminary evidence that...

 (normalized) Discounted Cumulative Gain and a variety of Estimated Search Length are, with appropriate discount functions, the best predictors of user preference;

 if a binary or three-point relevance scale is used, certain rater instructions as to what the single ratings mean can significantly influence user preference prediction quality;

 depending on the metric and other evaluation parameters, cut-off ranks as low as 4 may provide the best effort-quality ratio.

–  –  –

Al-Maskari, A., M. Sanderson and P. Clough (2007). The Relationship Between IR Effectiveness Measures and User Satisfaction. Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, The Netherlands, ACM: 773-774.

Al-Maskari, A., M. Sanderson, P. Clough and E. Airio (2008). The Good and the Bad System: Does the Test Collection Predict Users' Effectiveness? Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Singapore, Singapore. New York, ACM: 59-66.

Ali, K., C.-C. Chang and Y. Juan (2005). Exploring Cost-Effective Approaches to Human Evaluation of Search Engine Relevance. Advances in Information Retrieval(ed.). Berlin; Heidelberg, Springer: 360-374.

Allan, J., B. Carterette and J. Lewis (2005). When Will Information Retrieval Be "Good Enough"?

Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, Brazil. New York, ACM: 433-440.

Alpert, J. and N. Hajaj (2008). We Knew the Web Was Big... Retrieved 2010-10-04, from http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html.

archive.org (2006). Wayback Machine - dmoz.org 2005-06-22. Retrieved 2010-10-05, from http://web.archive.org/web/20050622050456/http://www.dmoz.org/.

Asano, Y., Y. Tezuka and T. Nishizeki (2008). Improvements of HITS Algorithms for Spam Links.

Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management, Huang Shan, China, Springer-Verlag: 200-208.

Baeza-Yates, R. (2004). Query Usage Mining in Search Engines. Web Mining: Applications and Techniques. A. Scime (ed.). Hershey, PA, IGI Publishing: 307-321.

Baeza-Yates, R., C. Hurtado and M. Mendoza (2005). Query Recommendation Using Query Logs in Search Engines. Current Trends in Database Technology - EDBT 2004 Workshops. W. Lindner, M. Mesiti, C. Türker, Y. Tzitzikas and A. Vakali (ed.). New York, Springer: 395-397.

Bailey, P., N. Craswell, I. Soboroff, P. Thomas, A. P. d. Vries and E. Yilmaz (2008). Relevance Assessment: Are Judges Exchangeable and Does It Matter. Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Singapore, Singapore, ACM: 667-674.

Barbaro, M. and T. Zeller Jr. (2006). A Face Is Exposed for AOL Searcher No. 4417749. Retrieved 31.10.2011.

Barboza, D. (2010). Baidu’s Gain from Departure Could Be China’s Loss. The New York Times. New York: B1.

Berman DeValerio (2010). Cases: AOL Privacy. Retrieved 31.10.2011.

Breithut, J. (2011). Drei gegen Google. Retrieved 30.10.2011.

- 181 Brin, S. and L. Page (1998). The Anatomy of a Large-scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems 30(1-7): 107-117.

Broder, A. (2002). A Taxonomy of Web Search. SIGIR Forum 36(2): 3-10.

Buckley, C. and E. M. Voorhees (2000). Evaluating Evaluation Measure Stability. Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Athens, Greece, ACM: 33-40.

Buckley, C. and E. M. Voorhees (2004). Retrieval Evaluation with Incomplete Information.

Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Sheffield, United Kingdom, ACM: 25-32.

Carbonell, J. and J. Goldstein (1998). The Use of MMR, Diversity-based Reranking for Reordering Documents and Producing Summaries. Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Melbourne, Australia, ACM: 335-336.

Chapelle, O., D. Metzler, Y. Zhang and P. Grinspan (2009). Expected Reciprocal Rank for Graded Relevance. Proceeding of the 18th ACM Conference on Information and Knowledge Management. Hong Kong, China, ACM: 621-630.

Chu, H. (2011). Factors Affecting Relevance Judgment: A Report from TREC Legal Track. Journal of Documentation 67(2): 264-278.

Clarke, C. L. A., N. Craswell and I. Soboroff (2009). Overview of the TREC 2009 Web Track. TREC 2009, Gaithersburg, Maryland, NIST.

Clarke, C. L. A., M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher and I. MacKinnon (2008). Novelty and Diversity in Information Retrieval Evaluation. Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Singapore, Singapore, ACM: 659-666.

Clarke, S. J. and P. Willett (1999). Estimating the Recall Performance of Web Search Engines. Aslib Proceedings 49(7): 184-189.

Cleverdon, C. W. and M. Keen (1968). Factors Determining the Performance of Indexing Systems.

Cranfield, England, Aslib Cranfield Research Project.

CNET News (2006). Yahoo's Steady Home Page Transformation. Retrieved 2010-10-05, from http://news.cnet.com/2300-1032_3-6072801.html.

comScore (2009). comScore Releases June 2009 U.S. Search Engine Rankings. Retrieved 23.10.2009, from http://www.comscore.com/Press_Events/Press_releases/2009/7/comScore_Releases_June_ 2009_U.S._Search_Engine_Rankings.

comScore (2010). Global Search Market Draws More than 100 Billion Searches per Month. Retrieved 2010-10-05, from http://www.comscore.com/Press_Events/Press_Releases/2009/8/Global_Search_Market_Dr aws_More_than_100_Billion_Searches_per_Month.

Cooper, W. S. (1968). Expected Search Length: A Single Measure of Retrieval Effectiveness Based on the Weak Ordering Action of Retrieval Systems. American Documentation 19(1): 30-41.

Craswell, N., O. Zoeter, M. Taylor and B. Ramsey (2008). An Experimental Comparison of Click Position-bias Models. Proceedings of the International Conference on Web Search and Web Data Mining, Palo Alto, California, USA, ACM.

- 182 Croft, B., D. Metzler and T. Strohman (2010). Search Engines: Information Retrieval in Practice, Addison-Wesley Publishing Company.

Dang, H. T., J. Lin and D. Kelly (2006). Overview of the TREC 2006 Question Answering Track.

Proceedings of the 15th Text REtrieval Conference, Gaithersburg, Maryland.

Davison, B. D., A. Gerasoulis, K. Kleisouris, Y. Lu, H.-j. Seo, W. Wang and B. Wu (1999). DiscoWeb:

Applying Link Analysis to Web Search. Poster Proceedings of the Eighth International World Wide Web Conference, Elsevier: 148-149.

De Beer, J. and M.-F. Moens (2006). Rpref: A Generalization of Bpref Towards Graded Relevance Judgments. Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Seattle, Washington, USA, ACM: 637-638.

de Kunder, M. (2010). The Size of the World Wide Web. Retrieved 2010-10-04, from http://www.worldwidewebsize.com/.

Della Mea, V., G. Demartini, L. Di Gaspero and S. Mizzaro (2006). Measuring Retrieval Effectiveness with Average Distance Measure (ADM). Information: Wissenschaft und Praxis 57(8): 433-443.

Della Mea, V., L. Di Gaspero and S. Mizzaro (2004). Evaluating ADM on a Four-level Relevance Scale Document Set from NTCIR. Proceedings of NTCIR Workshop 4 Meeting - Supplement 2: 30Diaz, A. (2008). Through the Google Goggles: Sociopolitical Bias in Search Engine Design. Web Search.

A. Spink and M. Zimmer (ed.). Berlin, Heidelberg, Springer. 14: 11-34.

Dou, Z., R. Song, X. Yuan and J.-R. Wen (2008). Are Click-through Data Adequate for Learning Web Search Rankings? Proceeding of the 17th ACM Conference on Information and Knowledge Management. Napa Valley, California, USA, ACM: 73-82.



Pages:     | 1 |   ...   | 23 | 24 || 26 |


Similar works:

«A Dialogue Between Graham Harman and Tristan Garcia Moderated by Rik Peters April 6th, 2013 at Wijsgerig Festival Drift, in the OT301 in Amsterdam, NL Wijsgerig Festival Drift is an annual student-organized philosophy festival in Amsterdam, with close ties to the student association of the philosophy department at the University of Amsterdam (UvA). The programme consists of lectures by philosophers in two or three different halls; live music; poetry. The combination of location (an old film...»

«VERBAL ABUSE AND ADOLESCENT IDENTITIES: MARKING THE BOUNDARIES OF GENDER Dale Margaret Bagshaw A thesis submitted in total fulfilment of the requirements of the degree of Doctor of Philosophy 2004 School of Behavioural Science and School of Social Work The University of Melbourne ABSTRACT This thesis offers a feminist-poststructuralist analysis of the role and function of verbal abuse in the constitution of adolescent masculinities and femininities in boys and girls in Year 9 in a range of...»

«The Role of the Nobility in the Creation of Gallo-Frankish Society In the late fifth and sixth centuries ad by Catrin Mair Lewis Wood, M.Phil. Thesis Submitted to the University of Nottingham for the degree of Doctor of Philosophy, October 2001,...-:. J. ' _ _ _ _.J. ••;. ~ _ Contents Table of Contents Contents Abstract Acknowledgements List of Illustrations Abbreviations Introduction Chapter 1 The Sources................. 8 1.1 Ancient and Contemporary Sources.. 8...»

«AN INVESTIGATION OF LITHOSPHERIC STRUCTURE AND EVOLUTION IN CONVERGENT OROGENIC SYSTEMS USING SEISMIC RECEIVER FUNCTIONS AND SURFACE WAVE ANALYSIS by Joshua A. Calkins A Dissertation Submitted to the Faculty of the DEPARTMENT OF GEOSCIENCES In Partial Fulfillment of the Requirements For the Degree of DOCTOR OF PHILOSOPHY In the Graduate College THE UNIVERSITY OF ARIZONA August 2008 2 THE UNIVERSITY OF ARIZONA GRADUATE COLLEGE As members of the Dissertation Committee, we certify that we have...»

«SAR HIGH SCHOOL Family Handbook 2014-2015 5775 503 West 259th Street Riverdale, NY 10471 718-548-2727 ● sarhighschool.org ● Fax 718-548-4400 Dedicated to the Memory of JJ Greenberg z”l ******************************************************************** This Family Handbook is intended for use by SAR families only. The information in this handbook is not to be used for commercial purposes or solicitations of any kind. We appreciate your cooperation in using this handbook in the spirit in...»

«FROM NEGATIVE ACT TO NEGATIVE RELATIONSHIP: UNDERSTANDING HOW PATTERNS OF ABUSIVE SUPERVISION EMERGE AND DEVELOP OVER TIME By LAUREN S. SIMON A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA © 2011 Lauren S. Simon To my family ACKNOWLEDGMENTS I thank my wonderful family and friends for the support they have given me throughout this process. I also thank my...»

«THE EFFECTS OF THEORIES OF INTELLIGENCE AND TASK OUTCOME ON AVOIDANCE OF PERFORMANCE FEEDBACK By CORINNE A NOVELL A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA © 2012 Corinne A. Novell To my Mom Christine and my twin sister Jen—the most amazing women I know—for the support they have given me throughout my life journeys ACKNOWLEDGEMENTS I would like to thank...»

«REGULATION OF SINK STRENGTH IN DEVELOPING MAIZE FLORETS: IMPLICATIONS FOR SEED SET AND GRAIN YIELD By ANDREA LEE EVELAND A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2008 1 © 2008 Andrea L. Eveland 2 To Joshua Shome, in loving memory. 3 ACKNOWLEDGMENTS Many thanks go to the members of my committee, Donald McCarty, John Davis, Robert Ferl, and Edward Braun, for...»

«THE NATURE OF THEISTIC APOLOGETICS Arthur F. Holmes, Wheaton College The resurgence of evangelical scholarship in recent years has been marked, as one would expect, by a resurgence of literature on and interest in theistic apologetics. Those works which, from the philosophical viewpoint, may be regarded as significant, have adopted certain distinctive methodologies. Consciously or unconsciously they have elected procedures which imply certain views of the nature of theistic apologetics. This...»

«ABSTRACT Title of dissertation: DISENTANGLING SELECTION FROM CAUSATION IN THE EMPIRICAL ASSOCIATION BETWEEN CRIME AND ADOLESCENT WORK Robert John Apel, Doctor of Philosophy, 2004 Dissertation directed by: Professor Raymond Paternoster Department of Criminology and Criminal Justice Researchers consistently find that youths who work longer hours during high school tend to have higher rates of crime and substance use. On the basis of this and other research showing the negative developmental...»

«INVESTIGATION OF NANODIAMONDS WITH SI-V DEFECT CENTERS FOR APPLICATIONS IN FLUORESCENCE-BASED SENSING AND DRUG DELIVERY by SONAL SINGH Dr. AARON CATLEDGE, CHAIR Dr. DERRICK R. DEAN Dr. JOSEPH G. HARRISON Dr. EUGENIA KHARLAMPIEVA Dr. YOGESH K. VOHRA A DISSERTATION Submitted to the graduate faculty of The University of Alabama at Birmingham, in partial fulfillment of the requirements for the degree of Doctor of Philosophy BIRMINGHAM, ALABAMA 2013 INVESTIGATION OF NANODIAMONDS WITH SI-V DEFECT...»

«AUTHORIZATION AND TRUST IN SOFTWARE SYSTEMS A Dissertation Presented to the Faculty of the Graduate School of Cornell University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy by Kevin A. Walsh January 2012 c 2012 Kevin A. Walsh ALL RIGHTS RESERVED AUTHORIZATION AND TRUST IN SOFTWARE SYSTEMS Kevin A. Walsh, Ph.D. Cornell University 2012 Nexus Authorization Logic (NAL) provides a principled basis for specifying and reasoning about credentials and authorization...»





 
<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.