FREE ELECTRONIC LIBRARY - Dissertations, online materials

Pages:     | 1 |   ...   | 9 | 10 || 12 | 13 |   ...   | 19 |

«Learning Implicit User Interest Hierarchy for Web Personalization by Hyoung-rae Kim A dissertation submitted to Florida Institute of Technology in ...»

-- [ Page 11 ] --

To understand the difficulty of improving Google’s ranking, we calculate the number of multiples needed to achieve one SD from the average. Formally, the number of multiples is defined as (Avg.+SD)/Avg. The larger the number of multiples indicates more difficulty in beating Google with statistically significance. The number of multiples for Top 1 is 3.35, Top 5 is 2.79, Top 10 is 2.72, Top 15 is 2.71, and Top 20 is 2.72. In information retrieval doubling or tripling the precision for a large variance like Google’s is rare. From our calculated number of multiples, we need to at least double or triple the precision to achieve statistically significant improvement over Google’s ranking.

To further demonstrate the difficulty, we applied the same t-Test to precision values from Google and Random (P=0.134, t=2.079 for Top 1; P=0.179, t=2.079 for Top 5; P=0.062, t=2.079 for Top 10; P=0.035, t=2.079 for Top 15; P=0.024, t=2.079 for Top 20). We found that, though Google’s improvement over random is statistically significant for Top 15 and 20, it is not statistically significant for Top 1, 5, and 10. Precision/Recall Analysis Precision/recall analysis visualizes the performance of each method in graphs as shown in Figure 23. The x-axis is recall and y-axis is precision. The line closer to the upper-right corner has higher performance. WS and US are closer to the upper-right corner than Google except with recall values lower than.15 (after Top 5). In general, WS outperforms US and Random. Varying Personal Weight The performance of WS may depend on how much we weigh the personal page score over the public page score. The parameter c in Section 6.2.4 represents the weight for the personal page score. For example, c=0.9 means the page is scored by 90% of personal page score and 10% of public page score. We experimented with c={0.9, 0.7, 0.5, 0.3, and 0.1} and measured the corresponding precision and recall. The results are plotted in Figure 24 and each line represents a different c value. The line closer to the upper right corner indicates higher performance. c=0.9 has lowest precision. c=0.1 achieved the second lowest precision except for recall values lower than 0.2. Figure 25 enlarges the scale of recall between 0 through 0.2 in Figure 24. It is still not clear which one is higher than the others except the line with c=0.9; however, c=0.5 in general seems to show the highest

–  –  –

potentially interesting web page data set and present the results in Table 21. The values in each cell are the average of 22 search terms’ precision values. The ways of reading this table and the table for interesting web pages are similar.

WS showed higher performances than Google in all five Top links. All five precisions achieved by WS are the highest values as well. The percentages of improvements are between 3% and 17%. Random showed the lowest in all five Top links.

The reason for the improvement of WS is, we predict, because the UIH that was derived from a user’s bookmarks supported the user’s potential interest. It might be difficult for Google that used the global/public interest to predict individual user’s broad potential interests.

We also counted what search terms yielded higher precision with WS than with Google. WS achieved higher performance for 12 search terms (55%), Google made for 8 search terms (36%), and they were even for 2 search terms (9%) out of 22 search terms.

The reason for the low performance of some search terms might be because there is no relation between his/her bookmarks and the search terms. Statistical Significance The t-Test between the two groups of 22 individual search terms with WS and Google showed no statistically significant difference with 95% confidence for any Top link (P=0.665 and t=2.079 for Top 1; P=0.115 and t=2079 for Top 5; P=0.466 and t=2.079 for Top 10; P=0.580 and t=2.079 for Top 15; P=0.347 and t=2.079 for Top 20).

We analyze the variance in the precision values to understand why the improvements are not statistically significant. The graph in Figure 26 illustrates the average and the standard deviation (SD) of 22 search terms’ precisions with Google to the five Top links. Variance was large for Top 1 and decreased as more links were considered.

In order to estimate how difficult it is to achieve statistically significant improvements over Google, we calculate the number of multiples, which is (Avg.+SD)/Avg. The number of multiples for Top 1 is 2.77, for Top 5 is 2.53, for Top 10 is 2.59, for Top 15 is 2.58, for Top 20 is 2.60. From our calculated number of multiples, we need to at least double the precision for achieving statistically significant improvement on potentially interesting web pages. Precision/Recall Analysis The results from precision/recall graph for potentially interesting web pages in Figure 27 and the Top link analysis in Table 21 are similar. WS was closer to the upperright corner than Google, US, and Random over all. WS outperformed other methods on

–  –  –

In order to see the improvement of WC’s performance with different personal weights, we varied the parameter for the weight: c = {0.9, 0.7, 0.5, 0.3, and 0.1}. The results are plotted in Figure 28. c=0.9 draw the lowest line; c=0.7 looks the second lowest line. Figure 29 enlarges the scale of recall in Figure 28. It looks clear that c=0.5 in general seems to show the highest performance. Therefore, we chose c=0.5 for the weight of personal page score. The weights for personal page score with both data sets are the same.

5.5. Summary The purpose of this research is to devise a new method of ranking web search results to serve each individual user’s interests. A user profile called UIH is learned from his/her bookmarks. For scoring a term in a web page that matches a term in the UIH, we identified four characteristics: the depth of tree node in the UIH that contains the term, the length of the term, the frequency of the term in the web page, and the html formatting used for emphasis. Our approach uses the terms filtered though stop list in web pages (Rasmussen, 1992). This approach removes the process of selecting important/significant terms unlike other information retrieval techniques (Pazzani and Billsus, 1997). Therefore, we can handle smaller data set and reduce the danger of eliminating new important terms.

We evaluated methods based on how many interesting web pages or potentially interesting web pages each algorithm found within certain number of top links (Bharat and Mihaila, 2001). Traditional precision/recall graphs (van Rijsbergen, 1979) were also used for evaluation. We counted which search term showed higher performances with WS than with Google as well.

We compared four ranking methods: Google, Random, US, and WS. Google is the most popular search engine and posts the best ordering results currently. Random method was chosen to see the improved performance of Google and our new methods. We used two data sets: interesting web pages that are relevant to the user search term and potentially interesting web pages that could be relevant in the future. On interesting web pages, the Top link analysis indicated WS achieved at least 13% higher precision than Google for Top 10, 15 and 20 links on average. WS outperformed US and Random in general also.

The precision/recall analysis showed that WS outperformed Google except with recall values lower than.15. Out of 22 search terms, WS achieved higher precision than Google for 12 search terms (55%). On potentially interesting web pages, WS achieved the highest performance in all five Top links with improvement over Google between 3% and 17%. It also outperformed the other methods in the precision/recall graph. The analysis of individual search terms yielded the same results as on interesting web pages. A weight of

0.5 for the personal ranking seemed to show the highest performance on both data sets.

Therefore, these results conclude that WS can provide more accurate ranking than Google on average. The improvement of WS was not statistically significant because the precision values of Google had large variance. The reason for the low performance of some search terms might be because there is no relation between his/her bookmarks and the search terms. We may be able to relieve this problem by incorporating interesting web pages based on implicit interest indicators such as mouse movements (Kim and Chan, 2005) in addition to bookmarking.

During the experiment, we observed that users do not tend to measure index pages as "Good". It is because index pages usually contain long lists of hyperlinks with little description for a user to find interesting. To identify index pages automatically, we count the number of "outside words" (the text outside anchor tags), which usually provide the subject content. However, our approach of penalizing the index pages did not make much improvement in our initial experiments. We will examine this approach further in the future.

Measuring the precision with clustered search results like the results from Vivisimo (2005) may show different performance from Google’s. In a clustered search engine, a link that does not belong to the top 10 in whole can belong to the top 10 in some sub clusters. The clustered search results provide users easier access to the interesting links after Top 10 or 20. Since WS showed higher performance for those links than Google as shown in Section, we assume that our method may get higher performance with clustered search engines. We may be able to make a search engine more interactive using a UIH. For example when a user’s query resides in an intermediate node in his/her UIH, we can ask a user to choose more specific interests providing the terms in the child nodes, or

–  –  –

Implicit Indicators for Interesting Web Pages A user’s interest in a web page can be estimated by unobtrusively (implicitly) observing his or her behaviour rather than asking for feedback directly (explicitly). Implicit methods are naturally less accurate than explicit methods, but they do not waste a user’s time or effort. Implicit indicators of a user’s interests can also be used to create models that change with a user’s interests over time. Research has shown that a user’s behaviour is related to his/her interest in a web page. We evaluate previously studied implicit indicators and examine the time spent on a page in more detail. For example, we observe whether a user is really looking at the monitor when we measure the time spent on a web page. Our results indicate that the duration is related to a user’s interest of a web page regardless a user’s attention to the web page. The thicker features in Figure 30 describe the position of

–  –  –

Figure 30. Implicit indicator for interesting web pages

6.1. Implicit Interest Indicators The time spent on a web page is one of the most intuitive candidates for user interest indicators. This paper thoroughly examines whether duration is related to a user’s interest. This section describes duration, as well as other user interest indicators that will be examined. The reason why each indicator is chosen is explained and how each indicator is measured is described.

6.1.1. Complete Duration A user may tend to spend more time on pages that he or she finds interesting, so we record the duration spent on a web page. The complete duration is defined as the time interval between the time a user opens and leaves a web page. Some web pages contain many images that delay the downloading time, so we start measuring the duration after the entire page is loaded. Thus, the complete duration won’t be affected by the connection speed, the amount of Internet traffic, or the CPU speed. The complete duration for a web page can be calculated by subtracting the time of finishing downloading the current web page from the time of leaving the web page. The complete duration is different from the duration used by Jung (2001). His duration includes the downloading time of a web page.

6.1.2. Active Window Duration Most modern operating systems allow a user to multitask, or run several applications at the same time. A user may write a report or chat while browsing a web page. Those other applications can be unrelated to the contents of a web page. If a user spent one hour writing a homework paper with a web browser minimized, the complete duration of the web page could be one hour. This is very likely to provide erroneous indications of user interest. In order to avoid being affected by this problem, we determine whether a web browser is active or not. The time that a web browser is inactive is subtracted from the complete duration. We call this duration active window duration since we count the time only when a web browser is active.

6.1.3. Look At It Duration Users are not always reading a web page when the web browser is active. They can easily be talking to friends or having a coffee break, while the web browser is active. The active window duration can easily be more than 30 minutes if a user leaves the browser active and goes for a coffee break. We may be able to detect the user’s absence by detecting the action of mouse movement. However, a better solution is to use a camera that detects a user’s face orientation. A camera can even check if a user is looking at the web browser or if his attention is diverted. This duration will be more accurate than the active window duration in terms of checking user’s attention to a web page. Since this duration counts the time that a user is looking at the web browser, we call it look at it duration. The look at it duration can be calculated by subtracting the time when a user does not look at the browser from active window duration.

6.1.4. Distance of Mouse Movement Many people move their mouse while reading the contents of a web page. Mouse movement can occur while looking at an interesting image, or when pointing at interesting objects. We hypothesize that the more distance a mouse moves, the more a user be interested in the web page. This indicator was also examined by Jung (2001). Our distance is a little bit different from his in a sense of detecting overall mouse movement. He counted on the mouse movement only when the mouse point is inside the active browser. The distance of mouse movement is detected by its x and y coordinates on a monitor every 100 milliseconds. The formula is

–  –  –

Pages:     | 1 |   ...   | 9 | 10 || 12 | 13 |   ...   | 19 |

Similar works:

«Voluntary Involuntary Disclosure By Randall J. Hucks A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Business Administration) in the University of Michigan 2015 Doctoral Committee: Professor Raffi J. Indjejikian, Chair Associate Professor Venkatesh K. Nagar Professor Amiyatosh K. Purnanandram Associate Professor Catherine Shakespeare Professor J. Frank Yates Acknowledgements I thank the members of my dissertation committee: Raffi...»

«Masterarbeit zur Erlangung des akademischen Grades Master of Arts der Philosophischen Fakultät der Universität Zürich Die Hausnamen der Schaffhauser Altstadt Verfasserin: Linda Hatt Matrikel-Nr.: 08-716-896 Referentin: Prof. Dr. Elvira Glaser Betreuer: Dr. Martin Graf Deutsches Seminar Abgabedatum: 15.12.2014 Masterarbeit Universität Zürich Hausnamen der Schaffhauser Altstadt Deutsches Seminar Linda Hatt Prof. Dr. E. Glaser / Dr. M. Graf Inhaltsverzeichnis 1. EINLEITUNG 1.1. VORWORT...»

«Techniques for Model Inference and Bug Finding by Chia Yuan Cho A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate Division of the University of California, Berkeley Committee in charge: Professor Dawn Xiaodong Song, Chair Professor Sanjit A. Seshia Professor John Chuang Fall 2013 Techniques for Model Inference and Bug Finding Copyright 2013 by Chia Yuan Cho 1 Abstract Techniques for Model Inference and...»

«Analyzing Challenges and Opportunities of the Implementation of e-Government Initiatives for Development through the Lens of the Capability Approach: Case Studies from Mozambique By: Gertrudes Adolfo Macueve Submitted as partial fulfilment of the requirements for the degree of Doctor of Philosophy (Ph. D) Faculty of Mathematics and Natural Sciences Department of Informatics University of Oslo Norway August 14, 2008 Dedication To my parents, Deolinda Noé and Adolfo A. Macueve TABLE OF CONTENTS...»

«Ghrelin The Defender of Fat in the Face of Stress Implications in Obesity Treatment Zachary R. Patterson A thesis submitted to the Faculty of Graduate & Postdoctoral Affairs in partial fulfillment of the requirements of the degree of Doctor of Philosophy in Neuroscience Carleton University Ottawa, ON Copyright © 2013 Zachary R. Patterson ii 0.1 Abstract Chronic activation of the stress response can lead to a number of metabolic disturbances such as obesity, metabolic syndrome, Type II...»

«EXPLORING THE DIVERSITY OF GENTRIFICATION AND THE ROLE OF GENDER IN HONG KONG, 1986 TO 2006 By Minting Ye A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Geography –Doctor of Philosophy 2014 ABSTRACT EXPLORING THE DIVERSITY OF GENTRIFICATION AND THE ROLE OF GENDER IN HONG KONG, 1986 TO 2006 By Minting Ye Gentrification is restructuring the geography of cities all over the world (Clark, 2005; Hackworth and Smith, 2001; Lees,...»

«DEVELOPMENT OF LASER SPECTROSCOPY FOR ELEMENTAL AND MOLECULAR ANALYSIS by YUAN LIU B.S. Information Engineering, Zhejiang University, 2006 M.S. Optics, University of Central Florida, 2010 A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the College of Optics and Photonics at the University of Central Florida Orlando, Florida Spring Term 2013 Major Professor: Martin C. Richardson © 2013 Yuan Liu ii ABSTRACT Laser-Induced Breakdown...»

«Clear and Distinct Perception in Descartes's Philosophy by Shoshana Rose Smith B.A. University of California Los Angeles, 1995 A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Philosophy in the GRADUATE DIVISION of the UNIVERSITY OF CALIFORNIA, BERKELEY Committee in charge: Professor Janet Broughton, Chair Professor Hannah Ginsborg Professor Anthony A. Long Spring 2005 Clear and Distinct Perception in Descartes's Philosophy Copyright...»

«DYNAMIC RESOURCE MANAGEMENT IN RESOURCE-OVERBOOKED CLOUD DATA CENTERS By Faruk Caglar Dissertation Submitted to the Faculty of the Graduate School of Vanderbilt University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY in Computer Science August, 2015 Nashville, Tennessee Approved: Dr. Aniruddha S. Gokhale Dr. Douglas C. Schmidt Dr. Gautam Biswas Dr. Christopher J. White Dr. Akos Ledeczi To my beloved wife Fatma for her patience, encouragement, and support...»

«Top ten blogs November 2015 by Bart Nooteboom 19. Beyond nihilism: Imperfection on the move published 10-8-2012 The philosopher Nietzsche dealt a death blow to belief in old absolutes, raising the spectre of nihilism. God is dead, and truth, morality and beauty have become subjective, relative and evanescent. There is weak nihilism: regretful loss of belief, and strong nihilism: no longer seeing such belief as desirable. Could we not see the loss of old absolutes as a discarding of shackles, an...»

«Terraforming: An Investigation of the Boundaries Between Science and Hard Science Fiction A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Peter Allon Schmidt, Jr.IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Robert Seidel, Advisor May 2010 © Peter Allon Schmidt, Jr. 2010 i Acknowledgements Outside of numerous friends, family and colleagues who have helped me along in the completion of this project, three in...»

«University of Alberta Stone Bodies in the City: Unmapping Monuments, Memory and Belonging in Ottawa by Tonya Katherine Davidson A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of the requirements for the degree of Doctor of Philosophy Sociology ©Tonya Katherine Davidson Fall 2012 Edmonton, Alberta Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private,...»

<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.