# «Learning Implicit User Interest Hierarchy for Web Personalization by Hyoung-rae Kim A dissertation submitted to Florida Institute of Technology in ...»

To understand the difficulty of improving Google’s ranking, we calculate the number of multiples needed to achieve one SD from the average. Formally, the number of multiples is defined as (Avg.+SD)/Avg. The larger the number of multiples indicates more difficulty in beating Google with statistically significance. The number of multiples for Top 1 is 3.35, Top 5 is 2.79, Top 10 is 2.72, Top 15 is 2.71, and Top 20 is 2.72. In information retrieval doubling or tripling the precision for a large variance like Google’s is rare. From our calculated number of multiples, we need to at least double or triple the precision to achieve statistically significant improvement over Google’s ranking.

To further demonstrate the difficulty, we applied the same t-Test to precision values from Google and Random (P=0.134, t=2.079 for Top 1; P=0.179, t=2.079 for Top 5; P=0.062, t=2.079 for Top 10; P=0.035, t=2.079 for Top 15; P=0.024, t=2.079 for Top 20). We found that, though Google’s improvement over random is statistically significant for Top 15 and 20, it is not statistically significant for Top 1, 5, and 10.

5.4.1.3. Precision/Recall Analysis Precision/recall analysis visualizes the performance of each method in graphs as shown in Figure 23. The x-axis is recall and y-axis is precision. The line closer to the upper-right corner has higher performance. WS and US are closer to the upper-right corner than Google except with recall values lower than.15 (after Top 5). In general, WS outperforms US and Random.

5.4.1.4. Varying Personal Weight The performance of WS may depend on how much we weigh the personal page score over the public page score. The parameter c in Section 6.2.4 represents the weight for the personal page score. For example, c=0.9 means the page is scored by 90% of personal page score and 10% of public page score. We experimented with c={0.9, 0.7, 0.5, 0.3, and 0.1} and measured the corresponding precision and recall. The results are plotted in Figure 24 and each line represents a different c value. The line closer to the upper right corner indicates higher performance. c=0.9 has lowest precision. c=0.1 achieved the second lowest precision except for recall values lower than 0.2. Figure 25 enlarges the scale of recall between 0 through 0.2 in Figure 24. It is still not clear which one is higher than the others except the line with c=0.9; however, c=0.5 in general seems to show the highest

potentially interesting web page data set and present the results in Table 21. The values in each cell are the average of 22 search terms’ precision values. The ways of reading this table and the table for interesting web pages are similar.

WS showed higher performances than Google in all five Top links. All five precisions achieved by WS are the highest values as well. The percentages of improvements are between 3% and 17%. Random showed the lowest in all five Top links.

The reason for the improvement of WS is, we predict, because the UIH that was derived from a user’s bookmarks supported the user’s potential interest. It might be difficult for Google that used the global/public interest to predict individual user’s broad potential interests.

We also counted what search terms yielded higher precision with WS than with Google. WS achieved higher performance for 12 search terms (55%), Google made for 8 search terms (36%), and they were even for 2 search terms (9%) out of 22 search terms.

The reason for the low performance of some search terms might be because there is no relation between his/her bookmarks and the search terms.

5.4.2.2. Statistical Significance The t-Test between the two groups of 22 individual search terms with WS and Google showed no statistically significant difference with 95% confidence for any Top link (P=0.665 and t=2.079 for Top 1; P=0.115 and t=2079 for Top 5; P=0.466 and t=2.079 for Top 10; P=0.580 and t=2.079 for Top 15; P=0.347 and t=2.079 for Top 20).

We analyze the variance in the precision values to understand why the improvements are not statistically significant. The graph in Figure 26 illustrates the average and the standard deviation (SD) of 22 search terms’ precisions with Google to the five Top links. Variance was large for Top 1 and decreased as more links were considered.

In order to estimate how difficult it is to achieve statistically significant improvements over Google, we calculate the number of multiples, which is (Avg.+SD)/Avg. The number of multiples for Top 1 is 2.77, for Top 5 is 2.53, for Top 10 is 2.59, for Top 15 is 2.58, for Top 20 is 2.60. From our calculated number of multiples, we need to at least double the precision for achieving statistically significant improvement on potentially interesting web pages.

5.4.2.3. Precision/Recall Analysis The results from precision/recall graph for potentially interesting web pages in Figure 27 and the Top link analysis in Table 21 are similar. WS was closer to the upperright corner than Google, US, and Random over all. WS outperformed other methods on

In order to see the improvement of WC’s performance with different personal weights, we varied the parameter for the weight: c = {0.9, 0.7, 0.5, 0.3, and 0.1}. The results are plotted in Figure 28. c=0.9 draw the lowest line; c=0.7 looks the second lowest line. Figure 29 enlarges the scale of recall in Figure 28. It looks clear that c=0.5 in general seems to show the highest performance. Therefore, we chose c=0.5 for the weight of personal page score. The weights for personal page score with both data sets are the same.

5.5. Summary The purpose of this research is to devise a new method of ranking web search results to serve each individual user’s interests. A user profile called UIH is learned from his/her bookmarks. For scoring a term in a web page that matches a term in the UIH, we identified four characteristics: the depth of tree node in the UIH that contains the term, the length of the term, the frequency of the term in the web page, and the html formatting used for emphasis. Our approach uses the terms filtered though stop list in web pages (Rasmussen, 1992). This approach removes the process of selecting important/significant terms unlike other information retrieval techniques (Pazzani and Billsus, 1997). Therefore, we can handle smaller data set and reduce the danger of eliminating new important terms.

We evaluated methods based on how many interesting web pages or potentially interesting web pages each algorithm found within certain number of top links (Bharat and Mihaila, 2001). Traditional precision/recall graphs (van Rijsbergen, 1979) were also used for evaluation. We counted which search term showed higher performances with WS than with Google as well.

We compared four ranking methods: Google, Random, US, and WS. Google is the most popular search engine and posts the best ordering results currently. Random method was chosen to see the improved performance of Google and our new methods. We used two data sets: interesting web pages that are relevant to the user search term and potentially interesting web pages that could be relevant in the future. On interesting web pages, the Top link analysis indicated WS achieved at least 13% higher precision than Google for Top 10, 15 and 20 links on average. WS outperformed US and Random in general also.

The precision/recall analysis showed that WS outperformed Google except with recall values lower than.15. Out of 22 search terms, WS achieved higher precision than Google for 12 search terms (55%). On potentially interesting web pages, WS achieved the highest performance in all five Top links with improvement over Google between 3% and 17%. It also outperformed the other methods in the precision/recall graph. The analysis of individual search terms yielded the same results as on interesting web pages. A weight of

0.5 for the personal ranking seemed to show the highest performance on both data sets.

Therefore, these results conclude that WS can provide more accurate ranking than Google on average. The improvement of WS was not statistically significant because the precision values of Google had large variance. The reason for the low performance of some search terms might be because there is no relation between his/her bookmarks and the search terms. We may be able to relieve this problem by incorporating interesting web pages based on implicit interest indicators such as mouse movements (Kim and Chan, 2005) in addition to bookmarking.

During the experiment, we observed that users do not tend to measure index pages as "Good". It is because index pages usually contain long lists of hyperlinks with little description for a user to find interesting. To identify index pages automatically, we count the number of "outside words" (the text outside anchor tags), which usually provide the subject content. However, our approach of penalizing the index pages did not make much improvement in our initial experiments. We will examine this approach further in the future.

Measuring the precision with clustered search results like the results from Vivisimo (2005) may show different performance from Google’s. In a clustered search engine, a link that does not belong to the top 10 in whole can belong to the top 10 in some sub clusters. The clustered search results provide users easier access to the interesting links after Top 10 or 20. Since WS showed higher performance for those links than Google as shown in Section 6.4.1.3, we assume that our method may get higher performance with clustered search engines. We may be able to make a search engine more interactive using a UIH. For example when a user’s query resides in an intermediate node in his/her UIH, we can ask a user to choose more specific interests providing the terms in the child nodes, or

Implicit Indicators for Interesting Web Pages A user’s interest in a web page can be estimated by unobtrusively (implicitly) observing his or her behaviour rather than asking for feedback directly (explicitly). Implicit methods are naturally less accurate than explicit methods, but they do not waste a user’s time or effort. Implicit indicators of a user’s interests can also be used to create models that change with a user’s interests over time. Research has shown that a user’s behaviour is related to his/her interest in a web page. We evaluate previously studied implicit indicators and examine the time spent on a page in more detail. For example, we observe whether a user is really looking at the monitor when we measure the time spent on a web page. Our results indicate that the duration is related to a user’s interest of a web page regardless a user’s attention to the web page. The thicker features in Figure 30 describe the position of

Figure 30. Implicit indicator for interesting web pages

6.1. Implicit Interest Indicators The time spent on a web page is one of the most intuitive candidates for user interest indicators. This paper thoroughly examines whether duration is related to a user’s interest. This section describes duration, as well as other user interest indicators that will be examined. The reason why each indicator is chosen is explained and how each indicator is measured is described.

6.1.1. Complete Duration A user may tend to spend more time on pages that he or she finds interesting, so we record the duration spent on a web page. The complete duration is defined as the time interval between the time a user opens and leaves a web page. Some web pages contain many images that delay the downloading time, so we start measuring the duration after the entire page is loaded. Thus, the complete duration won’t be affected by the connection speed, the amount of Internet traffic, or the CPU speed. The complete duration for a web page can be calculated by subtracting the time of finishing downloading the current web page from the time of leaving the web page. The complete duration is different from the duration used by Jung (2001). His duration includes the downloading time of a web page.

6.1.2. Active Window Duration Most modern operating systems allow a user to multitask, or run several applications at the same time. A user may write a report or chat while browsing a web page. Those other applications can be unrelated to the contents of a web page. If a user spent one hour writing a homework paper with a web browser minimized, the complete duration of the web page could be one hour. This is very likely to provide erroneous indications of user interest. In order to avoid being affected by this problem, we determine whether a web browser is active or not. The time that a web browser is inactive is subtracted from the complete duration. We call this duration active window duration since we count the time only when a web browser is active.

6.1.3. Look At It Duration Users are not always reading a web page when the web browser is active. They can easily be talking to friends or having a coffee break, while the web browser is active. The active window duration can easily be more than 30 minutes if a user leaves the browser active and goes for a coffee break. We may be able to detect the user’s absence by detecting the action of mouse movement. However, a better solution is to use a camera that detects a user’s face orientation. A camera can even check if a user is looking at the web browser or if his attention is diverted. This duration will be more accurate than the active window duration in terms of checking user’s attention to a web page. Since this duration counts the time that a user is looking at the web browser, we call it look at it duration. The look at it duration can be calculated by subtracting the time when a user does not look at the browser from active window duration.

6.1.4. Distance of Mouse Movement Many people move their mouse while reading the contents of a web page. Mouse movement can occur while looking at an interesting image, or when pointing at interesting objects. We hypothesize that the more distance a mouse moves, the more a user be interested in the web page. This indicator was also examined by Jung (2001). Our distance is a little bit different from his in a sense of detecting overall mouse movement. He counted on the mouse movement only when the mouse point is inside the active browser. The distance of mouse movement is detected by its x and y coordinates on a monitor every 100 milliseconds. The formula is