FREE ELECTRONIC LIBRARY - Dissertations, online materials

Pages:     | 1 |   ...   | 7 | 8 || 10 | 11 |   ...   | 19 |

«Learning Implicit User Interest Hierarchy for Web Personalization by Hyoung-rae Kim A dissertation submitted to Florida Institute of Technology in ...»

-- [ Page 9 ] --

We summarized the results in a Contingency matrix in Figure 17. The numbers outside the box were the sum of rows and columns. Out of 32 total correlation functions tested, 20 correlation functions satisfied old properties (P1-P3); 19 satisfied our new properties (P4, P5); 15 satisfied both desirable properties (P1-P5); 5 functions satisfied old properties only, and 4 functions satisfied new properties only. Since our properties could distinguish 5 correlation functions out of 20 functions that satisfied old properties (5/20=25%), it determined that our properties were independent from previous 3 desirable properties.

–  –  –

Figure 17. Contingency matrix of desirable properties over all functions 4.

3.2. Comparison Based upon Property 1 If a correlation function can detect the statistical independence of a diagram, it satisfies P1. The correlation functions that satisfy old properties and new properties were counted under the condition of satisfying P1. We created contingency matrix of the correlation functions that satisfied P1 in Figure 18. Twenty correlation functions satisfied P1. Interestingly, the correlation functions that satisfied P1 also satisfied P2 and P3. The correlation functions that satisfied P2 and P3 were dependent on the correlation functions that satisfied P1. This means P1 alone can explain P2 and P3. In other words, P1 can subsume P2 and P3, which reveals the nonessential nature of P2 and P3. When verifying correlation functions, P1 may be the only necessary reference property.

–  –  –

Figure 18. Contingency matrix of desirable properties over property 1 4.

3.3. Comparison Based upon Property 6 In Table 19, an “o” is placed in each column, for P6 if the correlation function can distinguish positive and negative relations. The correlation functions that satisfied old properties or new properties was counted and shown in the contingency matrix (Figure 19).

Of the 12 functions satisfying P6, 8 satisfied P1-P5; only 4 satisfied old properties (P1-P3) without satisfying new properties (P4 and P5).

All correlation functions satisfying P6 also satisfied old properties. It might be unnecessary to test a function for P1-P3 when P6 encompasses all three. Our properties (P4 and P5) tested an alternative aspect of the correlation functions, which is unrelated to P6.

In the contingency matrix, we compared 12 correlation functions, which satisfied old properties to new properties (Table 19). Of the 12 functions, only 8 satisfied our new properties. Clearly our properties were testing an alternative aspect of correlation functions.

–  –  –

Figure 19. Contingency matrix of desirable properties over property 6 4.

3.4. Normalized Results – Property 7 Normalization helps users analyze or understand the data. The P7 is not a highly desirable property because normalization is unrealistic for many correlation functions. We included the results of P7 testing only to help those users who prefer normalized results.

A comparison of P7 to both old properties and new properties may prove illustrative. Correlation functions that produced normalized results were assigned an “o” in Table 19. Out of 32 correlation functions, 22 functions satisfied P7. Ten functions satisfied both old properties and P7; fourteen functions satisfied both new properties and P7. The results also indicated that the characteristics of correlation functions were unrelated to function normality.

4.4. Summary In order to select the right correlation functions for an application out of a set of well-known correlation functions, the characteristics of each function must be compared.

Piatetsky-Shapiro (1991) opened new research areas by proposing three basic desirable properties for functions (P1-P3). Tan et al. (2002) compared correlation functions using the three established desirable properties and other existing function properties. Our research on two-variable Venn diagrams led us to identify two more desirable properties of correlation functions (P4 and P5). Property 4: If P(A,B) increases when P(A) and P(B) remain unchanged, then F monotonically decreases. Property 5: If P(A) (or P(B)) and P(A,B) increase at the same ratio, then F monotonically increases.

Consider the cases where A1=5.0, A2=10.0, B=10.0, A1∩B=3.0, and A2∩B=6.0 (Table 17). The correlations of the two cases, F(A1,B,A1∩B) and F(A2,B,A2∩B), may have different values. The log CPR values, which assess the significance of the correlation between A and B, are 4.2 and 5.0 respectively (Rosenfeld, 1994). But a deficiency exists.

For example, the function Inte satisfies P1-P3, but returns the same value of 6 in both cases. Our proposed property 5 can tell the difference between the two cases, and is critical for measuring the characteristics of correlation functions.

In addition, the definition of P1 used by Piatetsky-Shapiro (1991) lacks some descriptive power. For example, some correlation functions (e.g., Odds, Conv, Inte, and Coll in Appendix 1) produced a score of 1, even when two events were statistically independent. We revised the definition of P1 to improve its descriptive power.

P1: F can distinguish statistical independence of A and B.

In order to see whether those correlation functions that satisfy our two new desirable properties also satisfy previous three desirable properties, we collected 32 correlation functions as shown in Appendix 1 and used systematically generated test cases for testing each property. The results indicated that our two new desirable properties were more descriptive than the previous three old desirable properties. It was because all correlation functions that satisfy P1 also satisfied P2 and P3, and all correlation functions that satisfied P6 also satisfied P1-P3; but our properties were independent from P1 and P6.

We insist that these two properties are important in terms of understanding the correlations of two variables and characterizing an appropriate correlation function. The summarized results not only of P1-P5 but also of P6 that measured negative/positive correlation and the result of P7 that checked which correlation function returned a normalized return value will help reader compare characteristics of correlations functions.

Our experiment was limited to positive correlations for our web personalization since many applications depend on positive correlation. We will extend our analysis to

–  –  –

Personalized Ranking of Search Results with Implicitly Learned User Interest Hierarchies Web search engines are usually designed to serve all users, without considering the interests of individual users. Personalized web search incorporates an individual user's interests when deciding relevant results to return. We propose to learn a user profile, called a user interest hierarchy (UIH), from web pages that are of interest to the user. The user’s interest in web pages will be determined implicitly, without directly asking the user. Using the implicitly learned UIH, we study methods that (re)rank the results from a search engine. Experimental results indicate that our personalized ranking methods, when used with a popular search engine, can yield more relevant web pages for individual users. This

–  –  –

Figure 20. Devising a scoring function in the framework of web personalization

5.1. Personalized Results Personalization of web search involves adjusting search results for each user based on his or her unique interests. Our approach orders the pages returned by a search engine depending on a user’s interests. Instead of creating our own web search engine, we retrieved results from Google (Google, 2005). Since the purpose of this Chapter is to achieve a personalized ordering of search engine results, we can score a page based on the user profile and the results returned by a search engine as shown in the dashed box in Figure 21.

–  –  –

pages in his/her bookmarks (Li et al., 1999; Maarek and Ben-Shaul, 1996) and the Divisive Hierarchy Clustering (DHC) algorithm (Kim and Chan, 2003). A UIH organizes a user’s interests from general to specific. Near the root of a UIH, general interests are represented by larger clusters of terms while towards the leaves, more specific interests are represented by smaller clusters of terms. The root node contains all distinct terms in the bookmarked web page. The leaf nodes contain more specifically interesting terms. The relations between terms are calculated based on the co-occurrence in the same web page.

An example of a UIH is shown in Figure 3. Each node (cluster) contains a set of words. The root node contains all words that exist in a set of web pages. Each node can represent a conceptual relationship if those terms occur together at the same web page frequently, for example ‘perceptron’ and ‘ann’ (in italics) can be categorized as belonging to neural network algorithms, whereas ‘id3’ and ‘c4.5’ (in bold) in another node cannot.

Words in these two nodes are mutually related to some other words such as ‘machine’ and ‘learning’. This set of mutual words, ‘machine’ and ‘learning’, performs the role of connecting italicized and bold words in sibling nodes and forms the parent node. We illustrate this notion in the dashed box.

This Chapter focuses on devising a scoring method that receives two inputs (UIH and retrieved results) and one output (personalized ranking).

5.2. Approach In order to provide personalized, reordered search results to a user, we need to score each page depending on personal interests. Therefore, the goal is to assign higher scores to web pages that a user finds more interesting. This section explains how to score a retrieved web page using a user’s UIH. First, we explain the basic characteristics for each matching term. Second, based on the characteristics, we propose functions to score a term.

These functions determine how interesting a term is to a user. Third, based on the score and the number of the matching terms, we calculate an overall score for the page. Last, since the search engine provides a score/ranking for a web page, we incorporate this ranking into our final score of the web page.

5.2.1. Four Characteristics of a Matching Term Given a web page and a UIH, we identify matching terms (words/phrases) that reside both in the web page and in the UIH. The number of matching terms is defined m, which is less than the number of total distinct terms in the web page, nt, and the number of total distinct terms in the UIH, nu.

Each matching term, ti, is analyzed according to four characteristics: the level of a node where a term belongs to (Dti), the length of a term such as how many words are in the term (Lti), the frequency of a term (Fti), and the emphasis of a term (Eti). D and L can be calculated while building a UIH from the web pages in a user’s bookmark. Different web page has different values for F and E characteristics. We estimate the probability of these four characteristics and based on these probabilities, we approximate the significance of each matching term. Level/Depth of a UIH Node A UIH represents general interests in large clusters of terms near the root of the UIH, while more specific interests are represented by smaller clusters of terms near the leaves. The root node contains all distinct terms and the leaf nodes contain small groups of terms that represent more specific interests. Therefore, terms in more specific interests are harder to match, and the level (depth) where the term matches indicates significance. For example, a document that contains terms in leaf nodes will be more related to the user’s interests than a document that contains the terms in a root node only. If a term in a node also appears in several of its ancestors, we use the level (depth) closest to the leaves.

There is research that indicates user-defined query scores can be used effectively (Salton and Waldstein, 1978; Harper, 1980; Croft and Das, 1989). From the acquisition point of view, it is not clear how many levels of importance users can specify if we ask a user directly. In I3R (Croft and Thompson, 1987), they used only two levels: important or default. Harper (1980) used 5 levels of importance, and Croft and Das (1989) used 4 levels.

We calculate the scores of terms using the level (depth) of a node in the UIH instead of explicitly asking the user.

The significance of a term match can be measured by estimating the probability, P(Dti), of matching term ti at depth (level) Dti in the UIH. P(Dti) is the probability of a level in a UIH. A term that matches more specific interests (deeper in the UIH) has a lower P(Dti) of occurring. Lower probability indicates the matching term, ti, is more significant. The

probability is estimated by:

–  –  –

Longer terms (phrases) are more specific than shorter ones. If a web page contains a long search term typed in by a user, the web page is more likely what the user was looking for.

In general, there are fewer long terms than short terms. To measure the significance of a term match, the probability, P(Lti), of matching term ti of length Lti in the UIH is calculated. Lti is defined as MIN (10, the length of a term). We group the longer (greater than 10) phrases into one bin because they are rare. Longer terms has a smaller probability, P(Lti), of occurring, which indicates a more significant match. The probability

is estimated by:

–  –  – Frequency of a Term More frequent terms are more significant/important than less frequent terms.

Frequent terms are often used for document clustering or information retrieval (van Rijsbergen, 1979). A document that contains a search term many times will be more related to a user’s interest than a document that has the term only once.

–  –  –

page to measure the significance of the term. However, in general, frequent terms have a lower probability of occurring. For example, in a web page most of the terms (without the terms in a stop list (Rasmussen, 1992)) will occur once, some terms happen twice, and

–  –  – Emphasis of a Term Some terms have different formatting (HTML tags) such as title, bold, or italic.

Pages:     | 1 |   ...   | 7 | 8 || 10 | 11 |   ...   | 19 |

Similar works:


«The effect of contamination on selected physical and chemical characteristics of Mineral Trioxide Aggregate Thesis submitted in fulfilment of the degree of Doctor of Philosophy Mohammad Hossein Nekoofar School of Dentistry Cardiff University (April 2006-August 2011) DECLARATION This work has not previously been accepted in substance for any degree and is not concurrently submitted in candidature for any degree. Thursday, 23 June 2011 STATEMENT 1 This thesis is being submitted in partial...»

«Noise Mitigation in Low Leakage MTCMOS Circuits by Hailong JIAO A Thesis Submitted to The Hong Kong University of Science and Technology in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in the Department of Electronic and Computer Engineering December 2012, Hong Kong Authorization I hereby declare that I am the sole author of the thesis. I authorize the Hong Kong University of Science and Technology to lend this thesis to other institutions or individuals for...»

«An Appraisal of Naturalism in Contemporary Meta-Ethics David Christopher Lahti B.S., Gordon College Dissertation submitted for the degree of Ph.D. in philosophy (ethics) on 28 February 1998. The Whitefield Institute, Oxford.Co-supervisors: E. David Cook, Fellow, Green College, Oxford University R. J. Berry, Emeritus Professor of Genetics, FIBiol, FRSE, University College London Reformatted and repaginated for Letter sized paper, 2000. None of the following has been submitted previously for any...»

«Philosophy, Culture, and Traditions Vol. 4 2007 ISSN 1609-2392 _ Table of Contents Theme: Culture, Diversity, and Shared Values On Being Invisible: Socratic Asceticism and the Philosophical Life D.C. Schindler 7 The Ethical Imagination in Bachelard’s reading of Nietzsche Kuan-Min Huang 19 The Negation of Public Values in Neoliberalism: Market Versus State or State Versus Citizenship? Dionysios G. Drosos 31 Feminist Epistemology and Human Values in African Culture B.A. Lanre-Abass 57 Tempels...»


«K-12 Education as a Hermeneutic Adventurous Endeavor: Theorizing the Instrumental Approach, Education as a Sovereign Agent for Humanity, and the Relationships with Philosophy by Doron Yosef-Hassidim A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Department of Leadership, Higher and Adult Education Ontario Institute for Studies in Education University of Toronto © Copyright by Doron Yosef-Hassidim 2015 K-12 Education as a Hermeneutic Adventurous...»

«Extinct No More: Discourses on Tasmanian Aboriginal Heritage By William Price Submitted to the graduate degree program in Geography and the Graduate Faculty of the University of Kansas in partial fulfillment of the requirements for the degree of Doctor of Philosophy. Chairperson Jay T. Johnson Richard Howitt Barney Warf Stephen Egbert Bartholomew Dean Date Defended: December 10, 2013 The Dissertation Committee for William Price certifies that this is the approved version of the following...»

«PRESERVATION OF TRANSPORT PROPERTIES TRENDS: COMPUTATIONAL ROCK PHYSICS APPROACH A DISSERTATION SUBMITTED TO THE DEPARTMENT OF GEOPHYSICS AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Richa August 2010 Abstract In recent years, computational rock physics has gained reliability in predicting the properties of rocks, mostly due to the use of advanced imaging and computing techniques. We believe that...»

«New Arrivals Catalog July 2015 Windows Booksellers 199 West 8th Ave., Suite 1 Eugene, OR 97401 USA Phone: (800) 779-1701 or (541) 485-0014 * Fax: (541) 465-9694 Email: katrina@theologybooks.com Website: http://www.windowsbooks.com Monday Friday: 10:00 AM to 5:00 PM, Pacific time (phone & in-store); Saturday: Noon to 3:00 PM, Pacific time (in-store onlysorry, no phone). Our specialty is used and out-of-print academic books in the areas of theology, church history, biblical studies, and western...»

«3-n mil THE BLURRED BOUNDARIES BETWEEN FILM AND FICTION IN SALMAN RUSHDIE'S MIDNIGHT'S CHILDREN, THE SATANIC VERSES, AND OTHER SELECTED WORKS DISSERTATION Presented to the Graduate Council of the University of North Texas in Partial Fulfillment of the Requirements For the Degree of DOCTOR OF PHILOSOPHY By Moumin Manzoor Quazi, B.F.A., M.A., M.A. Denton, Texas August, 1999 $+9 Quazi, Moumin Manzoor, The Blurred Boundaries Between Film and Fiction in Salman Rushdie's Midnight's Children, The...»

«Of Women, Faith, and Nation: American Protestantism and the Kyrias School For Girls, Albania by Nevila Pahumi A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (History) in the University of Michigan 2016 Doctoral Committee: Professor Pamela Ballinger, Co-Chair Professor John V.A. Fine, Co-Chair Professor Fatma Müge Göçek Professor Mary Kelley Professor Rudi Lindner Barbara Reeves-Ellington, University of Oxford © Nevila Pahumi 2016...»

<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.