FREE ELECTRONIC LIBRARY - Dissertations, online materials

Pages:     | 1 |   ...   | 15 | 16 || 18 | 19 |

«Learning Implicit User Interest Hierarchy for Web Personalization by Hyoung-rae Kim A dissertation submitted to Florida Institute of Technology in ...»

-- [ Page 17 ] --

1989) is an extension of COBWEB for incremental clustering of continuous data set. It stores a mean or distribution for each individual attribute in each node and uses a modified similarity/dissimilarity measure that is an integral over continuous attributes. AutoClass (Cheeseman and Stutz, 1996) allows probabilistic membership of objects in clusters and hence clusters can overlap (an object belonging to multiple clusters). Also, it is a flat (nonhierarchical) clustering algorithm.

7.3.4. Correlation Functions Correlation functions are used to measure the similarity/distance between objects within a clustering function or other machine learning algorithms.

Tan et al. (2002) demonstrated that not all functions are equally good at capturing the dependencies among variables and there is no single function that is consistently better than the others in all applications. They compared various existing correlation functions based on the key properties that Piatetsky-Shapiro (1991) presented and other properties of a correlation function. We propose some additional desirable properties for correlation functions.

Huang et al. (2002) applied a number of correlation functions in their application to identify interesting rules and sequential patterns from the Livelink log files. They presented a comparison of these measures based on the feedback from domain experts.

Some of the interestingness measures were found to be better than others. Their experiment also supported that no single function is consistently better than the others. However, they did not analyze the desirable properties of correlation functions.

Piatetsky-Shapiro (1991) opened a new research area of information retrieval by presenting three desirable properties of a correlation function. However, his properties depend on cross product ratio (CPR) (Rosenfeld, 1994), in that all correlation function those satisfy CPR satisfies Piatetsky-Shapiro’s properties. We propose two other desirable properties, which independent from CPR.

Jaroszewicz and Simovici (2004) presented a method of computing interestingness of item sets and attribute sets with respect to background knowledge encoded as a Bayesian network. Their algorithm found interesting, unexpected patterns. Their work can be expanded to calculating all possible interestingness such as correlation or support between variables. Their work used the interestingness of dependency but did not focus on

–  –  –

system. This consisted of 5 areas. First, to create a context for personalization, we proposed to establish a user interest hierarchy (UIH) that can represent a continuum of general to specific interests from a set of web pages interesting to a user. This approach was non-intrusive and allowed web pages to be associated with multiple clusters/topics.

We evaluated the learned UIH based on data obtained from 13 users on our web server.

Second, variable length phrase-finding algorithm found meaningful phrases. In order to choose the best method, we compared the number of matching phrases chosen by a method to those phrases chosen by 10 human subjects. Matched-pair design (Robertson, 1981), comparing the top 10 best measures, was used for evaluation. Third, we proposed two properties as desirable properties for correlation functions. In order to compare our 2 new desirable properties and previous 3 desirable properties, we collected 32 correlation functions and examined which correlation function satisfied which desirable properties.

Fourth, we devised a new method of ranking web search results to serve each individual user’s interests using UIH. We evaluated methods based on how many interesting web pages or potentially interesting web pages each algorithm found within certain number of top links (Bharat and Mihaila, 2001). Traditional precision/recall graphs (van Rijsbergen,

1979) were also used for evaluation. We counted which search term showed higher performances with our weighted scoring method (WS) than with Google as well. Fifth, this we identified several implicit indicators that can be used to determine a user’s interest in a web page. This paper evaluates both previously studied implicit indicators and several new implicit indicators. All indicators examined were complete duration, active window duration, look at it duration, distance of mouse movement, number of mouse clicks, distance of scrollbar movement, number of scrollbar clicks, number of key up and down, and size of highlighting text. The data was 11 users’ implicit indicator data and a 1-5 interest rating of each page. During our experiment volunteers were encouraged to behave normally. Two evaluation criteria were used: (1) how accurately an indicator can predict users’ interests and (2) how many users’ interests an indicator can predict.

8.1. Summary of Contributions The following is a summary of our contributions.

In Chapter 2, Learning Implicit User Interest Hierarchy for Context in Personalization, we represented user interest hierarchy (UIH) at different abstraction levels (general to specific), which could be learned implicitly from the contents (words/phrases) in a set of web pages bookmarked by a user. The higher-level interests were more general, while the lower-level ones were more specific. In order to build a UIH, we devised a divisive graph-based hierarchical clustering algorithm (DHC), which constructed a UIH by grouping words (topics) into a hierarchy instead of flat cluster used by STC (Zamir and Etzioni, 1998). Between the root and the leaves, "internal" tree nodes represented different levels of generality and duration of interest. Towards the root of a UIH, more general (passive) interests were represented by larger clusters of words while towards the leaves, more specific (active) interests were represented by smaller clusters of words. DHC automatically found the threshold for clusters of terms (words and phrases) where as STC needed to specify the threshold. Furthermore, we used a more sophisticated correlation function, AEMI, than STC’s conditional probability. We also observed that DHC with an AEMI correlation function and MaxChildren threshold-finding method made a more meaningful UIH with 59% of meaningful clusters than the other combinations. We added phrases to the words as feature. Our experimental results indicated that 64% of UIH were interpretable by human.

In Chapter 3, Identifying Variable-Length Meaningful Phrases with Correlation Functions, we proposed a variable-length phrase-finding algorithm (VPF), which found more meaningful phrases – VPF – than older methods – Ahonen’s algorithm and Chan’s algorithms. They regenerated sequences recursively with the words selected in the previous stage and searched for increased length of phrases in time O(nw), where nw was the page size. This algorithm does not need any user-specified parameters, since our algorithm used average as a threshold and stopped when the length of phrases did not increase. The algorithm achieved further improved performance by pruning less meaningful phrases. The With-pruning was statistically significantly better than the Without-pruning with 95% confidence interval (P=0.004) for the VPF with F25. More meaningful phrases than previous methods were found by VPF with F25 and the improvement in performance is statistically significant than Ahonen’s original method. We suspected the filtering stage of Ahonen’s algorithm filtered many meaningful phrases out or their weighting scheme using the length of a phrase and tightness (Ahonen et al., 1998) distracted the correlation value of a phrase.

In Chapter 4, Analysis of Desirable Properties of Correlation Functions between Two Events, we identified 2 new desirable properties for a correlation functions in general.

They were:

• if P(A,B) increases when P(A) and P(B) remain unchanged, then F monotonically decreases;

• if P(A) (or P(B)) and P(A,B) increase at the same ratio, then F monotonically increases.

In addition, we revise and improve the first desirable property proposed by PiatetskyShapiro (1991) to make it more accurately descriptive. Some functions (e.g., Odds, Conv, Inte, and Coll in Appendix 1) produced a score of 1, even when two events were statistically independent.

• F can distinguish statistically independence of A and B.

Our empirical results indicated that our two new desirable properties were more descriptive than the previous three desirable properties provided in Piatetsky-Shapiro (1991). It was because all correlation functions that satisfy P1 also satisfied P2 and P3, and all correlation functions that satisfied P6 also satisfied P1-P3; but our properties were independent from P1 and P6. We tested our 2 new desirable properties over many different correlation functions and summarized their results with respect to each property. The summarized results included P1-P5, P6 that measured negative/positive correlation, and P7 that checked which correlation function returned a normalized return value. The associated table will help user compare the characteristics of correlations functions.

In Chapter 5, Personalized Ranking of Search Results with Implicitly Learned User Interest Hierarchies, we compared four ranking methods: Google, Random, US, and WS.

We used two data sets: interesting web pages that are relevant to the user search term and potentially interesting web pages that could be relevant in the future. We introduced two personalized ranking methods (WS and US) that utilize an implicitly learned user profile (UIH). We built different UIHs for users depending on their interests. We identified four characteristics for terms that match the user profile and provide a probabilistic measure for each characteristic. The four characteristics were the level of a node where a term belongs to (D), the length of a term (L), the frequency of a term (F), and the emphasis of a term (E).

D and L were calculated while building a UIH from the web pages in a user’s bookmark.

Different web pages had different values for F and E characteristics. Our experimental results indicate that WS method could achieve higher precision than Google for Top 10, 15 and 20 web pages that are relevant to the user search query. On interesting web pages, the Top link analysis showed WS achieved at least 13% higher precision than Google for Top 10, 15 and 20 links on average. WS outperformed US and Random in general also. On potentially interesting web pages, WS achieved the highest performance in all five Top links (Top 1, 5, 10, 15, and 20) as well. When incorporating the (public) ranking from the search engine, we found that equal weights for the public and personalized ranking can result in higher precision. A weight of 0.5 for the personal ranking seemed to show the highest performance on both data sets.

The precision/recall analysis visualizes the performance of each method in graphs.

WS and US are closer to the upper-right corner than Google except with recall values lower than.15 (after Top 5). In general, WS outperforms US and Random for interesting web pages. The results from precision/recall graph for potentially interesting web pages are similar. WS was closer to the upper-right corner than Google, US, and Random over all.

These results conclude that WS could provide more accurate ranking than Google on average.

In Chapter 6, Implicit Indicators for Interesting web Pages, our experiments indicate that complete duration, active window duration, look at it duration, and distance of mouse movement are reliable indicators for more users than other indicators – 8 users out of 11. Over the data set of “all visits”, the indicators were able to predict the most number of users’ interests as well – 7 users out of 11. The distance of mouse movement was as accurate as indicators based on duration, and it can be the most practical indicator since it is simple to detect and is more robust than active window duration against the case of user’s absence. If a user leaves a web page open and leaves the room, the distance of mouse movement will not be affected. For the bookmark, save, print, and memo indicators, more than 95% of the pages were correctly scored as “interested”. When we divided the data set less than “interested” and more than or equal to “interested”, “95% of the bookmarked web pages, 98% of the saved web pages, 100% of the printed web pages, and 98% of the memoed web pages belonged to the score of more than or equal to “interested”.

Our results also indicate that there was no indicator that was valid for all users. Depending on the user, an indicator may or may not be valid.

8.2. Ethical Issues in User Modeling 8.2.1. Privacy An individual’s right to privacy has always been an issue in user modeling. This is because the fact that the consequences for victims of privacy intrusions can be serious problems. Although the Internet is widely used nowadays, many users remain unfamiliar and skeptical about the level of security and privacy on the Internet. Great numbers of users are uncomfortable with “user profiling,” a practice in which users’ online movements are recorded. Much of this user anxiety is caused by the fact that users have no clear understanding regarding the rules that govern this practice, how extensive it is, what is recorded, and even how the information is used (Chiu, 2000). If the process of user profiling is explained, then user modeling will be more readily accepted.

This personalization method should be provided as an option of a web browser to a user along with full descriptions. Before building a user profile, the browser must accept the user’s permission whether s/he wants to build the user profile from their bookmarks or the web pages collected by implicit interest indicator. This is a permission-based personalization tool. Furthermore, the profile can be stored in a client. If both the web log file of the user’s behavior and the user profile remain in the user’s computer, then s/he will feel their privacy is protected. The browser should provide complete controls over the profile to the user.

Pages:     | 1 |   ...   | 15 | 16 || 18 | 19 |

Similar works:

«ABSTRACT MEN WRITING WOMEN: “THE WOMAN Title of Dissertation: QUESTION” AND MALE DISCOURSE OF IRANIAN MODERNITY Sahar Allamezade, Doctor of Philosophy, 2016 Dissertation directed by: Professor Ahmad Karimi-Hakkak School of Languages, Literatures, and Cultures In this dissertation I explore “The Woman Question” in the discourse of Iranian male authors. A pro-modernity group, they placed women’s issues at the heart of their discourse. This dissertation follows the trajectory of the...»

«Digital Struggles: Fostering Student Interaction in Online Writing Courses A DISSERTATION SUBMITTED TO THE FACULTY OF UNIVERSITY OF MINNESOTA BY Andrew Virtue IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Laura Gurak, Advisor July 2013 © Andrew Virtue 2013 Acknowledgements I would like to thank my advisor, Dr. Laura Gurak, for her patience, guidance, and encouragement throughout the dissertation process. I am so grateful for having the experience to work...»

«A TYPOLOGY OF SAMPLING IN HIP-HOP Amanda Sewell Submitted to the faculty of the University Graduate School in partial fulfillment of the requirements for the degree Doctor of Philosophy in the Musicology Department of the Jacobs School of Music Indiana University May 2013 Accepted by the Graduate Faculty, Indiana University, in partial fulfillment of the requirements for the degree of Doctor of Philosophy. Doctoral Committee _ J. Peter Burkholder, Ph.D. _ Phil Ford, Ph.D. _ Daniel R. Melamed,...»

«Fallāḥīn on Trial in Colonial Egypt: Apprehending the Peasantry through Orality, Writing, and Performance (1884-1914) by Anne Marie Clément A thesis submitted in conformity with the requirements for the degree of doctor of philosophy Department of Near and Middle Eastern Civilizations University of Toronto © Copyright by Anne Marie Clément 2012 Fallāḥīn on Trial in Colonial Egypt: Apprehending the Peasantry through Orality, Writing, and Performance (1884-1914) Anne Marie Clément...»

«Lord, Jenny (2014) Investigating the role of CLU PICALM and CR1 in Alzheimer's disease. PhD thesis, University of Nottingham.Access from the University of Nottingham repository: http://eprints.nottingham.ac.uk/14273/1/JennyLord_Thesis.pdf Copyright and reuse: The Nottingham ePrints service makes this work by researchers of the University of Nottingham available open access under the following conditions. This article is made available under the University of Nottingham End User licence and may...»

«Ghent University Faculty of Arts and Philosophy Trauma and Post-9/11 novels: Foer, McEwan and McInerney “This is the nature of these journeys – the steps, the sequences are not logical” Paper submitted in partial Supervisor: fulfillment of the requirements Prof. Dr. Philippe Codde for the degree of “Master in de Taalen Letterkunde: Nederlands – Engels” by Ellen Verbestel August, 2010 Acknowledgements First and foremost, I would like to thank my supervisor, Dr. Codde, for helping me...»

«RECYCLED BRICK MASONRY AGGREGATE CONCRETE: USE OF RECYCLED AGGREGATES FROM DEMOLISHED BRICK MASONRY CONSTRUCTION IN STRUCTURAL AND PAVEMENT GRADE PORTLAND CEMENT CONCRETE by Tara Lani Cavalline A dissertation submitted to the faculty of The University of North Carolina at Charlotte in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Infrastructure and Environmental Systems Charlotte Approved by: _ Dr. David C. Weggel _ Dr. Helene A. Hilger _ Dr. Vincent O....»

«CHOICE THEORY: AN INVESTIGATION OF THE TREATMENT EFFECTS OF A CHOICE THEORY PROTOCOL ON STUDENTS IDENTIFIED AS HAVING A BEHAVIORAL OR EMOTIONAL DISABILITY ON MEASURES OF ANXIETY, DEPRESSION, LOCUS OF CONTROL AND SELF-ESTEEM by Scott D. Reeder A dissertation submitted to the faculty of The University of North Carolina at Charlotte in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Counseling Charlotte 2011 Approved by: Dr. John R. Culbreth Dr. Edward A....»

«Structured Probabilistic Models of Proteins across Spatial and Fitness Landscapes Hetunandan Kamichetty CMU-CS-11-116 March 2011 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Thesis Committee: Chris J. Langmead(co-chair) Eric P. Xing (co-chair) Jaime Carbonell Chris Bailey-Kellogg Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy. Copyright c 2011 Hetunandan Kamichetty The views and conclusions contained in this document...»

«i Bentonite-Polymer Composites for Containment Applications By Joseph Scalia IV A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Geological Engineering) at the UNIVERSITY OF WISCONSIN-MADISON Date of final oral examination: 8/9/2012 The dissertation is approved by the following members of the Final Oral Committee: Craig H. Benson, Professor, Geological Engineering Tuncer B. Edil, Professor, Geological Engineering Samuel Kung, Professor,...»

«Learning Distance Functions for Exemplar-Based Object Recognition by Andrea Lynn Frome B.S. (Mary Washington College) 1996 A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the GRADUATE DIVISION of the UNIVERSITY OF CALIFORNIA, BERKELEY Committee in charge: Professor Jitendra Malik, Chair Professor Michael I. Jordan Professor Bruno Olshausen Fall 2007 The dissertation of Andrea Lynn Frome is approved: Professor...»

«GROWTH MODEL, SYNTHESIS OF CARBON NANOSTRUCTURES AND ALTERATION OF SURFACE PROPERTIES USING THEM Sayangdev Naha PhD Dissertation Submitted to the Faculty of Virginia Polytechnic Institute and State University in partial fulfillment of the requirements of the degree of Doctor of Philosophy in Engineering Mechanics Committee Members: Dr. Ishwar K. Puri, Chair Dr. Mark S. Cramer Dr. Muhammad R. Hajj Dr. John J. Lesko Dr. Roop L. Mahajan Dr. Mayuresh J. Patil 07/25/2008 Blacksburg, Virginia...»

<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.