«Learning Implicit User Interest Hierarchy for Web Personalization by Hyoung-rae Kim A dissertation submitted to Florida Institute of Technology in ...»
8.2.2. Confidence on the Results This section briefly discusses the extent to which we can trust the profile generated by the DHC algorithm. For example, if the profile indicates a user is interested in “laptop computer”, how much can we trust the result? The user profile is built out of a set of interesting web pages to a user. As previously explained, bookmarks or web pages detected by an implicit indicator can be used as the set. Since the set of interesting web pages can change over time, the user profile can change as well. The profiles should be rebuilt periodically. Then, are the interests of the user that appear over consecutive different periods more confident than the interests that occur only once? The results may also depend on how reliable the input data sets are. In order to answer the question, we may have to be able to measure the reliability of the set of interesting web pages. These questions are not easy for us to answer at this moment. It can be future work.
8.3. Limitation and Future Work In our system there are several limitations.
• We did not analyze differences among the UIHs’ obtained from various users because of the large numbers of web pages used in our experiments.
• The performance of the DHC algorithm varied depending on the articles selected.
We believe this is because of the intrinsic characteristics in a document.
• The performance of VPF varied depending on the articles selected. We currently do not understand the reason for the variance in performance over different articles. We assume it is due to the intrinsic characteristics of an article, because the human subjects’ results are also different depending on the articles.
• Our experiment for desirable properties of a correlation function was limited to positive correlations for our web personalization since many applications depend on positive correlation. We will extend our analysis to negative correlation as well.
• The improvement of WS was not statistically significant because the precision values of Google had large variance.
• The reason for the low performance of some search terms might be because there is no relation between his/her bookmarks and the search terms. We may be able to relieve this problem by incorporating interesting web pages based on implicit interest indicators.
• Our approach of penalizing the index pages did not make much improvement in our initial experiments. We will examine this approach further in the future.
• Since WS showed higher performance for links after Top 5 than Google, we expect that our method may get higher performance with clustered search engines.
• A longer evaluation would give more accurate results for the LookAtIt indicator, since users would act more naturally after more than 1 or 2 hours of surfing.
• We can combine this indicator to an application for personalized web search results in the future. The collected interesting web pages for a user can be used for building a user interest hierarchy.
References Note: Internet references current as of March 2005 Agrawal, R. and R. Srikant (1994), “Fast Algorithm for Mining Association Rules”, Proc.
20th Vary Large Data Base Conference, 487-499.
Ahonen, H., Heinonen, O., Klemettinen, M. and A.I. Verkamo (1998), “Applying Data Mining Techniques for Descriptive Phrase Extraction in Digital Document Collections”, Proc. Advances in Digital Libraries Conference, 2-11.
Albanese, M., Picariello, A., Sansone, C., and L. Sansone (2004), “Web Personalization Based on Static Information and Dynamic User Behavior”, Proc. 6th annual ACM international workshop on Web information and data management, 80-87.
Anderson, C.R. (2002), A Machine Learning Approach to Web Personalization, Ph.D.
thesis. University of Washington, Department of Computer Science and Engineering. http://www.the4cs.com/~corin/research/pubs/thesis.pdf Ardissono, L., Console, L., and I. Torre (1999), “Exploiting User Models for Personalizing News Presentations”, Proc. 2nd Workshop on Adaptive Systems and User Modeling on the WWW.
Barzilay, R., McKeown, K., and M. Elhadad (1999), “Information Fusion in the Context of Multi-Document Summarization”, Proc. 37th Annual Meeting of the Association for Computational Linguistics. http://www.cs.mu.oz.au/acl/P/P99/P99-1071.pdf Bellegarda, J.R. (1998), “Exploiting Both Local and Global Constraints for Multi-Span Statistical Language Modeling”, Proc. Intl. Conf. On Acoustics, Speech, and Signal Processing, IEEE press, 2, 677-680.
Bharat, K. and G.A. Mihaila (2001), “When Dxperts Agree: Using Non-Affiliated Experts to Rank Popular Topics”, Proc. 10th Intl. World Wide Web Conference.
Billsus, D. and M.J. Pazzani (1999), “A Hybrid User Model for News Story Classification”, Proc. 7th International Conference on User Modeling, Verlag, Wien-New York: Springer, 99-108.
Boyle, C. and A.O. Encarnacion (1994), “MetaDoc: An Adaptive Hypertext Reading System”, User Modeling and User-Adapted Interaction 4, 1 (Jan.), 1-19.
Brin, S., Motwani, R., Page, L., and T. Winograd (1998), “What Can You Do with a Web in Your Pocket”, In Bulletin of the IEEE Computer Society Technical Committee on Data Engineering.
Brusilovsky, P. and M.T. Maybury (2002), “From Adaptive Hypermedia to Adaptive Web”, In P. Brusilovsky and M. T. Maybury (eds.), Communications of the ACM 45 (5), Special Issue on the Adaptive Web, 31-33.
Brusilovsky, P. and L. Pesin (1998), “Adaptive Navigation Support in Educational Hypermedia: An Evaluation of the ISIS Tutor”, Journal of Computing and Information Technology 6, 1, 27-38.
Brusilovsky, P. (2001), “Adaptive hypermedia”, User Modeling and User Adapted Interaction, Ten Year Anniversary Issue (Alfred Kobsa, ed.) 11 (1/2), 87-110.
http://umuai.informatik.uni-essen.de/anniversary.html Cadez, I., Heckerman, D., Meek, C., Smyth, P., and S. White (2000), “Visualization of Navigation Patterns of a Web Site Using Model-Based Clustering”, Proc. 6th International Conference on Knowledge Discovery and Data Mining.
Chan, P.K. (1999), “A Non-Invasive Learning Approach to Building Web User Profiles”, In KDD-99 Workshop on Web Usage Analysis and User Profiling, 7-12.
Cheeseman, P. and J. Stutz (1996), “Bayesian Classification (AutoClass): Theory and Results”, Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, Menlo Park, Calif., 153-180.
Chen, L. and K. Sycara (1998), “Webmate: A Personal Agent for Browsing and Searching”, Proc. 2nd International Conference on Autonomous Agents, pp.132Chiu, A.S. (2000), “The Ethics of Internet Privacy”, http://web.tepper.cmu.edu/files/PDF_Document/c3f42085bfea4608914d62539d4f5
579.pdf Claypool, M., Le, P., Wased, M., and Brown, D., (2001) “Implicit Interest Indicators”, Proc. 6th international conference on Intelligent User Interfaces, 33-40.
Cohen, W.W. (1995), “Fast Effective Rule Induction”, Proc. Twelfth International Conference.
Cohen, W.W. (1998), “Joins that Generalize: Text Classification Using WHIRL”, Proc. 4th International Conference on Knowledge Discovery and Data Mining (KDD-98).
Croft, W.B. and R. Das (1989), “Experiments with Query Acquisition and Use in Document Retrieval Systems”, Proc. 13th ACM SIGIR.
Croft, W.B. and R.T. Thompson (1987), “I3R: A New Approach to the Design of Document Retrieval Systems”, Journal of the Americal Society for Information Science, 38: 389-404.
Croft, W.B., Turtle, H.R., and D.D. Lewis (1991), “The Use of Phrases and Structure Queries in Information Retrieval”, ACM SIGIR Conference on Research and Development in Information Retrieval, 32-45.
Croft, W.B. (2000), (editor) Advances in Information Retrieval: Recent Research from the Center for Intelligent Information Retrieval, Massachusetts, Kluwer Academic Publishers, 243.
Crotf, W.B. and R. Das (1989), “Experiments with Query Acquisition and Use in Document Retrieval Systems”, Proc. 13th ACM SIGIR.
Cutting, D.R., Karger, D.R., Pedersen, J.O., and J.W. Tukey (1992), “Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections”, Proc. 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.
Delaney, K.J. (2004), “Study Questions Whether Google Really is Better,” Wall Street Journal (Eastern edition), New York, May 25, B.1.
http://proquest.umi.com/pqdweb?RQT=309&VInst=PROD&VName=PQD&VTyp e=PQD&sid=5&index=45&SrchMode=1&Fmt=3&did=000000641646571&client Id=15106 Deshpande, M. and G. Karypis (2001), “Selective Markov Models for Predicting WebPage Accesses”, First SIAM International Conference on Data Mining.
Eirinaki, M., Lampos, C., Paulakis, S., and M. Vazirgiannis, (2004a) “Web Personalization Integrating Content Semantics and Navigational Patterns”, Workshop on Web Information and Data Management, 72 – 79.
Fagan, J.L. (1987), “Automatic Phrase Indexing for Document Retrieval”, Proc. 10th Annual ACM SIGIR Conference on Research & Development in Information Retrieval, 91-101.
Fink, J., Kobsa, A., and A. Nill (1996), “User-oriented Adaptivity and Adaptability in the AVANTI project”, Proc. for the Web: Empirical Studies, Microsoft Usability Group, Redmond (WA).
Fisher, D.H. (1987), “Knowledge Acquisition via Incremental Conceptual Clustering”.
Machine Learning 2, 139-172.
Frakes, W.B. and R. Baeza-Yates (1992), Information Retrieval: Data Structures and Algorithms, Prentice-Hall.
Fu, X., Budzik, J., and K.J. Hammond (2000), “Mining Navigation History for Recommendation”, Proc. 2000 Conference on Intelligent User Interfaces.
Gennari, J.H., Langley, P., and D. Fisher (1989), “Models of Incremental Concept Formation”, Artificial Intelligence, 40, 11-61.
Goecks, J. and J. W. Shavlik (2000), “Learning Users’ Interests by Unobtrusively Observing Their Normal Behavior”, Proc. ACM Intelligent User Interfaces Conference (IUI), Jan 2000.
Goecks, J. and J.W. Shavlik (2000), “Learning Users’ Interests by Unobtrusively Observing Their Normal Behavior”, Proc. ACM Intelligent User Interfaces Conference (IUI), Jan.
Gokcay, D. and E. Gokcay (1995), “Generating Titles for Paragraphs Using Statistically Extracted Keywords and Phrases, Systems, Man and Cybernetics”, Proc. IEEE International Conference on Intelligent Systems for the 21st Century, Vol. 4, 22Oct.
Google co. (2004), Google. http://www.google.com/ Granka, L. A., Joachims, T., and G. Gay (2004), “Eye-tracking Analysis of User Behavior in WWW Search”, Proc. 27th annual international conference on Research and development in information retrieval.
Gravano, L., Garcia-Molina, H., and A. Tomasic (1999), “Gloss: Text-source Discovery over the Internet”, ACM Transactions on Database Systems, 24(2):229-264, June.
Grossman, D., Frieder, O., Holmes, D., and D. Roberts (1997), “Integrating Structured Data and Text: A Relational Approach”, Journal of the American Society for Information Science, 48(2), February.
Guha, S., Rastogi, R., and K. Shim (1998), “CURE: An Efficient Clustering Algorithm for Large Databases”, Proc. ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’98), 73–84.
Guha S., Rastogi, R., and K. Shim (1999), “ROCK: A Robust Clustering Algorithm for Categorical Attributes”, Proc. 15th Int’l Conf. on Data Eng.
Han, J. (2001), eds., Data Mining: Concepts and Techniques, San Francisco: Morgan Kaufmann Publishers, pp.338.
Harper, D.J. (1980), Relevance Feedback in Document Retrieval Systems: An Evaluation of Probabilistic Strategies, Ph.D. Thesis, Computer Laboratory, University of Cambridge.
Hartigan, J. (1975), Clustering Algorithm, John Wiley.
Haveliwala, T.H. (1999), “Efficient Computation of PageRank”, Technical Report, Stanford University Database Group. http://dbpubs.stanford.edu/pub/1999-31 Haveliwala, T.H. (2002), “Topic-sensitive PageRank”, Proc. 11th Intl. World Wide Web Conference, Honolulu, Hawaii, May.
Herlocker, J., Konstan, J., Borchers, A., and J. Riedl (1999), “An Algorithmic Framework for Performing Collaborative Filtering”, Proc. 1999 Conference on Research and Development in Information Retrieval.
Hilderman, R. and H. Hamilton (2001), “Evaluation of Interestingness Measures for Ranking Discovered Knowledge”, Proc. 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining.
Hoerding, T. (1999), “A Temporary User Modeling Approach for Adaptive Shopping on the Web”, Proc. 2nd Workshop on Adaptive Systems and User Modeling on the WWW.
Huang, S., An, A., and N. Cercone (2002), “Comparison Of Interestingness Functions for Learning Web Usage Patterns”, Proc. eleventh international conference on Information and knowledge management, 617-620.
Hull, D.A. (1994), Information Retrieval Using Statistical Classification, PhD thesis, Stanford University, Statistics.
Intel Inc. (2001), “Open Source Computer Vision Library”, Reference Manual, Intel Corporation, 2-1~2-2.
Jaroszewicz, S. and D. A. Simovici (2004), “Interestingness of Frequent Item sets Using Bayesian Networks as Background Knowledge”, Proc. 2004 ACM SIGKDD international conference on Knowledge discovery and data mining. 2004, 178 – 186.
Jeh, G. and J. Widom (2003), “Scaling Personalized Web Search”, Proc. 12th Intl.
Conference on World Wide Web, Budapest, Hungary, 20-24, May.
Joachims, T., Freitag, D., and T. Mitchell (1997), “Web Watcher: A Tour Guide for the World Wide Web”, Proc. 15th International Joint Conference on Artificial Intelligence, 770-775.
Johansson, C. (1996), “Good Bigrams”, Proc. COLING-96, 592-597.
Jung, K. (2001) Modeling Web User Interest with Implicit Indicators, Master Thesis, Florida Institute of Technology.
Kamber, M. and R. Shinghal (1996), “Evaluating the Interestingness of Characteristic Rules”, Proc. 2nd International Conference on Knowledge Discovery and Data Mining, 263-266, Portland, Oregon.
Kaplan, C., Fenwick, J., and J. Chen (1993), “Adaptive Hypertext Navigation Based on User Goals and Context”, User Modeling and User-Adapted Interaction 3, 3, 193Karypis, G., Han, E., and V. Kumar (1999), “CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling”, IEEE Computer.
Kaufman, L. and P.J. Rousseeuw (1990), Finding Groups in Data: An Introduction to Cluster Analysis, Wiley, New York.