WWW.DISSERTATION.XLIBX.INFO
FREE ELECTRONIC LIBRARY - Dissertations, online materials
 
<< HOME
CONTACTS



Pages:     | 1 |   ...   | 11 | 12 || 14 | 15 |   ...   | 19 |

«Learning Implicit User Interest Hierarchy for Web Personalization by Hyoung-rae Kim A dissertation submitted to Florida Institute of Technology in ...»

-- [ Page 13 ] --

6.4. Results and Analysis This section analyzes the data collected from the users who participated in our experiment. There are two data sets: “visits with maximum duration” and “all visits”. For web pages that a user visited more than once, the score might be the same, but all other information (the durations or number of mouse clicks etc.) may be different. The “visits with maximum duration” data set contains only page views where the user stayed for the longest period of time. The maximum duration is determined using complete duration, which is described in Section 3.1. The “all visits” data set contains all page views collected in our experiment. We believe that the “visits with maximum duration” data set is more useful than “all visits”, because users do not tend to read the web page again if they know about a web page before (Billsus and Pazzani, 1999). On average, users had 182 visits in the “visits with maximum duration” data set, and users had 291 visits in the data set of “all visits”. Jung (2001) only used the “all visits” data set.

6.4.1. Visits with Maximum Duration Table 22 shows the experimental results with “visits with maximum duration” data set. The table summarized which indicator is reliable for which volunteer. The first column is users, the second column is complete duration (Complete), the third column is active window duration (Active), the rest columns are for look at it duration (LookAtIt), distance of mouse movement (MousMove), number of mouse clicks (MousClk#), distance of scrollbar movement (ScrolMov), number of scrollbar clicks (ScrolCk#), number of key up and down (KeyUpDn#), and size of highlighting text (Highligh). They are implicit indicators examined. The “√” mark means that the hypothesis for the indicator is statistically significant and “x” means that it was not. The mark “?” means it was unavailable to apply statistical methods to the data due to various reasons such as limited data. The last row indicates how many users’ interests can be predicted by that indicator – the number of “√” mark for each column.

The Indicators Complete, Active, LookAtIt, and MousMove were able to classify 8 users’ interests towards web pages (73%). The indicator of MousClk# was the next best indicator, which was recognized as the best in (Jung, 2001). Indicators of KeyUpDn# and Highligh were able to distinguish the lowest number of users’ interests – KeyUpDn# was significant to only 1 user and Highligh was significant to only 3 users. No indicator could predict User 5’s interest. The indicator Highligh could predict User 7, but no other indicators could do his interest. Indicator of ScrolMov was also valid only to User 4. These results indicate that there was no indicator that was valid to all of the users. Depending on users, an indicator may or may not be valid.

We expected that the LookAtIt would be the most accurate indicator, but the result did not turn out as we expected. We suspect that this was because they did not move around much and looked at the monitor most of the time while browsing. In practice, a user can use browser longer period.

6.4.2. All Visits Table 23 shows the experimental results with the data set of “all visits”. The table summarized which indicator is reliable for which volunteer. The implicit interest indicators Complete, Active, LookAtIt, and MousMove were able to predict the interests of 7 users (64%) that participated in the study. This means that when we used “visits with maximum duration” we could predict more number of users – 8 users. This result notifies that the “visits with maximum duration” data set is more useful in predicting users’ interests more accurately than the data set of “all visits”.

The indicator of MousClk# was the next best indicator and was able to predict the interests of 6 users. User interest was more accurately predicted by the MousClk# implicit indicator in the “all visits” data set, but this was less predictable than the 4 indicators above. This result is similar to the findings of Jung (2001), who also used the “all visits” data set, and where MouseClk# was found to be the best indicator. No indicator could predict User 5’s interest. User 4’s interest could be predicted only by ScrolCk# and User 7’s interest could be predicted only by Highligh. These results also indicate that different indicators can predict different people.

Table 22. ANOVA test with “visits with maximum duration” data set

–  –  –

The implicit interest indicators bookmark, save, print, and memo had lower usage than the other indicators mentioned above. Users bookmarked or printed only a few web pages while surfing web. Users did not bookmark all interesting web pages, so if used alone they cannot be used to identify all of the pages that a user finds interesting. However, these indicators have a very high accuracy when they are used, and they can be used together with other more frequently used indicators.

The results for the bookmark, save, print, and memo indicators are listed in Table

24. The first column is the indicator, the second column is the score (1-“not interested”, 3interested” and 5-“very interested”); the third column is the sum of the usages for the specified indicator across 11 volunteers. The rest of the columns are detailed usages for each user. The value in each cell is the number of times that the indicator was used. The number of times each indicator was used varied significantly between each individual. For instance, for some users the bookmark indicator was a clearer indicator than other ones – user 5; for some other users save was a clearer indicator – user 10.





Of the web pages that were bookmarked, 95% of them were scored more than or equal to “interested” (3). The sum of bookmarked web pages across 11 volunteers tells us that users rarely bookmarked uninteresting web pages – no bookmarked web pages were scored as “not interested”. User 1 and 5 showed a tendency of book-marking more web pages as the web pages became more interesting. These results indicate that bookmark was a good indicator.

Saved web pages were scored more than or equal to “interested” 98% of the time.

This means that users rarely saved uninteresting web pages. Saved web pages were never scored as “not interested.” All users, except user 8, only saved pages that they found interesting. Users 3, 6, and 10 showed a tendency of saving more web pages as the web pages became more interesting. These results indicate that save is a good implicit indicator.

All of the printed web pages were scored more than or equal to “interested”. This result tells us that users did not print uninteresting web pages. User 2, 3, 6, and 10 showed a tendency of saving more web pages as the web pages were getting more interesting.

These results indicate that print is a good indicator.

Nearly all (98%) of the memoed web pages were scored more than or equal to “interested.” No memoed web pages were scored as “not interested.” No user other than user 9 memoed on web pages for which he was less than “interested.” User 1 did not used the memo, but user 3, 5, and 10 showed a tendency of saving more memos as the web pages became more interesting. These results also indicate that memo is a good indicator.

Table 24. Results of bookmark, save, print, memo indicators

–  –  –

user’s interest in a web page. This paper evaluates both previously studied implicit indicators and several new implicit indicators. All indicators examined were complete duration, active window duration, look at it duration, distance of mouse movement, number of mouse clicks, distance of scrollbar movement, number of scrollbar clicks, number of key up and down, and size of highlighting text. The data was 11 users’ implicit indicator data and a 1-5 interest rating of each page. During our experiment volunteers were encouraged to behave normally.

Two evaluation criteria were used: (1) how accurately an indicator can predict users’ interests and (2) how many users’ interests an indicator can predict. We used two data sets: “visits with maximum duration” and “all visits”. We believe that “visits with maximum duration” is more useful for prediction than “all visits”, because users did not tend to read a web page again, once users read about the web page (Billsus and Pazzani, 1999). Over the data set containing “visits with maximum duration”, the implicit interest indicators Complete, Active, LookAtIt, and MousMove were able to predict 8 users’ interests towards web pages, but over the data set of “all visits” the indicators were able to predict only 7 users’ interests. These facts also notified that the “visits with maximum duration” data set is more useful in predicting users’ interests more accurately than the data set of “all visits”.

The experimental results told us that MousMove could be the most practical indicator because this event is simple to detect and has less risk than Active. If a user leaves a web page open and leaves the room, the MousMove indicator will not be affected. The indicator of MousClk# was the next best indicator, which was recognized as the best in (Jung, 2001). Our results indicate that there was no indicator that was valid for all users.

Depending on the user, an indicator may or may not be valid.

We also evaluated less-frequently-used indicators of user interest: bookmark, save, print, and memo. When we divided the data set less than “interested” and more than or equal to “interested”, “95% of the bookmarked web pages, 98% of the saved web pages, 100% of the printed web pages, and 98% of the memoed web pages belonged to the score of more than or equal to “interested”.

We expected that the LookAtIt indicator would be more accurate than the Complete and Active indicators, but the results for all three were similar. We believe that this was because volunteers did not move around much and looked at the monitor most of the time while browsing. Perhaps a longer evaluation would give more accurate results for the LookAtIt indicator, since users would act more naturally after more than 1 or 2 hours of surfing. We can combine this indicator to an application for personalized web search results in the future. The collected interesting web pages for a user can be used for building

–  –  –

The adaptive web is a relatively young research area, starting in early 1990. Now it attracts many researchers from different communities: machine learning, information retrieval, user modeling, and web-based education (Brusilovsky and Maybury, 2002). Our goal is to build user interest models implicitly and incorporate them to personalized web search. Thus, we review web information retrieval, user modeling, and machine learning.

We discuss each of these categories in turn.

7.1. Web Information Retrieval Web information retrieval (WIR) systems gather information from web pages or users who are using web pages. In this section we overview basic steps of a WIR.

Furthermore, we overview those adaptive web systems that do not include personalized user modeling such as recommendation systems (collaborative filtering systems) that rely on the similarity between a user’s preference and that of other people. The six sub sections

–  –  –

Figure 34. Diagram of web information retrieval 7.

1.1. Basics of a WIR System Many WIR systems use a model based on word frequency information to identify relevant documents (Zamir and Etzioni, 1999; Cutting et al., 1992). It is important to describe the process by which the computer converts text into a form that can be processed.

For most WIR systems, the most basic unit of text analysis is the word, while phrases, sentences, or paragraph may be more meaningful.

7.1.1.1. Lexical Analysis Text processing converts the text into a stream of tokens, including numbers, abbreviations, and alphanumeric sequences. There exists a large class of words - called stop list - that have no inherent meaning when taken out of context (e.g., “a”, “the”, “are”, or “to”). By removing the stop list from web documents, a web information retrieval system can significantly increase its efficiency like reducing the time and memory space required for running the system.

Opinion varies as to the optimal size of a stop list, although a larger size is preferred. The size and content may be domain dependent (Hull, 1994). The stop list should not be selected solely on the basis of frequency, because some frequent words still bear important semantic meaning in a document. In some researches (Croft, 1991; Zamir and Etzioni, 1998) both too frequent words and too rare words were removed. It has been found in retrieval experiments that using a stop list in the range of 8 through 500 words does not reduce the accuracy of search algorithms in identifying relevant documents (Frakes and Baeza-Yates, 1992).

Another common strategy for reducing text size and potentially improving WIR systems is to apply a stemming algorithm to word tokens. A stemming algorithm is a linguistic tool for building word equivalence classes by removing and modifying prefixes and suffixes to identify the root form of the word. This idea is based on the assumption that common morphological variants of a word have similar meanings. For example, a user who is interested in the word “computer” may also be interested in the word of “computing”, “computerized”, “compute” etc. Search engines are able to reduce the size of the index, often as much as 20-50% by applying a stemming method (Hull, 1994). Frakes and Baeza-Yates (1992) conducted a large number experiments to test the performance of stemming algorithm. In general, it appears that stemmers do not degrade retrieval performance, and the specific choice of stemmer does not seem to be important.



Pages:     | 1 |   ...   | 11 | 12 || 14 | 15 |   ...   | 19 |


Similar works:

«Alexis Wellwood Department of Linguistics wellwood@northwestern.edu 2016 Sheridan Road 847-491-5779 (tel) Evanston IL, USA, 60208 faculty.wcas.northwestern.edu/wellwood Nationality Canadian Positions Assistant Professor of Linguistics, Northwestern University, Evanston IL Sep 2015-Present Affiliated faculty to the Department of Philosophy (Feb 2016-Present) Affiliated faculty to the Cognitive Science Program (Sep 2015-Present) College Fellow, Northwestern University, Evanston IL Sep 2014-Aug...»

«The Techneducator Effect: Colliding Technology and Education in the Conceptualization of Virtual Learning Environments Daniel Jason Nolan A thesis subrnitted in conforrnity with the requirements for the degree of Doctor of Philosophy Department of Curriculum, Teaching and Learning Ontario Institute for Studies in Education of the University of Toronto O Copyright by Daniel Jason Nolan (2001) Bibliothèque nationale 1*1 National Library du Canada o Canada f Acquisitions and Acquisitions et...»

«Camp Togowoods Handbook for Campers and Parents/Guardians 2015 CONTENTS 3 OUR PHILOSOPHY AND GOALS 4 CONTACTING YOUR CAMPER AND CAMP 4 Mail 4 Email 4 Phone 5 CAMP FINANCIAL POLICIES 5 Cancellations and Refunds 5 Confirmation Slip/Balance Due 5 Cookie Credits 6 HEALTH AND SAFETY/FORMS 6 Medication 6 Medical Insurance 6 Out of Camp Trips 7 Water Safety 7 Forms 8 HOMESICKNESS AND CAMPER CONDUCT 9 PACKING FOR CAMP 10 GENERAL PACKING LIST 11 WILDERNESS PACKING LIST 12 OPENING DAY: BRINING YOUR...»

«THERMOMECHANICAL FATIGUE CRACK FORMATION IN A SINGLE CRYSTAL NI-BASE SUPERALLOY A Dissertation Presented to The Academic Faculty By Robert Lewis Amaro In Partial Fulfillment Of the Requirements for the Degree Doctor of Philosophy in the George W. Woodruff School of Mechanical Engineering Georgia Institute of Technology December, 2010 THERMOMECHANICAL FATIGUE CRACK FORMATION IN A SINGLE CRYSTAL NI-BASE SUPERALLOY Approved by: Dr. Stephen D. Antolovich, Co-Advisor Dr. Richard W. Neu, Co-Advisor...»

«Collins, Jane-Marie (2010) Intimacy and inequality: manumission and miscegenation in nineteenth-century Bahia (1830-1888). PhD thesis, University of Nottingham.Access from the University of Nottingham repository: http://eprints.nottingham.ac.uk/11801/1/JMC_THESIS_APRIL_2010.pdf Copyright and reuse: The Nottingham ePrints service makes this work by researchers of the University of Nottingham available open access under the following conditions. This article is made available under the University...»

«IMPROVING THE PERFORMANCE, AVAILABILITY, AND SECURITY OF DATA ACCESS FOR OPPORTUNISTIC MOBILE COMPUTING BY STEPHEN D. SMALDONE A dissertation submitted to the Graduate School—New Brunswick Rutgers, The State University of New Jersey in partial fulfillment of the requirements for the degree of Doctor of Philosophy Graduate Program in Computer Science Written under the direction of Liviu Iftode and approved by New Brunswick, New Jersey May, 2011 c 2011 Stephen D. Smaldone ALL RIGHTS RESERVED...»

«A COMPUTATIONAL FRAMEWORK TO QUANTIFY NEUROMECHANICAL CONSTRAINTS IN SELECTING FUNCTIONAL MUSCLE ACTIVATION PATTERNS A Thesis Presented to The Academic Faculty by Mark Hongchul Sohn In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Woodruff School of Mechanical Engineering Georgia Institute of Technology May 2015 COPYRIGHT © 2015 BY MARK HONGCHUL SOHN A COMPUTATIONAL FRAMEWORK TO QUANTIFY NEUROMECHANICAL CONSTRAINTS IN SELECTING FUNCTIONAL MUSCLE ACTIVATION...»

«Title: Enhanced technology acceptance model to explain and predict learners' behavioural intentions in learning management systems Name: Abdullah Al-Aulamie This is a digitised version of a dissertation submitted to the University of Bedfordshire. It is available to view only. This item is subject to copyright. Enhanced Technology Acceptance Model to Explain and Predict Learners' Behavioural Intentions in Learning Management Systems ABDULLAH AL-AULAMIE PhD UNIVERSITY OF BEDFORDSHIRE Enhanced...»

«HETEROJUNCTION BIPOLAR TRANSISTORS AND ULTRAVIOLET-LIGHTEMITTING DIODES BASED IN THE III-NITRIDE MATERIAL SYSTEM GROWN BY METALORGANIC CHEMICAL VAPOR DEPOSITION A Ph.D Dissertation Presented to The Academic Faculty by Zachary M. Lochner In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the School of Electrical and Computer Engineering Georgia Institute of Technology August 2013 Copyright 2013 by Zachary M. Lochner HETEROJUNCTION BIPOLAR TRANSISTORS AND...»

«CURRICIULUM VITAE Lisa R. Miller Office Home Department of Sociology 812 S Stull Ave. Apt. 10 Indiana University Bloomington, Indiana 47401 Ballantine Hall744 E-mail: milllisa@indiana.edu 1020 Kirkwood Avenue Mobile: (812) 679-7679 Bloomington, Indiana 47405 Web: www.lisarmiller.net EDUCATION May 2016 Doctorate of Philosophy, Sociology – In Progress (Expected) Indiana University, Bloomington, Indiana Dissertation: Dating and Sex across the Life Course: Understanding American Singles’...»

«AUDITORY RESPONSES IN NORMAL-HEARING, NOISE-EXPOSED EARS By Copyright 2013 Greta Catherine Stamper, Au.D. Submitted to the Intercampus Program in Communicative Disorders and the Graduate Faculty of the University of Kansas in partial fulfillment of the requirements for the degree of Doctor of Philosophy. Tiffany Johnson, Ph.D., Chairperson Mark Chertoff, Ph.D., Committee Member John Ferraro, Ph.D., Committee Member Marcello Peppi, Ph.D., Committee Member Jo Wick, Ph.D., Committee Member...»

«MOVEMENT COORDINATION IMPAIRMENT IN NON-SPECIFIC LOW BACK PAIN: UNDERSTANDING ABERRANT PATTERNS OF MOVEMENT AND OUR ABILITY TO CHANGE THEM A Thesis Submitted to the Faculty of Drexel University by Peemongkon Wattananon, PT, MS in partial fulfillment of the requirements for degree of Doctor of Philosophy April 2014 © Copyright 2014 Peemongkon Wattananon. All Rights Reserved ii DEDICATIONS This dissertation and all my hard works are dedicated to my beloved parents and brother, Wattana...»





 
<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.