WWW.DISSERTATION.XLIBX.INFO
FREE ELECTRONIC LIBRARY - Dissertations, online materials
 
<< HOME
CONTACTS



Pages:     | 1 |   ...   | 8 | 9 || 11 | 12 |

«by MICHAŁ WRÓBLEWSKI Supervisor JERZY STEFANOWSKI, Assistant Professor Referee ROBERT SUSMAGA, Assistant Professor MASTER THESIS Submitted in ...»

-- [ Page 10 ] --

g – total number of clusters in the results gu – total number of useful clusters s – total number of snippets in the results sa – total number of snippet assignments in the results sw – total number of snippets well matching their clusters sm – total number of snippets moderately matching their clusters sn – total number of snippets not matching their clusters so – total number of snippets in the "Other Topics" group

On the basis of these statistics, the following quality measures have been defined:

• cluster label quality It is the proportion of number of useful and number of all groups (without the "Other

Topics" group:

gu cluster label quality = g

–  –  –

6.1.3. AHC-SPECIFIC QUALITY MEASURES Presence of hierarchy in the clustering results makes their evaluation even more difficult.

Groups included in the results of AHC may contain not only documents, but also other groups, which may also contain other groups and so on. In this connection a number of

problems arise, for instance:

• should hierarchical dependencies between groups influence assessment of their quality, e.g. can a subcluster of some useless cluster be useful?

It seems that top levels of the hierarchy are the most important ones, as their descriptions give to the user the "first impression" of their contents. So even if a useless top-level cluster contains some interesting subclusters, discouraged user may disregard the whole branch of the hierarchy without looking further down into its structure.

• should the redundancy of groups be punished?

It happens that some levels of hierarchy introduce no distinct subtopics when compared to topic of their parent clusters. In such case, these subclusters should be assessed as useless.

• should the descriptions of parent groups be taken into consideration when evaluating their subgroups?

It seems that when assessing subclusters descriptions one must take into consideration also their parent clusters description. Consider for instance again results of clustering for a query "king". It may contain a top-level cluster described as "King County" with a subcluster "Government". The latter group contains of course documents about government of King County, not about any government (in which case it would be too general). On the other hand, during our evaluation experiment we have come into an example hierarchy for a query "volleyball" containing a cluster "women's volleyball", which in turn contained a subcluster "men's volleyball". Although the latter one is useful itself, its position in hierarchy makes it rather useless.

In order to at least partially deal with these problems, to evaluate AHC results we have

introduced few improvements and additional metrics:

–  –  –

• calculating cluster label quality and snippet assignment distribution separately for the top level groups Apart from calculating these coefficients for all clusters, when evaluating hierarchical clustering results one may calculate them only for the top level clusters (i.e. ones that have no parent groups). This should be done due to the mentioned earlier fact that top groups are the ones most important to the user.

• average hierarchy depth This metric does not measure directly the quality of results itself, but has rather an informative purpose. However, we intuitively feel that the constructed hierarchy has certain optimal depth and shouldn't be neither too shallow nor too deep. In [Maarek et al.

00] dependence of results quality on hierarchy depth for a sample documents collection is shown, but authors of this paper don't give any general conclusions. Optimal hierarchy depth may heavily depend on structure of the input data.

6.2. USER EVALUATION OF THE AHC ALGORITHM

6.2.1. THE EXPERIMENT Our user evaluation experiment has been conducted on 6 users, who were asked to fill in survey forms. Such a small number of persons taking part in this process may of course seem insufficient in order to recognize its results as statistically significant. Unfortunately, considering our situation, and especially lack of resources and time, it was hard to find a larger number of people that could help us in carrying out the evaluation.

We should also mention that all persons engaged in this process are experienced Internet users, professionally connected with computing science. Moreover, most of them are also experts in the domain of clustering algorithms. This fact had also to influence obtained results.

For the purpose of evaluation we have used four queries, two in both Polish and English languages in order to examine the difference in behavior of AHC for different languages. As we also wanted to study the influence of breadth of topics scope in the input documents set on results quality, two of the queries used in the experiment were general, while another two

–  –  –

Snippets used as input data for the experiment were downloaded from the Google search service. Results handled to the persons involved in the evaluation were obtained using the Carrot2 AHC module with its default parameter values (mentioned in Figure 5-2-4 and Figure 5-2-2).





6.2.2. RESULTS Here we would like to present the most important results of the user evaluation. As it has turned out values of quality measures calculated only for top level clusters differ on the maximum only a few percent from their equivalents calculated for the whole hierarchy (a fact somehow unexpected), we present here only values of the latter ones on one, collective chart.

–  –  –

One may notice that algorithm gives result of quite solid quality. Percentage of well and moderately assigned snippets is high and varies about 90 %, a result that may be considered as satisfying. Unfortunately, AHC deals much worse with cluster descriptions – only about 60-70% of groups have been considered as useful. On the other hand, almost all snippets are

- 74 assigned to groups, with value of assignment coverage factor amounting usually to about 90%.

It is easy to see that AHC deals better with results of queries given in the English than Polish language (probably due to less complex issues of stemming and phrases extraction in the case of the first one). Yet influence of query scope on clustering quality is less obvious. It seems that for specific questions snippet assignment coverage is higher and its distribution is better, but when comparing cluster labels quality no general trend can be observed. We must be aware of the fact that our conclusions may lack solid basis, as the experiment was performed on small number of both users and queries.

It is very interesting to compare values of quality measures assessed by different evaluators.

As it turns out, user judgements are very subjective, especially in case of cluster label qualities. This simple comparison shows, how difficult is the evaluation of clustering results, in particular in case of hierarchy (see Figure 6-2-2-2 below).

–  –  –

Finally, we would like to compare values of our metrics for results of AHC and LINGO (for detailed description of LINGO evaluation refer to [Osiński 03]). In the below table we have set aggregated results of one algorithm's evaluation against results of another one.

–  –  –

Fig. 6-2-2-3 Comparison of aggregated evaluation results of AHC and LINGO

- 75 This juxtaposition shows that while snippet assignment distribution in case of both algorithms is actually the same, LINGO creates significantly better descriptions, at the cost of assigning a lower number of snippets to clusters. It may seem interesting that difference between values of two latter measures for both methods are almost equal (about 15%). So it may seem that this fraction of the input collection is made up of documents that are difficult to place in any cluster, yet the AHC algorithm tries to do just so.

6.3. EXPERT EVALUATION OF THE AHC ALGORITHM INFLUENCE OF AHC PARAMETERS AND INPUT DATA

ON THE RESULTS

Major goal of experiments performed in this subchapter is presentation of the influence of changes in algorithm parameters and input data on its results. Due to already mentioned variety of parameters that may be used in our implementation of the AHC (compare Figure 5-2-4 and Figure 5-2-2), it was impossible to repeat user evaluation similar to one presented in the former subchapter for each studied aspect of the algorithm. We have therefore decided to present the results of these experiments (unfortunately, for large hierarchies we were forced to present only the biggest and topmost groups), with simple verbal description pointing out the most important differences.

During expert evaluation we have used default values of the algorithm parameters from Figure 5-2-4 and Figure 5-2-2 (unless it is mentioned differently in the experiment's description). All queries were formulated in the English language, and we have used top 100 snippets returned by the Google search service as the input data.

6.3.1. INFLUENCE OF QUERY BREADTH Purpose of this experiment was to compare hierarchy created by the algorithm for two different queries, one general and another more specific, covering a small subset of documents corresponding to the first one. We have chosen two queries: "clinton" (it is a widely used query in examples of results clustering as it covers a lot of completely different topics – see also [Zamir 99], [Weiss 01] and [Osiński 03]) and "president bill clinton".

- 76 Fig. 6-3-1-1 Comparison of AHC results for queries: "clinton" (general – on the left) and "president bill clinton" (much more specific – on the right).

Result of this comparison shows that the algorithm performs better with more general queries. Main reason for this fact is of course that it is much more difficult to find distinct topics in a set of documents relevant to a specific query, and therefore create good, heterogeneous clusters. In particular, group descriptions are much better when clustering results of general queries.

6.3.2. INFLUENCE OF THE INPUT SNIPPETS NUMBER Natural influence of increase of input data size should be increase of number of clusters in the results and also increase of their size. It should be also expected that it will improve clusters labels quality, as small documents set may contain proportionally many overspecialized phrases that will be chosen as clusters labels. On the other hand, in case of larger input collection we may expect that these phrases will be replaced by ones a little bit more general, but instead much more informative to the user (while their subclusters may have specific labels). Another important, yet not considered here aspect of increase of the input data size is its influence on the algorithm's processing time. This issue is discussed in more details in Chapter 5-3-2.

- 77 Fig. 6-3-2-1 Comparison of AHC results for a query "unix" using 100 (on the left) and 300 (on the right) input snippets.

The most important difference caused by increase of input snippets number are increase of number of created top-level clusters and growth of hierarchy, which has become much more complex (unfortunately, due to its size we were not able to present here the whole tree). It is clearly visible that while when using 300 snippets a dozen or so top-level groups have their own subclusters, in case of 100 snippets the structure is more flat, with hierarchy present only in few biggest clusters. However, increase of the groups number was much smaller than of the documents number due to repeating topics in the input set.

It turns out that our assumption about influence of increase in input data size on cluster labels quality is only partially fulfilled. Although some of the groups in the right tree have shorter descriptions, it is hard to say, whether their quality has really improved. It results most probably from the fact that problem with overspecialized labels in case of clusters created by the AHC algorithm is rather rare (compare LINGO). On the other hand, it is clearly visible that in hierarchy constructed by clustering a larger number of documents first clusters (i.e. the biggest ones) correspond better to the most important subtopics (due to the fact that algorithm is performed on a more representative sample).

6.3.3. INFLUENCE OF PHRASES USE Purpose of this experiment was to show the profits of phrases use in the AHC algorithm and the vector space model in general. According to our hypothesis formulated in Chapter 3-5, use of phrases shouldn't have strong influence on structure of the hierarchy itself, but rather on quality of cluster labels.

- 78 Fig. 6-3-3-1 Comparison of AHC results for a query "king" for maximum length of extracted phrases equal to 1 (i.e. no phrases use – on the left) and 5 (on the right).

After closer examination of the tree structure it is easy to see that in spite of phrases use its structure has changed only slightly. This is due to the fact that if a group of documents shares a phrase, it also shares all of words that are included in it, so introduction of phrases should have no great impact on inter-document similarities. On the other hand, the clusters labels have become much more coherent. When we use only single terms, often their sequence in descriptions is different than that in documents, and this fact may confuse the user. Moreover, terms equal to the user's query are removed during terms weigthing from the term-document matrix (see Chapter 3-2-2) and therefore they cannot appear in labels, which become often too short and not fully informative. However, when using phrases the same terms may appear as their elements and therefore also in descriptions. This problem is illustrated on the above figure where labels of clusters such as "King County" or "Burger King" become "County" or "Burger" when we use single terms only.



Pages:     | 1 |   ...   | 8 | 9 || 11 | 12 |


Similar works:

«HARD PANNING Hard pan coated confections are interesting products since they are, like all panned goods, identified by a process and not by the candy type, such as hard candy, cremes, fondants and caramel. Hard panned confections are divided into two types: 1. Hard sugar coated items like jaw breakers, coated chewing gum, Jordan Almonds and chocolate lintels.2. Hot panned items like Boston Baked Beans and French Burnt peanuts. The variety of products produced using these methods is as large as...»

«UNITED STATES SECURITIES AND EXCHANGE COMMISSION Washington, DC 20549 FORM 10-Q ☒ QUARTERLY REPORT PURSUANT TO SECTION 13 OR 15(D) OF THE SECURITIES EXCHANGE ACT OF 1934 For the quarterly period ended June 30, 2016.☐ TRANSITION REPORT PURSUANT TO SECTION 13 OR 15(D) OF THE SECURITIES EXCHANGE ACT OF 1934 for the transition period from _ to _. Commission File Number 0-50481 AEOLUS PHARMACEUTICALS, INC. (Exact Name of Registrant as Specified in Its Charter) Delaware 56-1953785 (State or Other...»

«www.markswatson.com The Curse See, I have set before thee this day life and good, and death and evil; In that I command thee this day to love the LORD thy God, to walk in his ways, and to keep his commandments and his statutes and his judgments, that thou mayest live and multiply: and the LORD thy God shall bless thee in the land whither thou goest to possess it. But if thine heart turn away, so that thou wilt not hear, but shalt be drawn away, and worship other gods, and serve them; I denounce...»

«Framework contract for Multi-country thematic and regional/countrylevel strategy evaluation studies and synthesis in the area of external co-operation Ref.: EuropeAid/122888/C/SER/Multi PARTICIP GmbH Germany LOT 5: Evaluation of EC main policies and strategies in the areas of external cooperation Aide à la Décision Économique Belgium EVA 2007 – Lot 5 Request nr: Version 1 Deutsches Institut für Entwicklungspolitik Germany Evaluation of Visibility of EU external action Development...»

«Cardiac Department Cardiac Catheterisation Information for parents and carers You have recently been informed that your child requires a cardiac catheter. The information in this booklet aims to answer any questions you may have about the procedure and will cover the following main areas: What is a cardiac catheterisation?  Where will the catheterisation be performed?  Admission for catheterisation  Going home after cardiac catheterisation  Should you have any questions about any of...»

«What Does Classifying More Than 10,000 Image Categories Tell Us? Jia Deng1,3, Alexander C. Berg2, Kai Li1, and Li Fei-Fei3 1 Princeton University 2 Columbia University 3 Stanford University Abstract. Image classification is a critical task for both humans and computers. One of the challenges lies in the large scale of the semantic space. In particular, humans can recognize tens of thousands of object classes and scenes. No computer vision algorithm today has been tested at this scale. This...»

«CREATIVE RAY BO NAME Des Moines Y Camp SPECIAL EDITION 1192166th Drive, Boone, IA 50036 January 2013 Special Edition http://www.y-camp.org/ SPECIAL EDITION TO THE YEA BO NEWSLETTER IN HONOR OF DR. RAYMOND HILL PUGH IT’S ALL ABOUT THE KIDS! Ray PughOne of a Kind Ray Pugh first came down the hill to Y Camp in 1946, when he was 17 years old, to be a Y Camp leader for Leon Smith. He came back for six more summers, working in many departments, leading songs and stage shows and coordinating many...»

«Chandoli National Park and Resettlement: Impacts on Local Communities in Maharashtra, India How has the Establishment of Chandoli National Park Impacted Local Communities in the Warana Basin? Elisa Trepp Master Thesis Environment and Resource Management Vrije Universiteit, Amsterdam Student number: 1935240 August 25, 2010 1st Supervisor: J. Bouma 2nd supervisor: D. Huitema External supervisor: K.J. Joy Preface This master thesis is part of the EU sponsored LiveDiverse project and was supported...»

«Anatomy of the Korean Vowels Dick Grune dick@dickgrune.com DRAFT April 6, 2016; Abstract The observed characteristics of the Korean vowels are explained on the basis of a simple descriptive model. 1 Introduction When one reads the descriptions of the Korean vowels in educational material and on the Internet, one is struck by the apparent “shiftiness” of their sounds. (u in the official transcription) is described as /u/ and /U/1 in the The International Phonetic Alphabet (IPA); the (o) as...»

«Minister as Jesus Ministered: Transformed Service Book Four in the Five Book Series Experience the Life: Making the Jesus Way a Habit Bill Hull and Paul Mascarella Trademark and Copyright Information Minister as Jesus Ministered:Transformed Service Copyright © 2007. Bill Hull. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means—without prior written permission of the publisher. Unless otherwise...»

«Proceedings on applied botany, genetics and breeding, volume 176, issue 2 M. A. Vishnyakova. Do not let laurels carry you away, they are cheap stuff. (Vavilov’s role in the formation of G. D. Karpechenko as a leader of genetic research at VIR). Proceedings on applied botany, genetics and breeding. Vol. 176. I. 2. 2015. рр. 131–145. Background: The article is devoted to the 90th anniversary of genetic investigations in the Vavilov Institute. In April 1925, a young scientist G. D....»

«Dear Friend, Thank you for requesting additional information regarding LEGACY ALLIANCE Ministerial Fellowship. Enclosed are our PLAN and VISION STATEMENTS, and an application for membership in the LEGACY ALLIANCE. Should you desire to become involved with LEGACY ALLIANCE, please complete the application in its entirety, and return with the application fee to: LEGACY ALLIANCE 555 Markham Woods Road Longwood, FL 32779 If we can be of further assistance, please e-mail us at info@markchironna.com...»





 
<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.