FREE ELECTRONIC LIBRARY - Dissertations, online materials

Pages:     | 1 |   ...   | 4 | 5 ||

«Chapter 6. Classification Chapter author: Jess Hemerly jhemerly Table of Contents    6.1 Overview  ...»

-- [ Page 6 ] --

6.5.3 Clustering A special method of computational classification, clustering aims to identify and organize things that are most alike, both keywords and documents, and group them together into appropriate classes. Scott Spangler and Jeffrey Kreulen (2008) define clustering as “an algorithmic attempt to automatically group documents into thematic categories” (p. 13). It is a method of fully automatable unsupervised machine learning that works on any collection of text.

Clustering relies on the clustering hypothesis: “closely associated documents tend to be relevant to the same requests” (van Rijsbergen, 1979, p. 30). On Flickr, keyword clustering works machine-in-hand with social tagging to classify similar photo tags applied by users. The top three most frequently used tags in the cluster, as grouped by the system, serve as the name for the cluster. For example, clusters related to the keyword “lava” include “Hawaii, volcano, ocean,” “Iceland, landscape, nature,” and “Etna, Sicily, Sicilia.” Clustering analysis is an effective way to group similar documents into thematic categories, but it is not capable of telling you what those documents actually mean. Thus, in order to build a taxonomy around meaning and not just keyword or similarity in a set of clustered documents, people need to go through the analysis results and develop the appropriate classification to fit with the set of documents. However, clusters are effective in excluding documents from groupings. In the end, while clustering is a form of unsupervised machine learning, it relies on the data analyst or information scientist to make sense of the clusters and impose a relevant classification scheme.

These examples are known as polythetic clustering: a set of terms defines each cluster.

Thinking back to chapter 5, we see a similarity: polythetic categories consist of multiple   ‐ 21 ‐  Chapter 6: Classification    Last revised: September 17, 2010  membership features. Likewise, a monothetic category membership is defined by one and only one feature. Thus, another approach to clustering is monothetic clustering, where one feature defines cluster membership. Mark Sanderson and Bruce Croft found that, when paired with another sources, monothetic clustering actually works to build hierarchical classifications. They used the WordNet ontology to determine hierarchical relationships between terms extracted from item descriptions, as well as hypernym/hyponym relationships defined by key phrases like “such as,” “and others,” and “is part of.” This estimation of relationships between words and concepts automatically mined from documents is known as subsumption.

Sanderson and Croft (1999, p. 207) used five principles of design for their experiment:

1. Terms for the hierarchy were to be extracted from the documents and had to best reflect the topics covered within them;

2. Their organization would be such that a parent term would refer to a more general concept than its child, in other words, the parent’s concept subsumes the child’s;

3. The child would cover a related sub topic of the parent;

4. Forming a strict hierarchy, where every child had only one parent, was not considered to be important, therefore, the structure could be more like a directed acyclic graph;

5. And finally, ambiguous terms would be expected to have separate entries in the hierarchy, one for each sense appearing in the documents.

The researchers chose a set of 500 documents, and extracted words and phrases from the documents. They then compared every term to every other term in order to find subsumption relationships. After identifying approximately 200 pairs, they were automatically organized into a concept hierarchy (Sanderson and Croft, 1999, p. 209).

  ‐ 22 ‐  Chapter 6: Classification    Last revised: September 17, 2010  6.5.4 Discriminant Approaches With discriminant approaches, instead of having the machine do the work of identifying groups and categories, we first impose upon the machine a list of categories and have it match documents or entities to those categories. The use of computational classification with Library of Congress headings in section 6.5 is an example of a discriminant approach. The computer is fed certain parameters that are then matched against the documents in order to classify them. Here, the knowledge worker’s task is to create the classification, and the computer fits documents into it automatically.

Gracenote’s CDDB and MusicID are examples of discriminant approaches to classification. CDDB contains metadata about millions of CDs and tracks, matching lengths of tracks on a CD to lengths of tracks in the database to determine what album the CD is. In this case, the list of categories can be thought of as the information in the database, and the track information submitted by the user is the input to be matched. CDDB then fills in the metadata information, classifying the tracks as part of an album. You’ll notice that CDDB does not work when you insert a mix CD made by a friend. This is because CDDB is unable to use the combined information about track length to match the tracks to a whole album.

But CDDB also contains waveform fingerprint information, which powers its MusicID iPhone application. MusicID is an app that, when you to hold your phone up to a speaker through which a song is playing, analyzes the track, matches its waveform to tracks in the database, and spits back possible track matches to help you identify the song. CDDB would not work without a pre-populated database of information to check information against. Popular smartphone application Shazam works the same way but uses a different database of fingerprints.

Music recommendation systems work in a similar fashion, and each service has its own algorithm for matching a user’s listening habits to categories built from the listening habits of others. Last.fm combines listening data with community tags in order to build stations around given artists or tags. Users can mark tracks as loved or click a button to ban a track from ever being played for them again. In this way, the music is classified and the set of documents—in this case, songs—is pared down to fit a user’s preferences. This is not based on any acoustic fingerprinting; instead, recommendation relies on analysis and comparison of the tags users apply to music in Last.fm.


• Batley, Sue. Classification in Theory and Practice. Oxford, UK: Chandos Publishing, 2005.

• Dougherty, Janet W.D. and Charles M. Keller. Taskonomy: A Practical Approach to Knowledge Structures. American Ethnologist, 9(4), pp. 763-774.

• Getty Vocabulary Program. Art & Architecture Thesaurus (AAT). Los Angeles: J. Paul Getty Trust, Vocabulary Program, 1988 (http://www.getty.edu/research/conducting_research/vocabularies/aat/about.html)

• Gruenberg, Louise. Faceted Classification for the Web…

• Jacob, Elin. (2004). Classification and Categorization: A Difference that Makes a Difference.

Library Trends, 52(3), pp. 515-500.

• Lavallee, Andrew (2007). Discord Over Dewey. Wall Street Journal Online, July 20, 2007.

• Murphy, J. (2003, July 22). NASA Team Dismissed Foam Strike. CBS News. Retrieved from http://www.cbsnews.com/stories/2003/07/10/tech/main562542.shtml   ‐ 23 ‐  Chapter 6: Classification    Last revised: September 17, 2010 

• Nunberg, G. (2009). Google's Book Search: A Disaster for Scholars. The Chronicle of Higher Education. Retrieved from http://chronicle.com/article/Googles-Book-SearchA/48245/

• Ranganathan, S. R. (1967). Hidden Roots of Classification. Information Storage Retrieval, 3, pp. 399-410

• OCLC.org. (2010). Dewey Services. Accessed via http://www.oclc.org/dewey/ on April 4, 2010.

• Spangler, S. and Jeffrey Kreulen. (2008). Mining the talk : unlocking the business value in unstructured information. Upper Saddle River NJ: IBM Press/Pearson plc.

• Svenonius, Elaine. (2000) The Intellectual Foundations of Information Organization.

Cambridge, MA: MIT Press.

• Van Rijsbergen, C.J. Information Retrieval. Newton, MA: Butterworth-Heinemann, 1979.

Pages:     | 1 |   ...   | 4 | 5 ||

Similar works:

«JETS 37/2 (June 1994) 169-184 EXODUS 2 1 : 2 2 2 3 : THE MISCARRIAGE INTERPRETATION AND THE PERSONHOOD OF THE FETUS RUSSELL FULLER* Exodus 21:22-23 (KJV) reads as follows: If men strive, and hurt a woman with child, so that her fruit depart from her, and yet no mischief follow: he shall be surely punished, according as the woman's husband will lay upon him; and he shall pay as the judges determine. And if any mischief follow, then thou shalt give life for life. For the past thirty years our...»

«Forward This thesis presents the result of my master’s project entitled Dynamic feed-back mechanisms in Trust-Based DSR, which is a five months project from Jan 31st to July 15th 2005 and corresponds to 30 ETCS points. Before that I have been studying the international master’s program in Computer System Engineering at DTU and this is my final project. I would like to appreciate Christian Damsgaard Jensen, who suggested me to work on this interesting project, supervised me in the thesis...»

«TRADING & FINANCIAL MARKET ANALYSIS 2015 CAREER TRADING WITH AMPLIFY TRADING A TRADING CAREER AT THE HEART OF GLOBAL FINANCIAL MARKETS Placed at the forefront of current financial within the industry, and we continuously strive market volatility, London based trading firm to exceed the expectations of all our clients. Amplify Trading delivers an invaluable The following pages detail the career trading opportunity for new traders to develop a wellprogramme where candidates spend a minimum...»

«Bioregional Mapping as a Participatory Tool in the Community Based Watershed Management Project in Santo André, Greater São Paulo, Brazil ∗ Erika de Castro Alison McNaughton† Prepared for delivery at the 2003 meeting for the Latin American Studies Associations, Dallas, Texas, March 27-29, 2003 ∗ Erika de Castro is Project Manager of the project “Community Based Watershed Management in Santo André” at the Centre for Human Settlements, School of Community and Regional Planning,...»

«PRESS RELEASE 2013 Independence Day Honours List His Excellency the Governor-General, Sir Elliott Fitzroy Belgrave, G.C.M.G., K.A., has consented to the announcement of National Honours on the occasion of the Forty-seventh Anniversary of the Independence of Barbados. The Barbados National Honours and Decorations system was instituted by Letters Patent dated July 25, 1980. The Letters Patent provide inter alia for the establishment of the Order of Barbados in which there are four (4) classes of...»

«The Sound Guy: ReSpatializer User’s Guide 1 Copyright Copyright 2007 The Sound Guy, Inc., All Rights Reserved. ReSpatializer™ is a trademark of The Sound Guy, Inc. 12510 Paseo Cerro Saratoga, CA 95070 www.sfxmachine.com All trademarks are property of their respective owners. Credits Product Design: Earl Vickers, Christopher Bennett and Daniel Harris Lead Developers: Christopher Bennett and Daniel Harris CAUGUI: © 2002-03 Urs Heckmann, www.u-he.com VST Plugin Technology by Steinberg HRTF...»

«45 Raymond Soulard, Jr. Notes from New England [Commentary] “Please accept this ragged purse of high notes.” The following continues the series originally called Notes from New England, begun in issue 24-25 (Winter 1998), then revived in issue 59 (October 2006) as Notes from the Northwest, & appearing since issue 75 (October 2010) under its original title. It is intended as a gathering-place for observations of various lengths upon the world around...»

«See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/261675133 Economic 'revelations' and the metaphors of the meltdown: An educational deconstruction Article in British Educational Research Journal · April 2014 DOI: 10.1002/berj.3081 CITATIONS READS 3 authors, including: John Clarke University of Central Lancashire 3 PUBLICATIONS 8 CITATIONS SEE PROFILE Jo Frankham Liverpool John Moores University 31 PUBLICATIONS 369 CITATIONS...»

«0 Rector Mrs K Cormack B Ed (Hons) B Sc Depute Rectors Lower School (1st and 2nd years) Mr M Di Carlo Dip Tech Ed Middle School (3rd and 4th years) Ms L Gordon B Sc (Hons) Upper School (5th and 6th years) Mr D MacIntosh B Ed (Hons) Support for Learning Mr S Gaffney B Ed (Hons) The School Address is The Academy, Dingwall, Ross-shire IV15 9LT Telephone 01349 869860 Fax 01349 869886 E-mail: dingwall.academy@highland.gov.uk Website: dingwallacademy.com Present Roll 1126 Stages Covered S1 to S6...»

«DIGITAL CONDUCTIVITY METER MODEL 1152 OPERATION MANUAL EMCEE ELECTRONICS, INC. 520 CYPRESS AVENUE VENICE, FL 34285 (941) 485-1515 FAX 941-488-4648 The information contained in the accompanying document is proprietary and confidential, and may not be copied in any manner whatsoever without prior written consent of Emcee Electronics, Inc. The document and the material therein may not be used for any purpose other than that intended by Emcee Electronics, Inc. COPYRIGHT 1984 EMCEE ELECTRONICS, INC....»

«!  6 Textuality, Kinship, and the Amazonian Theories of Being in the World: An Analysis of Motherhood and Yachay in Two Napo Runa Songs Michael A. Uzendoski This essay explores Amazonian storytelling and music as practices that connect people to each other, the past, the landscape, and cosmology in complex experiential and social ways. While many people consider Native Amazonian storytelling to be “oral literature” or “oral poetry,” there is much more to Amazonian storytelling than...»

«Interdisciplinary Journal of Information, Knowledge, and Management Volume 10, 2015 Cite as: Holgersson, S. (2015). How the use of ICT can contribute to a misleading picture of conditions – A five-step process. Interdisciplinary Journal of Information, Knowledge, and Management, 10, 193-215. Retrieved from http://www.ijikm.org/Volume10/IJIKMv10p193-215Holgersson1870.pdf How the Use of ICT can Contribute to a Misleading Picture of Conditions – A Five-Step Process Stefan Holgersson Department...»

<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.