FREE ELECTRONIC LIBRARY - Dissertations, online materials

Pages:     | 1 || 3 | 4 |   ...   | 6 |

«Chapter 6. Classification Chapter author: Jess Hemerly jhemerly Table of Contents    6.1 Overview  ...»

-- [ Page 2 ] --

In more recent iterations of the census, we have seen increasingly fine granularity in classifying Asian populations. The term “Hispanic” has also changed over the years, and in the 2010 census actually functioned to exclude a number of nationalities from the descriptor.

Politically, we see tension between the terms “Hispanic” and “Latino.” Finally we must decide how we want to build our classification. If we want to use a topdown hierarchical structure, with superordinate and subordinate classes nested in a structure, our classification will be enumerative—i.e., it will list all possible entities and the relationships between them according to literary or subject-specific warrant. It can be highly enumerative or less so, but either way will be a top-down structure consisting of mutually exclusive classes.

Enumerative classifications are limited by their nested structure in expressing relationships:

items are collocated by a hierarchical order of classes and subclasses and every entity will only belong to one subdivision.

Faceted classifications are an alternative to the strict hierarchies of enumerative systems and, as mentioned above, are especially useful in web user interfaces for things like online shopping, where users may want to consider multi-dimensional characteristics and where it is unreasonable to assume a strict hierarchical ordering of the dimensions. For example, if we have a collection of shirts in various styles, colors, brands, and prices, it makes sense to sort them using these dimensions in any order. Items are grouped orthogonally—that is, they are mutually independent—according to certain characteristics and users can select these characteristics to narrow down a selection of entities that meet a number of criteria. A faceted classification essentially enumerates all possible classes into which a set of concepts or entities can be sorted, but those classes only exist if there is a need to sort entities by a specific characteristic. A faceted classification is like a controlled vocabulary of “concepts and their associated labels that can be used, in association with a notation and a prescribed citation order, to synthesize the classes that will populate the classification scheme” (Jacob, 2004, p. 525).

The history of faceted classification is rooted in the colon classification theory of S.R.

Ranganathan, a Hindu mathematician working as a librarian. Ranganathan sought to organize all the world’s ideas for the purpose of library cataloging using a single classification and notation.

He established a set of five and only five facets applied to all knowledge: Personality, the type of thing; Matter, the constitutional matter of the thing; Energy, the action or activity of the thing;

Space, where the thing occurs; and time, when things occur. The notation was each facet separated by a colon, with values that represent different characteristics pulled from a table and shown in the P:M:E:S:T format (Ranganathan, 1967, p. 5-7). Today, the types of facets include enumerative (mutually exclusive); Boolean (yes or no); hierarchical or taxonomic (logical containment); and spectrum (a range of numerical values). The selection criteria include   ‐ 5 ‐  Chapter 6: Classification    Last revised: September 17, 2010  orthogonality, semantic balance, coverage, scalability, concreteness, and normativity. We will cover this in more depth later in this chapter, using examples from current web user interfaces.

So far, we’ve discussed systems of classification established by people or organizations with some institutional authority to create them. But a classification can be a highly personal form of information management too. Think again about your kitchen. You very likely have rules for where things go and why, and this system of organization allows you to more easily find things when you need to use them. Your principled system may be arbitrary, constrained by the size of your apartment or by the space granted to you in a shared cupboard. You may even organize based on activities for which you use things, like baking, snacking, and serving. But here we are talking about management of physical things that can only be in one place at a time.

It is useful to contrast this case with the management of “information things” that can be classified in many “places” at once, like a digital picture, music file, or a digital document. How might we manage and classify bits?

With the rise of social media sites and tools in recent years, a new of form of classification has emerged: social and distributed classification, the most well known form of which is tagging. Tagging is generally not a principled practice. Users tend to apply terms to photographs, news clips, or other entities, both textual and multimedia, that help them find and share things with others. Tagging usually falls short of classification due to a lack of vocabulary control and a tendency for users to tag intuitively. Pictures of trees on Flickr can appear tagged with “tree” or “trees” depending on the user’s whim. And, if we remember the vocabulary problem (section 2.5.2), one photographer’s “tree” is another’s “oak.” This disparity in the descriptors people use to categorize similar things makes many systems that depend on tagging for IR little more than tag soup. In an unstructured free-for-all tagging system, tagging is not classification; it is simply categorization or even description.

Thomas Vander Wal coined the term “folksonomy”—combining “folk” and “taxonomy”—to describe a collection of descriptors, often listed by popularity—use frequency—on the home page of a social tagging site such as Delicious. Folksonomies are often displayed in the form of a tag cloud, where the frequency with which the tag is used throughout the site determines the size of the text in the tag cloud. Similar tags are clustered, but folksonomies are not principled; they are emergent, created through bottom-up aggregation of user tags.

Users and communities can generate a set of principles to govern their tagging practices in order to harness distributed and social tagging to develop a useful classification system. Such a system of distributed or personal classification through the use of tagging is a tagsonomy, a principled evolution of the folksonomy. Tagsonomies can overcome the strict limitations of hierarchical classifications and users can adopt conventions to encode hierarchical and derivational relationships. Looking back at the kitchen example in the beginning of this chapter, the way you may (or may not) label items in your fridge is a basic example of a tagsonomy.

Social media systems can also be designed to push users toward tags that align with popular usage, systematically encouraging principles and thus classification. Social media systems can also include functionality that “bundles” tags, essentially building their own classification of user tags in order to enhance information retrieval. We’ll explore this in greater detail in the chapter.

Classification need not be performed directly by humans. Automatic indexing derives keywords from a document and provides access to all of those words. More complex systems take indexing a step further and build controlled vocabularies based on the keywords in the   ‐ 6 ‐  Chapter 6: Classification    Last revised: September 17, 2010  documents. Building on these controlled vocabularies, automatic classification aims to group similar documents using either a fully automatic clustering method or a predetermined classification scheme and documents already indexed according to that scheme.

Clustering allows us to perform automatic classification based on predetermined rules and guidelines that the machine will execute during document analysis. Computer scientist C.J.

Van Rijsbergen’s cluster hypothesis states, “closely associated documents tend to be relevant to the same requests” (Van Rijsbergen, 1979, p. 30).

Classes can be structured in one of two ways. First, a class can be intellectually formulated. That is, it’s structured through manual assignment, as in library classification, or automatic assignment, as in the Library of Congress’s CHESHIRE for which UC Berkeley professor Ray Larson developed entry vocabulary modules for clustering classification. Second, a class can be derived automatically from a collection of things in one of three ways: hierarchic clustering, agglomerative clustering, and hybrid methods, like query clustering.

For the purposes of automatic classification, data consists of objects with a set of four types of descriptors. These descriptors are similar in dimension to Ranganathan’s facets: multistate attributes (e.g., color), binary-state (e.g., keywords), numerical (e.g., hardness scale, or weighted keywords), or, when objects are themselves classes, (e.g., probability distributions).

To summarize, while classification and categorization are closely linked, they are not synonymous. Classification is, in a sense, the formalization and implementation of categorization. A classification can be hierarchical, faceted, social, or automatic, but in order for it to truly be a classification, there must be predetermined principles that serve as authority control for the organization of entities.

The following sections of the chapter will dive into each of these highlighted areas in more detail, including examples and applications of ontologies, faceted classifications, tagsonomies, and computational classifications.

6.2 Classification Theory 6.2.1 What is Classification?

As Louise Gruenberg wrote in “Faceted Classification, Facet Analysis, and the Web”:

“Classification is a higher order thinking skill requiring the fusion of the naturalist’s eye for relationships…with the logician’s desire for structured order…the mathematician’s compulsion to achieve consistent, predictable results…and the linguist’s interest in explicit and tacit expressions of meaning” (Gruenberg, 2002, para. 1).

As mentioned in section 6.1, a classification is a system of categories, called classes, ordered using a predetermined set of principles. The act of placing items into these classes is classification, and can be performed by people or, thanks to advances in fields like natural language processing, data mining, and the semantic web, machines. Classifications can be applied to a narrow set of concepts or entities, like kitchen supplies or beer, or to much broader sets of concepts and entities, such as Aristotle’s attempt to classify all beings or Dewey’s system to classify all knowledge for the purpose of finding it in a library. Classifications can also be applied to documents and data as well as concepts and actions—i.e. an entity’s placement in a class requires certain action be taken.

  ‐ 7 ‐  Chapter 6: Classification    Last revised: September 17, 2010  6.2.2 Purpose of a Classification A classification serves as a reference model—a semantic roadmap—to individual domains and relationships therein. This roadmap then enables us to better understand concepts and relationships between entities in a given domain. It also allows us to organize things in a way that will make them easier to locate. A classification in a home kitchen allows us to find what we need when cooking or baking quickly and easily. The way things are classified in a department store helps us to find specific domains of objects among many. A specialty store’s classification helps us to find specific objects within subclasses among many others. And the classification used in an online store allows Internet shoppers to locate and narrow down sets of matching items.

The four different forms of kitchen-related classification all relate to a single, specific domain of objects, but classifications can also be designed to classify all knowledge. In 1873, Melvil Dewey invented the Dewey Decimal Classification (DDC) as a scheme for classifying works in a general collection containing diverse subjects—essentially, collections of general knowledge. The first edition of DDC appeared in print in 1876, and it is currently the most widely used library classification scheme in the world’s public libraries, modified on a regular basis to “continually keep up with recorded knowledge” (OCLC, p. 1).

In contrast, Herbert Putnam created the Library of Congress Classification (LCC) in

1897. It was meant not to catalog all the world’s knowledge but to provide a practical way to organize and later locate items within the Library of Congress’s collection. It has since been adopted by research and academic libraries particularly in the United States, but most public and smaller libraries tend toward the Dewey Decimal Classification (DCC). Subject divisions in LCC are broad; for example, A contains “General Works,” M “Music,” and K “Law.” Contrast the widely used DDC and LLC systems with a collection of United States government documents distributed through the Federal Depository Library Program. New York University’s Bobst Library is a “selective depository,” receiving 55% of all documents distributed from the government to participating libraries. These materials come in various forms: pamphlets, books, booklets, newsletters, CD-ROM, and microfilm. Because the content is highly specialized, and because new subjects and pieces are added to the existing collection regularly, these documents, housed on the sixth floor of the NYU library, are organized by their own classification and hand-numbered as they come into the library. The documents are included in the library’s general catalog, BobCat, but users visiting the sixth floor are met with special indexes and librarians who know the collection quite well.

On the other hand, searching for information online with search engines like Google has drastically changed the way we expect to see search results returned. Keyword search returns links whose relevance (remember the concepts of recall and precision) depends on the algorithm powering the search engine. But search results are returned as links to the actual digital documents, not numbers that point to locations on shelves. Furthermore, unlike books on library shelves, documents online can exist in many different places. Because users are becoming so used to the methods of online searching, some believe that systems like DDC and LCC are in danger of being abandoned for less rigorous Google-like organization of libraries using a classification called BISAC.

Pages:     | 1 || 3 | 4 |   ...   | 6 |

Similar works:

«BOARD OF EDUCATION CARROLLTON PUBLIC SCHOOLS CARROLLTON, MICHIGAN REGULAR BOARD MEETING MIDDLE SCHOOL LIBRARY 6:30 P.M. 3211 CARLA DRIVE April 20, 2015 SAGINAW, MICHIGAN MINUTES Call to Order at 6:30 PM – Roll Call Presiding: Anthony McDonald Present: Fischer, McDonald, Feriend, Mejia, Schwerin Absent: Lapine, Carter Administration: Tim Wilson, Sarah Coates, Marc McKenzie, Tiffany Peterson, Shawn Thelen, Jason Kowalski, Cheryl Taylor, Nate Wotta, Ken Hyde Others in Attendance: Patricia...»

«Food or Fuel? (Teacher Notes) (The Chemistry and Efficiency of Producing Biodiesel) Background on Biodiesel Production (It is strongly suggested that the materials adapted for this background: http://www.unh.edu/p2/biodiesel/media/NHSTAhandout.doc be consulted as a one-stop resource for further detail in gathering lecture material). Vegetable oils and animal fats are triglycerides, containing glycerin. The biodiesel process turns the oils into esters, separating out the glycerin. The glycerin...»

«Media and Militancy: Case Study of Use of1FM Radio2013): 77-96 Swat IPRI Journal XIII, no. (Winter by Taliban in 77 MEDIA MILITANCY: CASE STUDY OF USE AND OF FM RADIO BY TALIBAN IN SWAT Sajjad Malik∗ Abstract The FM radio played an important role in the rise of the Taliban in Swat. Their leader, Maulana Fazlullah, was not the first cleric to use the airwaves to reach out to the masses but he was the first who was able to win over people to his version of Islam by his broadcasts on the pirate...»

«A thesis submitted for the degree of Doctor of Science (Computer Science) Aligning Access Rights to Governance Needs with the Responsibility MetaModel (ReMMo) in the Frame of Enterprise Architecture Christophe FELTUS Faculty of Computer Science, University of Namur, Belgium Public Research Centre Henri Tudor, Luxembourg Co–Supervisors: Prof. Dr Micha¨l Petit e Prof. Dr Eric Dubois Defended in March, 2014 Graphisme de couverture : c Presses universitaires de Namur c Centre de Recherche Public...»

«Community Energy Scrutiny Review DRAFT REPORT OF THE ENVIRONMENT AND REGENERATION SCRUTINY COMMITTEE London Borough of Islington May 2015 EXECUTIVE SUMMARY Community Energy Scrutiny Review Aim To explore and understand the community energy options available for Islington, their respective opportunities and issues and make recommendations on their relevance for the borough. Evidence The review ran from October 2014 until May 2015 and evidence was received from a variety of sources: Presentations...»

«The Thoughtful Christian David S. Dockery As the workday concluded on the Trinity campus prior to the Thanksgiving break, two students visited my office with a bag full of delicious cookies. I thanked them for their kindness and their thoughtfulness, reminding them how meaningful it was for me that they would remember those in the administration at this time of year. I then shared the cookies with others on the hallway, who likewise expressed appreciation for my consideration and...»

«UNIVERSITY OF LOUISIANA AT LAFAYETTE PUBLIC INFRACTIONS DECISION January 12, 2016 I. INTRODUCTION The NCAA Division I Committee on Infractions (COI) is an independent administrative body of the NCAA comprised of individuals from the Division I membership and the public that is charged with deciding infractions cases involving member institutions and their staffs. 1 This case involved the University of Louisiana at Lafayette (LouisianaLafayette) and the violations in this case involved the...»

«Film-Philosophy, 12.1 April 2008 The Practice of Strangeness: L’Intrus – Claire Denis (2004) and Jean-Luc Nancy (2000)1 Mar t ine Beugnet University of Edinburgh A child of the era of decolonization, Claire Denis grew up in various regions of France’s subSaharan colonial lands, and was brought back to the ‘métropole’ as a teenager in the 1960s. She has thus had a double practice of foreignness, abroad, and in her ‘own’ country, which she did not know and where, in similar yet...»

«The Status of [h] and [ʔ] in the Sistani Dialect of Miyankangi Farideh Okati, University of Sistan and Baluchestan, Iran f_okati4@yahoo.com Abbas Ali Ahangar, University of Sistan and Baluchestan, Iran aaliahangar@yahoo.com Carina Jahani, Uppsala University, Sweden carina.jahani@lingfil.uu.se Abstract The purpose of this article is to determine the phonemic status of [h] and [ʔ] in the Sistani dialect of Miyankangi. Auditory tests applied to the relevant data show that [ʔ] occurs mainly in...»

«European Commission DG Environment Establishment of guidelines for the inspection of mining waste facilities, inventory and rehabilitation of abandoned facilities and review of the BREF document No. 070307/2010/576108/ETU/C2 Annex 2 Guidelines for the inspection of mining waste facilities April 2012 Prepared by DHI in cooperation with Cantab Consulting Ltd University of Tartu Mecsek-Öko Miskolc University and VTT Inspection Guidelines Contents TERMS AND DEFINITIONS ABBREVIATIONS AND ACRONYMS...»

«Writing a Dynamic Personal Profile The purpose of a dynamic Personal Profile (or Career Summary) is to grab the reader's attention as soon as they start reading your resume. Together with your cover letter, it is your one and only chance to attract and hold their attention, to get across what is most important, and to entice the employer or recruiter to keep reading. Your Personal Profile focuses the reader's attention on the most important qualities, achievements and abilities you bring to the...»

«Shavington-cum-Gresty Parish Council Saaaaaa Clerk: Carol Jones Tel: 01270 812065 e-mail: carol.jones44@btinternet.com www.scgpc.org.uk Parish Councillors are summoned to a MEETING OF THE PARISH COUNCIL DATE: WEDNESDAY, 1 JUNE 2016 TIME: 7.15 pm VENUE: SHAVINGTON VILLAGE HALL SHAVINGTON Enquiries to: Clerk: Carol Jones Issue date: 27 May 2016 Date of issue: 9 March 2015 Signed To: Members of the Parish Council Councillors Gillian McIntyre (Chairman), W McIntyre (Vice-Chairman), V Adams, E...»

<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.