«Chapter 6. Classification Chapter author: Jess Hemerly jhemerly Table of Contents 6.1 Overview ...»
Like Aristotle before him, Hindu mathematician S.R. Ranganathan sought to organize all the world’s knowledge. Facing the limitations of Dewey’s system, where an item’s essence had to first be identified and then the item assigned a label and placement based on that essence, Ranganathan thought that items could be organized and a notation designed around a variety of aspects.
Faceted classification, with colon notation as imagined by Ranganathan, still exists in the context of special library collections. Even though Ranganathan sought to classify all the world’s knowledge, single content areas lend themselves nicely to faceted classification. Think of the difference between a library with a large general collection and a narrow, specialized collection as the difference between kitchen supplies in a department store versus a kitchen supply store (section 6.1). The specificity of items can be captured with a general classification, like Dewey Decimal or LCC, but the notations can be lengthy and new knowledge is much harder to add to a general classification than to a classification created specifically for a single subject area. A
specialized classification can also help in organization of physical multimedia items, such as pictures.
6.3.2 Faceted Classification as a Controlled Vocabulary According to Elaine Svenonius, “Facets are groupings of terms obtained by the first division of a subject discipline into homogeneous or semantically cohesive categories” (Svenonius, 2000, p.
140). The relationships between these facets results in a controlled vocabulary (recall section 2.5.4) governing the entities we are organizing. From this controlled vocabulary we can generate many structured descriptions that are complex but formally structured and enable us to describe things for which terms don’t yet exist. It is the combination of facets into compound terms that really exemplifies its role as a controlled vocabulary.
For example, Netflix classifies movies along an interesting number of spectra. There are the broad classes, or genres, of Horror, Mystery, and Documentary, but the site also organizes movies by date of release, moods (campy, quirky, mind-bending), qualities (cult, IMAX), storylines (future dystopias, opposites attract, rogue cops), and more. Many of these facets fit Ranganathan’s PMEST system of order, with moods being personality, time being release date, and storyline as energy. Users can select preferred characteristics from the taste menu and Netflix will recommend movies of compound facets like “Mind-bending 1980s cult future dystopia” movies. Netflix also provides examples of movies within the class so that users can get a better sense of what the term means. By combining examples and classes and not allowing users to provide their own terms for classification, Netflix maximizes the use of facets to develop a highly complex system of movie description.
Getty’s Art & Architecture Thesaurus (AAT) is a robust and widely used controlled vocabulary consisting of generic terms used to describe artifacts, objects, places and concepts in the domains of “art, architecture, and material culture” (Getty, para. 2). It was developed in the mid-1980s and released as a book until 1997, when those maintaining it realized that the vocabulary was so large and changed so frequently that users would be better served by a dynamic online version that could change easily than by a book bound to a publishing cycle.
The terms within the AAT are arranged hierarchically within facets. The AAT’s facets are “conceptually organized in a scheme that proceeds from abstract concepts to concrete,
physical artifacts” (Getty, para. 19):
• Associated Concepts: concepts, philosophical and critical theory, and phenomena, such as “love” and “nihilism.”
• Physical Attributes: material characteristics that can be measured and perceived, like “height” and “flexibility.”
• Styles and Periods: artistic and architectural eras and stylistic groupings, such as “Renaissance” and “Dada.”
• Agents: basically, people and the various groups and organizations with which they identify, whether based on physical, mental, socio-economic, or political characteristics—e.g., “stonemasons” or “socialists.”
• Activities: actions, processes, and occurrences, such as “body painting” and “drawing.” These are different from the “Objects” facet, which may also contain “body painting” but there in terms of the actual work itself, not the process of creation.
Within each facet is a strict hierarchical structure drilling down from broad term to very
specific instance. For example, let’s look at where we’d find “patent leather” in the hierarchy:1
Figure 6.2: “Patent Leather” in the Art & Architecture Thesaurus We can see from this example how a particular instance may be described on a number of dimensions for the purpose of organizing the item and retrieving information about it.
And by using a standard controlled vocabulary, catalogers and indexers make it easier for users to understand and adapt to the way things are organized for the purpose of finding them.
6.3.3 Facets in Information Retrieval Return to our kitchen scenario in 6.1, specifically the online shopping example, and remember the classes—price, brand, and type—that you would use to narrow down your choices. The shopping site we have described uses facets in its organization and its user interface. These facets improve your ability to browse, discover, and find items you may want to purchase.
As discussed in the introduction to this chapter, there are four major types of facets. First, they can be enumerative, meaning that they are a set of mutually exclusive possible values. As discussed above, under silverware we might see spoon, fork, and knife. These are examples of mutually exclusive values; something cannot be a fork and a knife, nor can we have a spoon and a knife. One difficult utensil in the silverware category would be a spork, which is the
1http://www.getty.edu/vow/AATFullDisplay?find=leather&logic=AND¬e=&english=N&prev_page=1&subjecti d=300193362 ‐ 14 ‐ Chapter 6: Classification Last revised: September 17, 2010 combination of spoon and fork. Because the combination of spoon and fork results in a new utensil, spork would be semantically similar enough to add to our list under silverware.
Second, facets can be Boolean, meaning yes (true) or no (false) along some dimension.
On a sportswear website, a class “Waterproof” could exist, with yes or no as the facet options.
Boolean facets are binary and thus useful when expressing whether a feature or characteristic exists or doesn’t exist among a set of entities.
Third, facets can be hierarchical or taxonomic, which are used to organize our instances, concepts, or entities by logical containment. At Williams-Sonoma’s website, the major classes—parent facets—include Cookware, Cooks’ Tools, and Cutlery. Within each of these categories there are two ways to view subcategories: Shop by Category, which for cookware includes “Saucepans and Sauciers” and “Roasters;” and Shop by Brand, which lists five different cookware brands to choose from. Each of the subcategories contains its own subcategories, enabling the user to further drill down in terms of specificity to find products. For example, “Saucepans and Sauciers” contains subcatgories “Saucepans” and “Sauciers.” Each facet is a logical container for the objects and categories therein.
Finally, facets can be a spectrum of numerical attributes along a range with a defined minimum and maximum. Price would be an example of a spectrum facet, as we could organize entities based on range of cost. Our facets could be something like “$0 - $49,” “$50 - $99,” $100
- $149,” and so on. Date would also be an effective numerical facet, as with the Netflix release date example.
6.3.4 Designing a Faceted Classification for the Web We follow a series of seven steps when creating a faceted classification. The first step is to collect the set of concepts or entities that need to be classified. Next, we analyze these concepts or entities in order to identify some good candidates for facets. Once we have some ideas of what facets might be appropriate for our set, we order the foci within the facets. At this point, we begin to look at the relationships between facets themselves and establish the grammar we will use to order and combine facets and subfacets, and create new facets and subfacets as needed.
Finally, we test the classification on new examples outside of our existing set in order to ensure that the classification is sufficient to account for new concepts or entities that could be added to our set in the future. Throughout the entire process, we iterate and refine in order to make sure the classification can best fulfill its purpose.
When choosing facets, following a set of fairly general criteria helps to create the best possible system a given purpose.
• Orthogonality: facets are independent dimensions; assigning an entity to one subfacet to an entity excludes it from being assigned to another subfacet under the same main facet heading.
For example, one facet could be “Type of Tool” and include cookware, glassware, dishware, and utensils; another could be “Brand.” A spoon could theoretically be placed in cookware (spoons you cook with), dishware (spoons you eat with), and utensils (all spoons could go here), but only one may be chosen. However, we would also assign the same spoon a brand, and that orthogonal classification is possible because “Brand” and “Type of Tool” are independent dimensions.
• Semantic Balance: Top-level facets are the most important semantic dimensions, e.g.
Cookware in the Williams-Sonoma example. Subfacets within a parent should be on an equal level semantically. Subfacets of Cookware like “Sauciers and Saucepans” and “Roasters and Brasiers” are semantically equivalent as they are named and grouped by cooking activity.
‐ 15 ‐ Chapter 6: Classification Last revised: September 17, 2010
• Coverage: Facets should classify all instances currently in the set. This is pretty straightforward, since if the classification you’re designing doesn’t even sufficiently organize the existing objects in the set, it will inevitably fail as new entities or objects come along, which is addressed by the next criterion, scalability.
• Scalability: In addition to adequately covering existing entities, we have to consider the need for the facets to accommodate future additions to the set of instances. That is, we want our classification to be scalable as our set of entities grows. While we can’t plan for every possible new instance, taking the time to think about the facets’ applicability to hypothetical instances means we don’t have to redesign a classification every time a new entity comes along.
• Objectivity: Although every classification has a perspective due to the conditions of its development either by people or by machine, facets should come as close to objectively classifying instances as possible. This is also called concreteness.
• Normativity: In order to make a faceted classification as useful by as many people as possible, the terms we use to name our facets and categories therein should not be idiosyncratic or metaphorical. Facet semantics should be mainstream and straightforward.
Hierarchical organizations within a facet can be maintained—facets can contain classes and subclasses—but we also need to determine how to order our facets in the user interface. A few principles exist to guide facet ordering.
• Simple to Complex
• Frequent/Popular to Infrequent/Less Popular
• Spatial, Geographical, or Geometric
• By size
• Chronological As we’ll see in section 6.5, humans don’t always manually create and sort items into facets.
Information and data scientists have made efforts to break down descriptions of items in order to identify common terms and dimensions that become the basis of a faceted classification or automatically assign items to relevant facets. In some cases, the computer takes an algorithm and works without human intervention. In other cases, people and machines work together to build a classification and assign items to classes.
6.4 Social/Distributed Classification
6.4.1 What is Tagging?
Ranganathan’s PMEST dimensions have found a natural use outside of regimented faceted classifications. These dimensions also unconsciously guide the tags people use to label their personal information or information shared online, particularly on social media sites. Tags are descriptive labels, in the form of words or phrases—WorldCup2010, for example—that build a ‐ 16 ‐ Chapter 6: Classification Last revised: September 17, 2010 metadata description of an item, whether a photograph, a song, or a restaurant on a shared review site.
With the rise of digital media, both kept in personal collections and shared online, the practice of tagging has emerged as a way to apply labels to content in order to describe and identify it. Collections of tags become incredibly useful both in categorizing information online to share with others and in managing one’s own media collection. Users of Last.fm tag music with labels that help describe its nature, era, mood, or genre, and Last.fm uses these tags to generate radio stations that play music similar to that tag and related tags.
But tagging has a downside. We tend to see a long tail for tags online, meaning that people may use a few common terms but beyond that many people also use many different tags.