«Chapter 6. Classification Chapter author: Jess Hemerly jhemerly Table of Contents 6.1 Overview ...»
The Book Industry Study Group (BISG) is a non-profit organization that, in its own words, "develops, maintain and promotes standards and best practices that enable the book industry to conduct business more efficiently." While the BISG deals with many aspects of book sales, one of BISG's standards is the Book Industry Standards Advisory Committee (BISAC) ‐ 8 ‐ Chapter 6: Classification Last revised: September 17, 2010 classification scheme. The BISAC is a classification system primarily meant to suggest to booksellers how a book should be placed on a bookstore's shelves or how it should be categorized in an online bookseller's database.
The LCC and the BISAC generally have little need to come into contact, as they are used for completely different purposes. But this changed in 2004, when Google announced an ambitious project to digitize the majority of the world's books. While both for-profit and nonprofit organizations have attempted to do this in the past, currently no competitor comes anywhere close to the digitization capacity of Google. Thus, the utility of what might be the world's largest index of digitized books could benefit the general public. Scholars, too, would be able to derive a great deal of benefit from the Google books project. However, much to the dismay of many people in the academic community, Google initially chose to classify books using less scholarly BISAC standard, rather than LCC.
BISAC, naturally, is meant to help casual bookstore customers browse for books. BISAC subject categories are very broad, and too general for someone interested in rigorous academic research. For example, a top-level category in the BISAC 2009 classification is “Cooking.” This is possibly a result of the success of the lucrative cookbook industry— Publisher's Weekly estimated that Americans purchased 530 million cooking and wine books in 2000 alone—or perhaps an attempt to extract the art of cooking from the Dewey heading of “Home economics & family living.” In contrast, a scholar looking for books about the history of a particular agricultural commodity might not find the BISAC categories as useful. BISAC offers a "Cooking/History" category, but an academic may be interested in a richer granularity of the subject matter.
In 2007, the then-new Perry Branch Library in Gilbert, Arizona eschewed Dewey Decimal for a BISAC-meets-Google-Books-inspired classification. Instead of classifying and labeling books with Dewey notation, the Perry library opted to label things with plain-English subject titles like “history” and “weddings” and create a computer catalog that allows users to look up books and find where their location among the subject-headed stacks (Lavallee, p. 1).
The classification combines the browsability of a large bookstore and the search capabilities of Google in an attempt to make the library friendlier to users. Many librarians increasingly recognize a need for electronic catalog search to take cues from other digital search engines in order to help people find relevant things with the least amount of effort.
However, critics argue that BIASC-based subjects aren’t sufficient to organize the Google Books collection and that Google, as a for-profit organization, has ambitions to provide access through various electronic bookstores. In the words of UC Berkeley Professor Geoffrey Nunberg, “In short, Google has taken a group of the world's great research collections and returned them in the form of a suburban-mall bookstore” (Nunberg, 2009, para. 16). The Dewey system has been translated and adopted worldwide, so things can be classified in the same way and found in the same place regardless of the prevailing language, although Universal Decimal Classification (UDC) is more widely used outside the United States. But with subject headings, things are not nearly as standardized and it’s more likely that each region—or worse, each library—would arrange books in a slightly different way. Thus, we see that another principle of designing a classification lies in the balance between recognized systems and modern trends in information demand.
In enterprise terms, a chosen classification heavily influences how a consumer interacts with the business. Thus, classification has implications for the business-to-customer relationship.
An online store with too few or too many facets can make it unnecessarily difficult for the ‐ 9 ‐ Chapter 6: Classification Last revised: September 17, 2010 customer to find things, prompting the user to look for what he or she needs elsewhere. In a store, we often become frustrated when an object isn’t where we might expect it to be located, and that frustration can lead a customer to be dissatisfied with a store experience and shop elsewhere. But arrangement can also encourage customers to purchase items they otherwise might not have thought they needed, both online and off. A customer may not think to buy napkins that match a set of dishes if those napkins weren’t located near the dishes and if the store hadn’t laid out example place settings. An online store recommendation based on an item in the user’s shopping cart may lead a user to buy additional accessories.
6.2.3 Classification is Principled There are three characteristics of a classification that cause a classification to be principled.
Principled, in this case, doesn’t necessarily mean “good;” it simply means that rules and guidelines dictate practice. All classifications, whether created by machine—through document analysis or data mining—or by people—one person or a group of people—have a perspective due to the decisions that have to be made in designing the classification.
First, we must decide if we’re trying to classify all knowledge or, as discussed in section 6.1, using a single collection or specialized set of entities within a target domain. Warrant, then, is the principle that guides how what will dictate our organization. In our kitchen example, we might see object warrant, where similar objects are put together, but more likely the dictating principle will be one of use warrant, where things are organized based on how they are used.
Once warrant has been identified, we need to consider how flexible the system will be to accommodate new entities. As of April 2010, the DDC is in its 22nd Edition. While the system has not been drastically modified—large-scale changes would require all libraries using the system to reorganize—it has evolved to accommodate new knowledge. We also need to determine whether we plan to classify subjects broadly or precisely and how enumerative our scheme should be, a concept we’ll explore in section 6.2.4. The decision to classify broadly or precisely depends largely on how broad a set of entities the system of categories has been designed to organize. Because of the diversity of objects in a department store, a broad classification is necessary to accommodate everything in the store. But in a special kitchen supply store, we can classify quite precisely because of the nature of the objects and the nature of those who want to buy things there.
In order to explore this issue of perspective in systems, we can contrast the way DDC and LCC handle religion and its various subclasses. Religion in DDC is the 200 class, with nine subclasses. Six of these nine classes are topics with “Christian” in the name; one class is for the Bible alone; and another section is entitled “Natural theology.” Anything else related to the world’s many religions has been lumped under 290, “Other religions.” Whether Dewey intended his system to be intentionally biased toward Christianity, the way religion has been classified makes it look as though there is an inherent bias.
How a new entity is classified can also reflect political bias, particularly in the kind of classification where an entity’s assignment to a class results in prescribed action. In 2003, the Columbia space shuttle disintegrated upon re-entry into the earth’s atmosphere. The accident was due to a piece of foam striking one of the shuttle’s external tanks during launch. Foam strike was not an unfamiliar concept, and NASA’s debris assessment team used a classification in order to identify risk and necessary action relating to these kinds of incidents. The key classes here are “in-family,” a regular, recognized event, and “out-of-family,” an anomaly. A United Space ‐ 10 ‐ Chapter 6: Classification Last revised: September 17, 2010 Alliance manager, a contractor, recognized the intensity of the foam strike as “out-of-family” due to its severity, but the NASA mission management team resisted this classification and called it “in-family” instead. Obviously there were larger organizational issues at play than simply classification, but the misclassification of an entity meant certain protocols were not initiated that could have prevented the disaster upon re-entry (Murphy, 2003).
According to the uniqueness principle, each class can be further divided into subclasses, and each level of hierarchy divided according to some feature or characteristic. On each level, the features and characteristics should be semantically similar. For example, silverware could be divided into sporks, spoons, forks, and knives. But dividing silverware into the subdivisons teaspoons, butcher knives, forks, and serving forks would not be semantically appropriate.
As discussed in the chapter introduction, a classification is lawful, systematic and arbitrary. It is lawful by nature, since its structure and relationships between categories are government by a set of principles. The principles are developed based on analysis of a set of entities or concepts, performed either by machine or by people.
6.2.4 Spectrum of Classification Looking back at our kitchen examples, we can see that there exists a range in the structure of our classifications, made most clear in the distinction between how we organize physical objects on shelves or in a display versus how we can organize the bits representing these objects online.
With objects, they can be in one and only one place on the store shelf. We could in theory put a few blenders in different locations, but it makes much more sense for them to be grouped together in a single spot. We are limited by the physicality of the objects.
A classification ranges along a spectrum from enumerative to faceted (Batley, 2005, p.
4). Enumerative here means that all possible classes or elements within a set are accounted for in a single comprehensive listing. Where enumerative classifications list all possible subjects and prescribe notation to classified entities, faceted classifications allow notation to be built from the combination of orthogonal characteristics. We’ll explore faceted classification in depth in 6.3.
The best example of a highly enumerative classification is the Library of Congress Classification (LCC). All possible topics are listed according to the practical purpose of organizing books in the LCC collection. The DDC is a highly enumerative classification system but reflects facetted properties as well. The DDC allows notation to link between subjects and objects by combining notation between conceptual areas. Universal Decimal Classification (UDC) is closer still toward the middle of the spectrum, allowing librarians to note special aspects of a work and relationships between it and other work or subjects. Like Dewey, UDC is an all-knowledge sort of system, but it has great applicability in specialized libraries as well.
Enumerative classification systems tend toward the “one and only one place” rule. Since libraries tend to have single copies of books, a book can exist in one and only one location within the library. Thus, the librarian must make decisions regarding the content of the book and follow guidelines to determine where the book should sit on the shelves. These guidelines include “Works dealing with multiple subjects are classed with subject being acted on” and “Class a work on multiple subjects with the one receiving fuller treatment” (OCLC).
6.2.5 Enumerative Classification versus Categorization
The distinction between categorization and classification seems superfluous but it in the context of information organization it is important to understand. Categorization can be thought of as a natural cognitive activity with fuzzy boundaries, while classification, in its most enumerative form, relies on predetermined mutually exclusive classes with fixed boundaries.
Category membership depends on perceived similarity; class membership depends on necessary and sufficient characteristics. Category membership is flexible and based on immediate context; class membership is true or false based on the “intension of a class” (Jacob, 2004, p. 528). With categories, some members may be more or less representative of that particular category. In a classification, class members are equally representative of the class.
Hierarchies may emerge from groupings of categories, but in a classification, there is a hierarchical structure of fixed classes (Jacob, 2004, p. 528).
Aristotle’s biological classification differs significantly from the biological classification we use today, which separates living things into eight taxonomic domains (domain, kingdom, phylum, class, order, family, genus, and species) based on a combination of Carolus Linnaeus’s characteristic-based grouping and Darwin’s principle of common descent. In contrast, Aristotle’s classification started with “animals with blood” and “animals without blood” and split animals into subclasses using the guiding principle of complex to less complex. Our present-day biological classification has strict rules for naming, while Aristotle’s hierarchical system was based largely on grouping according to perceived similarity.
6.3 Faceted Classification
6.3.1 What are Facets?
As we discussed in 6.2.5, enumerative classification imposes a number of constraints on how a classification is designed to accommodate classes. Highly hierarchical classifications may work to classify atoms—e.g., physical books—but when we turn to bits, hierarchical classification falters. Card sorting, a process where users are brought into a room and asked to sort cards with labels and entity names on them in order to help develop a system of organization, also results in limiting, hierarchical systems. Thus, we often turn to faceted classification in order to overcome enumerative classification’s limitations, especially on the web.