FREE ELECTRONIC LIBRARY - Dissertations, online materials

Pages:     | 1 |   ...   | 3 | 4 || 6 |

«Chapter 6. Classification Chapter author: Jess Hemerly jhemerly Table of Contents    6.1 Overview  ...»

-- [ Page 5 ] --

As we learned in Chapter 2 (“Identity and Identification”) and Chapter 3 (“Describing Instances”), people use different names for the same things and the same names for different things. So, too, do people apply different tags, and thus tagging can be distracting and deficient in information retrieval. Things may be tagged insufficiently or even with terms that don’t actually describe anything about the item at all. Thus, tagging suffers from the vocabulary problem.

Tagging is rather subjective, as it’s usually more of a quick and dirty task than a structured one. However, social media sites are beginning to build in mechanisms that add some structure to the tagging task. For example, on the social networking site Facebook, users can indicate that a specific person is in an uploaded picture by clicking on the faces of people in photographs, typing the person’s name, and then selecting the person from a list of Facebook friends. Because the system offers the user his or her full list of friends to choose from, the names are formatted the way they appear on a user’s profile, thereby creating a structured way to identify, describe, and connect people to photographs.

6.4.2 Folksonomy versus Tagsonomy Tagging by nature is an organizational free-for-all because it’s a highly subjective practice and is frequently done on the fly. A personal photo collection on Flickr may have pictures of trees tagged with the terms “woods,” “trees,” “forest,” and “forests,” or just one of these terms, or any combination of them. When retrieving photos from the collection, the variation in tags may make it difficult to find all pictures of trees. Without some rules, tagging is hardly a classification and more closely resembles categorization. This is what Thomas Vander Wal first referred to as “folksonomy” in 2004. It is simply a collection of tags users assign as descriptors, not a principled set of classes. But if a user were to choose one and only one term to use for all pictures of trees, as well as to decide whether the plural or singular form would be used, we’d see the beginning of a tagsonomy. We can thus overcome the vocabulary problem in tagging by creating a controlled vocabulary for the tags that will be used for various descriptive purposes.

Because tags are so much like facets, a tagsonomy can classify examples along multiple dimensions. We can tag with the location where the photo was taken, the event where it was taken either by general name or our own personal label—HawaiiVacation2010, for example—or by its public name, like SXSW. But just like categorization isn’t classification without the creation and application of guiding principles, tagging isn’t a tagsonomy without rules to dictate the dimensions along which tagging occurs, the granularity with which things are tagged, and the naming conventions that help form a controlled vocabulary for a set of entities. Will we use all plural, all singular, or a mix of plural and singular forms of nouns? And will we include spaces

–  –  –

between multiple-word tags if the site allows them? Making these decisions and applying them in the tagging process constitutes the different between tags and a tagsonomy.

6.4.3 Tagsonomies and Personal Information Management As we mentioned in 6.4.1, a tagsonomy can be useful in both information organization and retrieval. But tagsonomies play an especially useful role in situations of personal collections of entities and objects or even tasks and activities. A tagsonomy allows users to classify new entities as they are added, assigning them to classes based on a principled system of tagging.

Like the main IO/IR tradeoff between the up-front costs of information organization and the long-term benefits for information retrieval, a tagsonomy that’s consistently applied to tagging all of one’s music or one’s photos on Flickr makes it much easier to find things.

Let’s return to Flickr as our example here. There are two main levels of metadata at Flickr. First, as files are uploaded, machine metadata like exposure and date taken is added as well. Second, users are able to add tags to their photographs that allow them to describe the subject, the context, and more. Flickr displays these tags on the photo page and automatically generates an index of all tags a user has applied to uploaded photographs. For people, Flickr allows a user to tag the photo with the name of another Flickr user if that person appears in a photo, but since not everyone uses Flickr, a user may want to create another system to keep track of who appears in what photographs. This can happen on multiple levels. First, a user could use broad tags of “family” and “friend” to group more generally. A user could then tag family photos with specific identifiers like “mom,” “dad,” “cousin,” and “brother” and friend photos with tags like “best friend” and “girlfriend.” In order to maximize one’s ability to find all pictures of specific people within a collection, a user could come up with a system for tagging individual names, like “joeb” or full names. Each level of granularity defines a different class or subclass and principles that dictate how we tag enable us to find things more easily later on.

Task-based tagging also aids personal information organization, allowing us to build relationships between activities and domains. We can call this a taxonomy of tasks, or a taskonomy. Where a taxonomy organizes entities based on similarity of content or composition, a taskonomy organizes based on “activity structure” (Dougherty and Keller, 1982, p. 763-774).

Some people tend to organize their work areas in externally objective ways, such as by subject or topic—think of books going from a library shelf to a desk. They may then shift to more taskoriented organization as they complete a task, like writing a paper or creating a class presentation. Entities are no longer piled together objectively but are grouped according to resources necessary for a given task. These entities may retain some semblance of the objective organization, but the intention has shifted to something highly subjective: getting work done.

When a task is completed—a paper has been finished and turned in—the entities are then reorganized according to the original externally objective plan—i.e., books returned to their spots on library shelves. Of course, some people never organize at all, and everything ends up in the purgatory that is a “potential project” pile.

Taskonomies are interesting to think about when the work being done is not knowledge work but a skilled trade. Think about how you organize your own kitchen. Many people keep baking sheets and other items for oven use in a drawer under the oven, with pots and pans for stovetop use hung near the stove. This simple arrangement is a common example of a taskonomy, and, returning to our kitchen example, a cook’s taskonomy might look something

like this:

Figure 6.2: A Cook’s Taskonomy

–  –  –

Looking at the relationship between tasks and tools in this way can help a cook determine the best way to organize tools in a kitchen. Cutting items would necessarily be kept together near a prep area; having to run across the kitchen to another area where a poultry knife is kept with, say, chicken broth would be detrimental to the cook’s workflow. It would make far more sense to have all of the items for the task of cutting in a single area.

At an even more specific task-based level, think about the way you might prepare to make dinner. If following a recipe, many cooks like to pull all ingredients from their storage places and keep them close by in the prep area. This is similar to the idea of an activity-based “pile” mentioned above. After the meal has been prepared, items are returned to their original places, or “filed.” This piling and filing is an effective way to arrange items for a task at hand.

As with tagging content for metadata purposes, tagging tasks is a helpful way to build structure into something as simple as a to-do list. Items that are necessary for a given task can be tagged with a predetermined tag for that project or task in order to better organize all of the related items. A legal secretary could organize documents for an upcoming hearing by applying tags developed through a taskonomy so that all of the requisite electronic documents are easier to find. For collaborative work, assigning tags to tasks allows all collaborators to get a high-level view of the work to be done and who is best suited to perform a task. A taskonomy is also a useful way to achieve a high-level summary of what people do with certain items for user research and can then lead to more efficiency in design. Taskonomies, then, are excellent tools for helping to understand how things are done.

  ‐ 19 ‐  Chapter 6: Classification    Last revised: September 17, 2010 

6.5 Computational Classification 6.5.1 What is Computational Classification?

As we’ve seen, people can usually assign things to existing categories or create a new system of categories to design a classification. Knowledge experts have historically performed the task of classification, and these knowledge experts developed the major classifications we still use today, including Library of Congress Classification and Dewey Decimal. Even the scientific taxonomy was developed and refined by knowledge experts over time.

But it can be too costly in terms of time or effort to perform this manual assignment, especially when approaching a new domain or set of specialized documents. The cost increases when a set of documents changes or grows regularly. And when the value of the classification depends on it being done in a timely manner, such as filtering of news or email messages or clustering of search results, you just can’t do it manually because it isn’t useful unless things are classified immediately, even instantly.

Sometimes computational classification is fully automated, performed entirely by machines. Other times, people—sometimes data scientists, sometimes normal people through services like Amazon’s Mechanical Turk—assist the machines by refining results or preorganizing. Text analysis programs can index documents to help determine their similarity and, thus, what documents belong in a set. Banks perform automatic classification of us when determining credit risk based on our credit scores.

Advances in natural language processing—where machines and computers can use human language instead of only machine language as inputs and outputs—coupled with the expansion of the field of data science have allowed us to empower machines with the capability to classify objects, entities, and, specifically, documents through text classification. In the case of library science, for example, assigning the terms from a controlled vocabulary like Library of Congress subject headings is text classification because each term can be thought of as a category. But this work can be aided with automatic text classification, allowing new entities to be matched to the appropriate headings based on analysis of the document text or document metadata. This doesn’t remove the librarian entirely; it simply aids the librarian in the work of classifying new materials, especially digital ones.

Of course, text classification is not 100% perfect, but we don’t compare automated approaches to “perfect” classification, only to that which can be done by people. Because text classification processes can be applied to an incredible variety of domains, and because of the increasing number of documents in digital form, text classification is a growing and important field.

6.5.2 Machine Learning Machine learning is a process by which a computer, usually through the use of a complex algorithm, builds a text classification from a set of documents by “learning” the general categories that groups of documents share in common. Machine learning happens in two major ways: supervised and unsupervised. With supervised machine learning, we give the machine the categories we expect a set of items to fit into and the machine learns to give us the output we desire based on the input we provide.

A familiar example of supervised machine learning is filtering, a form of automated text classification. Text classification assumes a system of categories and labeled instances so that we   ‐ 20 ‐  Chapter 6: Classification    Last revised: September 17, 2010  can train a system to assign new entities or occurrences to the appropriate classes. Take, for example, your email inbox. At the simplest level, incoming messages are classified by your mail server or program as SPAM or NOT SPAM. Those messages which are SPAM are filtered to a SPAM folder, while those that are NOT SPAM head to your inbox. The spam filter looks for different characteristics within emails, such as nonsensical phrases, odd URLs and email addresses in the sender field, or key terms like “pharmaceutical” or “beneficiary.” The machine performs these tasks without any human help. Sometimes, however, things we want to receive end up in the spam folder and we have to go and look for them there. We may miss an important message because the computer mistook it for spam. We then mark the message NOT SPAM, teaching the computer that messages like this one are meant for the inbox, not the spam folder. Likewise, spam may sneak past the filter and end up in the inbox. We then have to mark the item as spam, teaching the computer that items like this one are meant for the spam folder.

We can also further filter the items coming into our inboxes into specific categories depending on things like sender email address or subject line. While you could tag each message once it has already reached your inbox, defining filters saves you the work and allows you to automatically organize your email inbox. Here, we provide the machine with a set of parameters and the machine then does the work of classifying our messages based on the message text.

With unsupervised machine learning, the machine receives input but does not receive categories. Instead, the machine finds patterns in the data of which we must make sense and to which we must attach meaning. The goal is to build representations of the input that can later be turned into a useful and reusable classification. We’ll explore this in greater depth in 6.5.3.

Pages:     | 1 |   ...   | 3 | 4 || 6 |

Similar works:

«ATLANTIAN ARMY BASIC MELEE TRAINING MANUAL Formerly Known as “The Dogs of War” Compiled by Baron Turgeis Hakonarson March 2005 Edited by THL Susanna Grey Reviewed and Approved by Count Valharic Caligula Aurelius, KSCA Atlantian Warlord, A.S. XXXIX This is not an official publication of the Society for Creative Anachronism, Inc. The author and collaborators accept no responsibility for the use or misuse of the information contained herein. Remember, our opponents are our friends, this is a...»

«The Wine Vocabulary Book By Wines Wonderland http://www.WinesWonderland.com When you walk into a wine shop you may feel like you have set foot on another planet. Nobody seems to be speaking a language that you understand. but they all seem to either be understanding each other or just nodding their heads because they don’t know what anyone is talking about either. Here’s your big chance to get the drop on everyone else. You can know wine words that they don’t even know. The tricky thing...»

«2014 Uniform Evaluation Report Chartered Professional Accountants of Canada UNIFORM EVALUATION REPORT i MEMBERSHIP OF 2014 BOARD OF EVALUATORS Christine Allison CPA, CA MD Funds Management Inc. Ottawa, Ontario Pierre-Yves Desbiens, CPA, CA, CF, MBA Cindy Ditner, FCPA, FCA, CMA Institute NEOMED BDO Canada LLP Montréal, Québec Toronto, Ontario Aline Girard, Ph.D., MBA, CPA, CA Mike Fitzpatrick, CPA, CA HEC Montréal Fitzpatrick & Company Montréal, Québec Charlottetown, Prince Edward Island...»

«Chapter 11 Smoking and oral tobacco use Majid Ezzati and Alan D. Lopez Summary Smoking has been causally associated with increased mortality from several diseases. This chapter provides global and regional estimates of premature mortality and disease burden in 2000 caused by tobacco use, including an analysis of uncertainty. It also describes a method for estimating the future burden of disease that could be avoided through smoking cessation or prevention. Comparable data, especially...»

«480 Secondary minerals of the Tertiary basalts, Barrington, New South Wales. By BERYL NASttAR, B.Sc., Ph.D., and M. DAVIES, B.Sc. Newcastle University College, New South Wales. [Taken as read 31 l~arch 1960.] ~ummary. Secondary minerals described from Tertiary amygdaloidal basalts in the Hunter Springs area, Barrington, New South Wales, are regarded as having been deposited from cold solutions some considerable time after consolidation of the basalts. The constituents in the solutions were...»

«Eötvös Loránd University Department of Information Systems DISTRIBUTED SURVIVABLE PIPELINE COMPUTATION AND COMMUNICATION PhD Dissertation Zsolt Palotai Supervisor: Dr. habil. András Lőrincz Head Senior Researcher ELU, Department of Information Systems PhD School of Informatics Dr. János Demetrovics PhD Program of Information Systems Dr. András Benczúr Budapest, 2007. 1 Abstract The increasing availability of mobile and/or intelligent sensors with computation and wireless communication...»

«FEDERAL RESERVE SYSTEM 12 CFR Part 222 [Docket No. R-1484] RIN 7100 AE14 Identity Theft Red Flags (Regulation V) AGENCY: Board of Governors of the Federal Reserve System. ACTION: Final rule. SUMMARY: The Board of Governors of the Federal Reserve System is amending its rule on identity theft “red flags” (“Red Flags rule”), which implements section 615(e) of the Fair Credit Reporting Act (FCRA). The Red Flag Program Clarification Act of 2010 (the Clarification Act) added a definition of...»

«Unofficial translation (Amendment 908/2011) Act on European Schooling Helsinki 1463/2007 Amendments up to 268/2011 Chapter 1 General provisions Section 1 Purpose of the Act (1) This Act provides for the establishment of European Schooling Helsinki, the education provided by the School and the administration and staff of the School. Section 2 Establishment of the School (Amendment 908/2011) (1) A state-run school called European Schooling Helsinki shall be established by this Act. The number of...»

«1 2012 UTKARSA OSA NEWSLETTER June Issue Vol 44 From the Editor’s desk: The June issue of Utkarsa is a window to the Seattle OSA 2012 convention. In addition to the convention highlights, we have the information on OSA 2012 symposiums, quarterly reports from office bearers, chapter reports and the literary section. Buddhism and Odisha, a must read, is also included in this issue. Enjoy your summer break. Editors: Sridhar Rana and Julie Acharya Ray (Photo: Silex Spring Geyser at Yellowstone...»

«Brief Chronicles V (2014) 61 What Happens in Macbeth: An Originalist Reading of the Play Richard F. Whalen M acbeth is a case study in how a Shakespeare play can be misread and thus misunderstood, especially by Stratfordian academics whose commentaries on what happens in Macbeth have misled readers and theater audiences. What happens in Othello has also been misunderstood, and the same may be true for other Shakespeare plays, notably Hamlet. The Stratfordian commentators have described Macbeth...»

«89 DARWINIAN SELECTION IN ASYMMETRIC WARFARE: THE NATURAL ADVANTAGE OF INSURGENTS AND TERRORISTS Dominic Johnson University of Edinburgh I've killed them by the tens of thousands, scoured their countryside at will, pried their allies away, and humiliated them day after day. I have burned their crops and looted their wealth. I've sent a whole generation of their generals into the afterworld. Have I changed nothing? They are stronger now than before. They are more than before. They fight more...»

«Ascent Investment Partners Plan your ascent Ascent Investment Partners, LLC 1401 South Brentwood Boulevard, Suite 390 St. Louis, MO 63144 www.ascentinvestmentpartners.com ©2011 Ascent Investment Partners, LLC www.ascentinvestmentpartners.com Page 1 Overview Ascent Investment Partners was formed to meet the unique needs of fixed income investors. Because safety and stability are the primary reasons investors include a fixed income allocation in their portfolio, we carefully balance risk and...»

<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.