WWW.DISSERTATION.XLIBX.INFO FREE ELECTRONIC LIBRARY - Dissertations, online materials

<< HOME
CONTACTS

Pages:     | 1 | 2 || 4 | 5 |   ...   | 11 |

# «THE UNIVERSITY OF CHICAGO GRAMMATICAL METHODS IN COMPUTER VISION A DISSERTATION SUBMITTED TO THE FACULTY OF THE DIVISION OF THE PHYSICAL SCIENCES IN ...»

-- [ Page 3 ] --

1 (Context-freeness is an independence assumption). Grammatical methods in general are characterized by 1) a method for decomposing novel data in a hierarchical fashion, and 2) an explanation for each level of the hierarchy in terms of previously seen data. For example, in a standard CFG, we will have both a parse tree over a sentence, and a set of labels for the nodes of the parse tree. If the CFG has been automatically generated (i.e., no human has assigned semantic meaning to grammar elements), then the label’s meaning will be derived solely from which parts of the training data have received that label.

For generic grammatical methods, we can deﬁne context-freeness by specifying that the contents of a node N (all nodes below N in the hierarchy) are independent of the context in which N appears (all nodes not below N in the hierarchy) given the data stored at N (its label and any attributes).

Remark 1.4.

2 (Independence assumptions yield more eﬀective data). Independence assumptions let you eﬀectively multiply the amount of data you have. Consider the following simple problem: try to classify 0-1 vectors into two classes given a bunch of labeled examples.

Consider the two sort of extreme things we can do: if we assume that each coordinate of the vector is independent, we get a Naive Bayes classiﬁer, where we basically learn what the distribution is over a given coordinate for each class. This can be done with a small number of training examples.

The other extreme is assuming that there is total dependence, that the vectors are just arbitrary elements of a set, and the individual coordinate values do not mean anything. Then the maximum likelihood classiﬁer (Paranoid Bayes?) is given by seeing how many times a speciﬁc vector showed up in each class, and picking the class where it showed up more often.

We would have to guess on any novel vector.

If the independence assumption is valid for the data, then the Naive Bayes classiﬁer acts like the Paranoid Bayes classiﬁer trained on a data set that is exponentially larger.

(Speciﬁcally, for each class, we generate all possible vectors that can be made by taking the ﬁrst coordinate of a random example from the class, then taking the second coordinate from an independently chosen random example from the class, etc.) Even when the independence assumption does not apply, context-free grammars are useful. This can be seen in examining the classic nonsense sentence ”Colorless green ideas sleep furiously”. The supposed weakness of context-free grammars, that the context-free assumption is unrealistic, is actually a strength, because it allows us to parse and react to very unrealistic data.

1.5 Grammatical Vision is Diﬃcult

Little concrete progress has been made with visual grammars. The general frameworks have few practical results, and people with practical results don’t seem to have a general framework.

We are also hindered because the closest equivalent to linguistics is the cognitive science of vision, which is not as well-stocked with intermediate hypotheses and sanity checks. For example, in linguistics, utterances can be decomposed into phonemes. There is no agreement as to how a visual signal can be decomposed into canonical parts.

1.6 Grammar induction is diﬃcult

Grammatical methods in vision are inspired by grammatical models like probabilistic contextfree grammars in linguistics. In computation linguistics, the problem of learning a grammar from unlabeled examples (grammar induction) is very diﬃcult.

There are many negative theoretical results (Lee [1996]), the most fundamental of them

being Gold’s Theorem:

Theorem 1.6.

1 (Gold’s Theorem). Consider an algorithm A trying to learn a language L ∈ Σ∗. A is given an inﬁnite sequence of words w1, w2, · · · ∈ L which contains every word in L at least once. After each wi, A outputs some Li. A is said to learn L in the limit if, for some n, L = Ln = Ln+1 = Ln+2 =....

Then, if a class of languages L contains all languages of ﬁnitely many strings, and at least one inﬁnite language (in particular, if L is the set of context-free languages), no algorithm can learn all languages in L in the limit.

Note that this result applies to all algorithms, regardless of their complexity.

This is a fundamental theoretical barrier to grammar induction. There is some hope that we can defeat it by adopting a Bayesian approach, and only hoping to learn suﬃciently simple grammars. This has been attempted several times, but no solution is known for the associated Bayesian learning problem, and a heuristic search strategy must be used (Cook [1976], Stolcke [1994], Nevill-Manning and Witten [1997]).

It is clear that the problem of learning context-free grammars applies directly to curve grammars, since part of our curve model actually is a PCFG (see Chapter 2). The learning problem might be easier if visual grammars can be simpler in structure than string grammars for natural languages. We don’t know if this is the case.

1.7 Visual Grammars are hard for additional reasons

–  –  –

The continuous nature of visual signals means that we will rarely if ever see exact agreement between two visual objects at the pixel level. Compare a well-deﬁned linguistic object (such as a word) to a visual part such as an eye; an eye will look slightly diﬀerent in every image because of natural variation and photometric eﬀects.

Additionally, because language has well deﬁned symbols, in the case of written language, or agreed-upon phonemes, in the case of spoken language, it is possible in computational linguistics to check whether two letters or two words are the same. Working with visual grammars is thus analogous to trying to solve the problems of computational linguistics simultaneously with the problems of speech recognition.

Additionally, hand-labeled training data is more readily available in computational linguistics. For example, there are datasets of sentences which come with ground-truth grammatical parses. Such information is generally unavailable in vision.

Without certain knowledge of correspondence between samples, generalization is diﬃcult.

We are required to infer correspondences between samples in an unsupervised manner. This generally leads to a unsupervised learning problem, such as clustering, or a correspondence problem.

1.7.2 Dealing with Scale

Computer vision usually tries to be scale invariant, since a small object closer to the camera naturally looks like a large object further from the camera. A doubled copy of an image is hopefully semantically the same as the original image.

We thus have two problems: when an object is too small, its internal structure may be lost. Generally, ﬁne internal structural elements ﬁrst become texture (discernible en masse but not individually) and then disappear completely. The larger object may still, however, be recognizable by shape.

When an object is too large, we are faced with the task of parsing internal structure of which we were previously unaware.

These diﬃculties can probably be dealt with. The ﬁrst demands that we come up with a simple appearance model for every object, so that we can detect it without detecting any of its subparts. Decreased recall is probably OK, as smaller things are actually more diﬃcult to see.

The second demands that every object be detectable at large scales, even those which have no sub-parts in our grammar. We can model this with simple inﬁnitely recursive grammars;

suﬃciently simple ones can hopefully model almost anything, while basically preventing their

being bias for either one of:

• A “larger” model with more internal structure

• A “smaller” model with less internal structure 1.7.3 Plane grammars do not naturally admit eﬃcient parsing Eﬃcient algorithms for parsing strings with context-free grammars rely on dynamic programming, and ultimately on the fact that a string has only quadratically many contiguous substrings. The elements of visual grammars naturally live in the image plane, and thus do not have a linear order. A visual object need not even occupy a single contiguous region, in the case of occlusion. Therefore, a parsing algorithm might in principle have to consider any subset of the image elements as a node in the parse, leading to an exponential runtime.

1.8 Contributions

Our ﬁrst contribution is the formulation of a class of shape grammars, a probabilistic model of curve formation that allows for both continuous variation and structural variation. We describe shape grammars in Chapter 2 and derive an EM-based training algorithm for shape grammars in Chapter 3. We demonstrate the eﬀectiveness of shape grammars for modeling human silhouettes, and also demonstrate their eﬀectiveness in classifying curves by shape.

Our second contribution is a general method for heuristically speeding up a large class of dynamic programming algorithms, given in Chapter 4. We provide a general framework for discussing coarse-to-ﬁne search strategies, and provide proofs of correctness. Our method can also be used with inadmissible heuristics.

Our third contribution is local inference, a method for correcting parses with unintended reuse, given in Chapter 4. We deﬁne an energy minimization problem over low-level assertions that allows us to detect shapes in cluttered images.

Our fourth contribution is an algorithm for doing approximate context-free parsing of long strings in linear time, given in Chapter 6. We deﬁne a notion of approximate parsing in terms of restricted families of decompositions, and construct small families which can approximate arbitrary parses.

Chapters 2, 3, and 6 are based on joint work with my advisor, Pedro Felzenszwalb.

–  –  –

We are building probabilistic models of shape. For now, we consider only models of shape that deal with the location in the plane of a ﬁxed number of (ordered) landmarks z1,..., zn, which in our case will deﬁne a curve outlining some shape. Some existing approaches to this problem are: Markov models (Basri et al. [1998], Grenander et al. [1991]), the Procrustes distance and its probabilistic counterpart, the Watson distribution (Dryden and Mardia [1998]), and active shape models (Cootes et al. [1995]).

We would like to prove that our models do as well or better in explaining the geometric variability of various shape classes. The easiest way to compare two generative probabilistic models is to examine samples from them, and subjectively assess their similarity to real data.

This is inherently a very qualitative evaluation.

Our main goal in building shape models is to allow us to do inference on new curves.

We can calculate a likelihood that a shape class generated a particular curve. We start by considering models deﬁned by perturbations of a single input curve.

When comparing to other probabilistic models, we can evaluate them in a more quantitative way using the log-likelihood, which is deﬁned as

–  –  –

If X = {x1,..., xn } is unseen data, and a large enough sample to be statistically useful, then H(X, q) is a good measure of how well a model explains unseen data. Smaller values of H(X, q) suggest that q is a better model. One problematic aspect of the log-likelihood is that Figure 2.1: The problem of drift makes a Markov model unappealing. Random samples from this model are too similar locally and too dissimilar globally. These shapes were generated by changing each length l by a multiplicative factor of 1 + N (0, σ), and changing each angle θ by adding π · N (0, σ). Here σ is the value listed underneath the shape.

Figure 2.2: The shape on the left is the original, and the other curves have been produced by sampling from the hierarchical curve models of Felzenszwalb and Schwartz [2007].

The model produces curves which have more perceptual similarity than the Markov model.

it is only meaningful if q is a correctly normalized probability distribution. Often it is easier to compute q(x) = Z ·q(x) for some unknown but ﬁxed constant Z. Due to the unsatisfactory nature of the competing models we consider, we defer quantitative measurement to Chapter 3.

The most straightforward model for deﬁning a distribution over curve shapes is a Markovtype model. A curve is a sequence of line segments 1,..., k. We can describe a curve template by giving the length li of each i, and the angle θi between i and i+1. If we perturb each li and θi slightly, we get a curve which diﬀers from the original. Some samples from this are shown in Figure 2.1.

The major weakness of the Markov model is drift: the small perturbations will accumulate, and the overall shape of the curve will vary greatly. A straight line has some probability of curling into a tight spiral. Consider a shape like a hand: a hand has ﬁngers that protrude.

This means that there are two points (namely the two points where a ﬁnger meets the rest of the hand) far away in the curve that we always expect to be very close together. A Markov perturbation of the shape is likely to pull these points far apart.

To defeat this problem, hierarchical curve models were introduced in Felzenszwalb and

Schwartz [2007]. There, the following curve model is given:

• Given a model curve C, decompose C hierarchically by repeatedly cutting it in half, in a balanced but otherwise arbitrary fashion.

• Suppose our ﬁrst decomposition is C = DE. We perturb C into C by ﬁrst perturbing the midpoint of C slightly. We then rotate and scale the curves D and E to align the end of D and the beginning of E with the perturbed midpoint.

• We then recursively apply the same process to each subcurve D and E.

It is easy to see that this defeats the problem of drift, because the overall shape of the curve is determined in a constant number of perturbations of the template curve. If our perturbations are smaller for larger curves, then we will leave the overall shape very similar while allowing signiﬁcant local variation. Some samples from this model are shown in Figure 2.2.

Pages:     | 1 | 2 || 4 | 5 |   ...   | 11 |

Similar works:

«201 0 ANNUAL REPORT ANNUAL MEETING OF SHAREHOLDERS May 25, 2011 Shareholders Letters LETTERS TO SHAREHOLDERS WILLIAMS-SONOMA, INC. 2010 ANNUAL REPORT [THIS PAGE INTENTIONALLY LEFT BLANK] To Our Shareholders, Shareholders Letters Howard Lester, our former Chairman of the Board of Directors and Chief Executive Officer, retired last year in May after 32 years at the helm of our company. In November, we mourned Howard’s passing. WilliamsSonoma, Inc. is what it is today because of Howard’s...»

«A Dialogue Between Graham Harman and Tristan Garcia Moderated by Rik Peters April 6th, 2013 at Wijsgerig Festival Drift, in the OT301 in Amsterdam, NL Wijsgerig Festival Drift is an annual student-organized philosophy festival in Amsterdam, with close ties to the student association of the philosophy department at the University of Amsterdam (UvA). The programme consists of lectures by philosophers in two or three different halls; live music; poetry. The combination of location (an old film...»

«THE RECYCLING OF FILTH: TRANSCULTURAL DISCOURSES IN THE FILMS OF PEDRO ALMODÓVAR AND JOHN WATERS by BRETT JESSIE DRUMMOND IGNACIO RODEÑO, CHAIR CONSTANCE JANIGA-PERKINS ALICIA CIPRIA ÁLVARO BAQUERO-PECINO JEREMY BUTLER A DISSERTATION Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Modern Languages & Classics in the Graduate School of The University of Alabama TUSCALOOSA, ALABAMA Copyright Brett Jessie Drummond 2013 ALL RIGHTS...»

«UNIVERSITY OF CALIFORNIA, IRVINE Enhancing Architecture-Implementation Conformance with Change Management and Support for Behavioral Mapping DISSERTATION submitted in partial satisfaction of the requirements for the degree of DOCTOR OF PHILOSOPHY in Information and Computer Science by Yongjie Zheng Dissertation Committee: Professor Richard N. Taylor, Chair Professor André van der Hoek Assistant Professor James A. Jones © 2012 Yongjie Zheng DEDICATION To Mom and Dad. ii TABLE OF CONTENTS Page...»

«Copyright 2003 by the American Society of Clinical Hypnosis American Journal of Clinical Hypnosis 46:2, October 2003 Eastern Meditative Techniques and Hypnosis: A New Synthesis Akira Otani University of Maryland Counseling Center In this article major ancient Buddhist meditation techniques, samatha, vipassana, Zen, and ton-len, will be described in reference to contemporary clinical hypnosis. In so doing, the Eastern healing framework out of which these techniques emerged is examined in...»

«Accidents The Dewey lectures are supposed to be autobiographical and also reflective about our profession hence, I guess the hope is, somewhat instructive. I don=t think my own history very instructive, but it did have some amusing twists, and I do have some concerns about the current state of the profession that I would like to share. So I thank the Dewey Society very much indeed for inviting me to give this lecture. It is an unexpected and a very pleasant honor! Current research has it that...»

«GABRIEL VACARIU    EPISTEMOLOGICALLY DIFFERENT WORLDS  To Ilie Parvu Gabriel Vacariu                   EPISTEMOLOGICALLY DIFFERENT  WORLDS    Referenţi ştiinţifici: Prof. univ. dr. ILIE PÂRVU Conf. univ. dr. CONSTANTIN STOENESCU On the first cover: “Anaïs in the hyperverse” by Magda Vacariu Şos. Panduri, 90–92, Bucureşti – 050663; Telefon/Fax: 410.23.84 E-mail: editura@unibuc.ro Internet: www.editura.unibuc.ro Tehnoredactare computerizată: Victoria Iacob...»

«Philosophy and Phenomenological Research Philosophy and Phenomenological Research Vol. LXXX No. 2, March 2010 Ó 2010 Philosophy and Phenomenological Research, LLC Is Evidence Knowledge? juan comesana ˜ University of Arizona holly kantin University of Wisconsin, Madison 1. Introduction In chapter 9 of Knowledge and its Limits (Williamson 2000), Timothy Williamson argues for the thesis that the evidence that a subject has is constituted by propositions known by the subject (a thesis that he...»

«Ó Springer 2006 Law and Philosophy (2006) 25: 31–79 DOI 10.1007/s10982-004-8196-4 RE’EM SEGEV JUSTIFICATION, RATIONALITY AND MISTAKE: MISTAKE OF LAW IS NO EXCUSE? IT MIGHT BE A JUSTIFICATON! (Accepted 5 October 2004) I. INTRODUCTION According to a famous maxim, ignorance or mistake of law is no excuse. This maxim is supposed to represent both the standard and the proper rule of law. In both respects, this is one of the ﬁrmest legal maxims, recognized and accepted by laypersons and...»

«Male Education and Son Preference in India by Rebha Sabharwal A Dissertation Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy Approved July 2013 by the Graduate Supervisory Committee: Sarah Hayford, Chair Victor Agadjanian Scott Yabiku ARIZONA STATE UNIVERSITY August 2013 ABSTRACT The study of son preference in India has been the focus of research for a few decades. The desire for sons leads to unfavorable consequences for daughters such as unequal access...»

«ENHANCEMENT OF ROLL MANEUVERABILITY USING POST-REVERSAL DESIGN A Thesis Presented to The Academic Faculty by Wei-En Li In Partial Fulfillment of the Requirement for the Degree Doctor of Philosophy in the School of Aerospace Engineering Georgia Institute of Technology August 2009 ENHANCEMENT OF ROLL MANEUVERABILITY USING POST-REVERSAL DESIGN Approved by: Professor Dewey H. Hodges, Advisor Professor J. V. R. Prasad Committee Chair School of Aerospace Engineering School of Aerospace Engineering...»

«Szab-Ch09.qxd 09/07/04 21:17 Page 356 Naming and Asserting Scott Soames Many essays in semantics and the philosophy of language seem to proceed on the assumption that—special circumstances involving ironic, metaphorical, or other non-literal uses of language aside—the proposition asserted by an utterance of a sentence in a context is the proposition semantically expressed by the sentence in that context. At some level, of course, we all know that this is a ﬁction, since sometimes a single...»

<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.