WWW.DISSERTATION.XLIBX.INFO
FREE ELECTRONIC LIBRARY - Dissertations, online materials
 
<< HOME
CONTACTS



Pages:     | 1 |   ...   | 7 | 8 || 10 | 11 |

«THE UNIVERSITY OF CHICAGO GRAMMATICAL METHODS IN COMPUTER VISION A DISSERTATION SUBMITTED TO THE FACULTY OF THE DIVISION OF THE PHYSICAL SCIENCES IN ...»

-- [ Page 9 ] --

One approach to solving this problem, used in Jin and Geman [2006], is to fix various low-level assertions that seem likely, and eliminate incompatible low-level assertions from consideration. In our case, this approach does not yield good results.

4.4 State space

We assume that there is a single object of interest against a background, and that its boundary is the curve we wish to detect. Let P be the set of pixels in the input image, and let E ⊂ P be the pixels marked as edges by some edge detector, such as the Canny edge detector (Canny [1986]). We wish to assign every non-edge pixel to be either figure or ground, i.e., inside the object of interest or not inside the object of interest. This will be formulated as an energy minimization problem. We will represent the labeling as a function L(p), where L(p) = 1 when p is inside the object, and L(p) = 0 when p is outside of the object.

Because we are trying to prevent edges from being used multiple times, we would in particular like to prevent or penalize the use of an edge pixel in opposite directions. We thus add variables to our energy minimization problem to represent the orientation of each edge pixel. We constrain this orientation to be orthogonal to the gradient at the edge pixel, so each edge pixel has two choices of orientation, represented as a unit vector D(p). We want to assign orientations so that the object of interest has its boundary traced out clockwise. We also want the orientation assignment to be consistent with the output of our global parsing algorithm.

–  –  –

We use the output of our local interpretation procedure to modify global inference in two

ways:

• A line segment that goes over an edge pixel with the wrong orientation is assigned the negative of its normal cost (corresponding to the assumption that it will be used twice in the parse with the correct orientation, and once with the reverse orientation). The first part of Figure 4.2 depicts a problematic parse that will be penalized appropriately by this heuristic.

• A line segment is disallowed if too high a fraction of its pixels are classified as interior, or if too high a fraction are classified as exterior. (Here interior and exterior refer to the output of our local inference procedure.) This is meant to eliminate short-circuiting, in which the optimization procedure double-counts an edge by taking a round-about way from its head to its tail. Unless the curve encompasses the whole object, this test generally prevents short-circuiting. The second part of Figure 4.2 shows an example of short-circuiting.

–  –  –

We formulate our energy minimization problem as the sum of four terms:

U (L, D) = Useg (L) + Uor (D) + Uint (L, D) + Uyd (D), Figure 4.2: Motivation for local constraints. Grey regions represent areas where there is evidence for edges.

where Useg enforces coherence in the segmentation, Uor enforces coherence in the orientation assignment, Uint enforces consistency between the segmentation and the orientation assignment, and Uyd enforces consistency between the orientation assignment and the previous curve selected by our global parsing algorithm. We now discuss each term in turn.

–  –  –

This energy function is very simple, it is just a sum over all pairs of adjacent non-edge pixels, where the cost is −α if the two pixels have the same label, and α if they have different labels, where α 0 is some constant.

This energy function is a standard model from statistical physics called the Ising model.

It is commonly used in computer vision to denoise images (Besag [1993], Geman and Geman [1984]). Here we are using it to penalize non-edge boundaries between figure and ground, which should be rare. When they are necessary, this energy term will push non-edge boundaries to be short and simple.

–  –  –

where M (p) is the gradient magnitude at pixel p and · is the dot product between vectors.

This energy function pushes our orientation assignment to vary smoothly, as nearby edge pixels are penalized if their orientations do not point in the same direction. We weight this penalty by the gradient magnitude, so that stronger edges are given more influence. The function wp,q is a weighting that specifies how influence dies off over the distance between p and q. We have used wp,q = e−(x +y )/4w (1 − β|y|), where x is the component of q − p perpendicular to the gradient at p, and y is the component of q − p parallel to the gradient at p. Note that wp,q = wq,p. The (1 − β|y|) factor captures the fact that parallel edges sufficiently far apart are more likely to go in opposite directions, because they are likely to be the two sides of a part of the object.

–  –  –

where M (q) is the gradient magnitude at q, and θ(u, v) is the angle between vectors u and v.

This energy function is motivated by the observation that the angle between q − p and D(q) tends to be close to 90 degrees when p is in the interior of the object and D(q) is going clockwise around the object. When p is outside the object and D(q) is going clockwise around the object, the angle between q − p and D(q) tends to be close to 270 degrees. This is shown in Figure 4.3. Multiplying the sine of this angle by (1 − 2L(p)), which is −1 in the interior and 1 in the exterior, yields a quantity that is negative if L(p) and D(q) are consistent with one another, and thus consistency is rewarded when we minimize the energy.





–  –  –

where · is the dot product of vectors.

This energy function simply rewards orientation assignments which go in the same direction as the line segments that make up our initial parse.

–  –  –

In our experimental results, we show the final parse chosen. Recall that we parse first, do local inference, and then reparse with modified data costs. We also show the segmentation into Figure 4.3: The angle between q − p and D(q) should be close to 90 degrees for interior p and close to 270 degrees for exterior p.

Figure 4.4: Output of detection algorithm.

Final detection shown top left, segmentation shown top right, orientation assignments shown in four bottom images. The top left of the orientation images gives a key to the orientation labels.

figure and ground, and the orientation assignment selected by our local inference procedure.

The orientation assignment is shown in four different images corresponding to four different pairs of opposite orientations. One orientation is shown in black, and its opposite is shown in white, while other orientations are shown in grey. The orientation depicted is shown by the squares in the top left of each image.

These results demonstrate that our local inference procedure, combined with our global parsing procedure, allows us to locate human silhouettes in clutter-free images.

Figure 4.5: Output of detection algorithm.

Final detection shown top left, segmentation shown top right, orientation assignments shown in four bottom images. The top left of the orientation images gives a key to the orientation labels.

Figure 4.6: Output of detection algorithm.

Final detection shown top left, segmentation shown top right, orientation assignments shown in four bottom images. The top left of the orientation images gives a key to the orientation labels.

Figure 4.7: Output of detection algorithm.

Final detection shown top left, segmentation shown top right, orientation assignments shown in four bottom images. The top left of the orientation images gives a key to the orientation labels.

Figure 4.8: Output of detection algorithm.

Final detection shown top left, segmentation shown top right, orientation assignments shown in four bottom images. The top left of the orientation images gives a key to the orientation labels.

Figure 4.9: Output of detection algorithm.

Final detection shown top left, segmentation shown top right, orientation assignments shown in four bottom images. The top left of the orientation images gives a key to the orientation labels.

Figure 4.10: Output of detection algorithm.

Final detection shown top left, segmentation shown top right, orientation assignments shown in four bottom images. The top left of the orientation images gives a key to the orientation labels.

Figure 4.11: Output of detection algorithm.

Final detection shown top left, segmentation shown top right, orientation assignments shown in four bottom images. The top left of the orientation images gives a key to the orientation labels.

Figure 4.12: Output of detection algorithm.

Final detection shown top left, segmentation shown top right, orientation assignments shown in four bottom images. The top left of the orientation images gives a key to the orientation labels.

Figure 4.13: Output of detection algorithm.

Final detection shown top left, segmentation shown top right, orientation assignments shown in four bottom images. The top left of the orientation images gives a key to the orientation labels.

–  –  –

Natural object categories such as cars and people exhibit two kinds of variation: continuous or “plastic” deformations and discontinuous structural variations. Therefore, object models which allow for both will make vision algorithms much more powerful. The potential for a satisfying account of large structural variation is one of the most intriguing possibilities of grammatical methods.

One of the simplest structural variations is occlusion: part of an object may not be visible, usually because something between the object and the camera is occluding it. Occlusion has been well understood in computer vision for a long time, and models can be made robust to it, e.g., the Hausdorff distance in Huttenlocher et al. [1993].

Another common way that objects exhibit structural variation is by having optional parts:

a dog may or may not have a tail, a person may or may not have a hat. Occlusion models are capable of recognizing such objects with or without their optional parts, but they do not accurately model optional parts. An optional part is a particular subset of the object that is likely to not appear, while occlusion allows any not-too-large subset of the model to disappear.

The usefulness of more general structural variation can be seen in Figure 5.1. Here, the human eye notices a large similarity between the two shapes A1 and A2, but many curve models would see very little similarity.

–  –  –

Figure 5.1: If A1 is the original curve, which other curve is most similar to it? Figure adapted from Basri et al.

[1998].

We might intuitively describe the second shape, A2, as A2 = “Take A1, snap off the right appendage, and reattach it beneath the left appendage.”.

This highlights several important points:

The description 5.1.1 of A2 is very short in English, and might be even shorter in a specialized curve model encoding. Description length is a good proxy for the conditional probability of observing A2 given that it is a distortion of A1 (Bienenstock et al. [1997]).

Structural variation is a fundamental problem in modeling visual objects. In the absence of a practical model of structural variation, we must model variation as continuous deformation. Then, any model that declares A1 and A2 to be similar will think that A3 or A4 is even more similar to A1.

Structural variation cannot be modeled without a semantically meaningful decomposition of the original curve, like that seen in Figure 5.2. Description 5.1.1 crucially relies on “the right appendage” making sense to the listener. Thus, perceptually simple structural variation must respect the perceived structure of the original curve. Contrast Figure 5.1 with Figure 5.3, where a similar transformation has been applied with no regard to the perceived structure of the original curve.

Figure 5.2: The original shape from Figure 5.

1, decomposed into semantically meaningful parts. We argue that this decomposition explains why the variation in Figure 5.1 is less semantically different than the variation in Figure 5.3. Adapted from Basri et al. [1998].

Figure 5.3: Two shapes which are not perceptually very similar, although they are related by a transformation as simple as that in Figure 5.

1. The problem is that the transformation does not respect the perceived structure of the original. Adapted from Basri et al. [1998].

Mixture models are a class of models that do not suffer from the continuous deformation problem of Figure 5.1. However, if there are multiple independent structural variations possible, it is unlikely that we will see every combination of each form. Consider the shape grammar Gn that generates shapes that have n arms, each of which can take either of two

forms:

–  –  –

where A is pointy and B is rectangular. We show four shapes possible under this grammar in Figure 5.4. A classic mixture model will not be able to generalize in this scenario without

–  –  –

exponentially many training examples, since there are 2n possible shapes. If we instead have mixture models at the level of individual structural variations, then our model is a grammatical model in the style of Section 1.1.

–  –  –

We want to learn a grammar that explains our training data well, but we also want a simple grammar, because simpler models exhibit better generalization. We can reward simple grammars by using a prior based on the Minimum Description Length, described in the next section. We can then attempt to maximize the posterior probability of G given the training data and this prior.



Pages:     | 1 |   ...   | 7 | 8 || 10 | 11 |


Similar works:

«Continuity and Discontinuity: The Temple and Early Christian Identity by Timothy Scott Wardle Department of Religion Duke University Date:_Approved: _ Joel Marcus, Supervisor _ Eric Meyers _ Lucas Van Rompay _ Christopher Rowe Dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Religion in the Graduate School of Duke University 2008 ABSTRACT Continuity and Discontinuity: The Temple and Early Christian Identity by Timothy...»

«A Sociolinguistic Investigation of Compliments and Compliment Responses among Young Japanese Chie Adachi Thesis submitted for the degree of Doctor of Philosophy Linguistics and English Language The University of Edinburgh 2011 i Abstract This dissertation is a sociolinguistic investigation into the system of the speech act of complimenting among young Japanese. Sociolinguistic studies on complimenting have been rather extensively carried out in Western academic discourse since the 1980s. The...»

«DEFENSIVE ADAPTATION:
 MANAGING SOCIAL ANXIETIES IN LITERATURE AND FILM By Christina Neckles Dissertation Submitted to the Faculty of the Graduate School of Vanderbilt University In partial fulfillment of the requirements For the degree of DOCTOR OF PHILOSOPHY In English August, 2009 Nashville, Tennessee Approved: Professor Jay Clayton Professor Paul D. Young Professor Sam Girgus Professor James A. Epstein i Copyright © 2009 by Christina Neckles All Rights Reserved ii DEDICATION For my...»

«Andrew Valkauskas Archetype: Ulfhednar (wolf head): Warrior type, very strong, up close and personal damage dealer. Ulfhednar embody the ruthless aggression of their patrons Skoll and Hati, whose lives are all about the hunt that leads to the kill, just like their father Fenrir. Their followers adhere to this philosophy with great zeal. They also embody the wolf-pack mentality, knowing how best to work together to hunt down their prey. Destiny promises that after their continuous chase of the...»

«EVALUATING THE EFFECTIVENESS OF ORIENTATION INDICATORS WITH AN AWARENESS OF INDIVIDUAL DIFFERENCES by Tina Renee Ziemek A dissertation submitted to the faculty of The University of Utah in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science School of Computing The University of Utah June 2010 Copyright c Tina Renee Ziemek 2010 All Rights Reserved THE UNIVERSITY OF UTAH GRADUATE SCHOOL SUPERVISORY COMMITTEE APPROVAL of a dissertation submitted by...»

«CONFRONTING COMPLEXITY: A COMPREHENSIVE STATISTICAL AND COMPUTATIONAL STRATEGY FOR IDENTIFYING THE MISSING LINK BETWEEN GENOTYPE AND PHENOTYPE By Tricia Ann Thornton-Wells Dissertation Submitted to the Faculty of the Graduate School of Vanderbilt University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY in Neuroscience December, 2006 Nashville, Tennessee Approved: Professor Jonathan L. Haines Professor Michael P. McDonald Professor Jason H. Moore Professor...»

«Gurnang Life Challenge Young Adult Offender Women Adventure Based Challenge Experiential Learning/Adventure Therapy Program Overview Contents Acknowledgments Executive Summary Introduction and Philosophical Overview What is the Issue? Dynamic Risks Gurnang Life Challenge Women Experiential Learning/Adventure Therapy Program Future of ABC Historical Overview What Is Adventurous Activities and Experiential learning Effectiveness of Adventurous / Experiential Learning activities What Else Is...»

«1 Second best as a researcher, second to none as a populariser? The atmospheric science of John Tyndall FRS (1820-1893) By Irena Maria McCabe A dissertation submitted in fulfilment of the requirements for the degree of doctor of philosophy Department of Science and Technology Studies University College London I, Irena Maria McCabe confirm that the work presented in this thesis is my own. Where information has been derived from other sources, I confirm that this has been indicated in the thesis....»

«No God, No Laws Nancy Cartwright Philosophy LSE and UCSD Introduction. My thesis is summarized in my title, ‘No God, No Laws’: the concept of a law of Nature cannot be made sense of without God. It is not as dramatic a thesis as it might look, however. I do not mean to argue that the enterprise of modern science cannot be made sense of without God. Rather, if you want to make sense of it you had better not think of science as discovering laws of Nature, for there cannot be any of these...»

«Opening Up Vision: The Case Against Encapsulation Ryan Ogilvie & Peter Carruthers Review of Philosophy and Psychology ISSN 1878-5158 Rev.Phil.Psych. DOI 10.1007/s13164-015-0294-8 Your article is protected by copyright and all rights are held exclusively by Springer Science +Business Media Dordrecht. This e-offprint is for personal use only and shall not be selfarchived in electronic repositories. If you wish to self-archive your article, please use the accepted manuscript version for posting on...»

«Szab-Ch09.qxd 09/07/04 21:17 Page 356 Naming and Asserting Scott Soames Many essays in semantics and the philosophy of language seem to proceed on the assumption that—special circumstances involving ironic, metaphorical, or other non-literal uses of language aside—the proposition asserted by an utterance of a sentence in a context is the proposition semantically expressed by the sentence in that context. At some level, of course, we all know that this is a fiction, since sometimes a single...»

«CURRICULUM VITAE May Chehab Associate Professor of French and Comparative Literature (since 1st January 2010) University of Cyprus PERSONAL DETAILS 1 EDUCATION 1.1 University qualifications 1.2 Third-Level Scholarships and Grants 1.3 Other Qualifications 2 PROFESSIONAL APPOINTMENTS 2.1 University of Cyprus 2.2 The Greek National Institute of Public Administration 2.3 The Greek Ombudsman 3 RESEARCH AREAS 3.1 French Literature 3.2 French and European Studies 3.3 Literature and Philosophy 3.4...»





 
<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.