FREE ELECTRONIC LIBRARY - Dissertations, online materials

Pages:     | 1 |   ...   | 4 | 5 || 7 | 8 |   ...   | 11 |


-- [ Page 6 ] --

The Watson distribution can be thought of as a probabilistic version of the Procrustes distance, which can be defined as

–  –  –

We have also experimented with the nonparametric distribution given by the Parzen windows method (Parzen [1962]). We want our grammar to be invariant to simultaneous translation, scale, and rotation of the points p, q, r. Therefore, we translate, scale, and rotate R2 with a map φ such that φ(p) = (0, 0) and φ(r) = (1, 0), and represent q via the coordinates q = φ(q).

–  –  –

In the context of kernel density estimators, the parameter h is called the bandwidth.

It specifies how much smoothing to apply to the density estimation. Selecting a suitable bandwidth is often problematic.

It is worth noting that this nonparametric distribution is just a mixture of Gaussians, and thus there is no solid distinction between a mixture of multiple copies of the rule X → Y Z with Gaussian midpoint distributions, and a single copy with a mixture of Gaussians midpoint distribution. We will prefer the second, since the resulting grammar has a simpler structure, as discussed in Chapter 5.

2.9 Dealing with Variation in Length

As mentioned before, our grammar models longer curves with rules of the form X → XX, and shorter curves with rules of the form X →. When building our initial models, it is very important to assign reasonable probabilities to these rules, because we want to use these rules in parsing, but prevent pathological parses. For example, we don’t want to parse an input curve in such a way that many high-level nonterminals X use the rule X → and some low-level nonterminal Y uses the rule Y → Y Y many times. Since the balance between different rules is governed by a multinomial distribution, we need a reasonable prior on multinomial distributions that prevents pathological parses.

Let X be a nonterminal. When there is no prior reason to believe that any ρX (X → λ) is larger than the others, it is natural to use a Dirichlet prior with each αi the same. For shape grammars, this is not entirely the case; since rules of the form [X → Y Z] and [X → ] serve different roles in the grammar, we expect their probabilities to take on different values.

In particular, if the nonterminal X is meant to model long curves, it is unlikely that a long curve will have only two sample points, especially if we are parsing a curve that has many sample points. Therefore, we would expect the probability of [X → ] to be low.

Let C be a curve. We define the scale of a subcurve C to be s(C ) = |C |/|C|. We can then specify an ideal scale sX for the nonterminal X. Let k = |R(X)|. Our prior distribution on ρX is then Dir(α1,..., αk ), where α1 = k · αdir · e−ωsX 1 − e−ωsX αi = k · αdir · 2 ≤ i ≤ k.

k−1 Here α1 corresponds to the rule [X → ]. Suitable values of αdir are probably significantly less than one, in light of Johnson et al. [2007]. A suitable value for ω is more complicated, and depends on the value of other model parameters.

This prior is suitable, because it produces grammars biased towards balanced parses. To see this, consider a grammar G whose parameters are set according to this prior. (This will be true of our initial grammar, as we describe in Section 2.7.) Let C be a curve of length n, and T be a G-parse of C. T will then specify nonterminals X1,..., Xn to parse each line segment of C. Thus, the likelihood P (C, T | G) will contain factors ρXi ([Xi → ]) for i = 1,..., n. If ρXi ([Xi → ]) is e−ωsX, then these factors contribute −ω i sXi e in total. Since the Xi combine to parse C, the scales sXi will hopefully sum to approximately one. In this case the probability will be highest when i sXi is lowest, which occurs when all the sXi are equal. The stated Dirichlet prior thus pushes G to favor balanced parse trees, Given a grammar model and a curve, we want to calculate how likely the curve is under the model, and we would like to extract the most likely correspondence between the points of the curve and the parts of the model. We call this process parsing, because it is extremely similar to CKY parsing with standard PCFG’s. We defer the technical details of parsing until a later chapter.

We examine parsing in this section by looking at human silhouettes from a dataset generated by Romer Rosales. We have two versions of this dataset: one where a silhouette has been automatically extracted by subtracting the background, and the curve simplified slightly by the discretization method discussed in Section 2.3; and a hand-annotated version, where we have marked by hand the position of 27 corresponding points on the body in each image.

Throughout this section, we examine the task of building a grammatical model from one curve, and using it to parse another curve. We demonstrate the accuracy of these parses by showing the correspondence between the points of the two curves induced by the most likely parse.

The easiest parsing task is to recover a one-to-one correspondence between two curves that have the same number of points, which we demonstrate in Figure 2.15. Because there are no missing or extra points, this is straightforward. The two curves are both from the handannotated Romer dataset. The grammar is built using a hand-picked set of constituents.

A somewhat harder task is to parse a curve that is longer than the curve on which our model is based. We wish to recover a reasonable correspondence between the points of the coarse model curve and a subset of the points of the finer input curve. We do this by building a grammar model from the coarse curve and using it to parse the finer curve. The grammar must have lengthening rules, but doesn’t need shortening rules. This demonstrates that we can model longer curves than were used to build the grammar. This is shown in Figure 2.16, where a model built from a hand-annotated curve is used to parse a curve from the Figure 2.15: Recovering a One-to-one Correspondence ground-truth Romer dataset. We successfully recover a very reasonable correspondence.

We also wish to recover a reasonable correspondence by building a grammar from a longer (i.e., finer) model curve and parsing a shorter (i.e., coarser) input curve. This demonstrates that we can model shorter curves than were used to build the grammar. This task is generally harder than the prior task.

Here we build a grammar from a ground-truth Romer curve, and try to parse one of the (much shorter) hand-annotated Romer curves. We can safely assume that every point in the parsed curve has a corresponding one in the example curve, which is the reverse of the previous experiments. In order to do this successfully, the grammar needs shortening rules, but not lengthening rules.

In Figure 2.17, we show the results of parsing a shorter curve with very little geometric variation. The fine details are not quite right, but overall the correspondence is reasonable.

In Figure 2.18, we show the results of parsing a shorter curve with significant geometric Figure 2.16: Parsing a Longer Curve Figure 2.17: Parsing a shorter curve with very little geometric variation.

variation. In this case, the correspondence found is completely incorrect.

Our models are very sensitive to the initial decomposition specified for the model curve.

The two experiments above used a balanced but otherwise arbitrary decomposition of the model curve. By using a more carefully chosen decomposition, we get much better parses, which can be seen in Figures 2.19 and 2.20. In Figure 2.19, the fine details are now exactly what we would expect, in contrast to Figure 2.17. In Figure 2.20, even though we have the same significant geometric variation as in Figure 2.18, we now find a very good correspondence.

Figure 2.18: Parsing a shorter curve with significant geometric variation.

Figure 2.19: Parsing a shorter curve with very little geometric variation.

Figure 2.20: Parsing a shorter curve with significant geometric variation.

Thus, even on fairly straightforward parsing tasks, our models are extremely sensitive to the particular hierarchical structure that we impose on the curve. The shortening rules only allow the parser to chop off constituents. Therefore, it is important to have good constituents. We address this in Chapter 3 by using multiple decompositions, and learn which constituents allow us to best explain our data.

–  –  –


In this chapter, we describe a mathematically rigorous way to tune our model parameters using the EM algorithm. We give the details of the algorithm in Section 3.1. In the remainder of the chapter, we show samples from various grammatical models after training, in order to demonstrate that we have learned a correct and interesting model. Finally, in Section 3.8, we show that samples from standard models fail to model the dataset we are using.

3.1 Learning Grammar Parameters with the EM Algorithm Let C1,..., Cn be independent samples from a grammar G, for which we know the structure, but not the parameters. We would like to set the parameters of G such that the posterior P (G | C1,..., Cn ) is maximized. We can write the posterior as

–  –  –

where this last sum is taken over all collections of one parse tree for each curve.

Unfortunately, it is not known how to maximize this posterior, or even the likelihood.

The likelihood is known to have many local maxima in the case of context-free grammars for natural languages (Charniak [1996]). Therefore, we are forced to use approximation techniques. We use the Expectation-Maximization algorithm (Dempster et al. [1977]), which is a standard approach to finding maximum likelihood or maximum a posteriori estimates in the case where important variables are unobserved. In our case, the unobserved variables are the parse trees {Ti }.

Let C = (C1,..., Cn ), and let T be the latent variable (T1,..., Tn ).

We then iteratively find better and better grammars G (t) :

–  –  –

We want to find the grammar parameters that maximize Q(t) (G). We have two classes of parameters: the multinomial distributions ρX, and the midpoint distributions µX→Y Z. We will consider each in turn.

–  –  –

A multinomial distribution is a distribution over a finite alphabet a1,..., ak, where P (ai ) = pi. Learning a multinomial distribution from independent samples x1,..., xn is commonly done by maximizing the posterior probability

–  –  –

Note that results still hold when the counts ci are real rather than integer. This is because the proofs rely solely on the log-likelihood having the form ci log pi. In this case, the ci are referred to as “soft counts”.

When the αi are greater than one, this prior smooths the estimated distribution p towards the uniform distribution. When the αi are less than one, this prior makes the distribution sparser than the maximum likelihood estimate.

When estimating the production probabilities in a grammar, a sparsifying prior is generally recommended. One intuition for this is that we are trying to recover a relatively sparse set of rules from the set of all possible rules; in order to avoid missing a rule, we must give some weight to (and thus generate observations of) rules that will ultimately be discarded.

In Johnson et al. [2007], a Dirichlet prior with αi = 1000 was found to work well for training context-free grammars for natural languages, although αi between 100 and 10,000 were found to work about as well.

In our case, ρX ([X → λ]) is a multinomial distribution, and we have soft counts cλ = Counta ( Xij → λ ).


–  –  –

First, we choose priors. Let X → Y Z be a rule of G. We set µX→Y Z equal to a Watson distribution with random mode µ and random concentration κ, where

–  –  –

Thus, the Count values are the soft counts we use to weight the za,ijk. For simplicity, we will simply say that we have observations z1,..., zn with corresponding weights c1,..., cn.

Let γ(κ; σ, θ) = log Gamma(κ; σ, θ), where

–  –  –

Examining this solution, we see that our estimate of the midpoint is similar to the maximum likelihood estimate of a Watson distribution, but is adapted to contain αmp artificial observations at the prior mode.

The posterior estimate of the concentration will be at a maxmimum when all zi are identical and equal to µ0. In this case, λmax will be m + αmp, and κ = κ σ+n. Thus, our ¯σ concentration will increase essentially linearly with the number of examples, but will never be infinite.

In general, the zi and µ0 will be more spread out, causing A to have a lower dominant eigenvalue, and lowering the concentration.

3.1.6 Examples of Learning Watson distributions

In figures 3.1 through 3.5, we demonstrate that we can learn the mode and concentration parameters of Watson distributions from training data. In each experiment, we select a random triangle (by using a Gaussian in Bookstein coordinates). We then draw 20 samples from Figure 3.1: Experimenting with the Watson distribution, part 1. In the first row, the original triangle T on the left, and the mode of the estimated Watson distribution on the right. Subsequent rows are samples from Watson(T, 30.00). The estimated concentration was 63.48.

the Watson distribution centered at this triangle (using 30 for the concentration parameter of the Watson). We then reestimate the Watson distribution from the samples. This is a less noisy version of the learning task that EM faces when refitting the midpoint distributions of a grammar from 20 samples.

We also show the results of fitting a Watson distribution with differing numbers of samples in Table 3.6. This demonstrates that the estimated concentration approaches the correct value as the number of samples increases.

Figure 3.2: Experimenting with the Watson distribution, part 2.

Pages:     | 1 |   ...   | 4 | 5 || 7 | 8 |   ...   | 11 |

Similar works:

«Transcription as an Initiator of Noticing Language Form and Use: an EFL Longitudinal Study by Margaret M. C. So A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Department of Curriculum, Teaching and Learning Ontario Institute for Studies in Education University of Toronto © Copyright by Margaret M.C. So (2015) Transcription as an Initiator of Noticing Language Form and Use: an EFL Longitudinal Study Doctor of Philosophy (2015) Margaret M. C. So...»

«ABSTRACT Title of dissertation: DOSE RANGING STUDY OF LUTEIN SUPLEMENTATION IN ELDERLY WITH AND WITHOUT AGE RELATED MACULAR DEGENERATION Fabiana Fonseca de Moura, Doctor of Philosophy, 2004 Dissertation directed by: Professor Frederick Khachik Adjunct Professor, Department of Chemistry and Biochemistry and the Department of Nutrition and Food Science Age-related macular degeneration (AMD) is the leading cause of blindness among people over the age of 65. Epidemiological studies have indicated...»

«The New Heretics: Popular Theology, Progressive Christianity and Protestant Language Ideologies by Rebekka King A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Department for the Study of Religion University of Toronto © Copyright by Rebekka King 2012 The New Heretics: Popular Theology, Progressive Christianity and Protestant Language Ideologies Rebekka King Doctor of Philosophy Department for the Study of Religion University of Toronto 2012...»

«Ideas and considerations for detailed design and naming for Ōtautahi North Western Cluster of Schools A Ngāi Tūāhuriri Perspective An Example of Modern Māori Learning Environments and associated Cultural Identifiers Te Whare Maahunui Tuahiwi Marae Home of Ngāi Tūāhuriri Mana Whenua 1 Wednesday, 14 October 2015 Table of Contents Whāinga / Aim Kaupapa rapunga whakaaro / Philosophy Mana Whenua / Te Ngāi Tūāhuriri Te Ngāi Tūāhuriri/ Mana Whenua Ngāi Tahu Whānui and Te Rūnanga o...»

«Chapter from Achin Vanaik (editor), Selling US Wars (Interlink Publishing, 2007) The Empire of Fear Zia Mian Whatever they fear from you, you’ll be threatened with. —Seneca (Roman philosopher and statesman, 4 BCE–65 CE) In late February 2001, a year after President George W. Bush took office, his secretary of state, Colin Powell, spoke on the subject of Iraq and its military capabilities. Ten years had passed since the 1991 Gulf War, a decade marked by international inspections aimed at...»

«COLLABORATIVE AND INDEPENDENT WRITING: JAPANESE UNIVERSITY ENGLISH LEARNERS’ PROCESSES, TEXTS AND OPINIONS by Yuko Watanabe A thesis submission in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Curriculum, Teaching and Learning Ontario Institute for Studies in Education University of Toronto © Copyright by Yuko Watanabe (2014) Collaborative and Independent Writing: Japanese University English Learners’ Processes, Texts and Opinions Yuko...»

«A Sociolinguistic Investigation of Compliments and Compliment Responses among Young Japanese Chie Adachi Thesis submitted for the degree of Doctor of Philosophy Linguistics and English Language The University of Edinburgh 2011 i Abstract This dissertation is a sociolinguistic investigation into the system of the speech act of complimenting among young Japanese. Sociolinguistic studies on complimenting have been rather extensively carried out in Western academic discourse since the 1980s. The...»

«Miller on Art, Altruism and Sexual Selection 1 The Bowerbirds and the Bees: Miller on Art, Altruism and Sexual Selection Catherine Driscoll*† Dept. of Philosophy North Carolina State University *Many thanks to Stephen Stich and Walter Sinnott-Armstrong for many comments on earlier drafts of this paper; and to Iris Oved and the participants in the Rutgers graduate seminar on the Evolution of Cognition for their feedback on some earlier versions of ideas I discuss here. †Address for...»

«Fruits and frugivory in neotropical primates and in Amazonian flooded and unflooded forests Joseph E. Hawes Thesis submitted for the degree of Doctor of Philosophy School of Environmental Sciences University of East Anglia, UK September 2012 © This copy of the thesis has been supplied on condition that anyone who consults it is understood to recognise that its copyright rests with the author and that no quotation from the thesis, nor any information derived therefrom, may be published without...»

«The QABALISTIC TAROT A TEXTBOOK OF MYSTICAL PHILOSOPHY Robert Wang SAMUEL WEISER, INC. York Beach, Maine The Tarot Symbols on the Tree of Life. CONTENTS PREFACE, xv INTRODUCTION, Modem Tarot Studies: A Nineteenth Century Legacy 1 The Search For Truth.......................................... 5 The Golden Dawn 10 The Golden Dawn Tarot 12 The RiderWaite Deck 13 Aleister Crowley's Thoth Tarot 14 Book T..............................»

«Turning the tide for gas permeable contact lenses Felicity Rosemary Gill Doctor of Philosophy School of Optometry & Vision Sciences Cardiff University May 2010 UMI Number: U585B69 All rights reserved INFORMATION TO ALL U SERS The quality of this reproduction is d ep en d e n t upon the quality of the copy subm itted. In the unlikely event th at the author did not sen d a com plete m anuscript and there are missing p ag es, th e se will be noted. Also, if material had to be removed, a note will...»

«THE INFLUENCE OF UNETHICAL PEER BEHAVIOR ON OBSERVERS’ UNETHICAL BEHAVIOR: A SOCIAL COGNITIVE PERSPECTIVE By MICHAEL JAMES O’FALLON A dissertation submitted in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY WASHINGTON STATE UNIVERSITY Department of Management and Operations DECEMBER 2007 To the Faculty of Washington State University: The members of the Committee appointed to examine the dissertation of MICHAEL JAMES O’FALLON find it satisfactory and...»

<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.