# «THE UNIVERSITY OF CHICAGO GRAMMATICAL METHODS IN COMPUTER VISION A DISSERTATION SUBMITTED TO THE FACULTY OF THE DIVISION OF THE PHYSICAL SCIENCES IN ...»

Deﬁnition 6.2.2. A decomposition family F = (I, D) is θ-ﬂexible if, for every interval [i, j) ∈ I, and every k such that i k j, there is a decomposition

A θ-ﬂexible decomposition family is desirable because, for any binary parse tree, there is a similar parse tree that is allowable under the decomposition family. Here, similar means that, as we recursively subdivide a curve into subcurves, we can always choose a midpoint whose index diﬀers from a given midpoint by at most a ﬁxed percentage (e.g., 10% when θ = 10 or 5% when θ = 20 ) of the length of the subcurve, as shown in Figure 6.1. A θ-ﬂexible decomposition family thus approximates the complete decomposition family, in some sense.

**Another notion of approximation is θ-completeness:**

Deﬁnition 6.2.3. Let s be a string of length n. Let F = (I, D) be a decomposition family for s. An interval [i, j) is called reachable if there is a sequence of intervals xt such that x0 = [0, n), xk = [i, j), and there is a decomposition with xt on the left-hand side and xt+1 on the right-hand side for every t between 0 and k − 1.

Deﬁnition 6.2.4. A decomposition family F = (I, D) is θ-complete if, for every interval [i, j), 0 ≤ i j ≤ n, there is a reachable interval [i, j ) ∈ I such that

Proof. Let [i, j) be an interval. Starting with the interval [0, n), we build a sequence of intervals by picking midpoints so that [i, j) is wholly contained in one side or the other each time. At each step, we then take the interval containing [i, j) as the next interval in our sequence.

When we can no longer do this, we have an interval [k, ) such that k ≤ i j ≤. We claim that k − i 2θ(j − i)

Decomposition families are interesting because they allow us to speed up parsing by searching over a restricted set of parses.

** Theorem 6.3.**

1. Let G be a context-free grammar with k rules. Let s be a string of length n. Let F = (I, D) be a decomposition family for s. Then there is a dynamic programming algorithm (Algorithm 8) to approximately (relative to F) parse s in time O(|D|k).

Traditional parsing corresponds to the complete decomposition family, which has size Θ(n3 ). Algorithm 8 with the complete decomposition family is equivalent to Algorithm 7.

Fortunately, we can construct sparse decomposition families of size O(n), which leads to a linear time approximate parsing algorithm.

Algorithm 8 Parsing with a decomposition family Input: string s, decomposition family (I, D), context-free grammar (X, R) for X ∈ X do for i = 0 to n − 1 do COST [X, [i, i + 1)] ← w(X → s[i]) end for end for for [i, j) ∈ I, ordered by increasing length do COST [X, [i, j)] ← ∞ for k such that [i, j) → [i, k)[k, j) ∈ D do for X → Y Z ∈ R do COST [X, [i, j)] ← min{COST [X, [i, j)], w(X → Y Z) + COST [Y, [i, k)] + COST [Z, [k, j)]} end for end for end for Figure 6.2: Illustrating the construction from Theorem 6.4.1, with k = 4. Rectangles denote portions of the string between members of the index set. A selection of intervals that live at each level are shown.

** Theorem 6.4.**

1. Let s be a string of length n, and let k be an integer. Then there is a 1 decomposition family (I, D) for s that has at most 2nk 2 decompositions.

k -ﬂexible Proof. Let C be the set of integers from 0 to n, inclusive. We will call this the index set.

We recursively deﬁne a sequence of subsampled index sets C (i) as follows: C (0) = C, and C (i+1) is obtained from C (i) by taking every other index. The number of indices in all C (i) combined is at most 2n.

We will talk about the length of an interval relative to an index set. An interval [a, b) in index set C (i) if there are − 1 indices between a and b in C (i).

has length An interval [a, b) is allowable if there is some C (i) which contains both a and b, and [a, b) has length at most k in C (i), i.e., there are at most k − 1 other indices between them in C (i).

The number of allowable intervals is then at most 2nk.

We will say that an interval [a, b) lives in level i (for i ≥ 0) if a and b are both in C (i), and if the length of the interval relative to C (i) is strictly greater than k/2 and at most k.

When i = 0, we will not have the minimum length requirement, so all intervals of C of length at most k live in level 0. It is straightforward to see that each allowable interval lives in a single level.

If [a, b) is an interval that lives in level i, then we can decompose it as [a, b) → [a, c)[c, b) by picking midpoints c from among the indices in C (i). Some of these choices will lead to intervals that live in level i − 1.

This decomposition family is k -ﬂexible. When picking midpoints at level i, our interval has length at least k/2 in C (i), so we have at least k/2 evenly spaced midpoint choices. We can thus choose a midpoint that is within c points of any desired midpoint, where c is at most k times the length of the interval we are splitting.

There are at most k midpoints per allowable interval, so the number of composition rules is at most 2nk 2.

This demonstrates the existence of linear-size θ-ﬂexible decomposition families, which yields a linear time approximate parsing algorithm. This in turn means that we can use context-free grammars to parse long strings, which expands the range of domains where we can use context-free grammar models.

APPENDIX A

## MODELS OF LEAVES

Figure A.1: Training examples from class #1.Figure A.2: Samples from learned model for class #1.

Figure A.3: Training examples from class #2.

Figure A.4: Samples from learned model for class #2.

Figure A.5: Training examples from class #3.

Figure A.6: Samples from learned model for class #3.

Figure A.7: Training examples from class #4.

Figure A.8: Samples from learned model for class #4.

Figure A.9: Training examples from class #5.

Figure A.10: Samples from learned model for class #5.

Figure A.11: Training examples from class #6.

Figure A.12: Samples from learned model for class #6.

Figure A.13: Training examples from class #7.

Figure A.14: Samples from learned model for class #7.

Figure A.15: Training examples from class #8.

Figure A.16: Samples from learned model for class #8.

Figure A.17: Training examples from class #9.

Figure A.18: Samples from learned model for class #9.

Figure A.19: Training examples from class #10.

Figure A.20: Samples from learned model for class #10.

Figure A.21: Training examples from class #11.

Figure A.22: Samples from learned model for class #11.

Figure A.23: Training examples from class #12.

Figure A.24: Samples from learned model for class #12.

Figure A.25: Training examples from class #13.

Figure A.26: Samples from learned model for class #13.

Figure A.27: Training examples from class #14.

Figure A.28: Samples from learned model for class #14.

Figure A.29: Training examples from class #15.

Figure A.30: Samples from learned model for class #15.

Yali Amit and Alain Trouv´. Pop: Patchwork of parts models for object recognition. Internae

**tional Journal of Computer Vision, 75(2):267–282, November 2007. ISSN 0920-5691. doi:**

10.1007/s11263-006-0033-9. URL http://dx.doi.org/10.1007/s11263-006-0033-9.

R. Basri, L. Costa, D. Geiger, and D. Jacobs. Determining the similarity of deformable shapes. Vision Research, 38(15-16):2365–2385, August 1998. doi: 10.1016/S0042URL http://dx.doi.org/10.1016/S0042-6989(98)00043-1.

E. J. Bernstein and Y. Amit. Part-based statistical models for object classiﬁcation and detection. volume 2, pages 734–740 vol. 2, 2005. doi: 10.1109/CVPR.2005.270. URL http://dx.doi.org/10.1109/CVPR.2005.270.

Elie Bienenstock, Stuart Geman, and Daniel Potter. Compositionality, mdl priors, and object recognition. In Neural Information Processing Systems, pages 838–844. MIT Press, 1997.

J. Canny. A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-8(6):679–698, November 1986. ISSN 0162-8828. doi: 10.1109/TPAMI.1986.4767851. URL http://dx.doi.org/10.1109/TPAMI.1986.4767851.

Eugene Charniak. Statistical Language Learning (Language, Speech, and Communication).

The MIT Press, September 1996. ISBN 0262531410.

C. Cook. Grammatical inference by hill climbing. Information Sciences, 10 (1):59–80, 1976. ISSN 00200255. doi: 10.1016/0020-0255(76)90061-X. URL http://dx.doi.org/10.1016/0020-0255(76)90061-X.

T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham. Active shape models - their training and application. Comput. Vis. Image Underst., 61(1):38–59, Jan 1995. ISSN 1077-3142. doi: 10.1006/cviu.1995.1004.

A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1):1–38, 1977. ISSN 00359246. doi: 10.2307/2984875. URL http://dx.doi.org/10.2307/2984875.

I. L. Dryden and Kanti V. Mardia. Statistical Shape Analysis. Wiley, 1 edition, September

1998. ISBN 0471958166.

Fellbaum. WordNet: An Electronic Lexical Database (Language, Speech, and Communication). The MIT Press, illustrated edition edition, May 1998. ISBN 026206197X.

P. Felzenszwalb and D. Huttenlocher. Pictorial structures for object recognition, 2003. URL http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.12.6365.

P. Felzenszwalb and J. Schwartz. Hierarchical matching of deformable shapes.

In Computer Vision and Pattern Recognition, 2007. CVPR ’07. IEEE Conference on, pages 1–8, 2007. doi: 10.1109/CVPR.2007.383018. URL http://dx.doi.org/10.1109/CVPR.2007.383018.

P. F. Felzenszwalb. Representation and detection of deformable shapes. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 27(2):208–220, February 2005. ISSN 0162doi: 10.1109/tpami.2005.35. URL http://dx.doi.org/10.1109/tpami.2005.35.

Pedro Felzenszwalb. Representation and Detection of Shapes in Images. PhD thesis, Massachusetts Institute of Technology, September 2003.

**K. S. Fu. A step towards uniﬁcation of syntactic and statistical pattern recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-8(3):**

398–404, May 1986. ISSN 0162-8828. doi: 10.1109/TPAMI.1986.4767800. URL http://dx.doi.org/10.1109/TPAMI.1986.4767800.

U. Grenander, Y. Chow, and D. M. Keenan. Hands: A pattern Theoretic Study of Biological Shapes, volume 2 of Research Notes in Neural Computing. Springer, New York, 1991.

Ulf Grenander and Michael Miller. Pattern Theory: From Representation to Inference. Oxford University Press, Inc., New York, NY, USA, 2007. ISBN 0199297061, 9780199297061.

URL http://portal.acm.org/citation.cfm?id=1512854.

**Feng Han and Song C. Zhu. Bottom-up/top-down image parsing with attribute grammar. IEEE Trans. Pattern Anal. Mach. Intell., 31(1):59–73, 2009. ISSN 0162-8828. doi:**

10.1109/TPAMI.2008.65. URL http://dx.doi.org/10.1109/TPAMI.2008.65.

Ya Jin and S. Geman. Context and hierarchy in a probabilistic image model. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, volume 2, pages 2145–2152, 2006. doi: 10.1109/CVPR.2006.86. URL http://dx.doi.org/10.1109/CVPR.2006.86.

Mark Johnson, Thomas L. Griﬃths, and Sharon Goldwater. Bayesian inference for pcfgs via markov chain monte carlo. In In Proceedings of the North American Conference on Computational Linguistics, 2007. URL http://acl.ldc.upenn.edu/N/N07/N07-1018.pdf.

**Benjamin Kimia, Ilana Frankel, and Ana-Maria Popescu. Euler spiral for shape completion. International Journal of Computer Vision, 54(1):159–182, August 2003. doi:**

10.1023/A:1023713602895. URL http://dx.doi.org/10.1023/A:1023713602895.

H. Ling and D. Jacobs. Shape classiﬁcation using the inner-distance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29:286–299, 2007. URL http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.83.814.

David Marr. Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. W. H. Freeman, March 1983. ISBN 0716715678.

Daniel F. Potter. Compositional pattern recognition. PhD thesis, Providence, RI, USA, 1999.

URL http://portal.acm.org/citation.cfm?id=929684.

Aristid Prusinkiewicz, Przemyslaw; Lindenmayer. The Algorithmic Beauty of Plants.

Springer-Verlag, 1990. ISBN 9780387972978.

**Bryan C. Russell, Antonio Torralba, Kevin P. Murphy, and William T. Freeman. LabelMe:**

A database and web-based tool for image annotation. International Journal of Computer Vision, 77(1):157–173, May 2008. ISSN 0920-5691. doi: 10.1007/s11263-007-0090-8. URL http://dx.doi.org/10.1007/s11263-007-0090-8.

O S¨derkvist. Computer vision classiﬁcation of leaves from swedish trees. Master’s thesis, o Linkoping University, 2001.

Andreas Stolcke. Bayesian learning of probabilistic language models. PhD thesis, Berkeley, CA, USA, 1994. URL http://portal.acm.org/citation.cfm?id=221997.

Zhuowen Tu, Xiangrong Chen, Alan Yuille, and Song-Chun Zhu. Image parsing: Unifying segmentation, detection, and recognition. International Journal of Computer Vision, 63(2):113–140, July 2005. ISSN 0920-5691. doi: 10.1007/s11263-005-6642-x. URL http://dx.doi.org/10.1007/s11263-005-6642-x.

<

Long Zhu, Yuanhao Chen, and Alan Yuille. Unsupervised learning of probabilistic grammar-markov models for object categories. IEEE Trans. Pattern Anal. Mach. Intell., 31(1):114–128, 2009. ISSN 0162-8828. doi: 10.1109/TPAMI.2008.67. URL http://dx.doi.org/10.1109/TPAMI.2008.67.

Song C. Zhu and David Mumford. A stochastic grammar of images. Found. Trends. Comput. Graph. Vis., 2(4):259–362, 2006. ISSN 1572-2740. doi: 10.1561/0600000018. URL