# «THE UNIVERSITY OF CHICAGO GRAMMATICAL METHODS IN COMPUTER VISION A DISSERTATION SUBMITTED TO THE FACULTY OF THE DIVISION OF THE PHYSICAL SCIENCES IN ...»

Hierarchical curve models have room for improvement, as can be seen in Figure 2.2. The variations do not respect the perceptual structure of the original curve; in particular, we do not see articulated parts being articulated.

In this chapter, we describe a grammatical reformulation of the work of Felzenszwalb and Schwartz [2007], which we hope will improve upon it in two ways. Firstly, we hope to allow for structural variation, which we argue is an important goal for computer vision in Section

5.1. Secondly, we give a generative probabilistic model of curves, which allows us to retrain the parameters of a grammar in a mathematically sound way, rather than optimizing many parameters in an ad-hoc manner (which will tend to be expensive and fragile).

The rest of this chapter is structured as follows: in Section 2.2, we give an informal discussion of L-systems to motivate our models. In Section 2.3, we describe a method for sub-sampling curves in order to process them more eﬃciently. In Section 2.4, we lay out our grammar formalism. In Section 2.5, we show samples from several example grammars in order to demonstrate the expressiveness of our models. In Section 2.6, we describe how to use our models to parse novel curves. In Section 2.7, we describe our method of building models from a single example curve. In Section 2.8, we give models of triangle deformation, which are the building blocks of our models of shape. In Section 2.9, we give more details on how to set initial parameters of our models. Finally, in Section 2.10, we show output from our parsing algorithm.

** 2.2 Motivating Example: L-Systems**

To motivate our discussion of curve grammars, we will ﬁrst give an amusing example of such a grammar, which is inspired by L-systems. A Lindenmayer system, or L-system, is a parallel string rewriting system. L-systems were formulated to model the shape of plants and other organisms. They are deﬁned on strings, which are then transformed into pictures using a “turtle language” akin to Logo. They are a simple and intuitive way to generate fractal images. Since our formalisms are diﬀerent from L-systems, we will not discuss their details, which can be found in Prusinkiewicz [1990].

We will informally discuss curve rewriting systems, where we iteratively replace straight lines with curves composed of straight lines. We will represent our curves like this:, so that we remember which endpoint is which.

**Consider the following system1 :**

→.

→ →.

If we start from, we successively generate,, and.

Suprisingly, after a large number of iterations, a pattern arises that may be familiar: Sierpinski’s triangle! After eight iterations, we have (ﬁlling in the dashed lines for simplicity) the curve shown in Figure 2.3.

These curves were generated by Inkscape’s L-system function, which has a randomizing option. It is interesting to randomize the last example. In Figure 2.4, we see a randomized version C obtained with fairly little noise (10% randomization in the length of line segments, and 5% randomization in the angles. These parameters only make sense in a more standard L-system framework). This has no global similarity at all to the other curve! Inkscape is using a Markov-like source of randomness, and little errors at the local level add up to huge changes at the global level. (This is exactly the issue shape-tree addresses in contrast to other deformable models.) If we are going to have a statistical model for perceptual similarity of curves, we will have to ﬁnd a way to introduce long-range dependencies.

This brings us to our next section: probabilistic context-free grammars on curves. As a

1. generated by A = A + A + A − − with angle 60◦ in Inkscape

2. generated by A = B + A + B; B = A − B − A with angle 60◦ in Inkscape Figure 2.3: Sierpinski’s triangle emerging from an L-system.

preview, we show several random perturbation of Sierpinski’s triangle in Figure 2.5 that do retain overall similarity.

** Figure 2.4: The curve from Figure 2.**

3, deformed randomly using a local model.

** Figure 2.5: The curve from Figure 2.**

3, deformed randomly using our curve models.

A continuous plane curve is a continuous function C : [0, 1] → R2. C is closed if C(0) = C(1). In order to handle curves computationally, we approximate them with discrete curves (referred to as “curves” hereafter). We create a curve by choosing a ﬁnite number of sample points p0,..., pn ∈ R2, where pi = C(ti ) for some 0 ≤ t0 t1... tn ≤ 1. A curve is closed if p0 = pn.

We can concatenate open curves C and D if the last point of C is the ﬁrst point of D.

We will denote this curve by CD.

We will denote an oriented line segment going from point p to point q by p,q. A curve will then have the form

** p0,p1 p1,p2 · · · pn−1,pn,**

and p0 will be equal to pn iﬀ the curve is closed. For a closed curve, this sequence is circular, and we consider any cyclic permutation of the line segments to be the same curve.

We will denote the length of a curve in segments by |C|. A curve C will have |C| + 1 sample points (counting p0 = pn doubly if C is closed). We will denote the i-th sample point of C by C[i], where i ranges from 0 to |C|.

** 2.3.1 Discretizing for Eﬃciency**

Our inference and learning algorithms run in time that depends on the number of sample points in a curve. In particular, parsing takes time that is cubic in the number of sample points. We can therefore vastly speed up inference and learning by working with subsampled curves. In this section, we describe an algorithm that approximates the shape of a curve with another, coarser curve.

Let C be a curve with N points. We wish to produce a curve C such that (a) C approximates the shape of C, (b) |C | ≈ L, and (c) C is sampled relatively uniformly from C. We do this by minimizing the objective function (2.1). The ﬁrst term measures the total deviation between C and C, and the second term rewards relatively uniform sampling at the correct rate.

where λi is the length of the i-th segment and λ = N/L is the “ideal” segment length. Here the ni are the indices of the points selected to appear in the downsampled version of the curve, and the sum in the ﬁrst term compares each point of the original curve to what we would interpolate it to be using the downsampled curve. If C is closed, then pni +j wraps around: pni +j = pni +j mod N.

This minimization can be done with a straightforward dynamic program, where we compute the minimum cost of approximating each subcurve of the curve. Note that we do not ﬁx the number of sample points in advance, but instead use the number minimizing the cost.

The results of the algorithm can be seen in Figure 2.6. Note that, while we lose ﬁne detail, the resulting curves closely approximate the original curves.

2.4 Stochastic Shape Grammars: A Generative Model for Curves In this section, we deﬁne Probabilistic Context-Free Shape Grammars, a probabilistic model that allows us to generate random curves, parse curves to ﬁnd their likelihood, and ultimately learn distributions over a class of curves.

Analogously to nonterminals in a traditional context-free grammar, we introduce placed symbols. A placed symbol is of the form Xp,q, where X is a member of a ﬁnite alphabet N or the special symbol, and p, q are points. Xp,q represents an oriented curve of type X going from p to q. The path between the endpoints is unspeciﬁed.

We specify the type X so that diﬀerent nonterminals can be related by the grammar.

Recall and from the section on L-systems, which represented diﬀerent kinds of curves because they were on the left-hand side of diﬀerent productions. Therefore, X should be thought of as specifying a particular class of paths between any two endpoints. By itself, X is an abstract symbol that can be instantiated as a placed symbol between any p and q.

The special symbol denotes a line segment. This takes the place of terminals in traditional context-free grammars.

where α ∈ N ∪ { }. As with curves, p0 will be equal to pn iﬀ the curvilinear form is closed, and two closed curvilinear forms will be considered equivalent if they diﬀer by a cyclic permutation.

An abstract curvilinear form is a sequence of abstract symbols (with no associated geometric information) α(0) α(1) · · · α(n−1).

We will specify whether these are open or closed, since there is no other way to tell. Again, closed abstract curvilinear forms will be considered equivalent if they diﬀer by a cyclic permutation.

We next introduce substitutions and substitution rules, which allow us to transform curvilinear forms, ultimately producing curves. We will perform substitutions of the form

Since Xp,q represents an unspeciﬁed path from p to q, substitution simply gives a more speciﬁc route, in terms of which points we will visit in between (the pi, which we will call the midpoint if there is only one of them, and control points otherwise) and what sort of path we will follow between these points (the Y (i) ).

In order to give a substitution rule for performing substitutions, we need to give

• An abstract substitution rule X → Y (1) · · · Y (k).

• a rule for determining the pi in terms of p and q.

In practice, applying a substitution rule is problematic because the pi live in an inﬁnite domain (R2 ), but we want to deal with curves that (1) live in a ﬁnite domain (the pixels of the image plane) and (2) are expected to exhibit a good deal of variation. Thus, we give a distribution µX→Y (1) ···Y (k) (p1,..., pk−1 ; p, q) over the pi called the control point distribution. When there is only a single control point, we will call µX→Y Z (p1 ; p, q) the midpoint distribution.

Deﬁnition 2.4.2. A probabilistic context-free shape grammar (PCFSG) is a tuple G = (N, R, S,, M, X ), where

• N is a set of abstract symbol types, which we will call nonterminals

• R is a set of abstract substitution rules, with R(X) being the set of rules in R with X on the left-hand side. The rules are of the form X → Y1... Yk, with the Yi in N ∪ { }.

• S is the starting set of abstract curvilinear forms

• is a special curve type representing a line segment

• X = {ρX | X ∈ N } ∪ {ρS }, where ρX is a probability distribution over R(X), and ρS is a distribution over S

• M = {µX→Y (1) ···Y (k) | (X → Y (1) · · · Y (k) ) ∈ R} is a set of control-point distributions.

It is worth noting that G has a corresponding abstract grammar Gabs = (N, R, S, { }, X ) which is just a traditional PCFG. Gabs is an odd PCFG because it only generates strings of the symbol ; nevertheless, many properties of G are actually properties of Gabs.

For a grammar that generates open curves, we can assume that S has a single curvilinear form S, and we will call S the start symbol. We sample from such a PCFSG by starting with the curvilinear form Sp,q for arbitrary p and q. While our curvilinear form contains a placed symbol Xp,q such that X =, we pick a random substitution rule X → Y (1)... Y (k) according to ρX, and pick random control points p1,..., pk−1 according to µX→Y (1)...Y (k) (p1,..., pk−1 ; p, q ). We then replace the placed symbol Xp,q with the (1) (k) curvilinear form Yp,p... Yp,q. We will disallow substitutions in which any two control 1 k−1 points pi, pj are equal, or in which any control point pi is equal to either of the endpoints p or q. This prevents us from having degenerate placed symbols Xp,p.

There is a slight diﬃculty here in that we usually cannot deﬁne the control-point distribution if p and q are equal. This is important, since we are mainly interested in closed curves, so we would like to start with Sp,p for some arbitrary p. In this case, our set S of starting forms will contain abstract curvilinear forms of length two, which are understood to be closed. If XY ∈ S is our starting form, we start with the curvilinear form Xp,q Yq,p for arbitrary p, q. We choose a random such curvilinear form according to ρS, and then continue as before.

** 2.4.1 Restricted Classes of Shape Grammars**

In this section, we deﬁne some restricted classes of PCFSG’s. Our restrictions are only on the abstract part of the grammar, so we are just specifying restricted classes of traditional grammars.

For simplicity and eﬃciency, we will usually restrict our abstract grammars to be in

**Chomsky Normal Form, in which abstract substitution rules can only be of two forms:**

• Binary rules of the form X → Y Z.

• Lexical rules of the form X →.

From now on, we will assume that PCFSG’s are in Chomsky Normal Form. Accordingly, we will speak of midpoints, rather than control points. It is important to note that PCFSG’s diﬀer from standard PCFG’s in that not all PCFSG’s are equivalent to a PCFSG in Chomsky Normal Form. This is because not all control point distributions can be realized as a product of midpoint distributions. For instance, suppose we have a rule A → BCD, which in the PCFG setting could be replaced with rules A → BX, X → CD. We could have a control point distribution µA→BCD (p1, p2 ; p, q) in which p1, p2 were on a random circle going through p and q. If we try to replicate such a control point distribution with midpoint distributions µA→BX (p1 ; p, q), µX→CD (p2 ; p1, q), we cannot, because the distribution µX→CD (p2 ; p1, q) can only depend on the two points p1 and q, and that is not suﬃcient to specify a unique circle.

** 2.5 Example Grammars**

Here we demonstrate the expressiveness of grammatical curve models with some simple examples. We have built three grammars by hand, by taking the curves shown in Figure 2.8, and specifying decompositions of them. We show the rules of one of the grammars in Figure

2.7. We show samples from the grammars in Figures 2.9, 2.10, and 2.11. In particular, we