# «THE UNIVERSITY OF CHICAGO GRAMMATICAL METHODS IN COMPUTER VISION A DISSERTATION SUBMITTED TO THE FACULTY OF THE DIVISION OF THE PHYSICAL SCIENCES IN ...»

Here the antecedents Ai and the conclusion C are statements in S, v is the cost of using the rule in a derivation, and wi is the cost of deriving Ai. We will also denote such a rule by C =|v A1,..., Ak. We write the conclusion on the left to maintain consistency with the conventional notation for context-free grammars. Note that the antecedents Ai are unordered, so that C =|v A1, A2 and C= |v A2, A1 would be considered equivalent.

A derivation of C is a ﬁnite tree whose root is labelled with a rule C =|v A1,..., Ak, where the subtree rooted at the i-th child is a derivation of Ai. The leaves of this tree are rules with no antecedents. The cost of a tree is the sum of the weights of the rules at its nodes. We will denote the cost of a tree T by wt(T ).

Usually, we will work with lightest derivation problems that are in Chomsky Normal Form, i.e., inference rules will either have two antecedents,

in which case they will be called lexical rules.

We will be interested only in lightest derivation problems that are acyclic. A lightest derivation problem (S, R) is acyclic if there is an ordering ≤ of the statements in S such that, for each rule C =|v A1,..., Ak ∈ R, Ai ≤ C for i = 1,..., k.

In most cases, the lightest derivation problem will be speciﬁed implicitly by listing abstracts statements and rules which contain free variables. The lightest derivation problem is then obtained by substituting all possible values for each free variable in the abstract statements and rules to create concrete statements and rules. In such cases, listing all the inference rules may be prohibitively slow.

As an example, we can consider CKY parsing with PCFG’s as a lightest derivation problem. Suppose our a grammar is in Chomsky Normal Form, and we are parsing a string s1,..., sn. The lightest derivation problem will have statements X(i, j) for every nonterminal X and every pair of integers i, j with 1 ≤ i ≤ j ≤ n. X(i, j) will represent the statement that nonterminal X has the yield si,..., sj. The rules will be, for every binary rule X → Y Z in the grammar, and all i, j, k with 1 ≤ i ≤ j ≤ k ≤ n,

The cost w(X → λ) will be the negative log of the transition probability speciﬁed by the PCFG. The goal statement ⊥ is in this case S(1, n), where S is the start symbol of the grammar.

In this case, listing all the concrete inference rules requires O(mn3 ) time, where m is the number of productions in the PCFG. This is the running time of the CKY algorithm, which is too slow to use in many contexts.

4.1.2 Solving Lightest Derivation Problems via Dynamic Programming Let ∆S,R (C) be the set of derivations of C. We deﬁne the cost of C to be

We can use Observation 4.1.2 in Algorithm 1, which assumes that (S, R) is an acylic lightest derivation problem in Chomsky Normal Form. Note that this is a standard use of dynamic programming.

In this section, we deﬁne “contexts”, which are a sort of dual concept to derivations. A Bcontext for C is a derivation of B that is missing a derivation of C. One node of a B-context for C will be labeled with hole(C) to denote where a derivation of C could be attached. See also Felzenszwalb and McAllester [2007].

ocost is analogous to the outside cost in the Inside-Outside Algorithm for PCFG’s, but here we are trying to ﬁnd a lightest derivation rather than sum over all derivations.

Observation 4.1.4. Let (S, R) be an acylic lightest derivation problem. We can construct

**the set ΓS,R (C) recursively as follows:**

This is implemented by Algorithm 2, which assumes that our lightest derivation problem is in Chomsky Normal Form.

Algorithm 2 outside(S, R, COST ) Input: S, R, COST OCOST ← ∅ for X ∈ S do OCOST [X] ← ∞ end for OCOST [⊥] ← 0 for X ∈ S (in reverse ≤ order) do for X =|v Y Z ∈ R do OCOST [Y ] ← min{OCOST [Y ], v + OCOST [X] + COST [Z]} OCOST [Z] ← min{OCOST [Z], v + OCOST [X] + COST [Y ]} end for end for return OCOST

Deﬁnition 4.1.6 (Homomorphism). Let (S, R) and (S, R ) be lightest derivation problems.

A homomorphism of lightest derivation problems is a function Φ : S → S such that, for every C =|v A1,..., Ak ∈ R, there exists a rule Φ(C) =|w Φ(A1 ),..., Φ(Ak ) ∈ R, with w ≤ v. We also require that Φ(⊥) =⊥.

Homomorphisms are called “abstractions” in Felzenszwalb and McAllester [2007]. We deﬁne the speciﬁc homomorphisms used in Section 4.2.1.

** Lemma 4.1.**

7. Let Φ be a homomorphism of lightest derivation problems. Then costS,R (X) ≥ costS,R (Φ(X)). Also, ocostS,R (X) ≥ ocostS,R (Φ(X)).

Proof. Let T be the derivation achieving cost(S, R)(X). We can replace each rule C =|v A1,..., Ak used in the derivation with a rule Φ(X) =|w Φ(A1 ),... Φ(Ak ), thereby creating a valid derivation of Φ(X) whose total cost is lower than that of T. Thus the lightest derivation of Φ(X) also has cost lower than that of T.

A similar argument holds for contexts and ocostS,R.

Deﬁnition 4.1.8. Let (S, R) be a lightest derivation problem. A table C mapping statements to weights is a valid lower bound for (S, R) if, for every X ∈ S, cost(S,R) (X) ≥ C[X].

A table C mapping statements to weights is a valid lower bound on contexts for (S, R) if, for every X ∈ S, ocost(S,R) (X) ≥ C[X].

** Lemma 4.1.**

9. If Φ : S → S is a homomorphism, and C is a valid lower bound for (S, R ), and we deﬁne C[X] := C [Φ(X)], then C is a valid lower bound for (S, R). C is called the pullback of C under Φ. The same is true of valid lower bounds on contexts.

Proof. Let X =| A1,..., Ak be the rule that achieves X’s lightest derivation. Then

where T1 ∈ ΓX and T2 ∈ ∆X. Therefore, if cost(X) = minT ∈∆X wt(T ) and ocost(X) = minT ∈ΓX, then the lightest derivation of ⊥ using X has cost cost(X) + ocost(X).

** Lemma 4.1.**

11. If T is a lightest derivation of ⊥, and X does not appear in T, then the value of S, R is the same as the value of S \{X}, R, where R is R minus any rules involving X.

** Theorem 4.1.**

12. Let (S, R) be a lightest derivation problem, and let X be a statement.

Suppose that COST is a valid lower bound for (S, R), and OCOST is a valid lower bound on contexts for (S, R). Then the lightest derivation that uses X has cost at least COST [X]+ OCOST [X].

Consequently, if there is a derivation T of ⊥ with cost wt(T ) COST [X] + OCOST [X], then no lightest derivation of ⊥ uses X.

We can use this theorem to perform an admissible coarse-to-ﬁne search strategy. Consider a sequence of LDP’s (S1, R1 ),..., (Sk, Rk ), with homomorphisms Φi : Si → Si+1 for i = 1,..., k − 1. We can solve the problems in reverse order, using the bound of Theorem 4.1.12 to eliminate statements. This is implemented by Algorithms 3, 4, 5, and 6. Algorithms 3 and 4 are very similar to Algorithms 1 and 2, but they do not consider solutions that violate the lower bound of Theorem 4.1.12. Algorithm 6 shows how to use the inside pass and the outside pass to organize our search.

We now give an informal description of the algorithm. We start by solving the coarsest problem (Sk, Rk ). We then (using the function lif t) lift the solution found from the coarsest level to the ﬁnest level. This is a more standard use of coarse-to-ﬁne search. This gives us a derivation of ⊥ in (S1, R1 ), and the cost of that derivation is an upper bound on the cost of the optimal derivation. Throughout the running of the algorithm, the variable u will be the lowest cost seen for a derivation of ⊥ in (S1, R1 ).

Because of Theorem 4.1.12, we can delete any statement from (Si, Ri ) if we can prove that it does not participate in any derivation of cost u or lower. This is not only guaranteed not to change the minimum cost of a derivation in (Si, Ri ), it is guaranteed not to delete any statement X such that

** X = Φi−1 (Φi−2 (... (Φ1 (X)... ),**

for any statement X participating in an optimal derivation of ⊥ in (S1, R1 ). Thus, whenever we delete a statement X in (Si, Ri ), we can also delete all statements in ﬁner problems that map to X.

We alternate between an inside pass, where we compute a valid lower bound for (Si, Ri ), and an outside pass, where we compute a valid lower bound on contexts for (Si−1, Ri−1 ).

These are implemented by Algorithms 3 and 4, respectively. Algorithm 3 takes a lower bound on contexts as input, and uses this to delete any statements that do not pass the test of Theorem 4.1.12. This can be done because, in line 12 of Algorithm 3, when we consider X, we have already considered all rules with X on the left-hand side, and thus COST [X] is already a lower bound on the cost of any derivation of X that does not use any deleted statements.

Similarly, Algorithm 4 takes a lower bound for (Si−1, Ri−1 ) (derived in Algorithm 6 by taking the pullback of a valid lower bound for (Si, Ri )) as input, and uses this to delete any statements that do not pass the test of Theorem 4.1.12. This can be done because, in line 3 of Algorithm 4, when we consider X, we have already considered all rules that use X on the right-hand side, and thus OCOST [X] is at that point a lower bound on the cost of a context for X that does not use any deleted statements.

**We prove the correctness of Algorithm 6 with two lemmas:**

** Lemma 4.1.**

13. If OCOST is a valid lower bound on contexts for (Si, Ri ), and u is an upper bound on the optimal cost of a derivation of ⊥ in (Si, Ri ), then after

COST is a valid lower bound for (Si, Ri ). Note that inside deletes some statements and rules from (Si, Ri ).

** Lemma 4.1.**

14. If COST is a valid lower bound for (Si−1, Ri−1 ), and u is an upper bound on the optimal cost of a derivation of ⊥ in (Si−1, Ri−1 ), then after

We can apply this algorithm even when our homomorphisms do not satisfy the required lower bounds. In this case, we are not guaranteed to ﬁnd an optimal solution. Designing inadmissible homomorphisms that yield good results in practice is an important area of future work.

For the problem of detecting an object with a grammar in a cluttered image, we have the following lightest derivation problem: for every pair of points p, q, and every nonterminal X, we have the statement X(p, q).

For every rule X → Y Z in our grammar, and every triple of points p, q, r, we have

where P, Q, R are the sets of grid points in our original lightest derivation problem that map to p, q, r in the coarsened problem.

We also need a lower bound on the cost of rules mapping to lexical rules X(p, q) =|v.

We do this by simply lifting all costs from the ﬁnest level.

This correctly implements the data cost when there is no reuse of the same pixel by diﬀerent line segments. We discuss this in more detail in the next section.

When doing scene parsing, we will modify this formula slightly. (This idea was ﬁrst introduced in Amit and Trouv´ [2007].) Pixels that are in already selected objects will be e labelled foreground, and other pixels will initially be labelled background. Let F be the set of pixels already marked foreground.

In this section, we discuss local inference, a method we have used to clean up parses that exhibit unintended reuse. Our method is as follows: we ﬁrst parse the image to ﬁnd the highest-scoring curve, with or without unintended reuse. We then perform our local inference procedure, which we formulate as an energy minimization problem. We then reparse the image, using some constraints derived from the output of the local inference procedure.

Unintended reuse is a serious and fundamental problem in the interpretation of images.

Here, we deﬁne the problem and argue that it is not possible to avoid this problem using standard context-free grammatical models.

We can consider an interpretation of an image to consist of various assertions about the contents of the image. We can characterize these assertions by asking what region of the image they refer to, and in particular how large that region is.

Compositional models are those models in which the quality or plausibility of a highlevel assertion (i.e., one that refers to a large image region) is deﬁned recursively in terms of the quality of lower-level assertions. In order to make these problems amenable to dynamic programming, we must assume context-freeness. It is thus a general problem of compositional models that we can only eﬃciently and exactly compute the quality of high-level assertions by assuming context-freeness and allowing them to depend on low-level assertions that are contradictory. This can then result in incoherent interpretations of the image. In our case, straight-forward maximization of the likelihood under the statistical model described above yields curves that have many self intersections; curves in which long contours are used multiple times, often in opposite directions; and curves which do not enclose any area. A parse with unintended reuse is shown in Figure 4.1.

In general, we cannot sacriﬁce eﬃciency beyond a certain point, and so we must search Figure 4.1: A parse exhibiting unintended reuse.

for a coherent interpretation which may not score the highest, even among coherent interpretations.