«MULTI-CAMERA SIMULTANEOUS LOCALIZATION AND MAPPING Brian Sanderson Clipp A dissertation submitted to the faculty of the University of North Carolina ...»
In contrast, with a single camera absolute scale cannot be determined. Omnidirectional cameras based on parabolic mirrors provide wide-angle scene coverage but at the expense of uneven sampling of the visual sphere and also unknowable absolute scale. Some omnidirectional cameras are essentially small clusters of standard perspective cameras (ImmersiveMedia,Imove,Ladybug2) but without a large baseline between the cameras, these cannot be used to measure absolute, scaled ego-motion. This is a problem of signal to noise ratio where the baseline of the cameras is small relative to the scene depth making the scale constraint from the camera geometry weak. Another possible approach to scaled motion estimation from video was presented by Scaramuzza et al. (2009). They use the fact that the camera moves with nonholonomic motion in a plane to calculate the scaled camera motion Figure 3.1: Example of a multi-camera system on a vehicle with one point correspondence.
It can be difﬁcult to avoid losing part of the ﬁeld-of-view due of a single camera or omnidirectional camera due to occlusion, which may require camera cluster placement high up on a boom. Alternatively, for mounting on a vehicle the system can be split into two clusters so that one can be placed on each side of the vehicle and occlusion problems are minimized while giving a large baseline for scale estimation. In this chapter we will show that by using a system of two camera clusters, consisting of one or more cameras each, separated by a known transformation, the six degrees of freedom (DOF) of camera system motion, including scale, can be recovered.
An example of a multi-camera system for the capture of ground-based video is shown in Figure 3.1. It consists of two camera clusters, one on each side of a vehicle. The cameras are attached tightly to the vehicle and can be considered a rigid object. This system is used for the experimental evaluation of our approach.
Computing the scale, structure and camera motion from video of a general scene is
et al. investigated the properties of visual odometry for single-camera and stereo-camera systems. Their analysis showed that a single camera system is not capable of maintaining a consistent scale over time. Their stereo system is able to maintain absolute scale over
Figure 3.2: (a) Overlapping stereo camera pair, (b) Non-overlapping multi-camera system extended periods of time by using a known baseline and cameras with overlapping ﬁelds of view.
Our approach eliminates the requirement for overlapping ﬁelds of view and is able to maintain the absolute scale over time.
In the next section 3.2, we introduce our novel solution to ﬁnding the 6DOF motion of a two-camera system with non-overlapping views. We derive the mathematical basis for our technique in section 3.3 as well as give a geometrical interpretation of the scale constraint.
The algorithm used to solve for the scaled motion is described in section 3.4. Section 3.5 discusses the evaluation of the technique on synthetic data and on real imagery.
3.2 6DOF Multi-camera Motion The proposed approach addresses the 6DOF motion estimation of multi-camera systems with non-overlapping ﬁelds-of-view. Most previous approaches to 6DOF motion estimation have used camera conﬁgurations with overlapping ﬁelds of view, which allow correspondences to be triangulated simultaneously across multiple views with a known, rigid baseline. Our approach uses a temporal baseline where points are only visible in one camera at a given time. The difference in the two approaches is illustrated in Figure 3.2.
Our technique assumes we can establish at least ﬁve temporal correspondences in one of the cameras and one temporal correspondence in any additional camera. In practice, this assumption is not a limitation, as a reliable estimation of camera motion requires multiple correspondences from each camera due to noise.
The essential matrix which deﬁnes the epipolar geometry of a single freely moving cal
geometry. The ambiguity can be eliminated with additional points. With oriented geometry the rotation and the translation up to scale of the camera can be extracted from the essential matrix. Consequently, a single camera provides 5DOF of the camera motion. The remaining degree is the scale of the translation. Given these 5DOF of multi-camera system motion (rotation and translation direction), we can compensate for the rotation of the system. Our approach is based on the observation that given the temporal epipolar geometry of one of the cameras, the position of the epipole in each of the other cameras of the multi-camera system is restricted to a line in the image. Hence, the scale as the remaining degree of freedom of the camera motion describes a linear subspace.
In the next section, we derive the mathematical basis of our approach to motion recovery.
We consider a system involving two cameras, rigidly coupled with respect to each other.
The cameras are assumed to be calibrated. Figure 3.3 shows the conﬁguration of the twocamera system. The cameras are denoted by C1 and C2, at the starting position and C1 and C2 after a rigid motion.
We will consider the motion of the camera-pair to a new position. Our purpose is to determine the motion using image measurements. It is possible through standard techniques to compute the motion of the cameras up to scale, by determining the motion of just one of the cameras using point correspondences from that camera. However, from one camera, motion can be determined only up to scale. The direction of the camera translation may be determined, but not the magnitude of the translation. It will be demonstrated in this chapter that a single correspondence from the second camera is sufﬁcient to determine the scale of the motion, that is, the magnitude of the translation. This result is summarized in the following theorem.
Theorem 1. Let a two camera system have initial conﬁguration determined by camera matrices P1 = [I | 0] and P2 = [R2 | − R2 C2 ].
Suppose it moves rigidly to a new position for which the ﬁrst camera is speciﬁed by P1 = [R1 | − λR1 C1 ]. Then the scale of the translation, λ, is determined by a single point correspondence x ↔ x seen in the second camera according to the formula
where A = R2 R1 [(R1 − I)C2 ]× R2 and B = R2 R1 [C1 ]× R2. In this chapter [a]× b denotes the skew-symmetric matrix inducing the cross product a × b.
Figure 3.3: Motion of a multi-camera system consisting of two rigidly coupled conventional cameras.
In order to simplify the derivation we assume the coordinate system is centered on the initial position of the ﬁrst camera, so that P1 = [I | 0]. Any other coordinate system is easily transformed to this one by a Euclidean change of coordinates.
Observe also that after the motion, the ﬁrst camera has moved to a new position with camera center at λC1. The scale is unknown at this point because in our method we propose as a ﬁrst step determining the motion of the cameras by computing the essential matrix of the ﬁrst camera over time. This allows us to compute the motion up to scale only. Thus, the scale λ remains unknown. We now proceed to derive Theorem 1. Our immediate goal is to determine the camera matrix for the second camera after the motion. First note that the camera P1 may be written as
where the matrix T, is the Euclidean transformation induced by the motion of the camera pair. Since the second camera undergoes the same Euclidean motion, we can compute the camera P2 to be
Now, given a single point correspondence x ↔ x as seen in the second camera, we may determine the value of λ, the scale of the camera translation. The essential matrix
equation x E2 x = 0 yields x Ax + λx Bx = 0, and hence:
So each correspondence in the second camera provides a measure for the scale. In the next section, we give a geometric interpretation for this constraint.
3.3.1 Geometric Interpretation The situation may be understood via a different geometric interpretation, shown in Figure 3.4. We note from (3.2) that the second camera moves to a new position C2 (λ) = R1 C2 + λC1. The locus of this point for varying values of λ is a straight line with its direction vector C1, passing through the point R1 C2. From its new position, the camera observes a point at position x in its image plane. This image point corresponds to a ray v along which the 3D point X must lie. If we think of the camera as moving along the line C2 (λ) (the locus of possible ﬁnal positions of the second camera center), then this ray traces out a plane Π; the 3D point X must lie on this plane.
On the other hand, the point X is also seen (as image point x) from the initial position of the second camera, and hence lies along a ray v through C2. The point where this ray meets the plane Π must be the position of the point X. In turn, this determines the scale factor λ.
3.3.2 Critical conﬁgurations This geometric interpretation allows us to identify critical conﬁgurations in which the scale factor λ cannot be determined. As shown in Figure 3.4, the 3D point X is the intersection of
Figure 3.4: The 3D point X must lie on the plane traced out by the ray corresponding to x for different values of the scale λ.
It also lies on the ray corresponding to x through the initial camera center C2.
the plane Π with a ray v through the camera center C2. If the plane does not pass through C2, then the point X can be located as the intersection of plane and ray. Thus, a critical conﬁguration can only occur when the plane Π passes through the second camera center, C2.
According to the construction, the line C2 (λ) lies on the plane Π. For different 3D points X, and corresponding image measurement x, the plane will vary, but always contain the line C2 (λ). Thus, the planes Π corresponding to different points X form a pencil of planes hinged around the axis line C2 (λ). Unless this line actually passes through C2, there will be at least one point X for which C2 does not lie on the plane Π, and this point can be used to determine the point X, and hence the scale.
Finally, if the line C2 (λ) passes through the point C2, then the method will fail. In this case, the ray corresponding to any point X will lie within the plane Π, and a unique point of intersection cannot be found.
In summary, if the line C2 (λ) does not pass through the initial camera center C2, almost any point correspondence x ↔ x may be used to determine the point X and the translation scale λ. The exceptions are point correspondences given by points X that lie in the plane deﬁned by the camera center C2 and the line C2 (λ) as well as far away points for which Π and v are almost parallel. The interested reader may wish to read another analysis of the critical conﬁgurations for scale estimation in non-overlapping multi-camera systems given in (Kim and Chung, 2006).
If on the other hand, the line C2 (λ) passes through the center C2, then the method will always fail. It may be seen that this occurs most importantly if there is no camera rotation, namely R1 = I. In this case, we see that C2 (λ) = C2 + λC1, which passes through C2. It is easy to give an algebraic condition for this critical condition. Since C1 is the direction vector of the line, the point C2 will lie on the line precisely when the vector R1 C2 − C2 is in the direction C1. This gives a condition for singularity (R1 C2 − C2 ) × C1 = 0, or rearranging this expression, and observing that the vector C2 × C1 is perpendicular to the plane of the three camera centers C2, C1 and C1 (the last of these being the coordinate
origin), we may state:
Theorem 2. The critical condition for singularity for scale determination is
In particular, the motion is not critical unless the axis of rotation is perpendicular to the plane determined by the three camera centers C2, C1 and C1.
Intuitively, critical motions occur when the rotation induced translation R1 C2 − C2 is aligned with the translation C1. The most common motion that causes a critical condition is when the camera system translates but has no rotation. Another common, but less obvious, critical motion occurs when both camera paths move along concentric circles. This conﬁguration is illustrated in Figure 3.5. A vehicle borne multi-camera system turning at a constant rate undergoes critical motion, but not when it enters and exits a turn.
Detecting critical motions is important to determining when the scale estimates are
reliable. One method to determine the criticality of a given motion is to use the approach of (Frahm and Pollefeys, 2006). We need to determine the dimension of the space that includes our estimate of the scale. To do this we double the scale λ and measure the difference in the fraction of inliers to the essential matrix of our initial estimate and the doubled scale essential matrix. If a large proportion of inliers are not lost when the scale is doubled then the scale is not observable from the data. If the scale is observable, the deviation from the estimated scale value would cause the correspondences to violate the epipolar constraint, which means they are outliers to the constraint for the doubled scale.
When the scale is ambiguous, doubling the scale does not cause correspondences to be classiﬁed as outliers. This method proved to work on real data sets in practice.
3.4 Algorithm Figure 3.6 shows an algorithm to solve relative motion of two generalized cameras from 6 rays with two centers where 5 rays meet one center and a sixth ray meets the other center. First, we use 5 correspondences in one ordinary camera to estimate an essential Figure 3.6: Algorithm for estimating 6DOF motion of a multi-camera system with nonoverlapping ﬁelds of view.
matrix between two frames in time. The algorithm used to estimate the essential matrix