«An Autonomous Vision-Guided Helicopter Omead Amidi August 1996 Department of Electrical and Computer Engineering Carnegie Mellon University ...»
The modules configure cameras using the NTSC format in non-interlaced mode with a special setting to provide the same image field updated at 60 Hz frequency. The sync generator also incorporates counters to count each horizontal image line. The line numbers from this counter are used by other modules to trigger and tag incoming attitude data from gyroscopes. In effect, this tagging mechanism synchronizes attitude data acquisition with each camera shutter opening and image video line.
3.3.5 Sensor Bridge Module
Triggered by the sync generator, a sensor bridge module tags camera images with sensor data for the visual odometer machine. The sensor bridge can capture data from a variety of sources, including four independent A/D converters, ten quadrature decoder circuits, and digital input lines. In addition, the sensor bridge can output data through four D/A converters and a number of digital output lines.
The block diagram of this module is shown in Figure 3-4.
A distinguishing feature of this module is its configurable state machine, which can arbitrarily define the type and sequence of data to be acquired or outputted. By encoding all input/output data into packets, this module reduces system YO complexity and provides a simple interface to a large number of sensors and external control outputs through two communication ports.
3.3.6 Image Display Image display is not critical for the visual odometer machine operation, but is necessary for viewing processed images. The visual odometer machine employs a DIA module for image display. The mod
ule is similar to the M D in several respects. As shown in Figure 3-5, the module supports the same variable clocking and blanking control circuitry as does the A D.These components enable image display on a range of monitors with different horizontal frequencies and resolution. For proper image synchronization on the monitor, an image synchronization handshaking procedure similar to the A/D module is performed at the input ports.
To help eliminate redundant image transfers for simple display purposes, the display module has separate input datapaths for image and overlay. For example, the image received from the A/Dcould be displayed while the overlay plane is updated by external processing elements.
Chapter 3. A Real-Time and Low Latency Visual Odometer Machine 61Data Flow and Synchronization The D/A module is implemented with a BrookTree D/A chip set and clock generator circuits.
The module supports triple LUTs for 256 pseudo-color display and 16 overlay colors, and can receive data asynchronously through its input ports and refresh an NTSC monitor in real-time solely from its input data received from comm-ports. The image transmitting modules need simply to send image data at predefined sizes and the module automatically produces the proper synchronization for display, dramatically reducing image display system complexity.
3.4 Data Flow and Synchronization
The tracking algorithm, implemented by the visual odometer machine, requires two main execution threads. The primary thread estimates helicopter position, while the secondary thread estimates helicopter velocity and prepares new potential templates for future tracking. Both threads estimate template range by stereo image processing and transform current template position and image center pixel velocity to the ground coordinate frame using the helicopter attitude which is measured by onboard angular sensors.
The visual odometer machine implements the above two execution threads of the algorithm using four C40 DSP engines. To minimize latency and optimize processor utilization, the machine schedules as many image processing operations as possible as early as possible. The machine allocates two C40s to estimate current template position and image center pixel velocity, another to measure range by stereo vision, and a fourth to integrate the image processing results and compensate for helicopter attitude changes. An external real-time controller, communicating via a bridge module, accesses external data storage for system initialization and logging tasks. Similarly, gyroscope attitude data is acquired through another external bridge interface.
Figure 3-6 shows the processing event scheduling of the vision machine by a data flow and synchronization time-line. The horizontal rectangles represent data transfer and processing events and the vertical lines represent the camera shutter openings. The following sections provide detailed descriptions of the vision machine data flow and synchronization.
3.4.1 Image Acquisition and Preprocessing The visual odometer machine acquires images from two NTSC cameras. The cameras are synchronized with one central sync generator and operate in a special non-interlaced mode which provides the same image field after each camera shutter opening. This mode guarantees a fresh image field at every sixtieth of a second.
The machine’s A/D module samples the camera signals at 6 MHz which provides a maximum of 360 pixels per NTSC video line which is 64 microseconds in duration (60 for the video signal + 4 for the horizontal frequency interval). The central 268 pixels are chosen as the line area of interest using A/Dblanking offsets. Similarly, 236 of the available 260 video field lines are centered and chosen to compose the vertical image area of interest.
The (268x236) field images are preprocessed by the convolution module with an (8x8) Gaussian convolution mask. For the implemented machine, the latency of the digitization and preprocessing operations were close to 4 microseconds and considered negligible.
3.4.2 Image Transfer and Storage Rows 1,3, and 8 of Figure 3-6 show the image data transfer to C40s. They consist of periods of activity during the valid image window and inactivity during blanking intervals where no transfers are performed.
C40 DMA coprocessors transfer and store images in high-speed static memory freeing the main CPU to perform only image processing. As relevant image areas arrive, the DMA coprocessors signal the main CPU to commence processing. Since most transfers are not simple periodic operations, complex instructions using C40 link pointers are loaded during system initialization so that the DMAs can be controlled without main processor intervention. In some cases, the data packets themselves are tagged with DMA instructions for variable length transfers. The DMA channels operate in the higher speed “split mode” which allows direct connection to the normally memory-mapped comm-ports, therefore reducing the memory bus data traffic of the C40s.
64 Chapter 3. A Real-Time and Low Latency Visual Odometer MachineData Flow and Synchronization Since the C40 has a 32 bit data bus structure, the incoming 8 bit image data from the comm-ports can only be stored in four pixel data elements using conventional C40 hardware implementations. The data must be unpacked before image processing can start. Unpacking an entire (268x256) image requires 3 millisecond by the main C40 processor.1 Similarly, outgoing data for display must be packed which also consumes valuable computational power.
3.4.3 Target Template Position Estimation Processing C40-1 tracks the two target templates of the visual odometer. The processing events of C40- 1 are shown by row 2 of Figure 3-6. Processing begins after a predetermined central image region is transferred to local memory by a DMA channel. This region encompasses two (40x40) partially overlapping position templates and a 20 pixel border as shown in Figure 3-7. In case the currently tracked templates leave camera view, image pixels in this region are captured at each cycle for possible initialization of templates in the next cycle. The template pixels are stored along with their borders to provide the extra image area for rotating and scaling templates as the helicopter moves. The pixels are unpacked and prepared by the main processor in parallel while the rest of the image is arriving.
The main processor begins searching for the two target templates upon image transfer completion. The search area for each template encompasses the last matched template surrounded by a 16 pixels wide border. Immediately following the coarse to fine template match process described in Chapter 2, the match locations are transmitted to C40-2 via a different DMA channel, and the processor starts template preparation for the next cycle.
If the templates must be updated, the processing is terminated and previous template locations are simply recorded for integration. More frequently, the templates must be rotated, scaled, and adjusted for image intensity variations for the next cycle. The processor updates the templates while the next image is being transferred to one of its memory banks.
~~~~ I. For efficient image packinghnpacking for the visual odometer machine, the C40s are programmed in assembly language to take advantage of parallel loadstore instructions and all independent global, local, and on-chip data storage to maintain the processing pipeline as full as possible. In addition, costly extra static RAM is incorporated in each C40 module to store packed and unpacked images. Intelligent hardware image packing and unpacking circuits are designed but not implcmentcd.
3.4.4 Stereo Processing C40-2 matches templates for stereo range measurement. Row 6 of Figure 3-6 shows its main processor activity. Image segments from both cameras are transferred to local memory by the DMA coprocessors. The stereo matching locates the (40x40) center template of the front camera image, received from C40-3, in a portion of the right camera image, acquired from the A/D module. To reduce latency, the main processor unpacks the front camera image template while the search area in the rear camera image is being received. The cameras are accurately calibrated to limit the search area to a horizontal rectangle in the rear camera image. To minimize the search area off-line stereo matching is used to align the cameras.
The matching process starts by a slow match to initially locate the template in a (72x 100) search area and continues by fast matches searching for the template displaced by +/- 16 pixels from the last successful match. The range measurement result is combined with the position estimates from C40- 1 and transferred to C40-3 by a DMA channel.
66 Chapter 3. A Real-Time and Low Latency Visual Odometer MachineSummary and Discussion 3.4.5 Pixel Velocity Processing C40-3 matches one template for velocity estimation based on image center pixel optical flow. Its processing events are shown by Row 10 of Figure 3-6. Filtered main camera images are transferred through the broadcast module by a DMA coprocessor. A predetermined central image region encompassing a (40x40) template and a 16 pixel wide search area is used for the search area. Template matching begins immediately after this portion of the image is received by a DMA channel. While the matching process is being performed by the main processor, a DMA channel transfers the center template to C40-2 for stereo matching. The template transfer time is quite short in comparison with the A/D transfers since the C40 comm-ports have significantly higher (16-20 MB/s) transfer rates. Position, range, and velocity template matching results are combined in one packet and sent to C40-4 by a DMA channel.
3.4.6 Attitude Compensation C40-4 compensates for the effects of helicopter attitude variations on image displacement. Row 10 of Figure 3-6 shows C40-4’s processor activity. C40-4 receives sensor data sampled and tagged with each video line video line (-15KHz) by the sensor bridge. The main processor filters the sensor data, with a latency of 100 video lines or 6.4 milliseconds, before compensating for the attitude variations.
To compensate for filter latency, sensor data acquisition begins before each shutter opening to ensure precise attitude measurement at each shutter opening.
3.5 Summary and Discussion
This chapter presented the second contribution of the work presented by this dissertation, a configurable vision machine architecture for real-time and low latency image processing. The architecture’s versatile communications scheme and the modular architecture are instrumental in its configurability to different applications. The architecture can be modified for low latency or high throughput to best meet application requirements at optimal system size and cost. Yet this configurability has some drawbacks. Machine event scheduling and programming can be difficult. The system designer must keep track of each module’s capabilities and limitations to arrange an optimal execution order and
Chapter 3. A Real-Time and Low Latency Visual Odometer MachineSummary and Discussion “just-in-time” arrival of various data at different modules. The designer may have to experiment with many different configurations until the optimal system is realized.
The flexibility and power of the architecture is demonstrated by a visual odometer machine which integrates image acquisition and display, DSP processing elements, central synchronization, image convolution, and external sensing. The visual odometer machine successfully maintained helicopter position and velocity at a 60 Hz update rate with 26 millisecond latency.
The design and implementation of the visual odometer for helicopter positioning were the first steps in building an autonomous vision-guided helicopter. Indoor flight tests using the six degree-of-freedom testbed demonstrated promising results in vision-based helicopter positioning and opened the path to more ambitious outdoor free flight experiments.