«An Autonomous Vision-Guided Helicopter Omead Amidi August 1996 Department of Electrical and Computer Engineering Carnegie Mellon University ...»
To provide this level of sensor integration for the visual odometer, a helicopter system including vision, navigational sensors, and control must be equipped with accurate event synchronization. Most critical is a tagging system to label incoming images with synchronized sensor data, and flexible external interfaces to capture data from different sensing devices.
Chapter 3. A Real-Time and Low Latency Visual Odometer MachineVisual Odometer Architecture Physical compactness: An on-board implementation of the visual odometer must be compact in size and weight, and use power efficiently to be carried on-board a small helicopter. Computational power and data throughput resources must be tailored to the needs of the odometer to optimize the use of available on-board space, payload, and power.
3.2 Visual Odometer Architecture
The lack of commercially available vision systems capable of meeting the requirements presented earlier motivated the development of a new architecture for real-time and low latency vision to implement the visual odometer machine. This section describes this architecture by focusing on its two distinguishing features: a decentralized communication scheme and modular structure.
3.2.1 Decentralized Communication
The architecture uses a network of decentralized, high-speed, asynchronous communication links which serve as the system’s arteries instead of a shared global bus. The same links carry system control packets for initial boot-strapping, monitoring, and diagnostics.
There are a number of advantages to this approach. Communication rates among system modules are consistently predictable since the links are independent and can operate without interruptions.
Furthermore, module additions or deletions do not affect the communication bandwidth of other system modules. In fact, different modules can be tested individually or bypassed in the processing pipeline to pinpoint trouble spots.
The communication scheme also reduces latency by eliminating large synchronous frame stores typically present in vision systems. Images can flow to all processing elements which internally store and process only relevant image segments as early as possible. This feature is critical to reducing processing latency for matching operations. For instance, as images are digitized line by line, a processor need not wait for the arrival of the entire image before locating a template which was previously observed near the top of the image.
52 Chapter 3. A Real-Time and Low Latency Visual Odometer MachineVisual Odometer Architecture Processing incoming images without a frame store requires carefully balancing incoming image traffic with module processing capabilities. Modules must asynchronously keep up with the large volume of data which is continuously sampled from the synchronous camera image signal. The communication scheme of the architecture addresses this issue by employing intelligent communication port (comm-port) interfaces and data broadcasters.
Each module supports a communication interface at each connection site or port. These interfaces support small queues to eliminate the effects of uneven input/output data rates of the asynchronous links which must continuously accept synchronous image data. The queues provide temporary storage to even out data transfer surges, thus allowing modules to receive data at constant predictable rates. The size of these port queues depends on the data transfer variation and must be carefully selected by the system designer.
The communication scheme also supports a data broadcasting capability to cope with applications demanding higher communication bandwidth and/or computational power than any single module can provide. Data broadcasters transfer multiple copies of the same data from one comm-port in parallel to multiple processors to minimize processing latency. The broadcasters can support their own port interfaces with data queues to even out transfers to each receiving module. The prototype vision machine shown in Figure 3-1 employs one broadcaster to divide the velocity and position estimation tasks between two DSP modules.
3.2.2 Modular Architecture
The decentralized communication scheme works hand-in-hand with a decentralized modular processing architecture. Interconnected via high-speed links, system modules incorporate local intelligence to perform complex tasks orchestrated by one external real-time controller. Each module is treated as a raw source of data or computation with timing, data flow, and synchronization predetermined before machine operation begins. System supervision by one central controller reduces the complexity of individual modules and allows compact and low cost implementations of most system modules. The
controller captures complicated non-vision tasks such as external communication and user interfaces which are typical sources of processing uncertainty.
The system architecture relies on predictable vision processing latency and timing of each module. Each module is rated for its computational power and bandwidth to perform a specific vision task. Following this rating system, existing modules of varying throughput and computational power can be employed in the system, or new modules can be developed to optimize systems for different applications.
Using all available modules as their tool-box, system designers can build systems with varying throughput and latency by expanding the processing flow vertically or horizontally as shown in Figure 3-2. If latency is not important, high throughput can be achieved by a long horizontal chain of modules connected as a pipeline, with each stage performing an image processing step. On the other hand, if latency is critical to the application, modules can be arranged vertically to operate in parallel.
The vision machine supports four types of modules which include: processing modules, interface or bridge modules, synchronization or timing modules, and broadcast modules. Processor modules provide raw computation for image processing. Interface or bridge modules connect the machine to external sources such as cameras or sensors. In addition, bridge modules allow communication with global busses or networks for standard communication with commercially available systems. Synchronization or timing modules generate timing signals for machine event scheduling. Finally, using the decentralized communication scheme, the broadcast modules carry out the data communication fan-out described earlier.
3.3 Components of the Visual Odometer Machine
The visual odometer machine is composed of a number of modules including: image A/D and D/A converters, image convolvers, powerful digital signal processing (DSP) elements, an image tagging and synchronization module, and external communication bridge modules. Figure 3-2 shows how these modules are interconnected to realize a prototype visual odometer machine. This section presents the underlying structure and the implementation details of each these module.
3.3.1 Image Acquisition
Image acquisition is fundamental to the operation of vision machines. The visual odometer machine acquires images from two cameras through two independent A/D converter modules. The modules sample the analog camera signals and output images digitally through their output comm-ports. The structure of the A/D module is shown in Figure 3-3.
The A/Dmodule provides a generic image digitization facility with a few non-standard features:
programmable image sampling and synchronization, real-time configurable image blanking, and high-speed communication ports. These features were found to be extremely useful for high-speed image processing.
The module supports custom-designed circuity to generate sampling clocks and control image blanking in addition to an A/D converter and comm-port interfaces.The clock generation circuitry provides programmable image sampling and synchronization frequencies. Programmable image sampling can dramatically reduce image data traffic by proper frequency matching to camera CCD array resolution. This can provide virtually the same image content captured in significantly smaller images. On the other hand, all available pixels can be used when digitizing video signals with longer rows as with high resolution line cameras as well as image capture from non-standard video sources such as variable frequency cameras. These are important considerations, as image capturing synchronized with rotor blade revolutions is a potential outdoor requirement of the system. The A/Dmodule also supports a configurable image blanking controller circuit that allows the processing elements to select regions of interest in the image in real time to further help reduce image data traffic.
The output comm-port interface incorporates a small storage queue to even out data output traffic. The status of this queue is used as a means of image synchronization. A full queue indicates that the receiving module is not capturing images and data is simply thrown away. If the receiving module commences reading data, the A/D module blocks transfers until the start of the next valid image field to properly synchronize image transfers without explicit hardware connections.
Chapter 3. A Real-Time and Low Latency Visual Odometer MachineComponents of the Visual Odometer Machine It is assumed that the receiving module can keep up with the image data rate and the queues will never overflow during machine operation. The size of the output queue is chosen carefully to equalize the variable image traffic and processor input data rates during valid pixel and blanking intervals.
The implemented A/Dmodule design incorporated an %bit BrookTree A/Dconverter, supporting a built-in image look-up table (LUT), clock generator chips, and custom designed state machines implementing the comm-port interface.
3.3.2 Image Convolution Fast convolution is essential for image processing. In addition to edge detection and smoothing, matching and feature extraction can be performed using special convolution masks. As previously presented, the visual odometer relies on fast image smoothing to subsample images for efficient template matching. In addition, image smoothing by convolution reduces the significant noise from the helicopter power plant and electronics which corrupts the camera signals.
The visual odometer machine filters images from the A D modules using a real-time image convolver module. An application specific integrated circuit or ASIC is employed for low latency image convolution. The convolver ASIC, GEC Plessey 16488, can perform (8x8) convolutions at 10 MHz per second input image data traffic which, in effect, delivers 640 MOPS. The convolution ASIC internally stores the (8x8) convolution mask and provides dedicated external expansion signals to increase mask size.
For compact implementation, the visual odometer machine simply includes the convolution ASIC within the A D module. To provide valid data near image borders, raw digitized images are transmitted to the convolver before image window blanking. The image convolution latency is 22 pixel clocks as the convolver chip operation is internally pipelined without waiting for the first 8 lines of the image to fill the pipe stages which are filled up by image lines above the region of interest.
3.3.3 DSP Processing Module High speed processing of images with low latency requires fast computing capable of acquiring and processing images at high frequencies. There are a number of compact CPU platforms with such capabilities including: SGS-Thomson Inmos T9000 Transputer , Intel i860 , and Texas Instruments TMS320C40 Digital Signal Processor (C40).
The C40 is an ideal platform for image processing and is extensively used to implement the visual odometer machine. It is a powerful image processor for several reasons. The most significant is its high communication bandwidth though versatile communication-ports (comm-ports) well-suited for high-speed image transfers. The C40 supports six asynchronous comm-ports, each rated at 20 MBytes per second (MB/s) transfer rate; 14-16 MB/s rates have been observed to be more typical.
(See  for a detailed analysis of the C40 comm-ports.) The C40 supports six DMA channels for high-speed data transfer. Each DMA channel has its own dedicated comm-port connection for high speed external data transfer. These dedicated connections help reduce the data traffic of the two main lOOMb/s external 32 bit memory interfaces of the C40. The DMA channels can perform non-stop complex data transfers using their own set of programmable instructions stored as “link pointers.” Images can be split in pieces and transferred to other C40s without any CPU intervention.
The processor also supports on-board high-speed memory for instruction cache and critical data storage. With careful resource management, incoming images from comm-ports can be stored in independent SRAM banks allowing the processor uninterrupted access to the image during image processing operations. For fast processing, the DMA channels can store image portions of interest in zero-wait-state SRAMS (20 ns access time) instead of the more traditional slow VRAM-based frame buffers.
Most CPU floating point operations such as 32-bit multiplication are single cycle instructions (two clock cycles) and the current C40s are clocked at 6OMHz with plans for 80 MHz versions by the end of 1996. The C40 is rated at 275 MOPS and 320 MB/s data throughput.
3.3.4 Module Synchronization Real-time image processing requires accurate event synchronization. The visual odometer machine relies on accurate synchronization to schedule image processing operations and to coordinate image acquisition with helicopter attitude measurement. The machine supports a central synchronization generator (sync generator) module to govern the processing and data acquisition operations with each camera image capture.