«EXTRACTION OF CONTEXTUAL KNOWLEDGE AND AMBIGUITY HANDLING FOR ONTOLOGY IN VIRTUAL ENVIRONMENT A Dissertation by HYUN SOO LEE Submitted to the Office ...»
As there is no context, each vertex is linked to a core – “Unknown”. Figure 34 shows the initial Metaearth architecture. The following sections describe how the generated Metaearth architecture is intensified through the object merging process and the semantic merging process.
Using fuzzy color based over-segmentation, a 2D input image is segmented into over-segmented regions and the initial Metaearth architecture is constructed (Figure 34).
The suggested process described in Section 4.2 is considered as the pre-processing stages for scene understanding. This section’s suggested method can be considered as a main stage in scene understanding.
As discussed in section 2.2, general scene understanding techniques consist of two parts: over-segmentation and segment merging. For example, Gould et al.  use a scan line algorithm. After generating over-segmented regions, each region is compared with regions in the same horizontal line. In this comparison, pre-trained detector is used.
The main limitation of their approach and similar methods is in the usage of pre-trained learning mechanisms and the output of over-segmented region. Since the output of the over-segmentation process is still segmented regions, the comparison and merging among these segments need some preprocessing such as scan line algorithm and require much time.
Since Metaearth architecture is a type of graph, the merging process is executed using graph structure in short time. Our objective is to merge similar types of regions into one in case they are connected. For example, we segment and merge three regions (blue-filled, yellow-filled and white-filled) into three; note the steps needed in the existing and suggested methods as shown in Figure 35.
Figure 35. Comparison of existing merging techniques and the suggested method.
In the suggested algorithm, this process is executed in a much shorter time. The reason is that the Meatearth architecture is a type of graph structure with adjacency matrix where each elements is BI as described in definition 10. In this manner, Metaearth architecture is considered a good data structure for performing object merging.
The remaining thing is to determine the similar type of regions. This determination is decided using the information presented in vertices and edges shown in figure 32. As other scene understanding techniques do, the suggested approach compares the color, edge and shapes of objects. These information are already measured and contained in the generated initial Metaearth architecture. In this process, a region has three types of
Our remaining task is determining the regions of similar type using the information about vertices and edges as shown in Figure 32. We can then compare the color, edges and shapes of the objects.
From this data, the merging conditions are generated as:
In equation (1), the difference between i th region’s R( k 1 ), G( k 2 ), B( k 3 ) fuzzy color and j th region’s R,G,B fuzzy color intensity is compared. If this value is less than a threshold value, it meets the color-based merging condition. Similarly, edgerelated condition and shape-related condition are checked using Metaearth architecture.
The latter conditions, shape related condition is checked using width ratio (equation (3)), height ratio (equation (4)) and co-linearity condition (equation (5)). Figure 36 shows the height ratio test and co-linearity condition.
Figure 36. Height ratio test and co-linearity condition for object merging.
The width, height and angle are calculated from a region’s principal component analysis (PCA) -based axis. Finally, these equations are unified in order to determine the merging. Equation (6) shows the parameters of a unified merging equation.
However, the shape-related merging condition does not work well in a general image. Figure 37 shows a counter-example in which two regions with quite different shapes make it difficult to apply shape-based merging conditions.
In the particular example, the threshold values are used as 1,k ;k 1, 2,3 20 and 2 100. Figure 38 shows one merging scenario using equation (6).
Finally, 78 regions are merged into 34 regions. Figure 39 shows that the vertices without any edge and the absorbed regions have been merged. The surviving regions are considered objects in Metaearth architecture. Since we have desired to decrease the vertices of the virtual space layer and to simplify it, we call the result the “intensification process of virtual space layer”.
In summary, the “intensification process of virtual space layer” shows that Metaearth architecture has been refined and prepared for the semantic-merging process.
Unlike existing scene understanding techniques, the suggested method uses another merging process called the “semantic merging process”. This process is related to the generation of the ontology and mapping layers in Metaearth architecture, i.e. it represents the key process for generating virtual ontology. We execute it using the graph structure of Metaearth architecture.
Figure 40 shows how the semantic merging process handles non-adjacent regions. In Metaearth architecture, regions with similar fuzzy color that are not adjacent are extracted and the shape-merging conditions are checked using equations (3), (4) and (5). If these conditions are met, the regions are concluded as having the same context.
Figure 41 shows an example of semantic merging. Two parts representing sky are not adjacent. After detecting the same fuzzy color and checking the merging condition, two regions are combined into one ontology core.
Figure 42 shows the finalized Metaearth architecture generated from the original 2D image. As a result of semantic merging, 18 ontology cores are extracted. The extracted core coincides exactly with the 18 contexts of the original image.
Figure 42. Metaearth architecture after semantic merging.
The next task is to map these contexts to the extracted ontology cores. The following section discusses the complexity analysis, the performance of the suggested approach and a case study.
The suggested approach is to extracting virtual ontology from a 2D scene. As the data structure, Metaearth architecture is applied. Figure 43 illustrates the overall procedure of the suggested method we now describe.
The suggested algorithm is executed in polynomial time O( M 2 ) where M is the number of vertices in the virtual space layer. The algorithm’s complexity is summarized in Table 7.
The fuzzy segmentation process is the most time-consuming of the subprocesses. While it is compared to the algorithm complexity ( ( M 2 ) ) of Gould et al.
 ’s method (one of the better scene understanding algorithms), our suggested algorithm also provides good performance.
The quality of the generated Metaearth architecture depends on parameters such as the size of noise regions and several parameters of merging conditions (see Sections
4.3 and 4.4). To check the influence of each parameter, we conduct a sensitivity analysis using different combinations. We use Figure 16 (a)’s image as the input image; Table 8 compares the results with the ground truth.
Table 8. Ground truth and average color information of Figure 16(a).
Table 9 shows the degree of coincidences between the ground truth and the suggested approach using different combinations of parameters. The combinations of threshold values (Figure 25) for determining noise regions and for color-based objectmerging conditions (equation (1)) are used, followed by an analysis of variance (ANOVA). Figure 44 (a) show the result of two-way ANOVA test. In figure 44 (a), the column means the size-related threshold value and the row represents R/G/B color based threshold values. The result shows that two hypothesis (“color-related threshold values do not affect the performance of the suggested algorithm” and “size-related threshold values do not affect the performance of the suggested algorithm”) are rejected. It means that the combination of threshold values is very crucial for obtaining well-structured virtual ontology. In order to the significance of two types of threshold values (colorrelated and size-related threshold values), the response surface method is applied as shown in figure 44 (b). After the test of response surface method, it shows that colorrelated threshold values are more significant than size-related threshold value.
Table 9. Coincidence degree(%) between ground truth and the suggested approach using different parameters.
300 75.3 79.8 81.0 81.1 77.3 68.5 93.2 86.2 81.2 77.9 76.5 73.2 500 67.3 77.8 77.9 78.1 71.0 66.8 82.3 80.1 80.3 78.2 80.7 81.5
1200 47.3 50.0 55.6 55.7 44.4 42.1 61.3 60.8 58.4 55.6 56.3 55.6 1500 45.0 50.3 50.0 48.3 46.0 46.1 52.3 52.1 51.3 50.7 51.9 51.2 2000 22.2 38.9 38.9 38.9 34.0 27.7 42.2 40.3 39.6 38.9 39.5 39.5 5000 11.5 22.2 22.2 22.2 22.2 11.1 22.2 22.2 22.2 22.2 22.2 22.2
With this analysis, optimal combinations can be acquired which can then be used to generate other Metaearth architecture with different input images. Figure 45 shows a Metaearth generation using a Tsukuba image. A Tsukuba image is commonly used for generating 3D disparity maps with left and right image pairs. Below, we note that the left image of the Tsukuba image shows the effectiveness of the suggested approach.
(c) Determination of K (K=16) (d) Initial fuzzy color-based over-segmentation (e) Initial Metaearth architecture (f) Image after object merging Figure 45. Metaearth architecture from Tsukuba image.
(g) Finalized Metaearth architecture after semantic merging
A limitation of the suggested method is that the generated architecture does not have Z depth because only one 2D image is used as an input. When stereo or multiview images are used, the generated architecture will have Z depth and can be used in virtual interaction analysis. The following section describe the generation of virtual ontology with Z depth.
5. CONSTRUCTION OF VIRTUAL ONTOLOGY USING MULTIVIEW SCENESThis section discusses the generation of a model with virtual ontology. Existing 3D reconstruction techniques and their limitations are reviewed and then a more effective method is suggested.
In general, Z-depth information is obtained using computer vision techniques such as stereovision/multi-view approaches. The stereovision method generates a virtual model from two stereo images. Corresponding points are selected manually or automatically. Typically, the images must be adjusted to eliminate lens distortion and other forms of noise. The rectified images are then used to calculate the epipolar constraints and fundamental matrix (in cases when the camera matrix is unknown) .
From the fundamental matrix, two camera matrixes with intrinsic parameters and extrinsic parameters are obtained and 3D depths are generated from triangulation.
Figure 46 shows the procedure. As an example, we use the general 3D computer vision technique to reconstruct the image as shown in Figure 47.
Figure 46. Stereo vision process.
(a) Left and Right image pair Figure 47. Reconstruction process.
(b) Calculation of fundamental matrix and two epipolar constraints
As shown in figure 47 (c), the regions are reconstructed using RANSAC  image rectification and scaling up to affine transformation. However, the reconstructed regions are not satisfied due to the inaccurate reconstruction and distortions. In general, the traditional computer vision-based 3D reconstruction approaches have several problems: Table 10 summarizes the issues.
Due to these issues, ambiguities are encountered in the 3D reconstruction process. There are several ambiguities which general stereovision / multiview vision
Ambiguities in distorted image : lens distortions and various image
Figure 48 gives examples of the ambiguities in 3D reconstruction. Figure 48 (a) and (b) show an example of “ambiguity in distorted image”. Due to the camera lens’s radial and axial distortion, image pairs are distorted and two epipolar constraints with poor quality are generated. In Figure 48 (c) and (d), the brightness of the images differ on the left and right. If the reconstruction algorithm is based on color-based correspondence matching, it may fail to find accurate corresponding points.
(a) Image #1 taken from active vision (b) Image #2 taken from active vision
Figure 48. Stereo images with occluded/non-occluded region.
Occlusion is another problem. Occluded regions can decrease the performance of correspondence matching methods such that the reconstructed regions cannot be identified as an original shape. Even though there are many occlusion-handling algorithms such as dynamic programming , existing approaches do not work efficiently due to many ambiguities. The right-hand side of Figure 48 (e) clearly shows that the vehicles are occluded.
Given these ambiguities, traditional computer vision techniques have one major disadvantage – the generation of an atomic model. Due to the absence of virtual ontology, the generated virtual model can be used only for visualization.