Shape Based Image Retrieval Using Fused Features

For content-based image retrieval, the shape is one of the most important discriminatory elements. The form captures most of the perceptual information of the observed objects on images in many applications, while colour and texture can often be omitted without affecting the performance of the retrieval. Unfortunately, there may be significant changes in shape, such as deformation, scaling, changes in orientation noise, and partial concealment. Accurate shape description remains, therefore, a challenging technical issue. The study performs experimental analysis to identify the problem. The adoption of the MPEG-7 and KIMIA-99 standard has significant importance to simplify the image retrieval process. The Fourier Descriptors, Moment-Based Features, Hierarchical Centroids and Histogram of Oriented Gradients have been applied for extraction of images from datasets. The fusion of features has been done by Discriminant Correlation Analysis and Direct Concatenation of features it has been evident that by fusion of features we obtained approximately 90% accurate and better results.


Introduction
The shape is one of the essential elements of an image and plays a very important role in content-based search methods.Compared to texture or colour, the entire object can be represented by a shape, but at the same point, it requires a number of parameters to be represented explicitly.A better representation of the shape should be compact and retain the most important features of an object and the main requirements are compact features, improved accuracy of recovery, general application, low computational time and robustness of noise for good representation.By comparing and storing the understandable representation that major transition information can be achieved by representing the smart shape descriptor.Consequently, finding efficient and meaningful descriptors of the shape is a challenge in the recognition and retrieval of shape.To address these issues, we propose image retrieval based on shape using more than one method of extraction of features to improve accuracy.Without significantly increasing the cost of computing.
The study shows that by fusing several descriptors together the performance increases, by fusing two strategies, direction concatenation of features and discriminant correlation analysis.For fusion the applied techniques are, Fourier descriptors: Fourier descriptors are a one the unique way of representing the boundary of a shape [33], Hierarchal centroids: Finding the centre of the mass divides the original image into sub-images.The first step in shape-based image retrieval is to find the centre of the mass of the object in question [30], Histogram of Oriented Gradients: The gradients of an image are useful because the magnitude of the gradients is always large around the corners and edges, and the edges and corners are very important for any shape of an object [32], Moment based feature: The basic moments can be the simple weighted average of the image intensities, standard deviation or the variance [35].
Maria Abro et al.

Related work
In the area of shape representation, there are two major categories one is contour based approaches and the other is region-based approaches.Where the region approaches [1] have their region based strategies and methods simply depends on the Global features obtained from the shape and, every pixel inside a shape is utilized to get the shape representation for example, moment invariants [2] [3] and geometric invariants, multi-scale Fourier description [4], Legendre moments [5] and Zernike moments [6].Since region descriptors cannot catch many critical shape points of interest which are the significant factor for recognizing comparative objects and have minor deformation or occlusion and they are robust to noise.Contour-based strategies and methods mark out an object's shape by its boundary information.Among them, we have found boundary shape methods including Splines [7] [8], chain code [9], polygons approximation [10] and curvature scale space [11] used in object representation.We will survey the latest approaches like multiscale technique [12] [13] shape context [14] [15], Fourier descriptor [16] [17] [18], contour flexibility [19], and so forth [20] [21].We are currently focusing on contour descriptors which are not the same as descriptors inside of the shape [22].We would combine the shape descriptors to make it more effective, as we have spectral [23] [24] and moment based [25].To make shapes gradually smooth we would use Mokhtarian and Mackworth [26] method and contour based multi scale description techniques like CSS (curvature scale space) [27] and MCC (multi scale convexity concavity) [22].The performance is depending on the quality of the shape decomposition, so we would use convex shape decomposition [28] [29].

Proposed Methodology
In this work, we evaluate the performance of several features i.e.Fourier descriptors [16], Hierarchical Centroids [30], Moment-based descriptor [31] and Shape Context Descriptors [32].Besides individual evaluations, we show that by fusing several descriptors together the performance increases, for this we tested two fusion strategies: direction concatenation of features and discriminant correlation analysis.In this section, we briefly explain the features and fusion strategies.

Fourier Descriptors
Fourier descriptors are a one the unique way of representing the boundary of a shape.For every boundary point/pixel , Fourier transform is applied, and it is converted to a complex number .Applying the inverse Fourier transform on these complex numbers, original shape of object can be recovered.The computation of Fourier descriptors can be broken down into following steps: I.
Compute the edges of shape and find the edge coordinates.

II.
Iterate over each edge coordinates clockwise and convert each coordinate to complex valued vector .For example: .III.
Compute the discrete Fourier transform of the complex-valued vector using the equation below: (1) Where complex valued vector of each pixel, and N is the total number of edge pixels.The resultant Fourier descriptor inherits the several properties of Fourier transformation i.e.Translation, Scaling, and Rotation invariances.The translation invariance is that no matter where the edge coordinate exists in an image, the descriptor value will remain same.The scaling invariance is no matter how large or small is the object, the descriptor should remain the same, and for example an apple shape can be large or small, depending upon the camera location from where the image is captured.Similarly, the rotation is no matter how the object is oriented in an image, for example an image is captured of a chair in standing position and another image is captured where the chair is fallen on the ground, the Shape Based Image Retrieval Using Fused Features 3 Fourier descriptor remains the same because the shape of the chair does not change after being fallen at 90⁰.The more details of Fourier Descriptors can be found in [33] [16].

Hierarchical Centroids
The approach of hierarchical centroids is dividing the original image into sub-images by finding the centre of the mass.In case of shape-based image retrieval, the first step is to find the centre of the mass of the given object.The hierarchical centroids algorithm [30] takes input an image and computes the x-coordinate of the centre of the object.Based on the x-coordinate of the centre, it further divides the image into sub-images and this process continues recursively until the desired level of centroids are found.The resultant centroids of the image are stored together to form a descriptor.These features play a vital role in identifying the shape of any object, because each object may have unique shape.To further bring the robustness into the descriptor, the original image is rotated at , and and descriptor each transformed image is computed separately.

Moment based Features
The use of image moments has been extensively studied in the past [34] [31] [35] [36] to extract the useful of image properties from the image intensities.The basic moments can be the simple weighted average of the image intensities, standard deviation or the variance etc., and often referred as raw moments.In this research, we employ the seven invariant moments that are scale, rotation, and translation invariants.

Shape Context based Features
The basic shape features can be the edges and contours of the shape of the object in an image, but in this work we exploit the more powerful descriptor that has been popular descriptor since last decade, known as Histogram of Oriented Gradients (HoG) [32].The image gradients (first order derivatives in x-y directions) of any image useful because the gradients magnitude is always large around the corners and edges, and for any shape of any objects the edges and corners are very important.HoG computes the histogram of these oriented gradients from small windows like and accumulating these together forms the descriptor.

Fusion Strategies
In this work, we compare two fusion strategies and show the importance of fusing different complementary features together by achieving state-of-the-art accuracy on the two standard datasets MPEG-7 [37] Kimia-99 [38].The first fusion strategy is direct concatenation of features, in which all four types of descriptors discussed earlier are directly concatenated and Euclidean distance is used to evaluate the performance.In second strategy, the Discriminant Correlation Analysis (DCA) is employed [39] [40].

a. Direct Concatenation of Features
In fusion, the one of the basic strategies is to combine all the features together without and processing, in many cases this kind of fusion works better than individual, and our results also show that combining features together gives better performance as compared to individual features.

b. Discriminant Correlation Analysis
DCA is an effective feature fusion strategy the exploits from the correlation of features of inter and intra class.The method tries to maximize the inter class correlation and at the same time also minimizes the intra class correlation, so that the features of each class lies within the class.The benefit of this method is twofold: it considers the class structure and secondly, its computational complexity is very low, which is required in practical applications like shape-based image retrieval.

Experiments and Results
In this section, we explain all the experiments followed results and findings.We assess the performance of individual features and also fusion of all the possible combinations using two fusion strategies explained in Section 3.5.To start with, the dataset used in these experiments are discussed below:

Datasets
In order to analyse and highlight the importance of fusion of features, two different datasets are used: MPEG-7 [37] and Kiami-99 [38]

Results and Discussion
In the proposed technique we present the results of four different features i.e.Fourier descriptors (FD), Hierarchical Centroids (HC), Moment based features (MF), Shape Context Features (SCF) and also evaluate two features fusion strategies: Direct Concatenation (DC) of features, and Discriminant Correlation Analysis (DCA).

a. Dataset MPEG-7
In this step the image1, image2 image3 and image5 are representing the (initial) query image of a 'Dog' that will be taken to retrieve the most similar images from the dataset MPEG-7 with the help of FD, HC, MF and SCF.Other five images are the retrieved images after extraction.After obtaining the results in the previous step as showns in the images we will fused the features of all the techniqes to aquire the better accurary.Similarly as mentioned above query image will be taken to retrieve the most similar images from the dataset MPEG-7 with fused techniques.b.Dataset KIMIA-99 In the same way as above: image1, image2 image3 and image5 are representing the (initial) query image of a 'Cat' will be taken to retrieve the most similar images from the dataset KIMIA-99 with the help of FD, HC, MF and SCF.Other five images are the retrieved images.And again after getting the results in the previous step as showns in the images we will fused the features of all the techniqes to aquire the better accurary.Similarly as mentioned above query image will be taken to retrieve the most similar images from the dataset KMIA-99 with fused techniques.

Evaluation Metrics
For the problem of shape-based image retrieval, Bull-Eye Score is the standard evaluation metrics.To compute the bull eye score, every shape in the dataset is compared with all the other shapes, and the number of shapes from the same class among the top 40 retrieval is counted.The final bull eye retrieval rate is computed using the equation below: ( Where correctly retrieved is the total number of shapes from the same class and denominator represents the highest possible number.Alongside the bull eye retrieval score it also compute and report the Nearest Neighbour (NN) accuracy.Specifically, we report five matching retrieval accuracies.
In Table 1, the bull eye score of individual features is reported.The Shape Context features clearly outperforms all others, and the performance of moment-based features is very low.This behaviour of results was already expected, because the moment-based features are not very good in capturing the shape of the object.The results of fusion of different combination of features is provided in Table 2.The results of DC fusion strategy directly correlate with the results of individual features, as can be observed from the fusion of FD and HC, that bull eye score of DC approach is slightly higher than their individual performance, and similar trend can be observed for all the other combination of features.But this trend is violated for DCA strategy.When two different features that captures two different properties of the images, the bull eye score is very low.For example, the FD captures the boundary of the shape, and HC focuses on the centers of the mass of the object, both computes two different properties of any shape, due to their complementary nature, the correlation between both is very low, and hence the DCA performance is also low.Whereas, the performance of the DCA strategy is higher, where features are somehow correlated with each other.As can be observed from the results, when SCF is combined with itself and DCA is applied the bull eye score jumps to 80.55% from 65.62%.Besides bull eye score, the top 1 and top 5 nearest neighbour based accuracy is also reported in Table 3 and Table 4 respectively.The top 1 retrieval rate for all the features is 100%, this shows that the features we evaluated in this study are well suited for the problem of shape-based image retrieval.But the top 5 retrieval rate suggests that these features alone are not sufficient for a robust image retrieval system.The top 5 nearest neighbour accuracy on MPEG-7 dataset for both fusion strategies is tabulated in Table 5.All the possible combination of features against both fusion strategies are evaluated.Unlike the bull eye score, the top 5 retrieval accuracies are higher.The best results that are achieved is 91.96% for the fusion of SCF with itself, and all the accuracies are higher than the results obtained by individual features, that highlights the importance and need of fusion of different features.As it is stated earlier, that different features capture the different properties of the shapes of an object in an image, so fusing these different properties together always increase the overall performance of the system.In Figure 3, the qualitative results on few images of MPEG-7 dataset is illustrated.The top 5 retrievals of DCA fusion strategy of SCF features with itself is presented.The results clearly suggest the fusion of Shape Context features with itself is a good choice for the problem of shape-based image retrieval.The DCA, finds the correlation between the features and projects the features to a new space where these are more separable.

Conclusion
In this study, we evaluate the performance of four different descriptors: Fourier, Hierarchical Centroids, Moment-based, Shape Context features for the task of shape-based image retrieval.Moreover, two different fusion strategies: Direct Concatenation of features and Discriminant Correlation Analysis, for different combinations of features are evaluated and assessed.The results clearly show that performance of separate features is lower than the fusion of different features and separate features are not enough for building a robust and accurate image retrieval system.Because each feature captures a different property of the objects in the images, combining these different complementary features together increases the overall performance of the system.

Figure 1
Figure 1 . MPEG-7 dataset consists of 70 classes with 20 images for each class.Kimia99 dataset only contains 9 classes with 11 images in each class.The both datasets have been widely used to measure the performance of shape-based image retrieval.The examples shapes of both MPEG-7 and Kimia99 dataset are shown in Figure 1 and Figure 2 respectively.

Figure 14 .
Figure 14.Qualitative results of DCA fusion of SCF with itself.

Table 1 .
Bull Eye Score on MPEG-7 Dataset for each descriptor separately, where FD, HC, MF and SCF represents Fourier Descriptors, Hierarchical Centroids, Moment Features, and Shape Context Features respectively.
Shape Based Image Retrieval Using Fused Features EAI Endorsed Transactions on Internet of Things 10 2018 -01 2019 | Volume 5 | Issue 17 | e1

Table 2 .
Bull Eye Score on MPEG-7 dataset for both fusion Strategies: DC (Direct Concatenation) and DCA (Discriminant Correlation Analysis) is presented for different combination of features.

Table 3 .
Top 1 Nearest Neighbor (NN) accuracy is reported for all the features on MPEG Dataset.

Table 4 .
Top 5Nearest Neighbor (NN) accuracy is reported on MPEG dataset for all the features.

Table 5 .
Top 5 Nearest Neighbor (NN) Accuracy on MPEG-7 dataset for both fusion Strategies: DC (Direct Concatenation) and DCA (Discriminant Correlation Analysis) is presented for different combination of features.Maria Abro et al.