An Intelligent Multi-resolution and Co-occuring local pattern generator for Image Retrieval

Content-based image retrieval (CBIR) is a methodology used to search indistinguishable images across any vast repository. Texture, Color and Shape are among the most prominent features of any CBIR system. Two texture descriptors namely Gray level Co-occurence matrix (GLCM) and Discrete wavelet transform (DWT) have been utilized here for the formation of a hybrid texture descriptor, denoted as (Co-DGLCM). To enhance the retrieval accuracy of the proposed system, a framework of an Extreme learning machine (ELM) with Relevance feedback (RF) has also been used. This technique provides simultaneously spatial relationship and information related to frequency in co-occuring local patterns of an image. Two benchmark texture databases namely Brodatz and MIT-Vistex have been tested and results are obtained in terms of accuracy, total average recall and total average precision which is 96.35% and 97.34% respectively on the two databases.


Introduction
This paper highlights an efficient hybrid texture descriptor namely combined DWT-GLCM (Co-DGLCM) which is an attractive combination of Gray level co-occurence matrix (GLCM) and Discrete wavelet transform (DWT). To improve the classification accuracy of the proposed system, an Extreme learning machine (ELM) has been used. Finally, based on the user's feedback, the concept of Relevance feedback (RF) has also been used to refine the inputs of an ELM working as a classifier.

Motivation
The evolution of the cyberspace technology and various digital repository techniques led to a wide enlargement in commercial and research applications in the field of digital images. Additionally, due to the enormous increase in * Corresponding author. Email:shikpank@yahoo.com multimedia content such as text, images, audio, video, etc., the storage requirement is a great issue of concern. Therefore, many multimedia compression techniques like JPEG, MP3, H.263, TIFF, etc., have been developed to solve this issue. Diverse versions of the compressed encoded streams, consisting of H.264 and H.265 [1] can be used for Bandwidth conservation. Therefore, as the multimedia content is increasing, there is a cumulative aggravation for techniques related to data storage and retrieval. To remove the bottleneck related to image retrieval, there is a need for an accurate and fast recovery method to precisely retrieve the data from these vast multimedia storehouses. The content of these databases can be images or text [2].
Based on the database content, the recovery of images can be categorized into two main types: (i) Content-based (ii) Text based. Recovery of images based upon text is the conventional method of retrieving images as it suffers from many pitfalls and the prominent among them are; it is a clumsy process to retrieve the images, though images have to be elucidated with textual descriptors and to describe multimedia content with text, it becomes too difficult. For example, complex images with perplex appearance becomes strenuous to be described with textual information. It also suffers from vast spelling errors based on homonyms etc. Therefore, an image recovery system denoted as Contentbased image recovery or retrieval (CBIR) based on perceptible basic attributes of a given image is key to this problem.
This work is concentrated on specifically texture feature extraction techniques and the formation of an intelligent textual hybrid system. Classification of images is also one of the most peculiar tasks and many types of classifiers can be deployed for this purpose. Here, the classifier based on neural network has been implemented, whose results are incomparable to other customary classifiers like support vector machine (SVM), Bayesian classifier, etc. Finally, to update the training images of ELM classifier, the technique of Relevance feedback has also been implemented. The diverse amount of distance metrics can also be employed to find the resemblance between an input image and total repository images.

Main Contributions
An effective technique which integrates Gray level cooccurence matrix (GLCM) and Discrete wavelet transform (DWT) and designated as combined DWT-GLCM (Co-DGLCM) has been implemented in this paper to extract texture features. Finally, to make the proposed system intelligent and accurate, ELM classifier has been used. Lastly, if the obtained results are less than the desired threshold then, the technique of Relevance Feedback has been applied which refines the input of the classifier based on the user's feedback and optimized results are obtained. The proposed technique has been analysed on standard benchmark texture datasets namely Brodatz and MIT Vistex and overall precision, recall and accuracy have been calculated. The framework for the presented work is described in the following pattern: Introduction based on the requirement for an optimum storage and retrieval technique has been discussed. Also, the motivation behind the concerned topic is given in section 1 and typical related work is dispensed in Section 2. Feature extraction techniques are described in Section 3. In Section 4 architectural framework with distance metric for similarity calculation are given. Experimental analysis and their outcomes are presented in Section 5. Lastly, Conclusion and future trends are described in Section 6.

Related work
Images are considered as an integral part of multimedia. Since many decades, recovery of images happens to be a very potent experimentation area. Texture describes the discernable patterns in an image and their spatial positioning. Texture can be extracted from the whole image or from its specific part. Anu Bala et al. proposed a system based on Local Texton XOR patterns (LTxXORP). Here, texton patterns are used for shape analysis. Finally, XOR operation is performed on different pixels consisting of a middle pixel and its adjacent neighbors [3]. LTxXORP is utilized here for shape analysis and moreover this technique does not include any intelligence based machine learning model for reducing the semantic gap.
Correlation between intensity values of a center pixel and a specific pixel is given by Prithaj et al. [4] using Local neighborhood Intensity Patterns (LNIP). For the purpose of comparison, the used technique is compared to distinct techniques like Local Binary Pattern (LBP) [5][6], Local derivative pattern (LDP) including Local ternary pattern (LTP). Also, the texture images have been tested and analysed by using Local tri-directional weber patterns (LTriWPs) [7]. LBP produces very long histograms of images and slows down the process of recognition. The outcomes of LTP are variant with gray scale transformation of intensity values. Therefore, second order statistical texture extraction technique, GLCM is utilized in the proposed work.
Gabor Wavelets have also been used for the extraction of features related to a facial dataset. Later, Fisher's discriminant based indexing is being utilized to reduce the dimensions of the resultant feature vectors [8]. Ekta Walia et al. [9] described the results of some local texture extraction methodologies on Log Gabor filter response. Log Gabor filter is stated to be better in performance as compared to Gabor filter. Gabor wavelet has large dimension of feature vectors, therefore another wavelet domain technique, DWT is utilized here, as it has less memory requirement and small feature vector size.
Wei Zhang et al. [10] proposed a technique for texture representation denoted by Normalized difference vector (NDV). Bag of Visual Words is also being utilized in this paper. Two improved versions of Local Binary descriptor (LBP) with wavelet transform have been used for texture classification [11].The employed techniques are specified by wavelet domain local statistical binary pattern (WLSBP) and directional wavelet domain local statistical binary pattern (dWLSBP). Again, LBP suffers from some drawbacks and an efficient retrieval system cannot be designed.
A texture feature extraction technique specified by Local quantized edge binary pattern (LQEBP) is implemented in HSV color space. These quantized patterns are used for the extraction of local information [12]. The concept of extraction of spectral textural features has been reported by Joydeb kumar et al. [13]. Here power law transform (PLT) has been employed for the same. An image retrieval system based on Rotated wavelet transform (RWT), Discrete wavelet transform (DWT), Dualtree complex wavelet transform (DT-CWT) and finally Dual tree rotated complex wavelet filter (DT-RCWF) and their different combinations have been proposed by Das et al. [14]. These structures have been developed by using DWT by changing some of its attributes but it is only the combination of DWT and GLCM in the proposed technique which produces an effective hybrid texture descriptor.
In order to fetch textural data for browsing, a system based on Gabor wavelet have been described by Manjunath et al. The application of machine learning in image retrieval has increased significantly in recent years. Image classification has evolved into a famous research area. Varied machine learning based classifiers have been utilized for image retrieval. But, Extreme learning machine (ELM), which is based on neural networks, can be considered as a type of Deep learning network. Though, it utilizes a single hidden layer and is a feed-forward neural network but has unexcelled results as compared to other machine learning based classifiers [17]. Kaya et al. [18] proposed a texture based retrieval system based on LBP and GLCM independently. For classification accuracy of butterfly images, ELM classifiers has also been deployed. But, this system lacks the encapsulation of Relevance feedback which has also been added to an ELM model in the proposed work to retrain the images of ELM. Support vector machine (SVM) have also been used for classification purpose. For the extraction of features Convolutional neural networks (CNN) have been used [19].
Refinement of the user's query is considered as an important tool in aggravating the accuracy of the CBIR system. Therefore, the technique of Relevance feedback has played a prominent role in improving the query of a particular system based on the feedback obtained by the user.
Relevance feedback can also be used in combination with many meta-heuristics. Sequential forward selector (SFS) meta-heuristic is utilized in combination with relevance feedback using a single iteration. But, only Relevance feedback without any machine learning model do not yield desired results and both these concepts have been utilized in the implemented technique. Many distance metrics have also been tested and analysed here [20].
In literature, the texture descriptors do not provide information concerning frequency relating to co-occuring local patterns of an image. But, the proposed texture descriptor is an intelligent and normalized fusion descriptor which contributes to producing a highly effective and accurate system with diversified resolution and prominent interconnection among pixels of an image.

Feature Extraction Techniques
In order to extract texture features, two divergent techniques namely Gray level Co-occurence Matrix (GLCM) and Discrete Wavelet Transform (DWT) have been utilized here. To boost the outcome capability of the CBIR system, an efficient method which is an integration of these two techniques (Co-DGLCM) is developed and augmented results are obtained. To improve the accuracy of the proposed system, Extreme learning machine (ELM) as a classifier has also been employed. Lastly, if the attained results are less than the desired value then, Relevance feedback is utilized to rectify the training of ELM and enhanced results are obtained.

Discrete Wavelet Transform
For analyzing a signal in time-frequency zone, initially Fourier based transforms like Discrete Fourier transform (DFT) [21] and Discrete Cosine Transform (DCT) [22] are used. But, these traditional methods are deficient, as they do not convey the local information of an image. Also, the images regenerated by these techniques are of deprived standard, especially at the edges because of the presence of high-frequency bristly components [23]. In the wavelet domain, both Gabor wavelet and Discrete Wavelet Transform are highly prominent. But, due to a large dimension of feature vectors, Gabor wavelet takes more time in image analysis as compared to DWT. So, Discrete Wavelet transform is utilized here as one of the applied technique which has many advantages like: (1) Less memory requirements due to small feature vector dimension (2) Multi-resolution analysis (3) Precise feature extraction.
A one dimensional DWT putrefies an input ( ) = 2 ( ) relatively with reference to wavelet function (x) with [24] scaling function (x) and is denoted as And Here, 0, and , denotes the scaling and wavelet coefficients respectively.
A two-dimensional wavelet transform can be systematically resolved by employing to each column of an image, a 1-D filter bank and subsequently implementing the same 1D filter bank to consequential coefficients of each row. Thus, by continually filtering the resultant picture coefficients in a tier of row and column fashion, until the desired wavelet transform is obtained. In this method, solitary sub-image which is in low pass region (LL1) and a trio of subimages obtained in high pass regions (HL1, LH1 and HH1) are acquired after an initial stage of disintegration [25]. Among the acquired three high pass sub-images, one highlow frequency vertical band (90º) (HL1), one low-high frequency horizontal band (0º) (LH1) and single high-high frequency diagonal element band (45º) (HH1) are created. To achieve a higher level of image decomposition, this procedure repeatedly takes place on each and every low pass sub-image.
These sub-images obtained in the low pass region represent similar low or little resolution of native input and sub-images obtained in high pass regions represent changes of brightness level in a vertical, horizontal and diagonal orientation. DThe triad partition of wavelet dissolution of an input Z = I (x,y) depicting the order of m x n pixels is shown in

Gray Level Co-occurence Matrix
For the withdrawal of texture features, the prominent second order statistical methods are Gray level Co-occurence matrix (GLCM), Gray level Difference matrix (GLDM) and Gray level run length matrix (GLRLM) [26]. Among these, GLDM is too trouble free in calculation and is based on the difference between gray level pixel pairs. But, its main drawback is that with the change in gray level variance, it also becomes variant. On the other hand, GLRLM [27] is based on counting the number of runs which in turn depicts gray levels in an image. This technique suffers from meagerness to represent the pattern of an image and also its computational involvement is high. Therefore, the best second-order statistical texture feature extraction technique is GLCM which has many dominant attributes like: (1) Rotation invariance (2) Diverse applications (3) Simple implementation and fast performance (4) Numerous resultant (Haralick) parameters.
It is a traditional and analytical technique used to extract texture features for the retrieval and classification of different images. [28] The spatial correlation enclosed by the pixel duplet in any given image is also computed by this geometrical method. A co-occurence matrix denoted by Pdis (m,n) specifying grey levels, contains information about two pixels: Gray level content m is denoted by the first pixel and content n is denoted precisely by the second pixel which further is separated by a distance denoted by dis. These specifications are chosen according to a specific angle. Matrices produced by this technique gives the gray level spatial frequencies which further gives association among pixels which are adjacent and have distinct distances amidst them [29].
Therefore, the description of GLCM is as follows: In the above-given equation (4) p1 and p2 denote positions in gray scale image I, the probability is denoted by P and Θ denotes the range of different angle directions given by 0º, 45º, 90º and 135º. Therefore, a GLCM image is represented by d as its vector used for movement, δ as its radius and θ as orientation. A generalized GLCM matrix can be represented by Fig. 2.

Figure 2. A generalized GLCM
Therefore, for a given specific test image with gray tone values, a GLCM matrix is formed which is the spatial cooccurence dependence matrix.
Prominent four types of GLCM feature parameters which are subjected to be used for the extraction [30] of textual content of an image and are denoted by: The disparity between the topmost and the bottom most conterminous pixel sets is given by Contrast intensity.
The correspondence between a reference pixel and its adjoining pixels in an image is diagnosed by making use of Correlation. It considers the mean and standard deviation of a matrix by encapsulating both the row and column of that particular matrix.
In the spatial domain, the proximity among gray levels in an image is defined by the term homogeneity.
Energy of a texture denotes the cyclic consistency of gray level allocation in an image.

Extreme Learning Machine
Extreme learning machine (ELM) is a single-layer feedforward neural network (SLFN), designed to be used for classification and regression applications. It was initially introduced by Huang at el. [17]. ELM has been successfully utilized in many research problems like pattern recognition, classification, fault identification, over-fitting, etc. In this system, the input weights are selected without any conscious decision and by using a specified analytical technique, its output weights are decided. This algorithm indeed is based on a single hidden layer and number of nodes in the hidden layer is an important parameter to be decided. ELM has many advantages as compared to many state-of-the-art traditional algorithms like least human involvement, [31] swift learning speed, complimentary universal capability, convenience to use, varied kernel functions, etc. [32]. The basic diagram of ELM is shown in Fig. 3.
In the above equation, the weight vectors which connects the input nodes to the i th hidden nodes are given by ui = [ui1, ui2, ui3….uin] T and βi=[βi1, βi2, βi3….βin] T represents the connecting weight vectors between the i th hidden nodes and the output nodes. The threshold value for the hidden nodes is given by vi whereas ui and xj is the inner product.
The above equation can be concisely written as: Here, the hidden layer output matrix of ELM is represented by H. Also, the hidden matrix H becomes invertible, if the number of given samples Ñ becomes equal to hidden node parameters H. However, the learning attributes of ELM, ui, vi and the hidden nodes can be allocated randomly, in the absence of any input data, therefore the output weights β of a linear system can be calculated by implementing least square technique and is given by: Where † signifies the Moore-Penrose conception in one of its version. So, from equation (11), we can see that the output weights are calculated by using a simple mathematical equation, thereby avoiding any lengthy procedures. Thus, the algorithm of ELM can be summarized in three steps, which are as follows: (1) Assigning of hidden node parameters ui and vi, where i=1,2……. Ñ. Thus, ELM transforms a complex problem into a simpler and linear function. Fast speed, more accuracy and many more advantages contribute to making this technique more sophisticated and precise as compared to many other customized methods.

Relevance Feedback
There are many ongoing advancements in the field of Content-based image retrieval. But still, the issue of the semantic gap is prevailing over this system. Semantic gap is the difference between high-level human perception and lowlevel machine understanding. To overcome this gap, many types of intelligent techniques have been used with CBIR systems in order to retrieve accurate and precise results. Relevance Feedback (RF) is one of the prominent techniques which has been discussed in the literature [33].
It is a strategy which helps to refine a particular image based on the feedback obtained by the user. To search the system with massive database images, text or a combination of images annotated with text can be used as a query. Then, a set of relevant images are obtained based on a specific query image. These retrieved images are analysed by the user and finally, the query image is refined by using Relevance feedback which selects the best-matched images, based on some common features. This process works iteratively until the desired results are obtained or the user gets satisfied. This intelligent technique can also be used in combination with many other concepts like Support vector machines (SVM), Neural networks, Deep learning, machine learning, etc. [34] The query input given by the user can be broadly classified into three types: The first category consists of a system in which a query image is composed of only keyboard text letters. This technique has some limitations like polysemy, synonymy, homonymy etc. So, finding the desired images based on the user's intention is a major issue of concern. A query can also be given in the form of an image, which is the second medium of inputting a query image. This technique has removed many ambiguities, which were present in the traditional method of the query by text. Also, this method has gained vast popularity in recent times due to its numerous applications in image processing. Relevance feedback can be considered as the third category of providing a query image, indeed through the iterative refinement of a user's query image. The three basic ways of a query refinement are as follows: (1) Extension of Query: In this technique, the neighboring images of an actual query image are also included in it, based on the feedback obtained by the user. Thus, in a way, an expansion of an original query image is done. (2) Query Re-Weighting: This method enhances the weights of some prominent attributes of an image and simultaneously reduces the weights of some unimportant attributes. In this way, a query becomes more refined.
(3) Movement of Query: A query is moved close to the required images by the adjustment in the attributes of a utilized distance function.
In this paper, to make the hybrid textural system more effective, Relevance Feedback is utilized. It works on the relevant images obtained after classification by ELM. If the obtained result is less than a defined threshold, then the role of relevance feedback comes into play. It works on the refinement process until the satisfactory results are obtained.

Architectural Framework
In the proposed system, a unique technique denoted by Co-DGLCM has been described which is a combination of DWT and GLCM. Both these techniques are jointly inexplicable. The superiority of both these techniques can be perceived with the help of their prominent characteristics like: DWT belongs to the family of wavelets which dispense localized frequency data unambiguously. DWT has an awesome temporal resolution which precisely apprehends both location and frequency statistics from an image. It also has swift computation due to consumption of a series of diverse filters. Similarly, GLCM has highly factual results and takes very little computational time. Therefore, the proposed approach is doubtlessly relevant as it is based on a combination of hybrid texture descriptor and a machine learning model with human's feedback in the form of Relevance feedback. This approach can be disintegrated into five major phases as shown in Fig. 4.

Phase 1: Pre-processing
In the first phase, a pre-processing step is employed. Brodatz database consists of 13 categories and each image bears the size of 512x512. Here, every single image is further split, producing 16 resultant images. Finally, this database contains a total of 1456 (7x16x13) images resized to 128x128, 13 categories with 7 angle directions (0º, 30º, 60º, 90º, 120º, 150º and 200º).
Vistex dataset consists of 40 initial images and each image bear the size of 512x512. Finally, a database of 640 images is created by partitioning the original 512x512 image into 128x128 sixteen non-overlapping images (40x16). Here, y = value of a particular feature minimum = Bottom value of every single feature vector maximum = Highest value of every single feature Normalization brings the feature dimensions into a common range. Both obtained feature vectors are fused and concatenated to form the final and resultant feature vector after normalization. These feature extraction and normalization steps are applied to all database images also. Z-score normalization and Decimal scaling normalization are also among the prominent normalization methods. But, the main disadvantage of Z-score Normalization is that it always achieves a normal distribution. But, if this condition is not met, then the vague results are produced.
In Decimal scaling, the normalization is achieved by moving the decimal point of input values. But, this technique is generally based on calculating the least and the largest value of given data which is difficult in some cases. Moreover, if the assumption about these values is done, then also the results could be impermissible. Therefore, on the basis of these conclusions, Min-Max Normalization is preferred because the correspondence among all the data values is conserved, without any bias introduction [35].

Phase 3: Classification
The third phase of this proposed system is based on classification. Here, the hybrid features of GLCM and DWT are applied in the form of an input to the ELM. Here, the ELM classifier is initially trained with the complete specific database. The output of this step is the texture images with meticulous categorization. With the correct categorization, the performance and accuracy of the proposed system is enhanced. The working of the ELM network depends on the type of activation function used and hidden layer total neurons. Various working parameters are given in Table 1.
Here, Radial basis function (RBF) is used as the activation function.

Phase 4: Similarity matching
After classification by the ELM network, the whole training dataset is successfully classified into varied categories. Now, based on the query image, a similarity matching is done between a given query image and the respective category to which it belongs. These categories are formed as a result of ELM classification. After the results of similarity calculation are acquired, the resultant images are arranged in increasing order based on the utilized distance metric. A result of zero with regard to distance metric exhibits accurate resemblance between two images. Many types of distance metrics are utilized for the purpose of calculating similarity. Some of the prominent distance metrics which are used in similarity calculation are given under: Here, Ii denotes the input query image and Di depicts all database images related to Euclidean, Manhattan and Minkowski distances whereas Xi and Yi denote ith values of sequences X and Y related to Spearman and Cosine distances.

Phase 5: Relevance Feedback
In the last phase of the proposed system, relevance feedback is used. The result obtained after ELM classification is compared to a threshold value, which is set at 95%. If the obtained results are greater than or equal to 95% then, the same results are retained. But, if it is less than the threshold then, relevance feedback is applied in the form of two iterations. This process rejects the non-relevant images based on user's feedback and finally, the ELM data is updated and retrained. Now, based on these refined images, final top N images are retrieved. This process enhances the accuracy of the proposed system to a significant value.

Evaluation metrics
In CBIR systems, after the recovery of indistinguishable images from a huge repository of images, the capability of a particular system can be concluded with respect to many evaluation parameters [36][37]. Precision and Recall are the most well-known evaluation metrics. These are defined using the given equations: In the proposed work, total database images are regarded as query input images and afterwards, Precision and Recall are calculated. Total (overall) average precision and total (overall) average recall based on the entire image set are In these above equations, total images in the database are represented by DB.

Experimental Results and Analysis
Two texture datasets, one Brodatz  In the initial assessment, Brodatz texture dataset is evaluated.
In this experiment, each image is further divided into 16 resultant images, finally forming 1456 (7x16x13) images of the size 128x128. Here, 7 denotes angle orientations and 13 denotes different categories. In the present experiment total dataset images are used as query input images and final resultant images are recovered in the retrieval pattern of 25, 35….up to 65. Graphs depicting average precision (total) and average recall (total) are shown in Fig. 6(a) and 6(b).

Figure 6(b). Total Average Recall (%) Brodatz
It is clearly undoubted from the comparative analysis of the above techniques that the proposed technique, outperforms the other compared techniques and total average precision of 96.35% is obtained on the retrieval of 25 images. This proposed texture descriptor is also benefitted by the encapsulation of ELM classifier and relevance feedback. Therefore, the results of precision and recall are outstanding as compared to some commonly used texture extraction techniques. Retrieval result for a particular Query image on the Brodatz database is shown in Fig. 7.  Fig. 7, it can be seen that the retrieved results for a specific query image are accurately classified and belong to the respective category of query image.
The second experimental analysis is carried on another MIT-Vistex [41] dataset. This dataset consists of 40 initial images and a database of 640 images is created by partitioning the original 512x512 image into 128x128 sixteen non-overlapping images (40x16).
Total average precision and total average recall on Vistex dataset are shown in Fig. 8(a) and 8(b) and total average precision for this database is 97.34% on the retrieval of 16 images. Again in this 2nd experiment, total dataset images are regarded as input query images and recovered outcome images are procured in the sequence of 16, 32……up to 80.

Figure 8(a). Total Average Precision (%) Vistex
Again, it can be seen from the precision values obtained for the proposed technique that, the implementation has significant results as compared to many basic texture feature extraction techniques.

Figure 8(b). Total Average Recall (%) Vistex
A specific query image and its retrieval results for Vistex dataset are shown in Fig. 9. Again, it can be concluded from the retrieved results that images belonging to the same native category of the query image are retrieved and the accuracy of the system is remarkably enhanced.

Proposed technique using divergent distance metrics and state-of-the-art comparison
There are different similarity measures [42] which can be used for the purpose of calculation. Here, five distance metrics namely Euclidean, Manhattan, Cosine, Spearman and Minkowski are tested on the proposed method and the results reveal that the Euclidean distance metric outperforms the other four used distance metrics [43]. The performance of all distance metrics is shown in Table 2. distance metric is used here. The comparison of the proposed system with the state-ofthe-art techniques in terms of average precision on Brodatz dataset is shown in Table 3. Table 3. Comparison of the proposed system with the prominent state-of-the-art techniques The average precision of the proposed system during the various phases of its functioning is shown in Table 4. Similarly, the relative comparison of the proposed technique with the state-of-the-art techniques on MIT-Vistex dataset is given in Table 5. Table 5. Comparison of the proposed system with the prominent state-of-the-art techniques From Table 3 and 5, it is clearly visible that the average precision obtained on Brodatz and Vistex dataset is remarkably higher than many state-of-the-art prominent texture analysis techniques. The proposed system also outperforms customary texture descriptors in terms of accuracy. The accuracy of the proposed system is given in Table 6. The confusion matrices based on classification by ELM are given in Table 7 and 8 for Brodatz and MIT-Vistex respectively. From Table 6, it can be clearly visualized that the accuracy of the proposed technique by using a machine learning model, specifically an Extreme learning machine (ELM) is significant and moreover this ELM model is retrained by using a feedback from the user in terms of Relevance feedback is form of "Cherry on a cake".
By obtaining a prominent and an effective value of accuracy, it can be concluded that the machine learning model has done an accurate classification. This performance of classification can be studied by a confusion matrix. A table which describe the performance of a classification model on a set of test data for which the true values are known. Therefore, the obtained confusion matrices for both the datasets are given in Table 7 and Table 8.  From the confusion matrix given in Table 7, the diagonal elements of the matrix represents the total number of retrieved images per category. Since, there are 13 categories in Brodatz dataset with each category containing 112 images in it. Therefore, these diagonal elements represents the total images out of 112 and it can be seen that from 1 st category 108 images are retrieved, 109 from second category and so on. Thus, out of 112 images, a significant number of images are retrieved thus enhancing the retrieval accuracy of the proposed system.

Brodatz Total Average Precision (%)
WLSBP [11] 89.85 PLT [13] 92.5  Again, from the confusion matrix of Vistex dataset, it can be concluded that out of 80 images particularly of 8 categories, an accuracy of 99.21% is achieved. Hence, from the results obtained on both the renowned texture datasets namely Brodatz and MIT-Vistex, it is evident that the implemented technique is significantly superior to many existing texture feature descriptors.

Conclusion and Future Trends
This paper depict an innovative technique for texture feature extraction which is based on an efficient combination of Gray level co-occurence matrix (GLCM) and Discrete wavelet transform (DWT). These techniques are utilized for the formation of a hybrid texture descriptor, namely combined DWT-GLCM (Co-DGLCM). The process of normalization has been used here to create a hybrid feature vector (HFV) from two obtained independent feature vectors. This HFV is applied as input to an Extreme learning machine (ELM) which works as a classifier. Then, based on a decisive condition of a threshold, Relevance feedback is applied. This Relevance feedback is used in the form of some iterations based on the user's feedback. This Extreme learning based-Relevance feedback framework helps in the evolution of an intelligent and modified system for learning and classification. GLCM has precise inter-pixel and inter-pattern relationship and DWT has awesome temporal resolution which precisely apprehends both location and frequency statistics from an image. To validate these facts, simulation results are provided which are finally compared with many latest texture extraction techniques. Our future work will be focused on developing a hybrid CBIR system by extracting color, texture and shape features of an image. Moreover, the feature extraction will be done by incorporating deep learning techniques like auto-encoders, deep belief networks, etc. There are many real-life multimedia applications like medical diagnosis, crime detection for the extraction of fingerprints, face recognition, pattern recognition etc, where this texture based hybrid descriptor can be utilized. Last but not the least, the concept of Internet of things (IoT) can be used for the online transfer of desired images.