Context-aware hand poses classifying on images and video-sequences using a combination of wavelet transforms, PCA and neural networks

In this paper we propose novel context-aware algorithms for hand poses classifying on images and video-sequences. The proposed hand poses classifying on images algorithm based on Viola-Jones method, wavelet transform, PCA and neural networks. On the first step, the Viola-Jones method is used to find the location of hand pose on images. Then, on the second step, the features of hand pose are extracted using combination of wavelet transform and PCA. Finally, on the last step, these extracted features are classified by multi-layer feed-forward neural networks. The proposed hand poses classifying on video-sequences algorithm based on the combination of CAMShift algorithm and proposed hand poses classifying on images algorithm. The experimental results show that the proposed algorithms effectively classify the hand pose in difference light contrast conditions and compete with state-of-the-art algorithms.


Introduction
Hand gesture recognition is one of the most difficult and required task in the field of image processing and computer vision.The hand gesture recognition systems are used to classify specific human hand gesture to transfer information or to manage devices, such as computers, televisions, etc.In this paper, the hand pose classifying on images and on video-sequences, which is main subtask of hand gesture recognition, is considered.
Classification hand pose on images can be done based on these following steps: 1. Detecting the location of hand pose on images; 2. Extracting the features of detected hand pose; 3. Classifying hand pose using extracted features.Because of high processing speed and effectiveness, method Viola-Jones becomes one of the most used object detection methods.So, to detect the location of hand pose on images we use method Viola-Jones.This method based on three ingredients to enable fast and accurate object detection: the integral image for feature detection, Adaboost for feature selection and an attentional cascade for efficient computational resource allocation.These ingredients allow method can perform the object detection in real time [1][2][3][4].
The next step is extracting features of detected hand pose.In order to extract image features, wavelet transform is one of the most effective methods.It enables to obtain the necessary information about the image and it is also can be very quickly calculated.The experimental results of image classification algorithms [5][6][7][8][9][10] showed that images, features of which extracted by using wavelet transform, were classified with 76-99.7%accuracy rate.
Besides the experimental results of algorithms [4,[16][17][18][19][20] showed that using combination of wavelet transform, PCA and neural networks gave more effective performance of object recognition.In these algorithms, neural networks were used to recognize objects based on their features, which extracted by using the combination of wavelet transform and PCA.
Thus, using the combination of Viola-Jones method, wavelet transform, PCA and neural networks is perspective solution for development of novel contextaware hand pose classifying algorithm on images.In this paper we propose a novel context-aware algorithm for hand pose classifying based on combination of Viola-Jones method, wavelet transform, PCA and neural networks.In this case, the context is any information about an image such as: image light condition, contour, noise and so on.
Classification hand pose on video-sequences can be done based on these following steps: 1. Detecting the location of hand pose on video-frame; 2. Tracking hand pose on video frame, used when hand pose is detected on previous frame; 3. Extracting the features of detected (tracked) hand pose; 4. Classifying hand pose using extracted features.In 1998, Harry Bradsky created the algorithm CAMShift (Continuously Adaptive MeanShift) [26], which based on color information was able to effectively track objects in real time.So in this paper, we propose hand pose classifying algorithm on video-sequences based on combination of CAMShift algorithm and proposed hand pose classifying algorithm on images.

Proposed hand pose classifying algorithm on images
The proposed hand pose classifying algorithm on images consists of following main steps: 1. Finding the hand pose location on image based on Viola-Jones method (Fig. 1);  This method was developed and proposed in 2001 by Paul Viola and Michael Jones, and it is still effective to detect object in digital images and videos in real-time [1,2].Using simple cascade classifier, which is the feature detector instead of one complex classifier, is the main idea of this method.Based on this idea, it enables to construct a detector, which can work in real time.

Integral image
In Viola-Jones method, integral image is used to rapidly compute rectangle features.The integral image is widely used in other methods, such as wavelet transforms, SURF, Haar filtering and etc. [21].Pixel value of the integral image at location (x, y) contains the sum of pixels above and to the left of (x, y) and is computed by formula (1).In the detection phase of the Viola-Jones object detection framework, a window of the target size is moved over the input image, and for each subsection of the image the Haar-like feature is calculated.This difference is then compared to a learned threshold that separates non-objects from objects.Because such a Haarlike feature is only a weak learner or classifier (its detection quality is slightly better than random guessing) a large number of Haar-like features are necessary to describe an object with sufficient accuracy.Examples of Haar-like features are presented in Fig. 4.

Learning classification using Adaboost
Boosting is a machine learning meta-algorithm for performing supervised learning.Boosting is based on the question posed by Kearns [23]: can a set of weak learners create a single strong learner?A weak learner is defined to be a classifier which is only slightly correlated with the true classification (it can label examples better than random guessing).In contrast, a strong learner is a classifier that is arbitrarily well-correlated with the true classification.
Schapire's affirmative answer to Kearns' question has had significant ramifications in machine learning and statistics, most notably leading to the development of boosting [24].
For each feature, the weak learner determines the optimal threshold classification function, such that the minimum number of examples is misclassified.A weak classifier hj(x) thus consist of a feature fj, a threshold θj and a parity pj indicating the direction of the inequality sign (formula 3): 1, ( ) ( ) , 0, otherwise where z is a 24×24 pixel sub-window of an image.Development of this approach was development more perfect family algorithms of a boosting -AdaBoost, short for Adaptive Boosting, is a machine learning algorithm, formulated by Yoav Freund and Robert Schapire.It is a meta-algorithm, and can be used in conjunction with many other learning algorithms to improve their performance.AdaBoost is adaptive in the sense that subsequent classifiers built are tweaked in favour of those instances misclassified by previous classifiers.
For combining increasingly more complex classifier in a "cascade" which allows background regions of the image to be quickly discarded while spending more computation on promising object-like regions.
In this paper, after hand pose location in image is found by using method Viola-Jones, the Haar and Daubechies wavelet transforms are used to extract hand pose image features.The process of extracting hand pose features by using wavelet transform works as follows.Firstly, the hand pose image is resized to 64×64 pixels.Then we apply wavelet transform to obtained image and extract the low-frequency wavelet coefficients.In the result, we have matrix that consists of 32×32 = 1024 lowfrequency wavelet coefficients (Fig. 5).

Extracting hand pose features using Wavelet transforms
Before classifying by neural networks, dimension of hand pose feature vector is reduced.In this paper, PCA is used to solve this task.At first, eigenspace for hand poses (eigenhandpose) will be created using M images of hand poses.The process of creating hand pose eigenspace is carried out as follows.
In first step, the process of extracting features is applied to each of M images.After that we obtain a set of 1 ,..., M II feature vectors.Then we form the mean vector, the value of each element of which is calculated by the formula (4): In second step, each vector of the M feature vectors is subtracted by mean vector using formula ( 5 In third step, an eigenspace, which consists of K eigenvectors of the covariance matrix C (6), is created.It is the best way to describe the distribution of these M feature vectors (K<M).
where k-th vector k u satisfies maximization of the following formula (7): and an orthogonality condition ( 8): 1, .0, otherwise Vectors k u and values k  are eigenvectors and eigenvalues of covariance matrix C. In order to create this eigenspace, firstly, we calculate M eigenvectors l u of covariance matrix C by using eigenvectors of other matrix After that we select K eigenvectors, which have the largest eigenvalues from M obtained eigenvectors.The eigenspace is the set of K selected eigenvectors (Fig. 6).
When the hand pose eigenspace is created, the process of reducing dimension of hand pose feature vector in I is carried out as follows.T ii Then we form a novel hand pose feature vector using formula (11): 1 { ,..., }.
This vector describes the distribution of each eigenvectors in presentation of hand pose feature vector.
The novel hand pose feature vector is  , which consists of K elements.In this case, number K is much less than 1024 (Fig. 7).The input of these neural networks is the hand pose feature vector  (11), which consists of K elements.
These neural networks will return a value from 0 to 1, which determine whether an input hand pose is training hand pose or not.
The neural networks classify the input hand pose as follows.Firstly, feature vector of the input hand pose is extracted.After that the dimension of this vector is reduced.Finally, obtained hand pose feature vector is submitted to the inputs of all trained neural networks.Input hand pose is classified as a hand pose of training set, neural network of this hand pose returns the largest value (Fig. 8.).

Experimental results
All experiments were performed on a laptop with the processor Intel Core Duo P7350 2.0 GHz and 2.0 GB of RAM.

Classifying hand poses on images
The proposed algorithm of classifying hand poses on images was tested using a part of the Cambridge Gesture database [25].This hand pose database consists of 5 difference parts, which contain images in various light contrast conditions (Fig. 10).The experimental results are presented in table 1. Column P1 is presented classifying results for dataset part 1 and so on.It is shown that the proposed hand pose classifying algorithm, which based on a combination of wavelet transform, PCA and neural networks, gave more accurate classifying results than algorithm [20].The highest hand pose classifying accuracy was obtained for the dataset part 1, in which the light is straight ahead the hand pose.For other parts, the classifying accuracy is competed with each other.Besides, it is shown that in this case, using wavelet Haar gave more effective classifying results than using wavelet Daubechies.

Classifying hand poses on videosequences
The proposed algorithm of classifying hand poses on video-sequences was tested using created data set, consisting of 6 classes of hand poses.Each hand pose is used to present a number from zero to 5 (Fig. 12).

Conclusions
In this paper we developed novel algorithms for hand pose classifying on images and on video-sequences based on wavelet transform, PCA and neural networks.Developed algorithms enables effectively classifying hand pose with difference light contrast.
The developed algorithm for classifying hand poses on images gave the highest accuracy rate 96,75%, which was obtained for the dataset part 1.In this part, the light is straight ahead hand pose.The experimental results also showed that using wavelet Haar gave more accuracy rate of hand pose classifying than using wavelet Daubechies.
The developed algorithm for classify hand poses on video-sequences performed with real time processing speed and gave the accuracy rate about 93%.

,
where I(x, y) is value of integral image pixel (x, y); i(x, y) intensity of original image pixel (x, y).Each pixel value of integral image I(x, y) is sum of the original pixels from i(0, 0) to i(x, y).Time of computation of integral image matrix depends on the number of pixels of original image.Value of each pixel of integral image can be computed by formula (2): Haar-like features are image features, which are used in the object recognition task.Viola and Jones adapted the idea of using an alternate feature set based on Haar wavelets instead of the usual image intensities of Papageorgiou et al. [22].And they developed the new EAI Endorsed Transactions on Context-aware Systems and Applications 03 2017 -07 2017 | Volume 4 | Issue 12 | e2 Context-aware hand poses classifying on images and video-sequences using a combination of wavelet transforms, PCA and neural networks 3 features called Haar-like features.A Haar-like feature considers adjacent rectangular regions at a specific location in a detection window, sums up the pixel intensities in each region and calculates the difference between these sums.

Figure 6 .
Figure 6.Creation of hand pose eigenspace Firstly, we decompose the hand pose feature vector on K eigenvectors i u and calculate corresponding decomposition coefficients by the formula (10):

Figure 7 .
Figure 7. Reducing dimension of hand pose feature vector

Figure 10 .
Figure 10.Examples of hand pose images of 5 difference parts In the part 1 (Fig. 10a), the light is straight ahead the hand pose.The light comes from bottom right corner of the hand pose for part 2 (Fig 10b), top right cornerpart 3 (Fig.10c), top left cornerpart 4 (Fig.10d) and bottom left cornerpart 5 (Fig.10e).In these experiments hand poses are divided into 12 classes presented on Fig.11.each part, we created one testing dataset, which contains 2400 hand pose images (20 images of each class).And for each part we also created one training dataset, which contains 1200 hand pose images (10 images of each class).

Figure 11 .
Figure 11.Examples of images of 12 classes of hand pose of dataset part 1

Figure 12 .
Figure 12.Examples of 6 classes using for hand poses classification on video-sequences The experimental results showed that proposed algorithm effectively classify hand poses on videosequences with accuracy rate about 93% and real time processing speed -30 frames per second.Examples of hand poses classification on video-sequences are presented in Fig. 13.

Figure 13 .
Figure 13.Examples of hand poses classification on video-sequences

Table 1 .
Accuracy rate of hand pose classifying