Infrared dynamic hand gesture recognition based on Gabor feature and sparse representation

Dynamic hand gesture recognition is a subject which has been investigated almost from the beginning of using terminals to interact with the computer central unit. We present a method for dynamic hand gesture recognition with image source acquired by a single IR camera. First of all, the hand images are captured by one infrared camera, which are light independent, and not be limited by skin color. Second, due to the dynamic hand gesture sets contain more than one frame, the data dimension is very large and the adjacent frames have limited diﬀerence. We use the twofold selections to choose the key frames from the gesture image sets. The Gabor feature is used to describe the locality variation. And the Sparse representation based classiﬁcation (SRC) is used for recognition. Experimental results on dynamic hand gesture images with variations of rotation and translation demonstrate the good performance of our method.


INTRODUCTION
Human computer interaction is one of the most visible and challenging research topics in computer vision and machine learning [1], [2].Hand gestures as the most common used communication mode and hand recognition technology have become the focus of study in the field of computer vision and pattern recognition.In passed decades, many research methods are proposed for recognizing various gestures [3].
For splitting up the hand from the image, the most common is using skin color [2], [4].Compared with the traditional image [5] capture machines such as color camera, instrumented gloves and Kinect sensors, the infrared imaging can acquire the intrinsic temperature information of skin, which is not sensitive to the illumination conditions and disguises.In this paper we use infrared imaging, which makes the hand seg-mentation more robust than based on skin color information method segment from the background.
In hand recognition section, different methods are proposed such as texture information, skin color, contour information [2] and shape parameters [3].Based on infrared images, exploring a robust feature extraction method is very important.Due to its characteristic of extracting the image within a specific area of multi-scale, multi-direction spatial frequency, Gabor filters [6] can achieve good results in this field.Therefore, Gabor features for recognition have been widely used, such as Gabor feature-based Fisher discriminate [7] and so on.However, the classical Eigen and Fisher algorithms consider only the global scatter of training samples.As we known, Gabor feature image is formed by convolution of hand gesture image with dozens of Gabor filters, whose dimension of feature-space is very large.
Although dynamic hand gesture images have a high dimensionality, they usually lie on a lower dimensional subspace or sub-manifold.The success of manifold learning implies that the high dimensional hand images can be sparsely represented or coded by the representative samples on the manifold.Recently, an interesting work was reported by Wright et al [8], where the sparse representation (SR) technique is employed for robust face recognition.In [8], using the training data as the dictionary book to find the sparse linear combination representation for the testing image via -norm.The SR based classification (SRC) is defined as evaluating which class of training samples could result in the minimal reconstruction error of the input testing image with the sparse coding coefficients.The dimensionality of the training and testing samples should be reduced beforehand to make the -norm sparse coding computationally feasible.
In this paper, we use Gabor feature based sparse representation classification (GSRC) [9] to realize dynamic hand gesture recognition.

RELATED WORK 2.1 Gabor wavelets
The Gabor wavelets whose kernels are similar to the 2D cortical simple cells, exhibit desirable characteristic of spatial locality and orientation selectivity, and are optimally localized in the space and frequency domains.The Gabor wavelets with orientation µ and scale ν are defined as: where, z = (x, y) denotes the pixel, ∥∥denotes the norm operator, and the wave vector kµ,ν is defined as fellows: and ϕµ = πµ 8 is the maximum frequency and f is the spacing factor between kernels in the frequency domain.In addition, the σ determines the ratio of the Gaussian window width to wavelength.
The Gabor feature representation of an image is the convolution of the image with the Gabor kernel.Let i(z) indicates the gray-level image, and the Gabor kernel outputs where * denotes the convolution operator, and Ψµ,ν(x, y) is the Gabor kernel.The Gabor filtering coefficient Gµ,ν (z) is a complex number, which can be written as Gµ,ν = Mµ,ν(z)e iθµ,ν (z) with Mµ,ν being the magnitude and θµ,ν (z) being the phase.It known that magnitude information contains the variation and local energy in the image.In [8], the Gabor feature vector χ is defined as via uniform down-sampling, normalization and concatenation of the Gabor filtering coefficients: t is the transpose operator,a µ,ν s the concatenated column vector from down-sampled magnitude matrix Mµ,ν (z) by a factor of ρ.

Gabor-feature based SRC
The augmented Gabor hand feature vector χ, which is a local feature descriptor, can not only enhance the hand feature but also tolerate to image local deformation to some extent.So we use the Gabor feature instead of the holistic features in the SRC framework, and the Gabor-feature based SR is With replacing y0 and A and χ(y0) and X(A),the Gaborfeature based SRC (GSRC) can be achieved.

DYNAMIC HAND GESTURE RECOGNI-TION
The process of dynamic hand gesture recognition is shown as following

hand segmentation
The capture image tool is infrared camera, compared with gray and color hand gesture imaging, infrared imaging canacquire the intrinsic temperature information of the skin, which is robust to the impacts of illumination and disguises.
The first step is hand segmentation.The raw image changes into gray image.After threshold processing the gray image is turned to binary image.For every hand gesture may in different illumination, the threshold is set adaptively.The threshold value depends on the mean gray value of the first image of the dynamic hand image sets.After the graying and smoothing the original frames, these binary difference-frames are created by computing differences between subsequent frames.In order to reduce the small change frames, a certain threshold is necessary to determine whether difference-frames are important enough to use.Both the first frame and the other K − 1 frame which corresponding to the K − 1 largest binary difference-frame are selected as the K key frame.
There are twofold selections during the image set processing.The first threshold is used to determine if the pixel belong to hand region or background.The dynamic hand gesture sequence is a delicate procedure, which resulted in videos containing only the gesture itself, but not being aligned in time.For instance, one video of the gesture '1' could have the finger moving at frame 10, whereas another video had the finger moving at frame 15.So the second selection is wiping the small variation frames.

Gabor feature extaction
In this paper, Gabor filters with 5 different scales and 8 different directions are used, so the total number of the filters is 40.In our experiment, 15 key frames are chosen from the hand gesture image set.We translate the first frame of the key frame set as the first column, in chronological order; the image set is changed in to a matrix.

Gabor feature based SRC
The hand gesture matrix get through the Gabor filters, we get the Gabor feature matrix.The dimension of the augmented Gabor feature is very high.Principal Component Analysis(PCA) [10] is then applied to reduce their dimension for classification in SRC.

EXPERIMENTAL RESULTS
The infrared database used in this paper was captured by an infrared camera (Point Grey FFMV-03M2C-CS) with a black board as background.The infrared camera with resolution of 1024x1280 recording at a speed of 30 frames per gesture was used.The training database contained 8 different gesture was created, which had different horizontal rotating movement, wrist rotation, and delicate illumination.Illustrations of each of these hand gestures are shown in Figure 5. Five different persons were asked to perform fourteen times of the gestures in Figure 5 presented each, to ensure the GSRC method is largely invariant to different of shapes of hands.Each person was shown the gesture and asked to perform this gesture in front of the infrared camera.Each person imitated one gesture for 14 times with different horizontal rotating movement and wrist rotation.In Figure 6, we can see some gestures with different horizontal rotating movement and wrist rotation.The horizontal rotating angle changed from −45 • to 45 • , and and the wrist angle approximately changed from −30 • to 30 • .All experiments were finished on an pc with Intel core i5-4570 , 3.20Ghz CPU and 8 GB of RAM.Software is Win7 opening system, MATLAB R2012b.
Altogether, the number of sample is 5person * 8gestures * 14attemps = 560.By applying a ratio of 50% for training set and 50% for the testing set.After the pre-processing, normalization is performed for the frames of the hand ges-   Here, the σ = 0.001.Compare with PCA dimensionality reduction and k-Nearest Neighbor.

CONCLUSIONS
In this paper, Gabor feature based sparse representation classification method is proposed for hand gesture recognition.All hand gestures images in the evaluation database are captured by an infrared camera.The experimental results demonstrate that our method can achieve good performance.The correct recognition rate is about 93.75%.
set of training samples of the i th object class, where si,j, j = 1, 2, • • •, n is an m-dimensional vector stretched by the j th sample of the i th class.For a test sample y0 ∈ R m from this class, intuitively, y0 could be well approximated by the linear combination of the samples within Ai, i.e. y0 = n i ∑ j=1 αi,jsi,j = Aiαi (5) where αi = [αi,1, αi,2, •••, αi,n] T are the coefficients.Suppose we have K object classes, and let A = [A1, A2, • • • , AK ] be the concatenation of the n training samples from all the K classes,where n = n1 + n2 + • • • + nK , then the linear representation of y0 can be written in terms of all training samples as

Figure 1 :
Figure 1: The flowchart of our dynamic hand gesture recognition.

Figure 3 :
Figure 3: 15 key frames of image set showed in Figure.2.

Figure 4 :
Figure 4: Gabor filter in 5 scales and 8 directions

Figure 6 :
Figure 6: Some samples with different horizontal movement and rotation.

Figure 7 :
Figure 7: Recognition rate by GSRC vs PCA+KNN versus different feature dimension.

Table 1 : the recognition of key frames and original image set without twofold selcetion(feature dimen- sionaliy is 80)
The image size is changed from 1024 * 1280 to 64 * 80.After twofold selection of the image set, 15 key frames are chosen from the original gesture image set.PCA is then applied to reduce the dimension of the outputs of Gabor filters.Different dimensions are chosen for the feature extraction.Figure7is the recognition rate by SRC versus different PCA dimension.Here the consume time means the feature extraction and recognition time.