Neuro-Fuzzy Hybridization using Modified S Membership Function and Kernel Extreme Learning Machine for Robust Face Recognition under Varying Illuminations

The multifaceted light varying environment severely degrades the performance of person recognition using facial images. Here, the authors present a novel person identification method using hybridization of artificial neural network (ANN) and fuzzy logic concepts. An efficient illumination normalization method is presented with the help of a new modified S membership function. The proposed method of illumination normalization retains the large scale facial features as well as suppresses the variations related to change in light variations. Kernel extreme learning machine (KELM) which is a non-linear and non-iterative learning algorithm of ANN is used for classification. Various kernel types and parameters are experimented to find the best choice for robust classification. To assess the performance of proposed hybridization, Yale and extended Yale B face databases have been used. Very promising results have been achieved which establish the worth of the proposed method.


Introduction
Person identification using their face images is one of the very active domains of research in pattern recognition from last many decades. Pose, expression and illumination variations severely affect the performance of classification systems [1]. Variations of illumination on the faces are due to light projection from different directions at different angles on the face of a person, generates such images which are apparently different and make the face recognition very difficult [2].

Related Work
Many researchers have proposed variety of methods for the illumination normalization and compensation. It all started in the year 1997 with Belhumeur et al. who gave FisherFace technique for the face image classification under variations of illumination [3]. The technique was based on linear discriminant analysis (LDA) with the help of which illumination normalization is performed and achieved good results, when compared to other similar method known as Virendra P. Vishwakarma and Sahil Dalal 2 Eigenface technique [4,5]. This research work was taken further by Toth et al. in 2000 [6].
For the illumination normalization, Chen et al. introduced a method in which low frequency band of the discrete Cosine transform (DCT) of the face image with logarithmic transform were discarded. This technique gave a quite promising results on YALE B and CMU PIE databases [7]. But discarding the low frequency band from the DCT coefficients might result in loss of considerable information from the face images. Therefore, Vishwakarma et al. proposed a technique in 2010 in which re-scaling of the low frequency DCT coefficients were performed and excellent results were attained over Yale B face database compared with Chen et al. method of discarding low frequency band [8], [9], [10].
In 2015, the author introduced an approach in which instead of discarding or re-scaling of the DCT coefficients, modification of the low frequency band were done with the help of fuzzy filter for illumination normalization and achieved very good improvement in terms of accuracy in recognition of face images under varying lighting conditions [11]. In 2018, homomorphic filtering and reflectance ratio based illumination normalization with selective feature extraction was performed in orthogonal DWT and integral wavelet transform (HFRIN-SFDWT and RR-CS IWT) domains by J. Yadav et al. and very promising results were achieved [12], [13]. Hybridization of two techniques namely DCT and discrete wavelet transform (DWT) were utilized by Vishwakarma and Goel in 2019 and illumination normalization was performed over various standard illumination varying face databases with the help of principal component analysis (PCA) with k-NNC classifier and back-propagation (BP) classifier [14]. There were some more techniques which have contributed in illumination normalization and that are: 9PL [15], Fractal analysis + Log function [16], Tetrolet transform [17], Edge orientation [18], etc.
The existing face recognition systems try to improve their performance by approximate modeling of illumination variations thus result in inappropriate illumination normalization. Also the classification algorithms are either linear or iterative if it is non-linear. The face recognition system performance can be improved using efficient illumination normalization method along with non-linear classification algorithms. This is achieved in the proposed method with modified S membership function based illumination normalization and classification using KELM.
The remaining parts of the manuscript are organized as: in Section 2, the preliminaries concepts used in the proposed approach are briefly described. Section 3 explains in detail the proposed method of illumination normalization and classification of face images under varying illuminations. The empirical results and analyses have been pen down in Section 4 and conclusions are drawn in Section 5.

Adaptive Histogram Equalization and Logarithmic Transform
Since the face images under varying illuminations are of quite low contrast, adaptive histogram equalization followed by logarithmic transform (AHELT) is utilized as a preprocessing technique for contrast stretching of the face images captured varying illuminations [10]. Logarithmic transform compensates the effect of increased number of low intensity pixels in the outcome of adaptive histogram equalization.
The inverse discrete Cosine transform (IDCT), now, can be defined [19]as: for, x = 0, 1, 2, 3, …, X -1 and y = 0, 1, 2, 3, …, Y -1. where, where, f represents the face image matrix, x and y represent the row and column indices of the face image matrix respectively and, u and v denote the same for the DCT matrix, i.e. C's rows and columns respectively. It is utilized to transform the data from time domain to frequency domain so that low frequency data can be easily obtained from the

Kernel Extreme Learning Machine
Extreme learning machine (ELM) is a non-iterative feedforward neural network with only one hidden layer [20]. In the hidden layer, weights and biases are assigned with random numbers and as a result, different values of network output (classification recognition or error rate) are achieved on every execution. For R number of neurons in the hidden layer, β as the activation function of the hidden layer, the output matrix of the hidden layer can be expressed as: where ar and br are the input weight and bias of the r th hidden layer neuron respectively which are random numbers, H(x) = [β(a1,b1,x)…, β(aR,bR,x)] is the output matrix obtained from the hidden layer with respect to the input x. For the single layer feed-forward neural network, the output weight τ is obtained analytically for the connection of output neurons with hidden nodes and can be obtained as: where Q is the target class and C is regularization coefficient, an application dependent parameter. ELM was introduced to minimize the training errors and the output weights norm i.e. ||τ|| 2 . Gradient-descent methods or global search methods take a much longer time as compared to ELM [21]. It also allows reducing the computational time that is required in optimizing the parameters [22]. Due to random weights and biases, ELM suffers from nondeterministic performance [23,24]. To overcome the limitations of ELM, another non-iterative algorithm was proposed in terms of kernel matrix. Kernel matrix in KELM is not related with the target, it is having relevance only with training samples [24]. The kernel matrix can be expressed as: where, xt∈ R N denotes the training data and t represents the number of training data used. This kernel matrix φ that can be obtained using Mercer's condition = where, Here, h is the kernel function used to obtain the kernel matrix. Therefore, formulation for KELM can be done as: This technique is termed as KELM which is exploited here for multi class classification of illumination normalized face images which are captured under varying lighting conditions [20].

Proposed Method of Illumination Normalization and Classification
In this section, a new method of face recognition under illumination variations is presented with the help of hybridization of fuzzy S membership function and KELM. An image comprises of large and small scale components, in which small scale components are mainly related to sharp details such as edges and corners and these are not affected by illumination variations. The large scale components which contain low frequency contents of the input image are mainly altered by variations in illumination. As depicted in the block diagram drawn in Fig. 2, all the database images are preprocessed using AHELT, followed by illumination normalization using S membership function and then classification by KELM.

Preprocessing
As shown in Fig. 2, all the database images are first preprocessed using AHELT that is AHE followed by LT. DCT is implemented on the output of AHELT. The DCT coefficients are read in zigzag manner [11]. Fig. 3 (a) shows some of the database images under high illumination variations (taken from subset 4 and 5 of extended Yale B face database). As these images are of quite low contrast, AHE followed by LT is applied, to improve the contrast of database images. These methods are simple preprocessing methods and outcome of these processing are shown in Fig.  3 (b) and (c) respectively. The correlation coefficients of these images with the same subject face image under uniform lighting, have been calculated and are listed below to each image. For calculating the correlation coefficient, same processing is applied to both images, uniform lighting image and image shown in Fig. 3(a). It is found that the value of correlation coefficients is quite low (listed below to

Illumination Normalization using modified S membership function
There is a strong need for handling the vagueness and uncertainty present in computer vision and pattern recognition applications. These vagueness and uncertainty can be better dealt using fuzzy logic concepts. Here, fuzzy logic based processing is investigated to normalize and compensate the effect of varying illumination. The fuzzy set and logic generalize the classical set and logic theory with the help of membership function (MF). In classical set, MF assigns either zero or one to an element for its nonmembership and membership in a given set respectively. In fuzzy sets, this value varies gradually between [0, 1] interval corresponding to its varying degree of membership in the set. Higher value means higher degree of membership of the element. This mapping function MF for a fuzzy set ℱ � , can be articulated as: where is the set of large scale DCT components which can be expressed as: Here δ corresponds to m th large scale DCT component.
β represents the total number of large scale DCT components considered for processing the illumination variations.
The fuzzy set ℱ � can be modeled as a set represented by modified S MF. Let us first understand the set represented by simple S MF [25]. This comprises a parameter, named fuzzifier (m), which can be tuned as per the requirement of the problem and thus provides more flexibility and generalization capability for classification. As shown in Fig.  4(a), the membership variation of this function is of S-shape. By varying the value of fuzzifier m, the steepness of MF can be controlled. This function is shown in Fig 4(a) and can be expressed, mathematically as: where c is the crossover point. α and β represent the minimum and maximum value of a given data set for a particular observation. This MF provides 0.5 membership grade at crossover point c as shown in Fig. 4(a). The value of m is selected as 2 in the present investigation.
The modified S MF is shown in Fig. 4(b) which represents the shifted origin version of the simple S MF. The membership grades assigned by this modified S MF starts with a positive value d as compared to zero in case of simple S MF. This modification is taken to suppress the initial large scale DCT components, that is to keep positive membership grade greater than d to the initial large scale DCT components in place of completely discarding (zeros membership grade) them, if these were processed by simple S MF. This has been done in order to extract facial feature related to initial large scale DCT components with membership grade positive (greater than d), while suppressing the effect of illumination variations in these by reducing the membership grade (less than one).  The modified S MF can be expressed as: where d, c and β are positive variables. The value d is controlled by taking negative value of α in above equation. More negative value of α generates higher value of d and vice versa. The maximum value of d is limited to 0.5 corresponding to z is equal to c, where c is the cross over point and it is expressed as c = (α + β)/2. The membership grades generated for a given set of large scale DCT components is arranged in vector form as: where gi is the membership grade for i th large scale DCT component. g1 represents the membership grade for first large scale DCT component (first AC DCT coefficient), which is d as shown in Fig. 4(b). Also gβ is one.
If ℋ represents the set of large scale DCT components after processing the set of original large scale DCT components (given by Eq. 11), ℋ can be expressed as: where .× represents the element by element multiplication of its argument sets.
After replacing by ℋ in the set of all DCT components, illumination normalized set is obtained, on which unzigzag operation followed by inverse DCT is applied to obtain the illumination normalized face image. The outcome of the above explained processing, for different value of α and β are shown in Fig. 3(d) to 3(g) along with correlation coefficient values.

Classification of Illumination Normalized Images using KELM
After illumination normalization of all the database images, KELM is used for classification of these processed images. The complex variations of pose, expression and illumination in any individual face image convert the process of face recognition into a highly nonlinear varying system and nonconvex in terms of convergence. Linear methods such as PCA, LDA map the high dimensional image space linearly to a low dimensional feature subspace. As a result, linear classification methods are unable to preserve the nonconvex variations of face manifolds useful to discriminate among different categories. This limits the capabilities of the linear methods to obtain better recognition performance. For high dimensional data such as face images, the iterative artificial neural network classifiers (BP algorithm based) are not appropriate due to their higher training time. KELM is a non-iterative classifier which learns the high dimensional training data very fast. It maps image space using non-linear activation function (kernel function) into a feature space (hidden layer) that helps in classification of non-linear varying input data. The kernel function used in the proposed investigation is polynomial kernel with kernel parameters 1 and 2 [20]. The value for regularization coefficient C used in Equation (6) is 1.

Results and Discussion
Thorough experimental analyses have been conducted on Yale and extended Yale B face databases. The Yale face database comprises of 15 subjects and each subject consists of 11 gray scale images [26]. The original resolution of images is 320 × 243 pixels. To include only internal details of face image, these are manually cropped to size of 220 × 175 pixels, followed by sub-sampling by 1.6 to have a size of 138 × 110 pixels. This database comprises of different facial expressions viz. happy, sad, surprised, sleepy, wink and normal; variations in illuminations; occlusion variations of with/without glasses as well as misalignment.
The performance metric used in the present investigation is "percentage error rate". This is given as percentage value of number of mismatch with respect to the total size of test dataset. That is, if 5 images are misclassified out of 200 test images, then the percentage error rate will be 2.5%.
For analysis on Yale face database, variations in percentage error rate (number of misclassifications with respect to total size of test dataset) have been analyzed for different number of training images per subject (NTIPS) used for training. When only one image per person has been used for training, the remaining images will be used for testing, results in mutual exclusive training and test datasets of 15 and 150 images respectively. Similarly, for 2, 3, and up to 8 images per subject for training purpose have been analyzed. Selection of the images is sequential for forming the training and test datasets. For all these eight combinations, the variations in percentage error rate have been computed for different values of α and β also. For 3 and 4 NTIPS, the percentage error rates are listed in Table   1(a) and 1(b) respectively with respect to some values of α  Table 2. This table also comprises the percentage error rate using other state of art methods on Yale face database. It is found that the proposed method of illumination normalization provides significantly better result on this database.
The second database used for performance evaluation is extended Yale B face database [15]. This comprises 38 persons face images under different illumination variations. These images are categorized in 5 subsets on the basis of the illumination variations for each subject. With 0°-12° illumination variation images lies in the subset 1 and it contains 7 images per person. Similarly, with 12°-25°, 26°-50°, 51°-77° and above 77° illumination variations, face images lies in subset 2, 3, 4 and 5 respectively. The number of images per person in subset 2, 3, 4 and 5 is 12, 12, 14 and 19 respectively and in this way, total number of images in these subsets is 266, 456, 456, 532 and 722 in subset 1, 2, 3, 4 and 5 respectively. The original size of these images is 640 × 480 pixels. To include facial details (forehead with other features of face only), these images are cropped to 192 × 168 and further reduced to resolution of 120 × 104 pixel. The images of subset 1 are with very less illumination variations, these are taken for training of KELM after illumination normalization using proposed method. The images of remaining subsets viz. subset 2, 3, 4 and 5 are taken for finding the performance of proposed method under testing.    Table 3(a) and 3(b) respectively. The error rate decreases as α and β are increased and for larger values of these parameters, the error rate is increasing. The minimum value of percentage error rate is 0.38 and 0.83 on subset 4 and 5 respectively. As evident from Table 3, from many combination of α and β, -160 and 465 for subset 4 and -220 and 666 are chosen for α and β respectively.
The performance comparison of the proposed method with existing state-of-art approaches of illumination normalization on extended Yale B face database has been given in Table 4. It is clearly evident that the results achieved using proposed method is significantly better. As KELM is a non-iterative learning algorithm of ANN, the proposed method is faster than the other existing contemporary methods of illumination normalization in terms of computation time.  M) where M is the number of pixels in the input face image [27]. The fixed number of large scale DCT coefficients is processed by modified S MF. The elements of modified S MF can be calculated in advance using equation (13) for given value of α and β. The processing is performed in vector operation having time complexity of O(β). Thus the illumination normalization component time complexity is same as that of DCT. KELM has been used for classification purpose which is a non-iterative algorithm. A kernel function is utilized for mapping the input into the feature space (hidden layer) [28,30]. The mapping operation depends upon the inner product of input vector which has size of M elements. With the help of generalized pseudo inverse method, feature space of KELM which is the hidden layer, is classified in a particular class. The number of operations performed in the generalized pseudo inverse method is O(M 2 ) by considering the number of classes C << M. This mapping is done in noniterative manner; hence, KELM performs extremely fast [20].

Conclusion
A novel illumination normalization method has been drawn from the concept of fuzzy logic theory and non-iterative non-linear classification. The proposed method can be used in wide domain of human face images based biometric and surveillance applications in which variations in light incident upon the person face is no fixed. The proposed modified S MF based modification has been applied on some large scale DCT components for robust illumination normalization of all the databases images. These illumination normalized images are classified using noniterative non-linear classifier: KELM. The percentage error rates have been computed on bench-mark face databases: Yale and extended Yale B face databases. Very promising results on Yale face database; accurate recognition on subset 3, 0.38 and 0.83 percent error rate on subset 4 and 5, respectively, of extended Yale B face database have been achieved. Comparison of the percentage error rate with that of existing state-of-art methods of illumination normalization on these databases demonstrates the worth of the proposed method. Also the proposed method is faster in terms of computation time as KELM is non-iterative learning algorithm of ANN.
In the future work, the compensation of illumination variations under other variations such as pose will be explored. The variations of KELM parameters on the performance of face recognition system under illumination variations may also be explored. It will be worth to examine the rough set concepts in the integration of fuzzy set for illumination normalization and compensation.