Tea category classification via 5-layer customized convolutional neural network

INTRODUCTION: Green tea, oolong, and black tea are the three most popular teas in the world. If classified tea by manual, it will not only take a lot of time, but also be affected by other factors, such as smell, vision, emotion, etc. OBJECTIVES: Other methods of tea category classification have the shortcomings of low classification accuracy, weak robustness. To solve the above problems, we proposed a method of deep learning. METHODS: This paper proposed a 5-layer customized convolutional neural network for 3 tea categories classification. RESULTS: The experimental results show that the method has fast speed and high accuracy of tea classification, which is 97.96%. CONCLUSION: Compared with state-of-the-art methods, our method has better performance than six state-of-the-art methods.


Introduction
Tea can promote digestion, inhibit arteriosclerosis, reduce the incidence of cardiovascular and cerebrovascular diseases [1]. Therefore, more and more people have begun to insist on drinking tea every day. According to the different kinds of tea plants and the different tea production ways, tea can be divided into six categories: green tea, black tea, oolong, white tea, black tea and yellow tea. Among them, green tea [2], oolong [3], and black tea [4] are the three most popular teas in the world . If we classified tea by manual, it will not only take a lot of time, but also be affected by other factors, such as smell, vision, emotion, etc.
These factors increase the uncertainty of the classification results.  [9] proposed to use fuzzy support vector machine (FSVM) to classify tea leaves. Ren, et al. [10] used near-infrared (NIR) technology to assess black tea tenderness and rankings. They proposed a multi-variable selection strategy based on the variable space optimization from big to small.

EAI Endorsed Transactions on e-Learning
The experiment results show that the model with a radial basis function achieves the best predictive results with the correct discriminant rate of 95.28% in this paper. Wu [11] presented a weighted k-nearest neighbors (WKNN) algorithm for tea category identification via a 3-CCD optical camera. Cattani and Rao [12] used Jaya algorithm to identify tea categories. Liu [13] used generalized eigenvalue proximal support vector machine (GEPSVM) for tea category identification. Chen and Chen [14] employed graylevel occurrence matrix (GLCM) to identify five categories of tea leaves.
Through the analysis of the methods in the above literature to know, these methods have some shortcomings, such as complex network structure, weakness robustness, redundant operation of feature extraction. To solve the first problem, we proposed a model with a sample network structure. And to achieve fast and accurate tea classification, this paper proposed a 5-layer customized convolution neural network for tea classification of green tea, oolong tea and black tea. The experiment results of 10-fold cross validation show that our model has better robustness and better performance than state-of-the-art methods.
The structure of this paper is as follows: Section 2, introduced our dataset of experiments, and we described the characteristics of each tea category. Section 3, introduced the structure of our proposed model and the functions of each network component. It also explained the measures used in the experiments. Section 4, showed the results of experiments, and compared our method with state-of-the-art methods. Section 5, concluded this paper and pointed out the direction of our future study.

Dataset
The dataset of this paper consists of three-tea categories: Green tea, Oolong, and black tea. Each category has 300 images. All categories have 900 images in all. As shown in Figure 1, the shape of green tea is long and thin, the color is green mixed with yellow, and the strips are slightly curved.
Oolong is a compact round shape, which is green mixed with black. And for black tea, the leaves are thicker and the color is dark brown.

Methodology
In this section, we will introduce the method of our study.
We used the convolutional neural network to classify tea categories. To easily understand this paper. Table 15 shows all variables used in our study. Table 16 gives the abbreviation and their full names. Table 15 and Table 16 are in the appendix at the end of the paper.

Convolution
Convolution is the result of two variables multiplied together in a range [15]. In our method, we used convolution operation to extract the features of input information [16].
The information in receptive field of convolution kernel calculated by convolution operators to obtain the extracted features. Through the receptive field of convolution kernel scanning the information matrix continuously and performs the convolution operation constantly [17][18][19]. All the extracted features are obtained. Figure 2 shows a convolution operation of the convolution kernel performing the feature extraction.
Convolution kernel Here, represents the output size of convolution operation [20]. represents input information size of convolution operation. represents the size of padding.
represents the size of convolution kernel. And  As shown in Figure 3, "Input" represents the input information of convolutional neural network (CNN).

Convolutional operation in Convolutional Neural Network
"Kernel" represents the convolution kernel. "Conv-in-Run" represents the process of convolutional operation [21].
"Conv" represents the convolutional layer. "Stack" represents consecutive convolutional layers. "Output" represents the output feature of the model. Here, the input information was scanned by convolution kernel [22]. The convolution kernel operator performs convolutional operation. The specific process is: each convolution kernel operator multiplies the corresponding input information to get the output. And the output of a convolution kernel is the sum of the outputs of all convolution kernel operators. CNN uses this network structure to quickly capture the features of input information.
Here, represents the output of a neuron without activation function. represents the number of neurons in the previous layer connected to it. represents the index of neurons it starts from 1. represents the weight ofth neuron. represents the input of -th neuron.
represents the bias of -th neuron.
represents the output of a neuron with activation function. And σ represents the activation function [23].

Pooling
The function of pooling is to reduce the feature dimensions, prevent overfitting and enhance the robustness of the model [24]. There are two common pooling -max-pooling and average pooling [25]. The difference between the two methods is that the output value of max-pooling is the maximum number of the receptive field of pooling kernel Tea category classification via 5-layer customized convolutional neural network [26], and the output value of average pooling is the average number of all the numbers in the receptive field of pooling kernel. Figure 4 shows the two common methods of pooling.

Max-pooling
Average pooling Here, represents the output size of pooling operation.
represents the input size of pooling operation.
represents the size of pooling kernel. represents the stride size of pooling kernel. The pooling layer can better implement dimensionality reduction operations and reduce overfitting.

Batch Normalization
Batch normalization (BN) works to speed up model training [27], improve the generalization ability of the network and improve model accuracy by disordering the sequence of training data. The calculation process of BN is as follows [28]. First, calculate the mean of mini-batch data. Second, calculate the variance of mini-batch data. Third, normalize the mini-batch data. Finally, output the data that perform the scale and shift on the data obtained by the above operations [29]. Formula (5) to (8) show the output calculated methods of BN.
Here, represents the th data in the th mini-batch.
represents the index of data in the th mini-batch.
represents the number of th mini-batch data.
represents the mean value of th mini-batch data. 2 represents the variance of th mini-batch data. is a constant, which aims to avoid invalid calculation when the 2 is 0. � represents the output of normalization. and represent the parameters to be learned in BN. , ( ) represents the output of BN when the input data is . Rectified Linear Unit (ReLU) is a common activation function in convolutional neural network. Which works to alleviate the gradient disappearance and improve the training speed of the model [30]. The ReLU curve is shown in Figure 5. When x > 0 , the value of y is equal to the value of x. When x ≤ 0, the value of y is equal to 0. And the output calculate method of ReLU is followed by (9).

Rectified Linear Unit
Here, represents the input of ReLU. represents the output of ReLU. represents the weight transpose matrix of neural. And represents the bias of a neural [31].
After ReLU calculated, the features with positive input can be output normally, while the features with negative input will be filtered out [32]. Therefore, ReLU can effectively improve the learning speed of model parameters in back propagation [33].

Structure of customized CNN
The structure of our model is made up of three convolutional layers, and two fully connected layers. Figure   6 shows the structure more vividly. After each convolutional layer, BN and ReLu are followed. Table 1 shows the value settings of hyperparameters in our model.

Measures
In order to better present the detection performance of our model, the confusion matrix is introduced as shown in Table   2. Here, TP represents the actual class is positive and the predicted class is positive too. FP represents the actual class is negative but the predicted class is positive [34]. FN represents the actual class is positive but the predicted class is negative. And the TN represents the actual class is negative and the predicted class is negative too. Besides, we defined five confusion matrix metrics: Sensitivity, Specificity, Precision, F1 and Micro-averaged F1 (MF1).
The respective calculation methods are as follows.

Cross Validation
To better test the robustness of our model, the 10-fold cross validation is also introduced. We divide the dataset into 10 equal subsets and number them, select the number 1 as the test set, and the other subsets as the training set [35].  And we performed 10 runs of 10-fold cross validation in the following experiments. An ideal 10-fold cross validation confusion matrix and an ideal 10 runs of 10-fold cross validation confusion matrix are as shown in Table 3. Table 3 Two ideal confusion matrix.

Confusion Matrix
Actual Class

Statistical Analysis
In this part, we showed the confusion matrix of our method and the five metrics of confusion matrix. All data as shown in Table 4 and Table 5. Tea category classification via 5-layer customized convolutional neural network  Table 5 shows the F1 results of green tea and oolong are beyond micro-averaged F1, and even though the black tea has achieved good results in specificity and precision, its overall performance is not as good as other teas. We believe that the reason for this result is green tea and oolong have more obvious characteristic information than black tea.
Because the black tea is relatively large and the color is darker than others, it is difficult for the model to detect the characteristics of black tea.

Optimal number of Convolutional layers
In order to prove the rationality of our proposed model structure, we conducted an ablation study. We set the number of convolutional layers to 2 and 4, and observe what will happen to the results. Here, Table 5 and Table 6 show the confusion matrix results of our model has two convolutional layers. Table 7 and Table 8 show the confusion matrix results of our model has four convolutional layers.  Table 7 with Table 5, all results of metrics in Table   7 have shown a clear downward trend. We believe the reason for this is that three convolutional layers structure have more powerful detection than two convolutional layers structure. Therefore, the performance of each metric in Table 5 is better than Table 7. performance of each metric in Table 9 is better than Table 7.
Here, due to the overfitting of the four convolutional layers structure, the performance of which is not beyond the three convolutional layers structure. The impact of overfitting on model detection performance is more serious than the limitation of the two convolutional layer network structure on model performance. Concluded, under the condition of keeping the same values of other parameters in the model, has three convolutional layers structure get the best detection performance for tea category. This also proves the rationality of the model structure of our proposed. Figure 8 shows this conclusion more vividly than other tables. Here, Sen represents sensitivity. Spe represents specificity. Pre represents precision.

Optimal of fully-connected layers
In this part, we compared with the results of different number of fully-connected layers. Table 10 and Table 11 show the data of 1 fully-connected layer. Table 12 and Table   13 show the data of 3 fully connected layers.  Here, we show the results of the model that has three convolutional layers and 1 fully-connected layer. Compare Table 11 with Table 5, the MF1 of Table 11 is 97.37%, which is 0.59% lower than Tea category classification via 5-layer customized convolutional neural network Here, we show the results of the model that has three convolutional layers and 3 fully-connected layers. Compare Table 13 with Table 5, the MF1 of Table 13 is 96.83%, which is 1.13% lower than Table 5. And it is also 0.54% lower than Table 11. It proves the performance of the model with 3 fully-connected layers is the worst. We believe that the cause of this result is overfitting. The model with 3 fullyconnected layers has overfitting, which leads to a decrease in performance. From the above content, it can be seen that in the feature extraction part and the classification part, overfitting could cause a greater negative impact on model performance. Figure 9 shows this conclusion more vividly than other tables. Here, Sen represents sensitivity. Spe represents specificity. Pre represents precision.

Comparison with State-of-the-art Approaches
All the algorithms were run via 10 runs of 10-fold cross validation on our dataset. The MF1 results as shown is Table   14. Here, we compared our method with six state-of-the-art methods.  As shown in Table 14, the MF1 of our method is 97.96%.
The Jaya methods obtained the best performance in stateof-the-art methods, which is 97.30%. As an obvious contrast, MF1 of our method is 0.66% higher than that of Jaya. Figure   10 shows this point more vividly. The brown histogram is Jaya method and the green histogram is our method. The distance between the brown dotted line and the green dotted line reflects the superiority of our algorithm. Therefore, Table 14 proves the effectiveness of our proposed method.

Conclusion
In this paper, we proposed the method of using a 5-layer customized convolutional neural network to perform tea category classification. The results of the experiment show our method obtained better performance than the state-ofthe-art methods on MF1.
In future studies, we will continue working on computer vision and designing convolutional neural networks for solving more complex classification problems.
Meanwhile, we also try to use swarm intelligence optimization algorithms to optimize the values of hyperparameters in the network. Some advanced CNN techniques [38], such as advanced pooling [39] and attention neural network [40], will be tested.  Index of neurons it starts from 1.
, ( ) Output of BN when the input data is .
Bias of a neural.
Bias of -th neuron.
Index of categories, it starts from 1.
Size of pooling kernel.

Number of categories.
Size of convolution kernel. 1 Convolution operator. 9 Convolution operator.
Input of -th neuron. 1 Obtained features by convolution operation. 16 Obtained features by convolution operation.
Input information size of convolution operation.
Input size of pooling operation.
Index of data in the th mini-batch.
Mean value of th mini-batch data.
Number of neurons in the previous layer connected to it.
Number of th mini-batch data.
Output size of convolution operation.
Output of a neuron without activation function.
Output size of pooling operation.
Output of a neuron with activation function.
Size of padding.
th data in the th mini-batch.
� Output of normalization.
Output of ReLU.
Stride size of convolution kernel.
Stride size of pooling kernel.

2
Variance of th mini-batch data.
Weight of -th neuron.
Weight transpose matrix of neural.

Input of ReLU.
A parameter to be learned in BN.
A parameter to be learned in BN.
A constant, which aims to avoid invalid calculation when the 2 is 0.