An Efficient Pest Classification In Smart Agriculture Using Transfer Learning

To this day, agriculture still remains very important and plays considerable role to support our daily life and economy in most countries. It is the source of not only food supply, but also providing raw materials for other industries, e.g. plastic, fuel. Currently, farmers are facing the challenge to produce sufficient crops for expanding human population and growing in economy, while maintaining the quality of agriculture products. Pest invasions, however, are a big threat to the growth crops which cause the crop loss and economic consequences. If they are left untreated even in a small area, they can quickly spread out other healthy area or nearby countries. A pest control is therefore crucial to reduce the crop loss. In this paper, we introduce an efficient method basing on deep learning approach to classify pests from images captured from the crops. The proposed method is implemented on various EfficientNet and shown to achieve a considerably high accuracy in a complex dataset, but only a few iterations are required in the training process. Received on 04 December 2020; accepted on 19 January 2021; published on 26 January 2021


Introduction
Human civilisation has changed dramatically. Agriculture, however, still remains very important for every country. Indeed, many raw materials like cotton, sugar, wood, or palm oil, come from agriculture. These materials are essential to major industries such as the manufacturing of pharmaceuticals, diesel fuel, plastic, and more. The most important aspect of agriculture is that it is the source of the world's food supply.
The United Nation predicted the human population will reach 10 billion by the year 2050. Therefore, the amount of food, we produce now, need to be doubled to accommodate this change. Not only increasing the quantity, but also maintaining the quality is a big challenge for farmers in the next generations.
One of the main sources causing damage to growing crops is pest invasions. They can eat leaves and burrow holes in stems, fruit or roots. Moreover, they can transmit bacterial, viral or fungal infection to a crop. Effective insect control for farms and agricultural premises is essential. Regular monitoring of agricultural lands can detect early invasion and using chemical spray like insecticides at larvae stage can indirectly reduce amount of insecticides needed. It is essential to control its outbreak. To do so, farmers need to scout their maize crop daily to be able to detect it early. Walking to the large fields and manually identify pests may take many hours.
Over the centuries, technologies have been applying to gain better yields. Intelligent robots and drones will revolutionise the efficiency of farms and change the agricultural landscape. These technologies can significantly reduce the amount of time needed for farmers and release them from manually monitoring their crops.
One of the main tasks for these technologies is to identify the presence of pests in the crops. Compared with the traditional machine learning methods, the diseases and pest classification method based on deep learning can directly input the original image. Instead of the tedious steps such as image pre-processing, feature extraction and feature classification, the endto-end structure is adopted to simplify the recognition process and solve the problem that the feature extractor designed manually is difficult to obtain the feature expression closest to the natural attribute of the object. The application of deep learning object detection can not only save time and effort, but also achieve realtime judgment to greatly reduce the huge loss caused by pests, which has important research value and significance.
However, pest classification on field crops, such as rice, soybean and other crops, is more challenging than generic object classification due to the inherent complexity of the background image. Throughout different stages of development of the insect (nymph or larva and adult), the morphological characteristics of insect vary greatly. To this end, we propose an efficient method to detect pests from the images captured on the fields. The proposed method can achieve a high accuracy in pest detection but with less amount of training data and time.
The rest of this paper is organised as follows: Section 2 reviews the related works of pest classifications using conventional machine learning methods and deep learning approaches. Then, Section 3 describes the proposed method and is followed by Experiments in Section 4. Section 5 finally concludes this paper summarising its main contributions with proposal of future works.

Object Classification
Object classification is one of the most popular and emergent research topic in computer vision. Its task is to classify objects from different object categories. It can be useful for object detection and tracking in images and videos. In general, two approaches for object classification are Conventional Machine Learning (ML) and Deep Learning (DL) as visualising at Figure 1.
The conventional approach is to use well-established Computer Vision techniques to extract features such as Scale Invariant Feature Transform (SIFT) [1], Speeded Up Robust Features (SURF) [2], BRIEF [3], etc. These features are fed into a conventional ML method, e.g. Support Vector Machine (SVM) [4], Decision Tree (DT), Random Forest (RF), Logistic Regression (LR) etc. The decision of these methods will be a specific class. Choosing suitable features is a crucial task for these conventional approaches. It is, however, time consuming to manually extract and do multiple experiments to select appropriate features.
In contrast, Deep learning, also known as artificial neural network (ANN), consists of input layers, hidden layers and output layers. DL uses input layers to take an image dataset which each image is labelled as a specific object class. Thereby, a DL model is trained on that data and it automatically discovers underlying patterns and generates feature maps for each specific class. This approach eliminates the need of handcrafted features.
However, these architectures are computational expensive. EfficientNet [10], a real time object classification, was published to address this issue. In general, authors of EfficientNet observed that in order to achieve better accuracy, the CNN is scaled up by increasing the number of layers, or making each layer wider, or inputting images at a higher resolution, or combining these factors. Figure 2 visualises these cases. However, the exploration of all these possibilities can be a tedious task and time consuming.
Neural Architecture Search (NAS) was utilised to develop a baseline architecture called EfficientNet-B0 described in Figure 3. Using the scaling search, EfficientNet-B0 is scaled up to EfficientNet-B1, and throught EfficientNet-B7. The scaling function from one architecture to another is saved and applied to subsequent scalings because additional search becomes excessively expensive. EfficientNet performance has been compared with other popular object classifications on ImageNet, described on Figure 4. They achieve both higher accuracy and better efficiency over existing DL. Another important aspect is that they significantly reduce the number of parameters of the DL network.

Object Classification in Agriculture
Insects cause major damages to growing crops. For example, many insects can eat leaves or burrows in stems, fruit, or roots, while others can transmit bacterial, fungal infection into crops. When a small area is affected, it can easily spread out in a larger area. At this stage, it is too late and difficult for treatment and cost more. Therefore, early detection of insects is critical for application of different techniques in proper timing to control its outbreak.Proper timing is one of the best ways to control the pests. To do so, farmers need to scout their maize crop daily to be able to detect it early. Therefore, crop inspection is a vital part of farming. Walking to the large fields and manually identify pests may take many hours.
Several works have been developed to quickly identify insects from images with a certain accuracy due to various difficulties: • Images captured from crops can contain a complex background.   • Insects can have similar appearance as other objects in crops.
• Insects can be covered by leaves.
• There are different forms of the same insect species.
Xie et al. [11] developed multiple task sparse representation of insect objects. It combined a sparse-coding 3 EAI Endorsed Transactions on Industrial Networks and Intelligent Systems 01 2021 -04 2021 | Volume 8 | Issue 26 | e1 technique and a multiple kernel learning technique as described in Figure 5. However, as mentioned above, this approach requires object segmentation and image denoising which are time consuming.
Ding and Taylor [12] proposed a DL based method to identify and counting codling moths in images captured inside field traps. The DL architecture was adopted from Lenet5 [5] to classify each local patch image if it contains moth or not. This architecture worked fine in a controlled setting and requiring preprocessing images.
The next section discusses our method to classify multiple types of insects from outdoor images, i.e. images contain more complex background and insects are various in appearances, sizes, angles and positions.

Proposed Method
Conventional machine learning methods have been designed to solely tackle problems. However, in deep learning, transfer learning [13] is an approach where a model was first trained on one dataset. It is similar to how human can utilise their knowledge across domain. In transfer learning, several layers of the trained model are used in a new model. The new model will be trained on another dataset. The benefit of transfer learning is to reduce the training time for a DL. The weights from pre-train layers are used as the initial weights for the training process and will be adapted during the new training. Figure 6 shows how the proposed DL architecture built from pre-train EfficientNet. The transfer learning with fine tuning is described by following steps:

Dataset
The insect dataset in this work contains 4, 449 images and is downloaded from Kaggle. It contains 5 categories: Laybird, Mosquito, Grasshopper, Butterfly and Dragonfly. Table 1 shows the total number of images for each category and Figure 7 shows some examples of insect images. It is a challenge dataset because images are various sizes, contain complex backgrounds and insects with different appearances, positions and poses.

Data preparation & augmentation
Each category in the dataset is splitted into 90% training data and 10% testing data. 20% of testing data is used for the validation step during the training process. The model will be trained on the training dataset and fine tuned by the validation dataset.
Next, data preparation involves using techniques such as the normalisation to rescale input data prior to training a DL model. Various range of data input may increase the error gradient values and make the learning process slow or unstable. Therefore, it is a good practice to have a normalised input data. In our work, before normalising all training data, each image is resized to 224 x 224. This dimension was used in popular DL network like AlexNet, VGG16 etc.
Enormous data are needed to improve a DL model. Collecting data is, however, difficult and time consuming. Therefore, data augmentation is a technique to artificially generate new training data from existing training data. Another benefit of data augmentation is to eliminate overfitting problem of the model. In this work, we randomly crop images with the size 224 x 224 from the original images. The generated images can be randomly rotated by 10 degree. They can be flipped around the horizontal axis.

Experiment Setup
Our proposed DL network is implemented by using pytorch. Its hyper parameters for training DL model are set with the batch size 8 and total epochs 10. The learning rate is set 0.000744 and scheduled to decrease 20% for each epoch during training to allow more fine-grained weight updates. Section 4.5.2.1 shows how to select the starting learning rate and total epochs selected.
It is, then, trained with CUDA 10.2 and cuDNN as the back-ends on GPU Tesla P100. During the training, we implement early stop technique to avoid overfitting. The training will stop if the accuracy is not improved after 3 times.

Evaluation metrics
The most common method to evaluate the performance of object classification is to calculate Precision (P) at Eq.1, Recall (R) at Eq.2, Accuracy (A) at Eq.3, F-measure (F) at Eq.4. To calculate these factors, a confusion matrix is needed as Figure 8 to calculate True positive (TP), False Negative (FN), False Positive (FP) and True Negative (TN) for each category.

Result.
We implemented 8 various models of EfficientNet to evaluate their performance. Table 2 summaries the performance for each model. The numbers in bold are the highest values.  Effects of Learning Rate In our implementation, we used Adam optimiser as a stochastic gradient descent method. The learning rate is needed for this method. When the learning rate is low, the training is reliable. However, the optimisation takes more time. Whereas if the learning rate is high, the training may not converge or diverge. We adopted the method explained in [14] to estimate the optimum learning rate which can be used as the starting point. Figure 9 shows the recommended learning rate is 0.000744.

4.5.2.2.Effects of number epochs
In deep learning, one epoch is one cycle when the whole training dataset is used. One epoch is divided into several small batches because the vast amount of the data. Training a deep learning model needs more than one epoch to pass the whole dataset several times to the same deep learning model. There is no absolute solution to identify which are the right numbers of epoch. Therefore, we implemented the early stop during the training process if there is no improvement of the loss. We observed that the maximum epochs in our proposed methods are 10 as displayed in Figure 10. The figure shows that after there is no significant improvement after 10 epochs.

Conclusions
Pest classification is an important aspect in a smart agriculture. In this paper, we have proposed an efficient method basing on EfficientNet to classify pests from images with complex background and various pests in different forms. Specifically, we have evaluated 8 architectures of EfficientNet. All models were trained within 10 epochs but they achieved a significantly high accuracy, greater than 95%, for all criteria, Precision, Recall, Accuracy and F1-measure. In our experiment, EfficientNet-b3 is shown to achieve the best performance. In future, this proposed method can be experimented on embedded systems for autonomous machines like drones or robots used in agriculture.