Machine Learning in Computer Vision: A Review

INTRODUCTION: Due to the advancement in the field of Artificial Intelligence (AI), the ability to tackle entire problems of machine intelligence. Nowadays, Machine learning (ML) is becoming a hot topic due to the direct training of machines with less interaction with a human. The scenario of manual feeding of the machine is changed in the modern era, it will learn automatically. Supervised and unsupervised ML techniques are used as a distinct purpose like feature extraction, pattern recognition, object detection, and classification. OBJECTIVES: In Computer Vision (CV), ML performs a significant role to extract crucial information from images. CV successfully contributes to multiple domains, surveillance system, optical character recognition, robotics, suspect detection, and many more. The direction of CV research is going toward healthcare realm, medical imaging (MI) is the emerging technology, play a vital role to enhance image quality and recognized critical features of binary medical image, covert original image into grayscale and set the threshold values for segmentation. CONTRIBUTION: This paper will address the importance of machine learning, state-of-the-art, and how ML is utilized in computer vision and image processing. This survey will provide details about the type of tools and applications, datasets, and techniques. Limitations of previous work and challenges of future work also discussed. Further, we identify and discuss a set of open issues yet to be addressed, for efficiently applying of ML in Computer vision and image process. METHODS, RESULTS, AND CONCLUSION: In this review paper, we have discussed the techniques and various types of supervised and unsupervised algorithms of ML, general overview of image processing and the results based on the impact; neural network enabled models, limitations, tools and application of CV, moreover, highlight the critical open research areas of ML in CV.


Introduction
Nowadays, artificial intelligence is an emerging field throughout the world, a vast area with diverse definitions, few researchers elaborate AI as "a system that thinks like humans" [1] while others indicate "a system that acts like humans" [2]. There is no proper formal rationale of AI; it may vary from a distinct environment [3]. It plays a significant role in many aspects of computing technology, but there is a slight difference between weak and strong AI. According to Keng Siau et.al, machine intelligence can measure two main aspects of AI, "Weak only focus on the narrow task and strong targets more than one area" [4]. Several applications running live in various environments and Google are the popular example of an artificial intelligence crawler search (content-based search) mechanism. In gaming technology, an artificial program that defeats the world champion chess player (Garry Kasparov) in a row of the six-game matches (Deep Blue Machine) [5]. Similarly, machine speaks and understand what human writes, retrieve crucial information by perceiving things through the camera, control highly reactive radio frequency, medical assisted (expert system: MYCIN), self-driving car, and humanoids.
Whereas machine learning is the subfield of AI [6]. The machine can learn automatically (train) from a given dataset and it will be able to take a proper decision (test) by itself [7]. In ML, the dataset is the main key checkpoint contains millions of unique records [8]. Data categories into two main formats, either it may qualitative or quantitative. Qualitative data collected through interviews, observation, and written documents [9], while quantitative have numerical records which can be measured for statically analysis [10]. Qualitative means machine can recognize the shape of different objects, color, size, weight, models, etc.
ML can classify into three most important learning techniques: (i) supervised (SL), (ii) unsupervised (USL), and (iii) semi-supervised learning (SSL). SL, function to labelled training data, used to prediction. Classification and regression come under supervised learning; popular algorithms are Support Vector Machine (SVM), KNN, logistic regression, Decision tree, and random forest [11]. On another end, USL has unlabelled training data, used to extract features from the input data [12].
Clustering means a group of similar data or records and it is the part of unsupervised. If we talk about the clustering algorithm then only one thing popup in our mind, which is K-mean, but there are varied more like hierarchical clustering and DBSCAN, etc. At last, SSL is the collaboration of both SL and USL, it has a small amount of labelled and large amount of unlabelled data, mainly used in information retrieval, image processing, and bio-information [12]. In this process, first, we apply to cluster on an unlabelled dataset for group same data after labelling those data with the help of classification. Preferred algorithms are continuity, cluster, and manifold assumptions.
At last, SSL is the collaboration of both SL and USL, it has a small amount of labelled and large amount of unlabelled data, mainly used in information retrieval, image processing, and bio-information [13]. In this process, first, we apply to cluster on an unlabelled dataset for group same data after labelling those data with the help of classification. Preferred algorithms are continuity, cluster, and manifold assumptions.
The journey of computer vision (CV) started in 1960, where father of computer vision Larry Roberts proposed 3D geometrical information extraction from 2D perspective polyhedral concepts in his Ph.D. thesis at MIT [13]. At the initial stage, CV illustrates, extraction of crucial information through digital images using computational models. Elaborating dual goals, vision is used as an autonomous system for the engineering point of views just like a human can perform a visual task, while computational models applied in the human biological system for detecting symptoms of diseases in the body [14].
From an engineering perspective, CV has successfully applied to an educational environment where attendance monitoring systems used to get students' attendance automatically by face detection and recognition through camera [15]. Unmanned Arial Vehicle (UAV) is a visionbased agent using surveillance of unwanted situation without a human pilot, Kanellakis et.al wrote a survey article based on current development and latest application on UAV systems [16]. In a biological context, medical imaging (MI) is an emerging field, and the number of research is going to increase day-by-day for the betterment of society.
The purpose of MI is to extract essential information from the medical image, but at a certain stage lack of quality of images can create another challenging aspect for all researchers. Kesner et.al highlights the concepts of digitizing and application of MI data in the innovation age [17]. MI is the costly process, automatic extraction of the labelled image with NLP from the radiology report for maintaining expert annotation [18]. In a recent report of McKinsey and Company, New York USA, author Alan Alexander et.al set the future direction of MI in 2021, where technology cluster growth rate, cutting edge practice, and the implication is the key aspect of future blockchain innovations [19]. At the same time, technology has little limitations in distinct areas which are discussed below: • In an agricultural viewpoint, farming through robots using CV and ML techniques for increasing productivity with quality. Encounter countless agricultural constraints: object color, size, texture, and reflectance properties [20]. • Several applications and research available on handwritten and documented script text recognition but detecting real-time image e.g., street view image and recognize each character properly of the image is still the rising problem for machine learning [21]. • A smaller number of image dataset is one of the limitations in ML where training and testing can only be performed accurately when providing a huge amount of labelled or unlabelled data. This paper will address the importance of machine learning and deep learning, state of art, and how ML is utilized in computer vision and image processing. This survey will provide details about the type of tools and applications, datasets, and techniques. Limitations of previous work and challenges of future work also discussed. Further, we identify and discuss a set of open issues yet to be addressed, for efficiently applying of ML in Computer vision and image process.
The paper organized in 8 sections and section 2 is based on Machine learning techniques for image processing. In section 3 we provide details of Neural network models, section 4 is based on the tools and applications and section 5 provides information about datasets for computer vision and image processing. Similarly, section 6 based on the challenges and limitations of machine learning research and section 7 provides open research issues for future work. Finally, in section 8 we conclude this review paper.

Machine Learning Techniques for Image Processing
Image processing (IP) means process image through a digital computer. Utilization of algorithms for enhanced quality of an image, on the bases of this system, can extract crucial information from the image [22,23]. There are two main types namely: analogous and digital image processing (DIP) [24,25]. In IP, get input as an image and after process system retrieves output in the form of an image. Mentioning the three important key factors of IP which are discussed below: • Start with the image acquisition • Manipulating and analyzing the image • Output in the form of an image which is based on analyzing In a digital computer, everything stores in the form of binary, storage of an image in pixels (raster), each pixel represents some numeric values of the two-dimensional array (i.e. x means row and y mean column). In analogue IP, processing can be applying on printout image, photographs mean any hard copies, utilize fundamental visual techniques to analyze the image. While at the other end, DIP manipulates digital images by simply follow three basic rules: (i) pre-processing, (ii) enhancement, and (iii) extraction of the information displayed through digital computer [26].
Phases of IP starts with image acquisition, it is the first step of IP were applying to scale on load image (convert RGB into gray-scale) [27,28]. Image enhancement technique facilitates to extract hidden detail from the input image. Image restoration deals with the degradation on the bases of the mathematical or probabilistic model. Color image processing is another phase of IP where it handles the color model (pseudo-color and full color) of IP. Wavelets and multi-resolution processing used to check images from various levels of degree [29]. After completion angle rotation, shrink size and resolution of the image, this is known as compression.
Morphological processing is one of the pivotal steps of IP and excerpt information of components and sends it to segmentation phases were partitioning the image into objects (the difficult task is to segment an autonomous picture) [30,102]. These objects represent the processed data transform from the solution space of segmentation [30,103]. At last, object detection and recognition assign a label for those objects which use in future processing. In DIP, digitize analogue image or video signal, digitization is the process of converting continuous into a digital image, it can only be done when image function (x, y) digitized both spatial and in amplitude. Sampling refers to take a sample and determine the spatial resolution of an image, there are two main parts of sampling (i) upsampling and (ii) down-sampling [31]. Variation occurs in the signal randomly due to noise. In sampling, the process of reducing noise is to get more samples [32]. The process of quantization is opposite from sampling (digitize the amplitudes) and the level of gray in the scale determines the level of quantization [31]. There are three main types of images use to process digitally: (i) binary image, (ii) gray-scale, and (iii) color image, further explain in the next paragraphs.
ML is the imperative factor of IP, unsupervised used for feature extraction, and supervised used to label objects for detection and recognition [33,34]. In unsupervised ML, researchers share various research publicly related to the elicitation of critical features in an image. Segmentation applied through the process of histogram then clustering. In clustering, color-based segmentation only done by Fuzzy C, where similar color pixels group together means cluster, similarly, segment texture image through the K-mean algorithm [35]. Expectation Maximize algorithm's parameters based on unsupervised operation in model-based segmentation [35].
Several real-time IP applications running on unsupervised ML techniques one work provide by Zhong et.al successfully build computational intelligence in optical remote sensing using IP clustering [36]. Ghosh et.al elaborate unsupervised changes detection of remote sensing images and compare two images of the same geographical area using a fuzzy clustering algorithm [37,38]. Moreover, it is widely used in the CV medical imaging field where it helps the system to diagnose disease; manual assessment of diagnosing malaria is the long step process and prone to human error. Purwa et.al developed unsupervised automatic detection of sensitive malaria screening techniques [39].
It is not the stopping point of unsupervised learning, but there are more areas cover with the real-time application. Supervised Learning is learning where we train or test machines using the labelled dataset, the label means already tagged data for classification purposes. In supervised learning, Support Vector Machine (SVM) widely use in NLP for recognizing handwritten and scan text images [40]. Classify cell biology to recognize phenotypes in medical imaging [41]. In robotics, supervised machine learning plays a vital role in the visual perception of forest trail for mobile robots [42]. Erickson et.al labels "benign" and "malignant" to detect brain tumour medical image [43]. Tuia et.al emphasis supervised learning in remote sensing image classification [44].

Neural Network (NN) Based Models
Deep learning (DL) is the newest innovation of researchers, but the fact is DL is the subfield of ML; it is just the model name of neural networks [45,104]. The idea of NN is popup by the concept of biological neurons of an individual. McCulloch and Walter Pitts derived human neurons into artificial neurons from the very first time in 1943 [46], the notion of perception in NN given by Frank Rosenblatt, a probabilistic model for information storage, funded by US Naval and successfully implemented at Cornell Aeronautical Research Lab in 1958 [47]. The basic building block of NN; computational model working on parallel with no centralized control unit, weight is the primary mean of storing information in neurons for the long term and amend the weight then updating information storage of NN as well. The architecture of NN is based on three main parts: (i) number of neurons (ii) number of layers and (iii) connections between layers [48].
Feedforward neural networks (FFNN) is an acyclic network and input values move forward from input nodes to hidden, then hidden to output neurons. Feedforward has a single and multilayer perceptron, single layer feedforward perceptron (SFFP) or linear threshold is typically based on a single output layer [49]. It contains an aggregate of input values with weights and activation function of artificial neurons only fire when the threshold value is zero (0). If the value of threshold is less than zero (-1) then add parameter known as bias for stabling the output otherwise model goes deactivated. Bias is the constant intercept (output = sum (input * weight) + bias) [1] add to the product of aggregate input and weight for setting the threshold value of activation function. In a multilayer feedforward perceptron (MFFP), nodes of each layer are fully connected to sub layers of others to make a network; means there is an input layer, one or more hidden layers and a single output layer [50].
Each layer has a different number of neurons and connected with the same as an acyclic graph. The backpropagation learning algorithm, the essential task is to reduce the error rate by tuning the weight of NN [25]. In the phase of feedforward, the model cannot work properly and produce error then backpropagation goes back in the previous iteration for retaining the probabilistic connecting weight for minimizing error value. For optimization algorithms, gradient descent is the iterative process that minimizes loss function by updating the parameter of the model (parameter refers to the weight of neural nets) [51]. In gradient descent, the learning rate is the size of those steps, which we take to reduce the rate of error. If we take a high learning rate, more steps cover to reach our desired result, but risk increases. If we take a low learning rate (local minima), the overall process is time-consuming but efficient, it takes fewer steps to move toward the goal with low risk [1]. Cost (also known as a loss) function tells how well our model or how our model performs accurately. It plots its own curve and gradient in the graph. This is the complete and concise overview of NN, advancement occurs when introducing innovation in the field of artificial NN is a convolutional neural network (CNN) in DL [52,53]. The main agenda behind the concept is to build strong AI which acts and think same as a human [54], CNN is an active learning algorithm [1], the objective is to recognized visual features of input image, adjust weight and bias (if needed) for differentiate object of one from others [55]. Moreover, researchers applied CNN mostly on a visual dataset to retrieve hidden information and make system knowledgeable for automatically recognized different objects from a different facet of an image. If we compare CNN with classification algorithms, CNN takes the low amount in the pre-processing phase while taking the high amount in the filtering process for enough training (ability to learn more characteristics), as compared to this, classification algorithms are totally opposite of CNN. The contribution of CNN is not limited in CV but most of its features used for classifying sentences in Natural Language Processing (NLP) [56]. The architecture of CNN is inspired from the human visual cortex, connective patterns of brain neurons [57]. The receptive field in architecture, individual neuron responds only in a restricted region of visual field simultaneously [58 , 105]. In this context, defining the basic difference of DL and neural nets, more than two hidden layers in NN model is probably called DL, which Highlighting the main aspect of gradient descent, learning rate, loss function, convolutional NN, and architecture of CNN.

Tools and Applications
In this review paper, list of eleven most popular tools of machine learning described in Table 1 with its features, platform capability, commercial and non-commercial products, and availability of supporting programming language domain.

Online Available Dataset for Computer Vision/Image Processing
There are many datasets were developed for the research purpose, which varies according to the number of images, categories, and types of images such as medical, hyperspectral, satellites, video streams, grayscale and RGB natural images. In this paper, we enlist the most popular image dataset, which is used for image processing/computer vision.
Labelled Faces in the Wild: this dataset available on the web site of the University of Massachusetts [77], this dataset contains 13,000 labelled images of human faces, for use in developing applications that involve facial recognition.
Stanford Dogs Dataset: this dataset provides by computer vision research group of Stanford University, which Contains 20,580 images and 120 different dog breed categories, with about 150 images per class [78].
Places: this dataset developed MIT Computer Science and Artificial Intelligence Laboratory and available on the website, which is based on a Scenecentric database with 205 scene categories and 2.5 million images with a category label [79].
CelebFaces: this dataset is designed by Multimedia laboratory, The Chinese University of Hong Kong and this Face dataset contains more than 200,000 celebrity images, each with 40 attribute annotations [80].
Plant Image Analysis: this dataset based on the collection of plant images and collection based on over 1 million images of plants. It can choose from 11 species of plants [81].
Indoor Scene Recognition: this dataset based on the 67 indoor categories and contains 15620 images. This dataset is freely available for research purpose and all images are in the format of jpg and every category at least 100 images [82].
MedPix: this dataset is based on the medical images and freely available for research purposes. This dataset contains 59000 images related to medical [83].

Challenges, Issues, and Limitations
In computer vision, limitations emerge due to training machine accordingly dataset, the concept rises called underfit and overfit, means not to train a system with less amount of data nor with the huge amount of data. In this paper, mention the five main aspects of raising challenges in the domain of image processing discuss below:

Algorithms Required Huge Amount of Data for Better Performance:
In this context, a machine requires a large amount of data for training and without this machine can't perform well enough as expected [84,85]. Underfitting occurs when the algorithm does not fit the data and increase bias and decrease variance. Another scenario is overfitting it occurs in training phases, where model overload because of the ration of training data, decrease bias and increase variance [86]. The collection of valid data, which is useful for further process, is another critical issue rising by gathering data from different origins but ensure data is in a particular domain. The problem arises when conduct questionnaires, people can't respond 100% as we expected results received according to the filling of questionnaires by people and training and testing the model through vague data generates error ratio [87].

Require Lengthy Offline Labelling of Training Data:
Right now, AI is facing difficulty to label data, almost 80% of real-world data are gathered, organizing, and label [1], data-labelling requires lots of efforts to train, validate, and tune model, this isn't an easy task [88]. The vast gap between data-increment and data-recognition. Reason to require lengthy offline labelling: low quality, scale, inefficient or costly, quality assurance. Estimate steps to convert unlabelled to labelled training data: (i) dataidentification, (ii) data-aggregation, (iii) data-cleansing, (iv) data-augmentation and (v) data-labelling, after that prepare algorithm for further process (vi) MLoptimization, (vii) ML-model tuning, (viii) ML-model training, and (ix) ML-algorithm development. These are the main process of labelling data to execute the ML algorithm.

High Processing Power:
Now, this is another challenging task to handle for the field researchers. ML requires more computational power to process the input image dataset [89]. High processing consumes more time to validate the massive amount of data if found any error, it takes a cycle to check either code running perfectly or not. In short, the overall scenario is the time deplete with lots of processing power needed to compute several records.

Bogus Data:
Algorithmic data emerge, fortune model, this is the beginning of synthetic or fake data become the part of the real-world [90,91]. Types of bogus are online websites or small companies, particularly start-up to create a mimic dataset for model training, creating the perfect dataset for research publication, offline survey collection, or nonexistent data. Identification of actual data from different sources and utilizes these collected datasets to find the unique outcome is one of the challenging tasks nowadays [92].

ML Model/Algorithm does not Collaborate:
Algorithms are categorized into two parts; one is learning style while other is a similarity. Learning style algorithms separated in the form of supervised, unsupervised, and semi-supervised learning aspects [93,94]. The similarity is used to grouping the same data together, instancebased, regression, regularization, clustering, decision-tree, Bayesian algorithms are the main examples. Limitation rises when the hybrid system means to collaborate with two algorithms to utilize both functionalities and make the machine more powerful than other run-time systems.

Open Research Areas
A lot of work has been done on computer vision by using machine learning but still issues in the computer vision field.
Object/ Vehicle Detection: Vehicle and object detection is still required efficient algorithms to detect required object or vehicle in the war situations to react quickly, so it required further development to fast processing to detect to react and prevent from damage [95,105]. Compressed and large-scale images are the problems in the dataset to accurately detect objects, this will help to easily detect vehicle and monitor their actives [96]. Poor visual and Low resolution of the satellite image is also an issue in object/vehicle detection, but development advance ML algorithms and model will help to use of Maximally Stable Extremal Regions for vehicle detection in complex situations low lightening conditions or under shadow regions [97,98].
Activity Recognition: Activity recognition in images/videos still required ML-based efficient algorithms and models, which measure indexing, recognition rate, accuracy, robustness, efficiency, single activity and multiple type and level of activity recognition [99].
Human pose estimation: This is also a major research area of computer vision, which based on the analysis of the pose of humans in an image [100]. This is the process of calculating the actual position of human joints in video or image. The research of computer vision is still facing the problem of the location of human joints in images [104]. It is also important to note that pose estimation has various sub-tasks such as single pose estimation, estimating poses in an image with many people, estimating poses in crowded places, and estimating poses in videos [101].

Conclusion
In this paper, the importance of ML in the image processing (IP) domain is highlighted and address the processes of digital image and how difficult is to feed the computer system. For this processes, the acquisition is the first step of IP where image can load and prepare for further process (briefly elaborate in image processing section). In the end, we get label image objects in the detection and recognition phases. Neural Networks (NN) is one of the ML algorithms, which aims to optimize solutions of the given problem to provide better solution and predicate output by providing input values. Wellknown companies adopted NN for maintaining entire applications and one of them is Amazon the worldfamous organization utilizes NN for its power recommendation engine, Microsoft for translation purpose, Facebook for facial detection and recognition, and Google use NN for Gmail spam filter, etc. Also, we have discussed the concept of deep learning, the emerging technology of AI, which is the subfield of ML, the advent of the technology it grabs the attention of researchers nowadays. We conclude as machine learning spread all the corners of computer vision and well perform in all those areas, but unfortunately, there are still some open research areas, where researchers need to focus on it and fill these areas, which are mention in open research areas section.

•
No associated data available related to this paper Competing interests • Authors did not have any conflict of interest. Funding • There is no funding available for this research Authors' contributions • First author proposes and write draft and 2nd author is supervise and revised the paper. 3rd authors review the paper for language and typing issues.

References
EAI Endorsed Transactions on Scalable Information Systems Online First

Machine Learning in Computer Vision: A Review
Abdullah Ayub Khan, Asif Ali Laghari, and Shafique Ahmed Awan