IOTA Based Anomaly Detection Machine learning in Mobile Sensing

In this proposed method, iMCS can detect and prevent fake sensing activities of mobile users using machine learning techniques. Our iMCS solution uses behavioral analysis based on participants' reliability scores to detect variation in behavior of users and introduces a new role in a distributed system of MCS architecture to validate the collected data. To evaluate the incentive based on the participant's sensory data and data quality, to properly distribute profit among the participants, we employ the Shapley Value approach. The evaluation results demonstrate that our method is effective in both quality estimations and incentive sharing.


Introduction
IOTA is a ground-breaking computing model that has rapidly evolved in practically every technology field during the previous decade, including smart anomaly detection and smart security systems, smart banking systems, crypto currencies, sensor use, smart cities, and satellites (Wang et al. 2020). It is made up of a range of IOTA mobile devices (Things) with sensors, actuators, storage, processing, and networking capabilities for data collection and sharing over the internet (K. Singh et al. 2020). The data collected and processed by a IOTA network is sensitive, and it must be safeguarded against possible attacks (Hao et al. 2019). Firewalls, authentication systems, various types of encryption, antivirus, and other security measures are presently being used to safeguard sensitive data from vulnerable mobile device security threats, such as the distributed denial-ofservice (DDoS) attack, which is the first line of defence (Alrashdi et al. 2019). IOTA has the potential to create networking [SDN], future network structure, Deep learning (DL), artificial intelligence (AI), and machine learning are examples of data networking (NDN) and cloud network computing (VoIP fibre optics, global microwave access interoperability (WiMAX), deep learning (DL), AI, and machine learning). Numerous new anomalies (both unique and mutations of an old anomaly) are produced on a regular basis as a result of the inclusion of a large amount of data (Shafiq et al. 2021). As a result, a second-line defensive intrusion detection system (IDS) can offer additional security protection for an IOTA network (Shafiq et al. 2020). The methods of deployment and detection can be used to classify IDSs. Depending on the detection technique, an IDS can be either host-based or network-based, as well as signature-based, specifierbased, or hybrid detection. The goal of this research is to use the Network-based IDS (NIDS) detection technique to provide IOTA security at the entrance points. The current IDS have a basic flaw: when zero-day abnormalities are identified, the False Alarm Rate (FAR) increases (Bhuvaneswari and S. 2020). Machine Learning (ML) and Deep Learning (DL) techniques have recently been investigated as ways to improve detection accuracy and minimise the FAR for NIDS. In research, both ML and DL techniques have been proven to be effective in extracting meaningful patterns from network data in order to classify flows as anomalous or benign. Thanks to its deep architecture, which requires no human contact, DL has demonstrated speed in learning valuable characteristics from raw data, and has emphasised the importance of integrating IOTA networks into NIDS (Kuang et al. 2020

Related Work
Throughout the last decade, researchers have been investigating artificial intelligence technologies such as machine learning and deep learning in order to provide effective NIDS solutions (Gupta et al. 2020;Stoyanova et al. 2020). Because of advances in graphical processing unit (GPU) technology, which answered the speedy calculation need for DL algorithms, DL methods have been favoured over ML algorithms over the last three years, according to current NIDS trends. This has inspired scientists to apply the DL algorithms in an IOTA network to develop effective security solutions that process large numbers of raw data (Al Zamil et al. 2017;Zielonka et al. 2021). [8] Because of its deep structure, the DL can learn the complex pattern and aid in the classification of benign and pathological traffic. Researchers in the field of NIDS commonly use machine learning techniques. Ali et al., for example, suggested IDS based on the Particle Swarm algorithm that uses a fast-learning network. Despite being efficient enough to predict most attacks, the performance of the minority class label detection model was not encouraging. Shen et al. developed an ensemble approach methodology that included applying the BAT optimization algorithm during the ensemble cutting step. Yao et al. explain a multi-level semi-supervised machine learning model that incorporates clustering as well as the Random Forest approach (Chaterji et al. 2021; Unal 2020), in another noteworthy piece. Their methodology has been successful in detecting multi-level assault classes. [9] ML and DL methodologies are also being used by researchers to produce successful NIDS solutions using a variety of hybrid strategies. All of these methods are investigated utilising DL algorithms for feature and complexity reduction, followed by a machine learning predictor (Mohamad Noor and Hassan 2019; Taneja et al. 2020).

Figure 1. IOTA Block chain Methods
For example, Shone et al. use a advance method to integrating auto encoder (AE) and RF by using the AE encoder only. [10] Their non-symmetric solution detected the abnormalities successfully with the exception of some labels due to lower instances.

IOTA Framework
The IoTA Framework is on the caller's side. [14] It includes the sensor platform, the sensor data interface and the SIP client, as shown in Fig.1. The calling side can also be fitted with other instruments (e.g. computers, tablets, televisions, microphones, and speakers), sensors for fake and anomaly detection. As a user-server model, the sensor data interface has been established to facilitate communication between the sensor platform and the user. In XML format, the interface accepts the critical data. Sensor Model Language has been chosen as the standard, unified data representation model. This file will also be utilised as a parameter in a Deep Learning Model.

Deep Learning Model
DNN is part of the supervised learning algorithm family to train the model with several layers. The DNN employed in this study is based on the notion of an artificial neural feed-forward network with numerous hidden layers to enhance abstraction capabilities. The input layer of the DNN structure employed in this study has a set of 64, 32, 16, or 8 neurons. After that, we used four dense layers with 210, 29, 28, and 27 neurons, followed by a sigmoid classification layer with two outputs to demonstrate the anomalous and benign traffic categorization. [15] For the experiment, only five neurons with numerical and category information are used in the input layer, After that, there are two thick layers with 28 and 27 neurons, as well as an output layer with a sigmoid EAI Endorsed Transactions on Creative Technologies 12 2021 -03 2022 | Volume 9 | Issue 30 | e1 activation function, which determines if mobile activity is benign or pathological.

Figure 5. Deep Learning based IOTA Architecture
The study investigated a two-stage IDS system to safeguard the IOTA network based on Deep Learning against potential attacks as illustrated in Figure 3. The several phases of the considered model are (1) the phase of data collection, preparation and (2) the stage of detection of deep neural networks. The many procedures taken to implement and assess DL models include.

Shapely Value
We describe the user data's Shapley value as a sum of two user data's Shapley values, which gives every coalition its rational value: The user data in which the worth of a coalition is decided by its restricted rational value-the total coalition consumption given the optimal load optimization method. Every coalition's value is its logic, according to the user data (i.e., the difference between its rational and bounded rational values). [18] Importantly, while the restricted rational coalition values can be calculated, the rational values and rationality differences do not apply due to the limited processing resources. [16] As a result, we can only calculate the Shapley value of the user data with limited logic using appropriate coalition values. The limited rational Shapley value is what we refer to it as [19]. We believe that this pay-off method is fair since it generates a pay-off distribution using the technique described below (which is not feasible in view of the restricted resources available): Step 1: Divide the Grand Coalition's rational value among the agents "equitably," according to Shapley's axioms. Intuitively, each agent's role may be considered as a reward for making a reasonable contribution.
Step 2: Divide the rational difference between agents of the Grand Coalitionagain, unknownaccording to Shapley's axioms. Every share can be thought of as a monetary punishment for not determining the logical value in a reasonable length of time. For example, if an agent's existence [12] in a coalition generates rational discrepancies on a regular basis (for example, due to the agent's severe constraints, which increases the time required to calculate the rational value), that agent will be penalised. If coalition values suggest a cost, the penalty could be negative.
Step 3: Assign a fair reward less a fair punishment to each agent.
In view of this mechanism of division, we offer two greedy algorithms to optimize individually and collectively the cooling plan for apartments. These methods will let to find a pretty excellent (but not always optimum) answer in time, and the useful parts of these algorithms, as we will see later, considerably assist us in optimizing the coalition load. More precisely, the first algorithm detects times in a particular day, When the air conditioner is turned on, the gap between householder preferences and the expected temperature during the comfort period is the smallest. The second strategy relieves stress on a group of apartments by reassigning a significant number of occupants to apartments with more flexible preferences (subject to a specified temperature threshold and individual temperature preferences). The more adjustable an apartment is, the easier it will be to accommodate its preferences. [17] This programme takes use of the fact. These two algorithms can be used to detect sub-optimal load coordination (which results in a potentially lesser discount saving than the optimal option) whilst fulfilling the household temperature preferences. Then, our limited rationality proposal proves that the fair distribution of the discount may be obtained using the Shapley value.

EAI Endorsed Transactions on
Creative Technologies 12 2021 -03 2022 | Volume 9 | Issue 30 | e1 We used the publicly accessible dataset IoT-Botnet 2020 to evaluate the performance of the DL techniques explored in this study. This dataset is provided in CSV format and is used from BoT-IOTA Pcap files by producing extra network and flow-based attributes. The original dataset contains samples of many sorts of attacks, such as denial of service, distributed denial of service, acknowledgement and attacks on theft of information. We picked the benign samples from the original dataset, while the random samples from [23] every anomaly class were evaluated for fair model assessment. For the anomaly class.

Figure 6. Feature Correlated Matrix
The initial data set included attacks such as denial of service, distributed denial of service, recognition, and robbery. The benign samples were taken from the original data set, whereas random samples from each anomaly class were evaluated to ensure a fair model evaluation. In information theory, [21] the MI is a key concept that provides information about other variables in exchange for a reduction in the uncertainty of a single random variable The MI can be calculated as follows:

Results
To build the various DL based IDS techniques, we employed a batch size of 27, [21] a learning rate of 0.01, an Adam Optimizer, and a binary cross entropy Loss function ReLU and sigmoid activation functions were used in this work for DL methods.  very high percentage of anomaly flow detection, with the top DNN score of 99.5%. On the other hand, it has been noted that the rates for detecting benign traffic have marginally fallen by 3.87-10.99 percent and DNN performs even better than other 96.085 percent methods. [22] We have also noticed that the LSTM model was inadequate at detecting benign flows with a deterioration of nearly 11%. We estimate that the imbalanced character of the data set with anomaly records is nearly 3.2 times higher than the benign data, which have helped to degrade the detection rate for the benign label. Increased data for benign labels can also enhance their detection rate.  We considered the real-world scenario where a mobile crowd sensing of fake users in a demand response programme so that a specified threshold does not surpass their aggregate. In return, the network receives a discount for their quality. A coalition of network users must therefore optimize the use of mobile network by its users.

Conclusions
With this proposed method, iMCS can detect and prevent mobile users' false sensing behaviors with machine learning techniques. The iMCS solution uses behavioral analysis based on the reliability scores of participants to identify user behavior variation, and offers a new function for validation of acquired data in a distributed MCS architectural system. We apply the Shapley Value technique to equitably share the reward between participants to evaluate the incentive based on the participant's sensory input and data quality. The results of the evaluation show that our strategy is effective in both quality and incentive sharing estimates. The study of each label (Benign and This research offers an effective anomaly detection system based on a deep neural network for the architecture of the IOTA network, that effectively learns valuable complex patterns from IOTA network flows in order to classify traffic as good and anomalous. The new IoT-Botnet 2020 dataset tests the proposed methodology. The experimental findings revealed a superior model to the previous DL-methods by displaying a 99.01% detection accuracy with a false alarm rate of 3.9%, which improved the model's accuracy by 0.57-2.6% while simultaneously lowering the FAR by 0.23-7.98%. Results reveal furthermore that the best number features in the 16-32 range calculated by the MI are a feasible option to reduce the complexity of the model with a performance effect that is almost minimal. Moreover, incorporating the categorical features further increases detection precision by using only the top five characteristics) as regards percentage accuracy, recall and F1. We have seen that all techniques showed a very high percentage of anomaly flow detection, with the top DNN score of 99.95%. On the other hand, it has been noted that the rates for detecting benign traffic have marginally fallen by 3.87-10.99 percent and DNN performs even better than other 96.085 percent methods. We have also noticed that the LSTM model was inadequate at detecting benign flows with a deterioration of nearly 11%. We estimate that the imbalanced character of the data set with anomaly records is nearly 3.2 times higher than the benign data, which have helped to degrade the detection rate for the benign label. Increased data for benign labels can also enhance their detection rate.