Energy efficient data aggregation and improved prediction in cooperative surveillance system through Machine Learning and Particle Swarm based Optimization

The present pandemic demands touchless and autonomous, intelligent surveillance system to reduce human involvement. Heterogeneous types of sensors are used to improve the effectiveness of this surveillance system and a cooperative approach of such sensors will make the system further efficient due to variation in users such as corporate office, universities, manufacturing industries etc. The application of effective data aggregation technique on sensors is essential as the energy utilization of the system degrades the lifetime, coverage and computational overhead. The application of bio-inspired optimization technique like Particle Swarm Optimization for scheduling leads to improved performance of the system as the nature of the system is heterogeneous and requirement is multi-objective. Similarly the application of Support vector Machine as a classification and prediction algorithm on the huge data collected periodically makes the system further autonomous and intelligent.


Introduction
The sensors in a surveillance system is a form of ad hoc network, consisting of light-weighted wireless nodes known as sensor nodes that sense environmental conditions in various forms like pressure, temperature, fire, movement, image etc. The data aggregation from these large number of sensor nodes are tedious task in terms of computational power and energy usage due to the heterogeneous nature of sensors. The cooperation among sensors is important for effective data aggregation among sensors during required time intervals. All the sensors are expected to send their data to based node periodically. The surveillance sensor system is having application like climate monitoring, building monitoring, physical condition monitoring, defense monitoring etc. But this huge network always suffers from limitations [1] like resources, storage capacity, memory, computational power etc. Shows the general structure of intelligent surveillance system where all types of sensor nodes in the network is interrelated with each other or communicating via transitional sensor nodes. Sensor nodes that generate data analyze and broadcast sensed data packets to sink nodes based on their sensing mechanisms. Since the base station can be set up very far away from the sensor nodes, this method is fundamentally straight distribution. The energy utilization of data aggregation is more crucial for nodes which is very far from base station s this kind of nodes need to broadcast data over a long distance ,this problem can be solved by classifying nodes such a way that aggregator [2] node and collection nodes. Hence, not all the nodes will broadcast rather only few nodes will transfer data to base station other nodes only collect the data.
Since the majority of sensors used in networks are resource-constrained and battery-powered, energy consumption must be kept to a minimum for longer life time. Data aggregation strategies are used to save resources, reduce energy consumption, and minimize excessive network traffic. Lot of work has been happening in this area of energy usage during data aggregation as energy usage and lifetime of a network is the major factors which decide the efficiency of the network. The transmission of redundant data also creates worst energy usage because the redundant data are generated periodically. Efficient data collection with removed redundant data and less energy usage technique is essential for the improved performance of a network as a whole.

Base station
User/data analysis through ML

Cloud Storage
Target Sensors Cameras Figure 1. Basic architecture of intelligent surveillance system The data aggregated [2] through aggregator node reaches at base node and it can be further send to user end through internet. As the surveillance is aperiodic work and sensing has to be done in frequent interval, there will be huge data generated through sensing process. This large information is stored in cloud server and data analytics is done in the data for effective prediction.

Background Work
The Existing Energy-efficient Routing Algorithm to Prolong [6] Lifetime is excellent technique for routing in large number of wireless sensor nodes, which is also efficient in terms of packet delivery ratio even though nodes are in heterogeneous nature. But this algorithm shows poor packet delivery ratio for a scalable network, which also suffers from low throughput when network is expanded to a large one. The data aggregation [7] support for this algorithm is also a question as it equally depends on the coverage and lifetime requirement of the network.
The more distributed approach like Ant Colony Optimisation can be utilised for effective data aggregation as it also improves overall system performance but the overhead in computational complexity is a matter of concern as the frequent pheromone updating is demanded by this protocol. Infrastructure based Data Gathering Protocol (IDGP) [8] is solution for distributed approach even though it demands huge computational power and which also degrade system lifetime. The comparison of this to protocol shows that data aggregation always create a trade off between lifetime and coverage similarly it also create a trade off between computational power and system performance.
Data aggregation always offers performance degradation due to black hole nodes and multipath [12] transmission. The presence of black hole node will ensure that data are not received the destination node, which also forwards wrong data to the destination. Multipath transmission creates a lot of duplicate packets which also become a reason for excess power [5] usage ultimately it will degrade system lifetime.
The data aggregation technique that decreases a large amount of broadcast is the most sensible application in WSN. Innovative concealed data aggregation methods are used to extensive homomorphic communal encryption techniques. The data aggregation [15] and authentication protocol, called DAA, incorporates fake data detection with data aggregation and confidentiality. The Queen-MAC protocol are used to schedule [21] the node wakeup times, reduce the inactive listening and traffic and increase the throughput as well as network lifetime.
In a heterogeneous environment of sensor nodes the broadcast and multicast based algorithm also need special attention [19] as it suffers from lot of power consumption especially due to data link layer or link layer issues. The modulation technique and demodulation technique used along with multiplexing in the lower level, the type of antenna for power dissipation etc will also create lot of life time related issues. The approaches with hybrid modulation and efficient encoding can solve the issues, which can be also used for improving security in lower layers.
The energy utilization is a major problem of wireless sensor. The author introduced the effective algorithm based on signal coverage of effective communication. There are two derived algorithm are used, to [20] guarantee local energy balanced consumption ascribed to the deployment using multi-hop partition subspace clustering algorithm. Second one coverage probability by using distributed locating deployment based on efficient communication coverage probability. As a result DLD-ECCP protocol used to save the hardware resource and energy utilization in wireless sensor network.
The wireless sensor networks are created by associated sensors that each have the capability to gather, process, and store ecological information in addition to communicate with others through inter-sensor wireless communication. The wireless sensor network used this characteristic in broad level applications such as ecological monitoring, battlefield observation, nuclear, biological, and chemical (NBC) attack detection etc. The critical are and common area are illustrious sufficiently. The author introduced the approximation algorithm for critical-square-grid coverage [21] and to be used to cover the grid based area in entire network. As a result the author provides the better resolution for critical-squaregrid coverage.
The cluster scheduling and collision avoidance are major problems of in large-scale cluster-tree wireless sensor networks. The author introduced the Time-Division Cluster Scheduling (TDCS) mechanism [22] dependent on the cyclic expansion of Resource Constrained Project Scheduling with Temporal Constraints (RCPS/TC) for clustering tree wireless sensor network with limited communication errors. The objective to aspired all end-to-end deadlines of a predefined set of time limited data flows as reduce the energy utilization of the nodes by setting the TDCS period as long as probable. Because of each cluster is active only once during the time period. The end-to-end delay of an agreed flow may span over various time periods when there are the flows with reverse direction.

Different types of data aggregation process
Data aggregation is key performance deciding technique in sensor network as it also depends on nature of the routing algorithm used; the same will deeply affect the network life time and coverage parameters. In order to proceed with network aggregation, the sensor nodes can route packets based on the data packet substance and choose the next hop. The network layout separates data aggregation mechanisms based on the routing protocol. In wireless sensor networks, there are four different types of data aggregation methods are used mainly. Figure 2 shows various data aggregation methods which are frequently used with sensor networks. These techniques are mainly compared with respect to lifetime of a sensor node, coverage and throughput or packet delivery ration.

Tree based method
The tree-based method is useful for aggregation because it allows you to create an aggregation tree. The tree is a least spanning tree, with the base station acting as the root and the sender nodes acting as the leaves. The data stream starts with the leaves node awake and ends with the root base station. The key disadvantages of this approach are that wireless sensor networks are not immune to failure in the event of data packet loss.

Cluster based method
In large-scale energy control sensor networks, it is ineffective for sensors to broadcast data to the base station in an unwavering manner. The cluster-based approach is a hierarchical method that divides the entire network into several cluster classes. Each cluster has a cluster head that is chosen by the cluster members either by priority or through an election algorithm. The cluster head is in charge of aggregating data obtained from cluster members nearby and broadcasting the result to the sink node. The cluster heads may communicate directly with the sink using extended range communication or multi-hopping between cluster heads.

Multi path method
The downside of using a tree-based approach is that the system's robustness is limited. To overcome this limitation, a new method was developed that distributes incompletely aggregated data to a single parent node in the aggregation tree; a node can send data over multiple paths. Every node has the ability to send data packets to one or more of its neighbours. As a result, data packets flow from the sender node to the sink node through multiple paths, with several intermediate nodes in between. These techniques were used to make the device more stable, but they added some additional overhead.

Hybrid method
The Hybrid method was able to distinguish between tree, cluster-based, and multipath systems. In which the data aggregation formation can be controlled based on the exact network location as well as a variety of performance data.

Data aggregation techniques with energy conversion
The difference with the general hypothesis that sensing is unrelated to energy consumption is that a hopeful class of applications is basically sensing regulated. In reality, the energy consumption of the sensing subsystem may not only be associated, but it may also be better than the radio's or even better than the rest of the sensor node's energy consumption. This can be applied to a variety of factors, including 1. Power-starved transducers are sensors that need a lot of power to complete their sampling mission. 2. High-level rate and high-level resolution AD converters are typically needed by sensors such as acoustic and seismic transducers. The converters' power consumption can be used to estimate the most significant power consumption of the sensing subsystem. 3. Active sensors are a new type of sensor that uses active transducers to obtain data about the sensed reality. In order to obtain information about the experimental quantity, sensors must send out an intrusive signal in this case. 4. Purchase period is lengthy and the acquisition time may be in the hundreds of milliseconds or level seconds range. As a result, even if sensor power consumption is limited, the energy devoted by the sensing subsystem can increase,

Data aggregation techniques with network lifetime
Battery-powered sensors can be used as long as they can link collected data to a privilege node. Sensing and communications decreased energy consumption [23,24], allowing for more effective power management and scheduling. When ground access to the controlled area is not permitted, one alternative is to coordinate the sensors remotely from a plane to observe a series of targets with well-known locations. A high sensor inhabitant density in the drop zone will then be rewarded for overcoming exact sensor location. This would increase the likelihood of the goal coverage area being reached. For processing, the data collected from the sensors is sent to an inner node. Figure 3 shows the overall process involved in the system where data from heterogeneous sensors are collected by corresponding sensors with initial configuration there could be traditional technique to improve lifetime of the sensor [25] or coverage of the sensor. Now the collected data will be sent to cloud storage through the internet. This huge data collected [16,22] from sensors can be e used for prediction through any suitable machine learning technique. The application of support vector machine as a classification technique will improve overall system performance as support vector classification is very effective on heterogeneous data with large number of features. Support vector can yield high classification accuracy as well as precision as it will increase the dimensionality of features available. Now the end user can get fruitful results on surveillance based on the classification and prediction.  Even though the system yields quality output, still system performance is poor due to poor handling of sensors lifetime and coverage [9] parameters. The lifetime and coverage always have a trade off as improving one parameter will reduce the value of another parameter [10]. Now application of an effective scheduling [14,17,18] is essential to schedule various sensors based on its application and environment to yield good life time as well as coverage [13] parameters. Particle Swarm Optimization (PSO) is applied as an effective optimization tool which can optimize both the coverage as well as energy usage according to the scenario.

Particle swarm optimization (PSO)
In order to further improve the coverage and network lifetime, Particle Swarm Optimization (PSO) [3] based scheduling is essential where all the necessary parameters are considered for optimization. PSO is utilized among sensor nodes in the heterogeneous network where essential parameters of individual sensor are considered. Once all particles are initialized with required value, an iterative optimization procedure is initiated and the optimization will be done on sensors scheduling. Process of PSO algorithm is shown in figure 4.

Input: An array of the population of particles from D dimensions in a problem space Output: Improved load balancing among sensor nodes in WSN
Step 1:Begin Step 2: For each particle Step 3: Evaluate fitness function in D variables Step 4: Compare each particle's fitness evaluation with its ' ' Step 5: If current fitness value is better than ' ' Step 6: Save the current value as ' ' Step 7:

End If Step 8:End For
Step 9: Compare fitness evaluation with population's overall previous best Step 10:If current value is better than 'g ' Step 11: Save current value as ' 'to current particle's array index and value Step 12: End If Step 13: Modify velocity and position of each particle.

Step 14: Repeat until stop condition is met
Step 15: End Figure 4. Process of PSO optimization Figure 4 illustrates the PSO based optimization [4] to generate better scheduling of sensor nodes to improve performance without compromising lifetime and coverage. Based on the fitness value of neighbour nodes and other relevant parameters, the PSO-based approach achieves effective scheduling for transmitting data packets from source to destination. As a result, the network's lifespan and coverage are consciously increased.

Support vector machines (SVM)
For both linear and nonlinear data, SVM is a reasonably good classification [5] tool. The original training data is transformed into a higher dimension using a nonlinear mapping. With the new dimension, it searches for the linear optimal separating hyperplane (i.e., "decision boundary"). With an appropriate nonlinear mapping to a sufficiently high dimension, data from two classes can always be separated by a hyperplane. SVM finds this hyperplane using support vectors ("essential" training tuples) and margins. The training can be slow in SVM but accuracy is high owing to their ability to model complex nonlinear decision boundaries. Hence SVM is used for both classification and numeric prediction. SVM achieve a classification or regression decision based on the value of the linear combination of input features. Figure 5 shows the general philosophy of SVM where the goal is to generate mathematical functions that map input variables to desired outputs for classification or regression type prediction problems. First, SVM uses nonlinear kernel functions to transform non-linear relationships among the variables into linearly separable feature spaces. Then, the maximum-margin hyperplanes are constructed to optimally separate different classes from each other based on the training dataset. A hyperplane is a geometric concept used to describe the separation surface between different classes of things. In SVM, two parallel hyperplanes are constructed on each side of the separation space with the aim of maximizing the distance between them. A kernel function in SVM uses the kernel trick (a method for using a linear classifier algorithm to solve a nonlinear problem) Margin Figure 5. General philosophy of SVM

Metric evaluation of data aggregation in wireless sensor network
Maximum Amount Shortest Path (MASP) is a data gathering and collection technique that increases throughput while lowering energy consumption to maximize sensor node allocation. With MASP and Shortest Path Tree (SPT), the maximum amount shortest path decreases energy consumption while increasing throughput. The Energy-efficient Routing Algorithm to Prolong Lifetime (ERAPL) [6] routing protocol is used to save energy and extend the network's lifetime. This protocol uses with Data Gathering Sequence (DGS), used to avoid shared transmission and loop transmission between nodes.
Responding to continuous queries using data aggregation in lively data sets is the low-cost level [7] with scalable technique. The query cost model can be used to estimate the number of messages needed to satisfy the incoherency limits set by the source. The data collection route for sink node uses an Infrastructure-based Data Gathering Protocol (IDGP) [8]. The K-hop relay method is used to route data to a mobile sink node with the fewest number of hops possible. The data gathering protocol outperformed the others in terms of fewer hops and a shorter data gathering route to the sink node.
High probability's data distribution mechanism avoids the black holes created by these attacks. The arbitrarily multipath routes are generated using this mechanism [12]. The routes that have been developed are also extremely dispersive and energy efficient, allowing them to avoid the black whole attacks. This mechanism used to the optimization with reduce the energy level and to provide the security control.
To mitigate the effects of interference, combine scheduling with broadcast power management. The power regulation aids [5] in shortening the schedule length as compared to single frequency scheduling. Broadcasting on different frequencies is more professional.

Energy and lifetime parameter in sensor network
The combine-skip-substitute (CSS) method is used to find the optimal solution in small range of the lower bound. The combine-skip-substitute schemes are used to achieve the efficiency and correctness. The Data Routing for In-Network Aggregation (DRINA) [11] is used to reduce number of messages for conception a routing tree. This method maximizes the number of overlapping routes, high data aggregation, reliable transmission and data aggregation technique.
To schedule node wakeup times, minimize inactive listening, traffic, and increase latency, distribution ratio, and [21] network lifetime, an adaptive quorum-based MAC protocol is used. The optimal routing and data aggregation scheme are used achieved to enhance the lifetime of network as well as optimizing data aggregation and routing. The optimal routing and data aggregation methods to reduce the data traffic level as well as increase the network lifetime. The performance measures of network are carried out using simulation tool where prototyping of network is done with various types of nodes/ heterogenous parameters. The figure 6 shows performance of network in terms of packet delivery ratio for a varying number of nods. The performance is measured with scheduling using Particle Swarm Optimization (S-PSO) and scheduling using without Particle Swarm Optimization (S-WPSO). Scheduling with PSO offers always efficient packet delivery ratio compared with scheduling without PSO. Aathe number of nodes varies the PSO-based scheduling is still effective and the same is able to provide stable packet delivery ratio. Energy efficient data aggregation and improved prediction in cooperative surveillance system through Machine Learning and Particle Swarm based optimization  Figure 7 shows the graph plotted with reference to load balancing efficiency of the network. As the network is heterogeneous and large, the performance measure on load balancing is important and the same shows the overall network performance, which also indirectly contribute to the computing performance or the computational requirement of the network and the same parameters also affect the lifetime of the network. It is evident from the graph that the PSO based scheduling could attain excellent load balancing efficiency compared with the traditional technique and there is almost 20% improvement in performance for a varying number of nodes with heterogeneous nature. Classification on data acquired from the heterogeneous sensors is pre-processed and classified using Support Vector Machine (SVM) for effective prediction. The performance of classification need to be measured as it utterly affects overall system performance and the same is essential for improved prediction. The performance of the classifier is measured using Weka tool where the collected data is converted Attribute Relation File Format (ARF).  Table 3 shows the SVM based classification summary which enables us to evaluate various parameters like classification accuracy, overhead etc. Table 3 gives a detailed analysis on classification with measure on parameters like True positive (TP), False Positive (FP) and precision.

Conclusion
Autonomous nature of cooperative surveillance system is essential due to current covid pandemic situation. a surveillance system with heterogeneous sensors always perform poor due to lack of efficient data aggregation technique. The data aggregation through scheduling of heterogeneous sensors with particle Swarm Optimization technique could achieve excellent performance in terms of packet delivery ratio and network throughput. The performance measures also shows that the efficient scheduling achieved improved load balancing which also reduced computational complexity of a system and ultimately improved the life time of the system. The autonomous surveillance system always demands for prediction based on the huge data collected, the application of classification technique like support vector offer improved prediction based on available data features. The dimensional expansion on data features could improve classification accuracy even though it slightly increased the classification overhead. the application of PSO -based scheduling on heterogeneous sensors made the surveillance system Cooperative and the application of SVM based classification improved the system learning approach and the system became true autonomous.