FedADMP: A Joint Anomaly Detection and Mobility Prediction Framework via Federated Learning

With the proliferation of mobile devices and smart cameras, detecting anomalies and predicting their mobility are critical for enhancing safety in ubiquitous computing systems. Due to data privacy regulations and limited communication bandwidth, it is infeasible to collect, transmit, and store all data from mobile devices at a central location. To overcome this challenge, we propose FedADMP, a federated learning based joint Anomaly Detection and Mobility Prediction framework. FedADMP adaptively splits the training process between the server and clients to reduce computation loads on clients. To protect the privacy of user data, clients in FedADMP upload only intermediate model parameters to the cloud server. We also develop a di ff erential privacy method to prevent the cloud server and external attackers from inferring private information during the model upload procedure. Extensive experiments using real-world datasets show that FedADMP consistently outperforms existing methods.


Introduction
With the proliferation of mobile devices and smart cameras, anomaly detection and mobility prediction become an important and emerging topic in ubiquitous computing systems, where an anomaly can be a human, a car, and so on. In many situations, especially when solving crime cases, anomaly detection and mobility predication are indispensable. Historically, solving crime cases has been the prerogative of criminal justice and law enforcement experts. With the increasing use of computer systems to track and identify crimes, data analysts have begun to help law enforcement officers speed up the process of solving crime cases [1]. For example, accurate anomaly detection and mobility prediction can help the authority quickly identify the potential threat (i.e., the anomaly) and its movement paths (i.e., the mobility) so as to schedule forces to meet the requirement (e.g., taking the threat into custody) with minimum delay. For smart cities, with the help * Corresponding author. Email: lij@binghamton.edu of anomaly detection and mobility prediction, the government can better understand the traffic flows and propose proper policies to run the city more efficiently.
Given its great value in practical applications, there are a large number of studies on anomaly detection (e.g., [2][3][4][5][6]) and mobility prediction (e.g., [7][8][9][10]). While these models achieve good performance, they are in general designed for a single task, i.e., either anomaly detection or mobility prediction. The intrinsic design makes it hard to generalize these models for the purpose of joint anomaly detection and mobility prediction in ubiquitous computing systems subject to its increasingly demanding services. Furthermore, there is an increasing concern on privacy issues raised in these models due to the data sharing between client devices and the central server.
Federated learning (FL) [11][12][13][14] has recently been proposed for decentralized privacy-preserving training, which enables client devices, such as mobile phones and smart cameras, to collaboratively learn a shared (global) model while keeping all the training data on client devices. There is a central server in FL that orchestrates the whole training process. In each training round, the server collects local models from eligible client devices, which are then averaged to improve the shared model. Similar to the conventional centralized machine learning framework, the wall-clock time for training a model to reach a certain accuracy (i.e., timeto-accuracy) is a key performance objective. FL has been deployed across user devices for computer vision and natural language processing tasks [15,16], medical imaging AI creation [17], video streaming [18], imaging training and AI cameras testing in smart cities [19,20].
Despite the above success, FL cannot be directly applied for joint anomaly detection and mobility prediction in ubiquitous computing systems due to the following challenges. First, model training through e.g., deep neural network (DNN) is known to be extremely computation-intensive. Thus, it would be infeasible to train the model on client devices alone, which often have limited resources for emerging applications in ubiquitous computing systems. Second, the model uploading and aggregation process in FL are not secure, which may result in the leak of private information of individuals. This is because the central server and external attackers may be able to obtain user's private information by analyzing the uploaded model weights. Finally, the application of privacy protection via e.g., differential privacy method [21] may lead to significant performance degradation in joint anomaly detection and mobility prediction task. Adding to these challenges is the fact that FL testing is usually performed on real-life client data, some of the client devices may not be available for the FL training and testing in ubiquitous computing systems [12,22], and the system performance of client devices often varies. As a result, a robust FL model is desired for ubiquitous computing systems where not all client devices are available.
Research contributions: In this paper, we propose FedADMP, a FL-based joint anomaly detection and mobility prediction (ADMP) framework, to address the aforementioned challenges. First, FedADMP protects the privacy of client devices. In FedADMP, the raw data is stored at the client devices and only intermediate model parameters such as the gradients of the trained neural network is uploaded to the server. Second, to reduce the computation load on client devices, FedADMP provides a novel joint training process in which the training task is performed collaboratively by client devices and the cloud server in each epoch. In contrast, the conventional FL framework performs the model training only on client devices. In addition, we consider a practical attack scenario and develop a lightweight group activation mechanism to protect client devices from such an attack without performance degradation. Finally, FedADMP works on situations where only partial client devices are available for FL training and testing.
In summary, the main contributions of the paper are: • To the best of our knowledge, FedADMP is the first work that addresses the joint anomaly detection and mobility prediction problem using an FL-based approach. By storing users' private data on client devices and uploading only model parameters to the server, FedADMP successfully protects the privacy of client devices.
• FedADMP provides a novel joint training process between client devices and the cloud server in each epoch to reduce the computation load on client devices, which often have limited computational capability in ubiquitous computing systems. In particular, each client device performs partial training using its raw data and then adaptively offloads the remaining training to the cloud server by uploading its intermediate results i.e., the activation of LSTM to the cloud server. Our extensive experiments using realworld traces show that FedADMP not only dramatically reduces the convergence time to achieve a certain accuracy, but also reduces the energy consumption on client devices and the communication costs, which are desirable for resourceconstrained client devices in ubiquitous computing systems.
• We consider a practical attack scenario and develop a lightweight group activation mechanism to protect client devices from such attack.
• We strengthen the robustness of our proposed FedADMP framework by considering the situation in which only partial client devices are available for training and testing. Our experimental results show that FedADMP achieves the desired time-to-accuracy with reduced energy consumption and communication costs when partial client devices contribute to the model training.
The rest of the paper is organized as follows. We formulate the joint anomaly detection and mobility prediction problem in Section 2. Section 3 presents the design of our FedADMP system design. The experimental results of FedADMP on five real-world datasets are given in Section 4. Section 5 discusses the related work and Section 6 concludes the paper. Additional experimental results are provided in the appendix.

Preliminaries
In this section, we provide a brief overview of the joint ADMP problem in ubiquitous computing systems. With 2 EAI Endorsed Transactions on Security and Safety Online First Figure 1. Workflow comparison between conventional machine learning systems and FedADMP. "S", "T " and "U " represent cloud servers, transfer environment, and end users, respectively.
the advance of location based applications and largelydeployed smart cameras, location-based services have played a significant role in the safety of smart cities [8,23,24]. Below, we give a generation definition of the ADMP data trajectory.

Definition 1. (Data trajectory)
Let q represent data that is recorded as a tuple of four elements: image pixel m, location l, timestamp t, user identity u (i.e., q = (m, l, t, u)). An ADMP trajectory S of target u is defined as a set of ordered image sequences {q 1 , · · · , q n } (i.e., S u = {(m 1 , l 1 , t 1 ), · · · , (m n , l n , t n )}). We use S to denote {S u } u∈U , where U is the set of all targets.
Note that the above definition is applicable to general application domains [23,[25][26][27]. For simplicity, we transform all location information into a new unique ID, and quantify the time interval into fixed value, following the recent practice [10,24]. In our paper, we choose 250 seconds as the default time interval given the consideration of anomaly mobility and general location frequency in smart city services. However, such spatial and temporal resolution can be easily generalized based on the requirement of different services.

Definition 2.
(ADMP) Given an image sequence S, the goal of ADMP is to detect a particular target u and predict the possible location l n+1 for target u in the next time step t n+1 .

FedADMP System Design
In this section, we first provide an overview of FedADMP and then describe each core component of FedADMP. Similar to many other conventional federated learning frameworks, we assume that the raw data is collected by each client, rather than being collected at a central location. Figure 1 compares the workflow of conventional machine learning systems and FedADMP on anomaly detection and mobility prediction. As illustrated in Figure 1 (a), conventional machine learning systems perform the following four steps to solve the joint anomaly detection and mobility prediction problem. First, the raw data is collected from client devices and transmitted to a central server, which usually resides in a remote data center. Next, a joint model is trained on the server using the raw data. The trained model is then distributed to the client devices through communication channels. Finally, the client devices infer the anomaly and the corresponding mobility behavior. Although such a centralized model works well on conventional systems where client devices have limited computation resources, it fails to protect users' privacy data as raw user data is uploaded to the cloud server. Recently, FL was proposed to address this issue by performing all computations (e.g., model training) on client devices so that no raw user data is shared with the cloud server. However, client devices usually have limited computation resources and storage, particularly for emerging applications in ubiquitous computing systems, which may significantly degrade the system performance.
To address the above challenges, we propose FedADMP, a novel FL based model for joint anomaly detection and mobility prediction in ubiquitous computing systems. The workflow of FedADMP is given in Figure 1 (b). In FedADMP, client devices and the cloud server collaboratively train the model in each epoch. We assume that all clients use the Long Short Term Memory (LSTM) model [28] for training. Each client device first partially trains the model and then adaptively offloads the model training to the cloud server while keeping the sensitive user data on the device. Specifically, during the forward propagation process, each client device uploads only the activation (i.e., the intermediate output of the first layer of LSTM to the cloud server, which further processes the forward propagation based on the activation received from each client. Then the cloud server computes the gradient of the loss function in the backward propagation, which is used to update the model respect to each client. Once the server completes processing all clients in each epoch, a global aggregation is performed to compute the updated global model, which is then sent to all clients. As the client device performs only partial training, FedADMP significantly reduces the computation loads on client device. Furthermore, in departure of the conventional split learning methods (e.g., [29,30]), the client devices in FedADMP uploads only the activation rather than the whole local model, to the cloud server. This significantly reduces the communication cost as the size of the activation is significantly smaller than the whole model. In addition, we have developed a differential privacy based method to prevent external attackers and the cloud server from extracting clients' personal information from the uploaded activations. 3 EAI Endorsed Transactions on Security and Safety Online First  Figure 2 gives the architecture of FedADMP, which consists of three components: an input module with multi-modal embedding, an LSTM-based sequential module, and an output module. Input module with multi-modal embedding: In FedADMP, we convert the raw data into a sequence S based on Definition 1, where S u represents a sequence of images captured by client devices (e.g., smart cameras, mobile devices, etc.). As there are thousands of locations in the dataset, the dimension of one location can be up to thousands. To address this issue, we use embedding methods to reduce the feature dimension and learn the dense representation of discrete location, time, and pixel values. Similar to conventional embedding methods such as [24,31,32], the embedding table is only a lookup table with dense representation of difference indexes, which significantly reduces the dimension. To improve the performance, the embedding model is trained and optimized with the whole network training process. Finally, we concatenate the location, the time, and the pixel vectors to obtain a representation of a spatialtemporal point. Sequential modeling unit: Given the tremendous success of recurrent networks, in particular the LSTM in sequential modeling, we develop a LSTM-based modeling unit for sequential transition relationships. Our modeling unit combines the last output of the neural network and the current input to make the network learn to capture the relationship between sequential inputs. LSTM is formulated as follows:

Anomaly Detection and Mobility Prediction
where • denotes the Hadamard product, x t is the input, h t is the hidden state, c t is the cell state, i t , f t , o t are the input, forget and output gates, and g t is the useful information from the input. The computation flow of such a LSTM is illustrated in Figure 3. Output module: Following the above sequential module, we consider a projection-based method for output module (shown in Figure 4), which directly uses the hidden state vector to calculate the correlation between the hidden state and the dense location embedding representation. Given the correlation results, we use the sigmoid function to obtain the final hybrid output which contains the identification probability distribution concatenated with the prediction probability distribution. We use a linear layer to directly project the hidden state onto 4 EAI Endorsed Transactions on Security and Safety Online First high-dimensional identification and position vectors. The sigmoid function is then applied to the projection output to obtain a probability distribution of the identified task and the predicted position. The output module is formulated as where h is the hidden state of previous sequential unit, w and b represent learnable parameters of projection linear layer. Given the above hybrid output that contains the concatenated result of task identification and mobility prediction, we can simply separate them through slicing to obtain the results for the task identification and the mobility prediction. We further determine if the identification task is an anomaly or not. Let y d be the detection output. We compute the normal score as s = (y d − y l ) 2 , where y l is the one hot code information of the target's label. Once we obtain the score s, we compare it with a decision making threshold δ. If s is less than δ, then an anomaly is considered to be detected.

Privacy-Preserving Mechanism
Although FedADMP does not upload raw data to the cloud server, it is still possible to divulge sensitive information collected by client devices by analyzing the differences between the activation and the shared global model transmitted between client devices and the cloud in each epoch [33,34]. A real attack example [23]: Although it is much harder to extract private information from the activation and the shared global model than the raw data due to the complexity and the implicit nature of the model, advanced techniques were recently developed to attack the model upload procedure and extract private information with some prior knowledge [35].  identically distributed in practice, the contribution of each client device is varying. Therefore, by comparing the differences between the global model M t−1 and the local model M t , the attacker can deduce which client is involved in the training process. A simple attack of the above procedure is presented in Figure 5. Activation optimization with differential privacy: To mitigate the potential privacy risk, we propose a privacy-preserving local activation technique on client devices based on the differential privacy (DP) method [21]. We introduce DP into the local activation optimization to obtain the controlled embedding table for model sharing with privacy guarantees. Figure 6 compares our proposed activation optimization method against the conventional DP methods. In conventional FL based framework, the noise is added either to the raw data ( Figure 6 (a)) or the the entire local model (Figure 6 (b)). However, this is usually computational expensive due to the sheer size of raw data and local model, particularly for client devices in ubiquitous computing systems considered in this paper. In departure from these methods, FedADMP adds artificial noise only to the activation uploaded to the cloud server ( Figure 6 (c) complete the rest of training 8: end for 10: M t+1 = K n=1 m t+1 k /K 11: end for 12: Client: 13: A t k ← forward propagation //construct noise with DP ϵ 14: A t k ←A t k + noise // apply DP 15: upload A t k to the server server continues the training process as described in Figure 1. Upon the completion of the training process in each epoch, the server aggregates results from all client devices. Here we choose the widely used Adaptive Moment Estimation (Aadm [36]) algorithm to update the model in FedADMP. Assume that there are k client devices involved in a task. Client k sends its activation A t k to the server after completing the clientside training, and the server will use A t k to complete the rest of the forward propagation. Then the server updates the local model m t+1

Training Procedure
Algorithm 1 summarizes our training process. We use the cross-entropy loss function for local training on client devices and the basic SGD algorithm for global optimization. Lines 3-11 in the algorithm describes the training process performed on the server side. First, the server randomly generates an initial parameter M 0 (line 3). The server then process the activation received from client devices (line 6) and completes the rest of the training process (line 7). Next, the server updates the local model (line 8). After all local models are updated, the server computes the global model and sends the global model to all client devices to start a new epoch (line 9). At the client side, each client performs local training on its raw data, adds noise to the activation to protect privacy, and then uploads the activation to the server (lines 12 and 13).

Discussion: Adversarial Attacks on FedADMP
A few adversarial attacks were proposed on federated learning model, including a data poisoning [37] in which malicious participants aim to poison the global model by sending model updates derived from mislabeled data, a backdoor attack [38] where the goal of the adversary is to reduce the performance of the model on targeted tasks, the inference attack [39] which reconstructs the training samples by comparing generative deep neural networks with discriminative deep neural networks to generate samples that appear to come from the training set, the server side attack [40] which uses GAN with a multitask discriminator to recover private data. FedADMP is not resistant to the above attacks. We can apply existing defense mechanisms such as the ones proposed in [41], to defend against adversarial machine learning attacks.

Experimental Results
In this section, we evaluate the performance of FedADMP using real-world datasets. All experiments were conducted on an Intel(R) core i7-9750H machine with 2.6GHz 6-core CPU, GTX 1660Ti GPU, and 16GB memory.

Datasets
We evaluate the performance of FedADMP using five data trajectories from the following three datasets. To better understand the characteristics of the data, we analyze the data and draw their distributions in Figure 7. We observe that the time interval in most of the datasets is 5 seconds. We also observe that the visiting frequency of all locations follows similar longtail distribution with Beijing trajectory having slightly higher visiting frequency.
Our work focuses on identifying suspicious objects and predicting their future locations in ubiquitous computing systems. However, the above mobility 6 EAI Endorsed Transactions on Security and Safety Online First datasets do not contain data for object recognition and hence cannot be directly used for our task. To address this issue, we perform several pre-processing steps on the above mobility datasets together with the insights from two object recognition databases: MNIST datasets [42] and Fashion-MNIST datasets [43]. In particular, we reconstruct the mobility datasets so that each data contains the four components describe in Section 2: image, time, location, and ID. After the reconstruction, each object in the recognition database corresponds to a specific ID in the mobility database. Since the trajectory data of the Geo-life database is expressed in latitude and longitude, to make the experiment practical, we follow the recent practice [10,24]

Baselines and Metrics
We compare FedADMP with state-of-the-art methods in terms of convergence, prediction accuracy, communication costs, and energy consumption on client devices. To achieve high precision, we focus on evaluating the convergence of different models in terms of wall-clock time instead of the number of iterations. This is because the model training and task execution can be time consuming, and achieving a high precision in a timely manner is of significant importance for ADMP to maintain safety in ubiquitous computing systems. For completeness, we also present the results in term of the number of iterations to show the advantange of FedADMP. We compare FedADMP against the following four state-ofthe-art methods: • PMF [23] is a FL-based privacy preserving mobility prediction framework with all computations done on client devices.
• LSTM is the simplified version of PMF model, which only contains LSTM layers.
• FedMA [44] constructs the shared global model in a layer-wise manner by matching and averaging hidden elements with similar feature extraction signatures.
• FedAvg [11] is a state-of-the-art method for distributed training with privacy guarantees.

Parameters
The

Performance results
In this section, we compare the performance of FedADMP against the above four state-of-the-art methods on five data trajectories. We observe that FedADMP dramatically reduces the time taken to achieve a certain accuracy level with significantly reduced energy consumption on client devices and no additional communication costs. Convergence and accuracy: Figures 8 and 9 compare the convergence and model accuracy of FedADMP against the above four state-of-the-art methods using time-to-accuracy and epoch-to-accuracy metrics, respectively, where the MNIST dataset is used as the object recognition database. In particular, when an anomaly appears in the system, FedADMP first identifies the anomaly and then predicts its mobility. The accuracy presented in the paper is the joint accuracy of anomaly detection and mobility prediction. Note that in most cases, the ground truth of the anomaly is not available and hence are hard to be directly quantified. However, once the targeted anomaly appears in the time series data, FedADMP can learn to predict its next location and detect/identify the anomaly accordingly. Figures 8 and 9 show that FedADMP consistently outperforms other four state-of-the-art methods although the accuracy improvement over PMF is limited in some traces. More importantly, FedADMP dramatically reduces the convergence time to achieve a particular accuracy. For example, FedAMDP achieves 60% accuracy within 10 seconds in the Seattle trajectory, while the best performed baseline (i.e., the PMF) achieves on 7 EAI Endorsed Transactions on Security and Safety Online First 15% accuracy (as shown in Figure 10). Similar observations are made in other trajectories. This property is highly desirable for the joint anomaly detection and mobility prediction task in ubiquitous computing systems since a timely decision is critical in many emerging applications as motivated in the introduction. Similar results are observed when the Fashion-MNIST dataset is used as the object recognition database; the corresponding results are provided in Appendix A for the ease of exposition. Energy consumption and communication costs: As client devices in ubiquitous computing systems are usually battery-powered and resource constrained, it is important to keep the energy consumption and communication costs low in order not to quickly drain the battery of client devices. We compare the energy consumed by client devices in different models for achieving the same accuracy. In our experiments, we assume an ideal cloud server (e.g., in data center) that always has enough computation resources. Figure 11 (Left) compares the energy consumed by client devices in different models when achieving 40% accuracy. The figure shows that FedADMP consumes the least energy. This is because FedADMP offloads part of the training tasks to the server, as described in Section 3. We also measure the size of the data transmitted between the server and clients. In our experiments, all clients are executed in parallel on the machine. As a result, the communication environment is ideal and the communication time between the server and clients are negligible. When the server and clients execute on different machines, the communication time depends on the size of data transmitted between the server and clients. Figure 11 (Right) presents the size of data transmitted between the server and clients. As client devices in FedADMP send only the activation rather than the entire model parameters to the server, the size of data transmitted between the server and clients is small. Similar observations can be made for other performance accuracy and hence are omitted here. Impact of the number of clients: Although collecting information from a large number of clients helps improve the accuracy of anomaly detection, it also increases the search space for mobility prediction. Figure 12 (a) shows that within the same amount of training time, when the number of clients increases, the model accuracy reduces slightly. This is because, when the number of clients increases, the number of training epochs increases. On the other hand, for a certain accuracy (e.g., 40%), it will take less time to achieve such an accuracy as more data is used for training at each epoch when the number of clients increases, as shown in Figure 12 (b). Impact of differential privacy parameters: We also investigate the impact of differential privacy parameter ϵ on the performance of FedADMP. Figure 13 shows that the accuracy of FedADMP increases when ϵ increases ϵ (i.e., when the differential privacy 8 EAI Endorsed Transactions on Security and Safety Online First  requirement is less strict). We also observe that, when ϵ is small (i.e., less than 40), the accuracy of FedADMP increases dramatically when ϵ increases slightly, especially for the SF trajectory. This is because the SF trajectory has more location ID numbers than others. As the relation between locations is much more diverse with more features for the trajectory data, the larger the number of location ID numbers, the less ambiguous the trajectory is. When ϵ is greater than 40, the accuracy of FedADMP is less sensitive to the DP, i.e., the marginal improvement is smaller with the change of ϵ.
We further study the security of our model based on our differential privacy (DP) and attack model described in Section 3.2. In particular, we define the attack risk as ||ls attack ∩ ls truth ||/ls truth , where ls stands for the location set, and ls attack is the estimated location set based on differences between the downloaded model and the uploaded model. Table 1 gives the attack risk of FedADMP on different DP parameter ϵ. Furthermore, Figure 13 and Table 1 together show that the more noises are added, the better privacy protection is provided at the cost of model accuracy. Thus, there exists a tradeoff between the privacy and the accuracy 9 EAI Endorsed Transactions on Security and Safety Online First    for specific conditions, which provides us a control knob based on different application requirements. The impact of model size: Since the client devices in ubiquitous computing systems are usually limited in storage, energy, and computational resources, we hope to design a model that has small number of parameters. However, models with fewer parameters usually have worse performance. As a result, it is important to achieve the desirable tradeoff between the number of parameters and the performance in ubiquitous computing systems. We conduct experiments to evaluate the impact of the model size on the accuracy of the model. In particular, we focus on the impact of hidden layers. For simplicity, we set the dimension h r of the hidden state in LSTM the same as the dimension h l of the input and output state, which denotes the size of hidden state (i.e., hidden size). Figure 14 shows that the model accuracy varies with the hidden size in all datasets. In particular, the smaller the hidden size, the worse the performance of model. Therefore, we choose 128 as the default value of the hidden size for all datasets.

Joint model vs. individual model: A natural question
is what is the advantage of FedADMP, a joint model for anomaly detection and mobility prediction, over two separate models (one for each task)? Below, we address this question by comparing FedADMP against executing two tasks separately in terms of communication costs, time consumption ,and energy cost. As shown in Figure 15, FedADMP achieves a better performance in all performance metrics. There are two major reasons. First, FedADMP reduces the training time, which results in less energy consumption on client devices. At the same time, the communication cost is reduced since less epoch is taken to achieve the same performance accuracy. Secondly, FedADMP allows to process two tasks simultaneously while having separate models for each task requires to process two models one after the other, which incurs additional overheads.
Robustness of FedADMP: Different from traditional machine learning models that run on servers in wellmanaged data centers, client devices in ubiquitous computing systems often have various amount of computing powers. This makes it challenging for the coordinator to efficiently identify and manage valuable participants. Furthermore, client devices often vary in system performance and may slow down or drop out of the network. To that end, we evaluate the robustness of our FedADMP on scenario where not all client devices are available for training and testing, and characterize the performance tradeoff in terms of model accuracy, energy consumption and communication costs.
We consider a scenario with 100 client devices, and randomly select some client devices for training and testing under the assumption that the remaining client devices are not available. For the ease of exposition, we only present results for three cases: (i) FedADMP-40: 40 client devices are used for training and testing; (ii) FedADMP-70: 70 client devices participate in training and testing; and (iii) FedADMP-100: all client devices participate in training and testing. Figure 16 presents the results of FedADMP in terms of convergence. The figure shows that when the number of devices involved in the training is small (e.g., FedADMP-40), it takes more epochs but less time to converge and the accuracy rate is lower. This is because, when fewer client devices are involved in training, less data is used for training. Hence the length of each epoch is smaller and it takes more epoch to converge at the cost of a lower accuracy. More importantly, we observe that when a relatively large number of client devices participate in the training (e.g., FedADMP-70), FedADMP achieves almost the same accuracy 10 EAI Endorsed Transactions on Security and Safety Online First with relatively small convergence time compared to FedADMP-100. One insight is that some client devices may not provide enough meaningful information for the whole system model and the information provided by client devices may overlap, and hence when enough client devices are involved, we obtain desirable results. Our observations coincide with Figure 17, where FedADMP-40 consumes the least energy (as it takes the least time to converge), but has the highest communication costs (as it takes more epochs to converge). Similar trends are observed in other trajectories, and hence are relegated to the Appendix A.

Related work
Anomaly detection and mobility prediction is an emerging topic in ubiquitous computing systems, which has recently been considered by a large number of researchers. Markov models are widely used in this area to cluster the location information from the trajectories [9] to cluster the location information from the trajectories. Performance guaranteed algorithms have been proposed to capture the tradeoff between detection latency and accuracy [45][46][47]. To deal with the issue of data sparsity, matrix factorization has been introduced and applied (e.g., [48]). With the tremendous success of deep neural networks in various domains, deep learning based models have been proposed to solve this problem (e.g., [7,8,10,24,49]). While these state-of-the-art methods achieve reasonable performance in domain-specific applications, none of them protect the privacy of the client data. Furthermore, existing works on target recognition focused on synthesize photos from sketches. For example, Tang et al. [50] proposed techniques to automatically match hand-drawn sketches of human faces with photos. Kumar et al. [51] were inspired by the characteristics displayed by attribute-based representations in other pattern recognition problems, and the desire to perform a semantic search on face images, and designed the solution to achieve significant recognition accuracy. Klare et al. [52] developed an algorithm to perform automated extraction. The proposed algorithm operated by performing facial component positioning 11 EAI Endorsed Transactions on Security and Safety Online First and alignment, followed by texture descriptor encoding and support vector regression. For target tracking, Brooks et al. [53] described a prediction-based sensor collaboration that uses estimation of target velocities to activate regions of sensors. Estrin et al. [54] developed the directed diffusion approach to move sensor data in a network that seeks to minimize communication distance between data sources and data sinks.
Privacy-preserving methods such as differential privacy [21] and k-anonymity [55] were proposed to address the increasing demand of privacy regulations. However, these methods protect data at the cost of destroying the data structure, which impacts the performance of ADMP. Federated learning (FL) was recently proposed to protect privacy of mobile devices. Many optimizations (e.g., [56,57]) have been developed to reduce the communication and computation costs of FL. However, existing FL-based models perform training heavily on client devices, which becomes a critical bottleneck for resource-constraint client devices in ubiquitous computing environments. Some other works (e.g., [58]) augmented FL with privacy through differential privacy techniques by adding artificial noise to the whole model in the aggregation stage, which imposes overhead on the model. Another line of works focuses on the split learning (SL) [29,30,[59][60][61][62], a collaborative deep learning technique that splits a deep learning network into two parts: a client-side network and a server-side network. The training of the network is done in a sequential manner where the server trains with one client and then moves to another client. However, such a sequential training engages only one client at a time and hence is not efficient. Vertical federated learning (VFL) enables multiple parties that own different attributes (e.g., features and labels) of the same data entity (e.g., a person) to jointly train a model [63][64][65]. In contrast to VFL where clients transfer the whole model to the server, our FedADMP transfers only the activation from clients to the server. The work that is most close to ours is PMF [23], which is a FL-based method for mobility prediction. However, PMF has a low efficiency for ADMP with relatively high computation and communication costs on client devices. In departure from the above works, FedADMP performs anomaly detection and mobility prediction simultaneously with privacy guarantees.

Conclusion
In this paper, we considered the joint anomaly detection and mobility prediction (ADMP) problem in ubiquitous computing systems. We proposed FedADMP, a federated learning-based framework, for ADMP. To reduce the computation loads on resource-constrained client devices, client devices and the cloud server work collaboratively on the training process. To protect the privacy of user data, each client device uploads only the activation, instead of the raw data or the whole local model, to the cloud server. We also developed a differential privacy method to further protect the privacy and strengthened the robustness of FedADMP when only partial clients devices are available for training, which is a situation often occurring for emerging applications in ubiquitous computing systems. Our experimental results show that FedADMP consistently outperforms state-of-the-art methods in terms of model accuracy with dramatically reduced energy consumption and computation costs at client devices.