Device Authentication Codes based on RF Fingerprinting using Deep Learning

In this paper, we propose Device Authentication Code (DAC), a novel method for authenticating IoT devices with wireless interface by exploiting their radio frequency (RF) signatures. The proposed DAC is based on RF fingerprinting, information theoretic method, feature learning, and discriminatory power of deep learning. Specifically, an autoencoder is used to automatically extract features from the RF traces, and the reconstruction error is used as the DAC and this DAC is unique to the device and the particular message of interest. Then Kolmogorov-Smirnov (K-S) test is used to match the distribution of the reconstruction error generated by the autoencoder and the received message, and the result will determine whether the device of interest belongs to an authorized user. We validate this concept on two experimentally collected RF traces from six ZigBee and five universal software defined radio peripheral (USRP) devices, respectively. The traces span a range of Signalto- Noise Ratio by varying locations and mobility of the devices and channel interference and noise to ensure robustness of the model. Experimental results demonstrate that DAC is able to prevent device impersonation by extracting salient features that are unique to any wireless device of interest and can be used to identify RF devices. Furthermore, the proposed method does not need the RF traces of the intruder during model training yet be able to identify devices not seen during training, which makes it practical.


I. INTRODUCTION
We introduce a novel method for authenticating IoT devices with wireless interfaces based on their radio frequency (RF) signatures called the Device Authentication Code (DAC). DAC exploits the potential of RF fingerprinting and the feature learning power of autoencoders. DAC similar to the message authentication code (MAC) approach to message authentication in cryptographic network security applications. With the DAC any IoT device with a wireless interface that transmits a wireless can be authenticated and the integrity of the transmitted signal verified because the method exploits the features inherent to the device alone that uniquely distinguishes that device from any other device.
The process of device authentication using the DAC scheme is depicted in Figure 1, an AE is trained to reconstruct the inputs given to it by minimizing the reconstruction error. This reconstruction error is used as the device authentication code (DAC). A device passes the signal to be transmitted through a pretrained auto-encoder (AE) based model. The signal is concatenated with the reconstruction (DAC) and transmitted. At the receiver, the received signal is decoupled into the original signal and DAC. The signal is then passed through the AE model deployed at the receiver to generate another DAC. Device authentication is done by comparing the two DAC using the Kolmogorov-Smirnov statistic. An exact match in both DAC authenticates the transmitting device. If confidentiality is required, the encoded version of the signal could be transmitted instead of the raw signal.
The identification of IoT devices based on the physical characteristics of their built-in components is applicable not just for authentication but also for tracking both the device and its user [1]. The fingerprints are typically created during the manufacturing process as the base materials of the components are created. Creation of fingerprints are usually accidental or inherent to the process. However, it is possible to generate and insert them on purpose. In either case, the RF fingerprints are a result of minute variations in the electronic components [1]. When appropriately analyzed, RF fingerprints can be exploited to identify and distinguish one device from another, even from the same manufacturer model [1]. Unlike RF features, identifiers at other layers such as MAC addresses and International Mobile Subscriber Identity (IMSI) are relatively easy to impersonate [2], [3].
In our previous work [4], we proposed an intrusion detection model pipeline based on RF fingerprinting combining deep learning, dimension reduction and clustering models. After training, the model pipeline clusters the RF traces from authorized devices into separate clusters, corresponding to each authorized device. When RF traces from an unauthorized device goes through the pipeline, it is clustered as a new cluster (the intruder). This framework can the detect presence of an intruder by determining the amount of unique clusters (devices) in the perimeter of interest. However, this model cannot ascertain which RF trace belongs to an intruder. The DAC is a more holistic and robust approach that mitigates this limitation.
The rest of the paper is structured in the following manner: In Section II, we present background on components from which the inspiration of our work is drawn. An explanation of the proposed approach is given in Section III. In Section IV information on experiments including data collection and analysis of results are presented. Related work are presented

II. BACKGROUND
Device identification via RF fingerprinting typically involves data collection, processing, feature extraction and device identification. In this work we apply deep learning to automate these tasks. This is because its success in feature learning, and classification accuracy across multiple domains such as computer vision, speech, natural language and signal processing. However, in supervised deep learning samples and labels from all classes of interest must be present during training. During inference, a test sample from a class not observed during training will be classified as one of the already seen classes. For this reason we adopt unsupervised learning.
Message authentication comprises methods used to verify the identity of the sender and/or the integrity of the message (meaning it has not been modified, deleted or is being replayed). Traditionally, the main methods for performing authentication are: message encryption, the use of message authentication codes (MAC) and hash functions. In this section we briefly introduce the MAC approach.

A. Message Authentication Code (MAC)
Consider sending a message M between two parties A and B that share a secret key K. To transmit a message to B, A uses K to create a message authentication code (M AC S ), a fixed sized cryptographic checksum and function M AC = C(K, M ) of the message and the shared key. The MAC is appended to the message and transmitted. B applies the MAC function on the message and generates a new M AC R using the secret key. The newly generated M AC R is compared with the received M AC S . If M AC S = M AC R , then: 1) B is assured that M has not been altered because if an intruder modifies M without modifying M AC S , then M AC R will not match M AC S . Also, the intruder cannot modify M AC S to reflect changes in the message because he does not have K. 2) B is also assured that the message came from A because only A has K required to generate a message with the correct MAC. The above scheme describes the approach to authentication. There is no confidentiality because anyone can have access to M . Traditionally, confidentiality is introduced by encrypting M either before or after the MAC procedure. The later is the popular choice. However both parties need two set of keys.

B. Autoencoders
Autoencoders (AE) are neural networks with the objective of reconstructing data input into them ( Figure 2). Mathematically, the autoencoder attempts to learn the identity function: by minimizing the "reconstruction error" between the input and its reconstruction given as: First the AE learns an "encoded" representation of the data, by extracting the inherent structure in the data [5]. Learning the encoded representation can be achieved by restricting the number of nodes in the encoding layers as in undercomplete autoencoders [6]. Overcomplete autoencoders learn structure by imposing other regularization constraints on the encoding layer such as sparsity as in sparse autoencoders [7], or addition of noise as in denoising autoencoders [8].
Convolutional autoencoders (CAE) exploit spatial relationships in data by weight sharing [9]. AEs can be extended to

C. Kolmogorov-Smirnov (K-S) test
The K-S test is a non-parametric test used to ascertain whether a sample comes from a population whose distribution is known, or whether the distribution of two populations are the same. In the one-sample test, a one-dimensional probability distribution is compared to a reference probability distribution. In the two-sample test two samples from two distributions are compared. If we define the empirical distribution function (EDF) F n for n independent and identically distributed (i.i.d) observations X i which are ordered as: where I [−∞,x] (X i ) is an indicator function equals 1 when X i ≤ x and 0 otherwise. Then the K-S statistic for another EDF F (x) is: where sup x is the supremum function of the set of distances. The K-S statistic converges to 0 as n goes to infinity if the sample is from the distribution F (x). Analogously, for the two-sample test , given two empirical distributions F 1,n and F 2,m with sample sizes of n and m, respectively, the K-S statistic for the first and second sample is Given a specified level α, the null hypothesis can be rejected for large sample sizes if where in general It is possible to set confidence limits on F (x) such that for the test statistic D α , if P (D n > D α ) = α then F (x) will be contained in F n (x) and a tolerance of width ±D α with a probability of 1 -α. The null hypothesis is that both samples are drawn from the same distribution and the pvalue is a measure of similarity. If the p-value is "small", the null hypothesis should be rejected. The KS-test measures the distance between the empirical distribution functions of both samples without any assumptions about the distribution of the data. Unlike the t-test, K-S test is robust to scale changes and it is not restricted to identifying changes only in the mean.

A. Problem Statement
We herein restate the problem for the purpose of emphasis. There are n RF devices with wireless interfaces. All devices are of the same make and model and are considered identical (figure 4). All devices are made to transmit the same information. This procedure is repeated but at different noise levels. One of the devices is considered an intruder. Another constraint is that the RF traces from the intruder device are not available for the training of the model. The objective is to authenticate any device of interest and identify the intruder.

B. Mathematical formulation
We represent the manufacturing process for a batch of RF devices such as sensors as: S o represents the features common to every device in the batch and is required for any device to pass quality control. µ M accounts for minor differences and uncertainties due to the imperfection of the manufacturing process. An autoencoder as a function: is a mapping that encodes an input X to a latent representation Z, and decodes Z to recover X. Since there is no explicit formula forX, what has been done instead is to minimize: during training using data. In wireless communications there are also environmental factors ε, such as channel fading, thermal noise, and effects of device mobility, that are superimposed on the received signal. Therefore the training data can be denoted as: which is a mapping generated by the underlying stochastic process that consists of S which contains the manufacturing uncertainty µ M , and ε which contains environmental uncertainty. Based on the premise of RF fingerprinting, µ M is unique to every RF device. Hence, for a batch of devices: it is possible to identify S i ∀i ∈ N using a method that is robust to the environmental uncertainty ε that affects the data X obtained from the batch of devices. We show experimentally that this can be achieved using an autoencoder and a two-sided K-S test.

C. Intrusion Detection with Autoencoder and K-S test
Autoencoder based models have been used for anomaly and novelty detection in other domains [11], [12]. Autoencoders were used to detect abnormalities in machines by detecting abnormal operation sounds [13], and to detect anomalies in video frames [14]. The idea is based on the fact that a trained autoencoders will produce a low reconstruction error for data from the same or similar distribution as the training data but a high reconstruction error otherwise. Hence the reconstruction error is thresholded and used to identify anomaly.
However, for our specific problem of interest, the devices are of the same make and model, and may transmit identical signals (see Figure 6). In this case, the threshold approach does not work. Figure 3 shows the reconstruction error during inference for a CAE trained on five out of six devices with RF traces from one device left out. It is obvious that identifying a novel device (say, any one of the six devices not used in training) with a single threshold will not suffice. Instead we require a metric that can capture and differentiate between distributions. Figure 5 depicts the DAC process. An autoencoder is trained on the RF traces from authorized devices. During inference, the distribution of the mean square error (MSE) between signals and their reconstructions will be unique to each device. This holds true for devices of the same make and model, and transmitting identical signals. The MSE are analogous to a fingerprint, and we use them as the DAC in this study.
For an application such as device registration on a network, the weights of the trained AE is deployed in each authorized device as the secret key. Every device will be required to transmit a predetermined signal upon start-up. The start-up signal is concatenated with its DAC before transmission. A similar setup applies for message authentication for intrusion detection except that any signal can be transmitted by a registered device. The receiving device decodes the signal and reconstructs the original signal using its own key. The DAC is compared to the DAC received from the sending device. A match (i.e., a K-S statistic of 0 and p-value of 1) means that the device is an authorized (pre-registered) device, otherwise the device is flagged as a new device (possibly an intruder).
In the device registration scenario, even if an intruder knows the signal being transmitted and attempt to use a device of the same make and model to transmit an identical signal, his DAC will not be a match at the receiver because the intruder does not have the exact same AE weighs (key). Furthermore, if the message to be transmitted requires confidentiality, the encoded form of the message obtained from the encoder of the autoencoder can be transmitted instead of the original signal. This adds confidentiality and another layer of security. This approach is very secure because the probability of obtaining the exact set of network weights as installed in the device is very low. Devices. An NI USRP-293x configured in receiver mode using LabVIEW was used to capture the RF traces. All ZigBee devices were configured to transmit at 0, -1, -5, -10, -15dBm SNR, and the USRPs to transmit at [-10dB, 10dB] in steps of 2dB. This is done to simulate a real life scenario which RF traces from one of the devices are not used for training the model ( figure 4). The RF traces consist In-phase (I) and Quadrature (Q) vectors. A sample of the identical RF traces obtained from the ZigBee devices is shown in figure 6.

IV. EXPERIMENTAL RESULTS
2) Training and Authentication: The raw RF traces from the authorized devices is used to train the AE-based model. The model performs automated feature extraction and dimension reduction on the data and during inference, a K-S test is performed on any RF trace to ascertain if it is from an authorized device.
The performance of the proposed approach is evaluated on the generated datasets. As previously stated, one device is considered the intruder and left out of training. 90% of the IQ samples from the other authorized devices at varying SNR levels are used for training, half of the remaining 10% are used for validation and the remaining 5% of samples are mixed with IQ samples from the intruder class for testing. It is worth mentioning again that the data are collected for each device at different SNR levels and combined in order to mimic multipath effects, variation in channel conditions as well as noise. Table I shows the KS statistic and p-values when comparing the DAC of the raw RF traces of all devices. We can consider the rows and columns to represent the sending and receiving devices respectively. For example in row one, the DAC for RF traces from device 0 is compared with DAC computed at the receiver on the RF traces obtained from every device including device 0 itself. The same is done every RF device of interest. The first and second element of the tuple in every cell of the table is the K-S statistic and p-value of the K-S test Fig. 6: Sample of RF I and Q data captured from all ZigBee devices respectively. As previously mentioned, the null hypothesis is that both samples are from the same distribution.

B. Results and Analysis
The null hypothesis cannot be rejected if either the statistic is very low, or the p-value is high. Typical values considered as low are in the range [0, 0.1]. On the other hand, the null hypothesis should be rejected if the K-S statistic is high and the p-value is very low. In other words, a K-S statistic in the range [0, 0.1] and p-value in the range [0.9, 1.0] indicates that the device of interest is an authorized device. On the other hand, for a K-S statistic of greater than 0.1 and p-value less than 0.9 indicate that the DAC is from a different distribution and the device of interest is not authorized.
The values of interest in Table I are highlighted in bold. The first thing to observe is that every cell in the diagonal of the table contains (0.00, 1.00). This indicates a perfect match and it is intuitive since the RF trace is from the authorized device. Secondly for every other cell, the first element of every tuple are very small values. This shows that the raw RF traces from all the devices according to the K-S test are considered almost identical. Hence the performance of the DAC approach will be based on how far away from zero the K-S statistic is or how close to one the p-value is. Only an authorized device will have the values (0.00, 1.00) signifying a DAC match. Table II show a similar comparison done in Table I but on the DAC obtained for every device using our approach. It can be observed that the values for the K-S statistic here is much higher than in table ?? for the raw RF traces. In fact only one comparison has a K-S statistic of approximately 0.2. If we consider 0.1 to be the minimum threshold below which we cannot reject the null hypothesis, then the CAE model performs quite well and is able to produce very discriminatory features, even though the original data (RF traces) are identical.
Tables III and IV show the performance of the CAE model trained on ZigBee and USRP RF traces respectively collected at single noise levels as well as on RF traces from all noise levels mixed together. The device of interest is device 5 and 2 for the ZigBee and USRP devices respectively. As expected, the model performs better when tested at single noise levels especially for the ZigBee devices. However the model still performs very well on data containing RF traces from all noise levels. This is important because in real life scenario, the wireless signals will seldom be at one noise level due to environmental factors. Hence it is important that the model is robust to these different phenomena.
In summary, it is evident that DAC approach is successful at exploiting device inherent features. In an authentication scenario, if the K-S statistic and the p-value between the DAC of the data received by a sending device, and that of the reconstruction error produced by receiving device are 0 and 1 respectively, then that device is an authorized device. Otherwise that device can be flagged as an intruder. Furthermore, it has been shown that the model is robust to varying SNR levels in the transmitted signal.

V. DISCUSSION
According to Baldini in [1] the major requirements for fingerprinting as adopted from the biometric domain are: 1) universality, meaning that every device must be identifiable by the characteristics of its built-in electrical components. 2) uniqueness, meaning that no two devices should have the same fingerprint signatures or physical characteristics. 3) permanence, which requires that the features must be invariant to time and environmental conditions. 4) collectability, which requires that the characteristics must me quantitatively measurable All requirements may not be satisfied simultaneously or to the same extent for all components in an IoT device. Furthermore, in current state-of-the-art identification approaches, the features may be time varying, dependent on the environment, or in some cases may not be adequately discriminatory for the identification of the device [1]. Some methods adopt a statistical approach to feature extraction, others methods formulate RF fingerprinting as a classification problem and apply supervised machine and deep learning techniques on the statistical features. There are methods that propose automatic feature extraction from the raw RF signals using deep learning and others adopt unsupervised learning methods.
In [15] the authors apply a statistical approach called "Radio Frequency Distinct Native Attribute (RF-DNA)" for passive interrogation of microwave devices. The RF-DNA is a novel method developed by the Air Force institute of Technology in 2006. RF-DNA extracts the variance, skewness and kurtosis features from three statistical properties of the signal namely: the instantaneous amplitude, phase and frequency of the received signal. The statistical features are then concatenated to form the fingerprint or "DNA". Multiple discriminant analysis (MDA) and a maximum likelihood classifier (MLC) were applied for dimension reduction and classification respectively. The authors also applied non-parametric random forest and Adaboost classifiers using RF-DNaA features in [16] and obtained better results than the MLC. Extending this work, Bihl et. al [17] applied multiple-discriminant analysis to reduce    [18] adopt the same RF-DNA method to address physical layer security for cognitive radio networks.
The authors in [19] employed a symbol-based statistical RF fingerprinting approach for identification of fake base station (FBS) in a cellular network. In their work, leverage is made on the fact that the amplitude and phase errors introduced in the transmitted signal will be larger for a FBS compared to a real base station (RBS). The non-linearity introduced by the power amplifier of several software defined radio in the FBS is measured at the user equipment using a second-order symbol-based error vector magnitude (EVM) approach. After this, the structure of the noise in the signal the UE receives is determined using the kurtosis; specifically the second and fourth order moments. The authors assert that the kurtosis on the magnitude of the noise structure is an effective indicator that can be used to identify a FBS.
Patel in [20] proposed the use of non-parametric methods for feature generation such as mean, median, mode and trend rather parametric methods such as variance, standard deviation, skewness and kurtosis. By testing this using a random forest classifier, he asserts that features obtained from non-parametric features result in improved classification. In [21] the authors tackle the problem of identifying a node by its fingerprint. In their work they use features extracted from the complex amplitude and phase angle of the WiFi signals. Hilbert transform is applied as data prepossessing. They however discard the phase profiles and use the amplitude profiles for feature extraction. Principal component analysis (PCA) was used to reduce the dimension and the reduced data is fed into a neural network for classification.
There are many other works that follow the approach of statistically generating RF fingerprints and feeding them into a classification model such as [22] [23] [24]. Similar to this work some work such as [25] actually perform classification on the raw RF data. These works do not consider that training data of some classes are unavailable. Moreover, in most of the works, the proposed approaches are tested on data obtained at singular noise levels, which does not always depict real life scenario.
An anomaly detection technique based on a deep predictive coding neural network, for analyzing RF spectrum in wireless systems was proposed in [26]. In their work, frequency-domain data obtained from time-domain data were stored as sequential 2D images. The image sequences were then fed into a deep learning video predictor which attempts to predict the next frame from previous frames. Anomaly detection is triggered when there is a deviation between the actual and predicted spectrum behavior. In [27], an anomaly identification method in temporal-spectral data was proposed. They generate models from historical data and compare the historical data with realtime data for intrusion detection. These approaches do not require RF data from all classes. However, there is a challenge of specifying what normal system behavior means, as well as defining an appropriate threshold.
There have been recent works that have applied generative models. The very recent work by Roy et.al in [28] proposes the use of GANs for the detection of rogue RF transmitters. They use the generator model of the GAN to learn the sample space of the IQ values of the authorized transmitters. They then generate fake signals that mimic the transmissions of the authorized transmitters from the learned representation. The authors test their model with fake data generated by the model, as opposed to actual data from an unauthorized device. Furthermore, data was collected at one SNR level of 45dB which infers very strong signal. Also, more recently the authors in [29] introduced a method called spectrum anomaly detector with interpretable features (SAIFE) which is an anomaly detection framework for wireless spectrum based on adversarial autoencoders. In their work, the authors make use of power spectral density (PSD) data. They showed that their model can achieve a constant false alarm rate in an unsupervised setting, whereas in a semi supervised setting, the model is capable of learning intuitive features such as class center frequency and signal bandwidth. Their approach achieves very impressive results while being exposed to just 20% of labeled data samples.
To the best of our knowledge, most of the existing methods in the literature do not consider the unavailability of data from some classes (e.g., intruder) during training. For the more recent works that use generative models and consider identifying novel classes, the models are trained and tested with data of only one SNR level which may not be representative of certain real life scenarios. Furthermore, to the best of our knowledge, this is the first work that combines deep learning with information theoretic approach to authentication in the RF domain. Our approach has been verified on data which contains different SNR levels to represent the real life effects of varying channel conditions, device mobility and noise.

VI. CONCLUSIONS
In this work, we propose a novel framework for intrusion detection based on RF fingerprinting using deep learning. Specifically, the problem of identifying an authorized device or an intruder from a set of devices of the same make, model and manufacturer sending the exact same information is considered, and a novel concept of Device Authentication Code (DAC) is proposed. In the proposed framework, an autoencoder is used to automatically extract features from the RF traces, and the reconstruction error is used as the DAC, and this DAC is unique to each device and the particular message of interest. Then Kolmogorov-Smirnov (K-S) test is used to match the distribution of the reconstruction error generated by the autoencoder and the received message, and the result will determine whether the device of interest belongs to an authorized user. We validate this concept on two experimentally collected RF traces from six ZigBee and five universal software defined radio peripheral (USRP) devices, respectively. Experimental results demonstrate that DAC is able to prevent device impersonation by extracting salient features that are unique to any wireless device of interest and can be used to identify RF devices. Furthermore, we show that our method is robust to changes in channel conditions, mobility and varying signal strength. It is worth noting that the proposed method does not need the RF traces of the intruder during model training yet be able to identify devices not seen during training, which makes it practical.