deMSF: a Method for Detecting Malicious Server Flocks for Same Campaign

Nowadays, cybercriminals tend to leverage dynamic malicious infrastructures with multiple servers to conduct attacks, such as malware distribution and control. Compared with a single server, employing multiple servers allows crimes to be more efficient and stealthy. As the necessary role infrastructures play, many approaches have been proposed to detect malicious servers. However, many existing methods typically target only on the individual server and therefore fail to reveal inter-server connections of an attack campaign. In this paper, we propose a complementary system, deMSF, to identify server flocks, which are formed by infrastructures involved in the same malicious campaign. Our solution first acquires server flocks by mining relations of servers from both spatial and temporal dimensions. Further we extract the semantic vectors of servers based on word2vec and build a textCNN-based flocks classifier to recognize malicious flocks. We evaluate deMSF with real-world traffic collected from an ISP network. The result shows that it has a high precision of 99% with 90% recall.


Introduction
Malicious web activity is still a major threat to Internet. Nowadays, cybercriminals build malicious web infrastructures to supply their crimes, which makes attacks complicated and diversified. Dark infrastructures today contain multiple servers (e.g., exploit servers, command & control servers, redirect servers, payment servers). Adversaries growingly combine these servers as the platform to spread malicious content, launch attacks and monetize from crimes. A typical example is malicious redirection, which leverages exploit servers to redirect visitor to another website.
Many approaches have been proposed to identify malicious servers. Most detection systems detect malicious webs by analyzing web content [5,11,16,24], identify malicious servers by building a reputation system for an individual server [3,4,9] or find popular techniques adversaries used to avoid evasion [18,25,30,32,33]. Unfortunately, these works focus only on a single server which makes them lack the panoramic view of attacks. In addition, some servers may not easily be discovered by only analyzing the single server. Some works [2,15,19,27,34] notice the relation of servers by identify malicious redirections. Zhang et al. [34] indicate that malicious servers tend to be invisible and propose a method by analyzing redirections from visible to invisible server. Li et al. [15] leverage redirect-chains to build the topology of dark infrastructures and further recognize the dedicated malicious hosts. However, the collection of redirections is not easy while relations of malicious servers are various which not only limited to redirect.
Different from existing works, we focus on relations of servers given that increasingly attacks are conducted with multiple servers. We aim to identify servers involved in the same malicious activity, which we called a server flock, without relying on redirections. In particular, we find two features of server flocks to help distinguish flocks from DNS traffic. First, the completion of the attack requires a victim to access multiple servers continuously, in other words, servers of a flock tend to coexist in the user's access list within an interval. Second, the flock used for crimes only serves certain victims, EAI Endorsed Transactions on Security and Safety 07 2020 -10 2020 | Volume 7 | Issue 26 | e1

Research Article
Yixin Li et al.
2 which means that servers of a flock probably have stable clients.
Based on the above observations, in this paper, we propose a mechanism, deMSF, to detect malicious server flocks on a local network with only three fields: timestamp, clients and servers. We generate sever flocks in two steps: (a)we cluster servers within a client access list according to the timeline; (b)we extract final server flocks based on the similarity of clients. Inspired by a hypothesis that servers occur in the same contexts tend to be similar, we extract semantic vectors as features of servers based on word2vec and further design a convolution neural network based on textCNN to classify malicious flocks.
It should be noted that deMSF is a complementary approach to existing works. We believe that it can help detect servers that may be ignored by only analyzing a single server. In addition, it helps describing relations of servers within a malicious activity.
In summary, the contributions can be described as follows: -We present a system, deMSF, to detect malicious campaigns by recognizing malicious server flocks. Focusing on flocks rather than an individual server makes deMSF be capable of revealing the relation of malicious servers. -We design a two-step method to discover malicious flocks. In the first step, we generate flocks by clustering servers from both sequential and spatial dimensions. In the second step, we extract semantic vectors of servers and design a convolutional neural network to classify flocks based on these vectors. -We evaluate the effectiveness of deMSF with realworld data collected from an ISP network, and the result demonstrates that deMFS performs well in discovering associated servers involved in malicious campaigns.
The rest of this paper is organized as follows: section 2 introduces the background knowledge of our work. Section 3 describes the critical components of deMSF. Then we evaluate the effectiveness of deMSF in section 4. Section 5 presents the related works, and section 6 is our conclusion.

Background
Machine learning has been widely used in many fields and gets significant advantages. Our work leverages machine learning to describe semantic features of servers and identify malicious flocks. In this section, we describe the related machine learning techniques employed in our system.

word2vec
Word2vec, proposed by Mikolov [20] in 2013, is one of the most widely used techniques for learning high-quality word vectors from huge data sets with billions of words. The resulting vectors can reflect subtle semantic relationships between words, for example, vector(King)−vector(Man)+vector(Woman) results in a vector that is closest to the vector representation of the word Queen [22].

Fig.1. Two models of word2vec
Word2vec takes text corpus as input and generates word vectors. It includes two learning models, Continuous Bag of Words (CBOW) and Skip-gram. As shown in Figure 1, both are simple neural network model with one hidden layer. The former predicts the word given its context, while the latter predicts the context given a word. Compared to the one-hot encoder, word2vec generates dense vectors. Another significant advantage of word2vec is that words with similar meanings will be mapped to similar positions in the vector space.

textCNN
Convolutional neural networks (CNNs) are a specialized kind of neural network for processing data that has a known, grid-like topology [8]. CNNs are originally used in computer vision [13], while in recent years, they have been found to perform well for NLP. In 2014, Kim proposed a network named textCNN [12] for sentencelevel classification tasks with pre-trained word vectors. As shown in Figure 2, textCNN is a simply neural network with an input layer, an output layer, a convolution layer and a max-pooling layer. It takes texts as input and usually leverages word embedding to increase performance. In this paper, we design a network based on textCNN for our task. This model achieves superlative performance in malicious flocks detection. In this section, we describe our design of deMSF. The intuition of deMSF is that servers involved in one activity have strong relationships: (a) servers of a flock tend to cooccur within an interval in a client access list; (b) servers for the same campaign have similar clients. As shown in Figure 3, deMSF takes network traffic as input, and has four components: Preprocessing, Flocks Generating, Servers Vectorizing, Flocks Classifying. It leverages only three fields (client, timestamp, server) to analyze. In this paper, we take DNS logs as raw data. After process raw data and extract related fields, we generate server flocks from both temporal and spatial dimensions. Then we vectorize servers according to word2vec [21]. Finally, we build a deep learning classifier to recognize malicious flocks based on semantic vectors of servers. In the following, we will explain each component in detail.

Preprocessing
The primary goal of this step is to formalize the dirty raw data, extract valid fields and generate visit-sequences of clients. In order to reduce the data to be processed and improve the system efficacy, we first filter records according to the following rules.
-Irregular domain. There are some records in raw data with irregular domains(domains that do not conform to domain naming rules, for example, google,com), which is probably caused by mistyping or misconfiguration. -Invalid domain. An invalid domain here indicates that its TLD(Top Level Domain) is not in the list of registered TLDs presented by IANA [10]. We filter records with these domains. -Hyperactive clients. There are some hyperactive clients whose queries are greatly more than others, which are usually proxies forwarding requests for many users. In order to improve the performance of deMSF, we remove these clients cause they behave significantly different from regular clients. In detail, we remove the top H% most active clients. In this experiment, H is set to 1% empirically.
Then we formalize the data by extracting three valid fields: client, server and timestamp to generate request sequences. The form is defined as follows: R = ∪Ci is the set of visit-sequences, where Ci = {(s1,t1),(s2,t2),...,(sn,tn)} represent the visit-sequence of client i and (sn,tn) indicates that client i query server sn at time tn.

Flocks Generating
Based on the collected sequences of each client, deMSF further mines related servers that are involved in the same activity. We explore two steps to find server flocks from temporal and spatial dimensions. We give an example in Figure 4.
First, we execute clustering according to querying time. We analyze dns queries of ten clients within 900 seconds, Figure 5 shows the result: client's requests show obvious clustering phenomena in timeline. The result accords with our expectations as many network activities require more than one domain. For example, when query a web page, clients usually query other domains to download images. Besides, some programs have a static domain query list and order. We implement time clustering in a simple way: for two adjacent visits (sj,tj),(sj+1,tj+1) of client Ci, if the time interval ∆T = tj+1 −tj greater than a certain threshold τ (we set τ = 5 in this article), we divide them into different clusters. After these step, we get a time-clustered sequence of client Ci as {s1,s2,...sn}.

Fig.5. Domain queries of ten clients in 900 seconds
Second, we perform clustering in terms of the client similarity of servers. It depends on the intuition that normal clients usually don't query malicious servers while infected clients of a same malicious campaign usually query same suspicious servers. In other words, servers sharing similar client tend to belong to same flocks. We leverage Jaccard similarity to measure the connection of server si and sj: Specifically, for a time-clustered sequence {s1,s2,...sn}, we calculate the client similarity of adjacent servers sj and sj+1. If the Similarity(sj,sj+1) is less than a certain threshold γ (γ is set to 0.5 empirically), we divide them into different clusters.
Finally, as our goal is to find the correlation among different servers, the small flocks with only one server are removed. In addition, if the adjacent two servers are the same, we only keep one.

EAI Endorsed Transactions on
Security and Safety 07 2020 -10 2020 | Volume 7 | Issue 26 | e1 deMSF: a Method for Detecting Malicious Server Flocks for Same Campaign 5

Servers Vectorizing
The goal of this step is to map servers into a lowdimensional feature vector while keeping the context information as much as possible. We find that a technique named word embedding in natural language processing (NLP) is very helpful for learning features of servers. Word embedding based on a hypothesis: words that occur in the same contexts tend to have similar meanings. The same applies to servers: servers that occur in the same contexts tend to be similar. Thus we regard servers as words, a flock as a sentence, then we can learn features of servers the same as word embedding. Based on this, we leverage word2vec to learn feature vectors of servers, which can effectively describe the relationship among different servers.
Considering the time consuming and effect, we experiment with CBOW model. We implement it in Python, using the Gensim † package to generate servervectors: (a) the input layer contains 2a context servers, in this article, we set a=5. (b) the output layer contains a vector, which is the server probability predicted according to the context. In the experiment, we set the size of vectors as 128.

Flocks Classifying
As we mentioned earlier, the server can be regarded as a word and the flock as a sentence. Then identifying malicious server flocks can be seen as a text classification task. Based on this perception, we design the neural network based on the textCNN [12] proposed by Kim in 2014. The structure is shown in Figure 6, consisting of an input layer, an embedding layer, convolution layers, maxpooling layers, a concatenate layer and an output layer. And we show the parameter settings in Table 1.
(a) Input Layer. The input layer takes flocks as input. A flock can be represented as a sequence Seqflock = {s1,s2,...sn}, where n is the length of sequence.
(b) Embedding Layer. Let xi be the k-dimensional server vector corresponding to the i-th server in the sequence. A sequence with n servers can be represented as x1:n = x1 ⊕ x2 ⊕ ... ⊕ xn. The output of the embedding layer is a n*k matrix composed of server vectors of each sequence, where k is the length of vectors. † https://radimrehurek.com/gensim/  (d) Max-pooling Layer. We apply a max-pooling operation over the feature map and take the maximum value cˆ to capture the most important feature for a feature map. In this step, we get 300 features from 300 filters.

cˆ= max(c)
(e) Concantenate&flatten & output. All features are passed to a fully connected softmax layer whose output is the probability distribution over labels.

Evaluation
In this section, we evaluate the performance of deMSF using the real word DNS traffic captured from an ISP network. We first introduce the dataset used in our experiment. Then we analyze the results of server vectorizing and the effectiveness of deMSF.

DNS traffic
We obtain DNS traffic collected on the edge of an ISP network from December 20th, 2018 to December 26th, 2018. The summary of dataset is presented in Table 2. As we filter hyperactive clients in preprocessing step, we don't count them in Table 2. Ground Truth We get the ground truth from two popular online blacklists, Malware Domain Block List [7] and URLhaus [1]. Except above two blacklists, we also leverage a threat intelligence platform named ThreatBook [29] to scan all servers appeared in flocks and get their report. ThreatBook marks a server with three labels: clean, suspicious and malicious. Also some special clean servers will be marked as whitelist in ThreatBook.

Labelling
We first label servers according to ground truth we collect by following steps: (a) a server is labeled as white if it is marked with whitelist by ThreatBook. (b) a server is labeled as malicious if it is listed in any blacklists or is marked with malicious by ThreatBook. (c) a server is labeled as suspicious if marked with suspicious by ThreatBook.
(d) a server is labeled as clean if it is marked with clean by ThreatBook and not listed in any blacklist.
Then we label flocks with harsh conditions. A flock is labeled as clean if all servers are labeled as white. A flock is labeled as malicious if its threat score is greater than 3. The threat score of a flock is the average score of all its servers and is calculated by the following formula.

Server Vectorizing Results & Analysis
We expect semantic vectors of servers can effectively represent the internal relationship among servers, which means similar servers tend to have similar vectors. The internal relationship here indicates that servers have It can be seen that the semantic vectors can reflect internal connections of servers. Thus it is feasible to use the semantic vectors as features of servers.  classification, we only use labeled flocks to execute the experiment. The summary of data is presented in Table 5 and the result is showed in Table 6.
It can be seen that deMSF has excellent results. It has a high precision that all detected flocks are actually malicious flocks. It has an acceptable recall that only a few malicious flocks are not detected. This could be caused by the new threat that has weak connections with the known threat we trained thus deMSF cannot detect it. We show some examples of malicious flocks in Table 7.

Overhead
The most expensive part of deMSF is to calculate the client similarity of servers. Since we should calculate similarity among different servers and there may be a large number of servers in data. Fortunately, there are some techniques like sparse matrix multiplication can significantly reduce the complexity of calculation.

Limitation
• Single malicious servers. deMSF focuses on multiple servers involved in malicious activities or evasion techniques instead of a single server. Thus, deMSF cannot detect malicious campaigns with only a single server cause there are • no connections we can extract from these campaigns.
However, malicious campaign with a single server is very rare. • Noise. deMSF based on the query sequences of a client. It is inevitable that there are queries triggered by background activities mixed in the true continuous queries. Although we leverage the client similarity to decrease the noise, this phenomenon can not be eliminated. But it should be noted that noise is a small probability event. With the data increase, its impact is negligible. -New threat. Since deMSF leverage the inter-connections of servers according to client queries, deMSF can hardly detect completely new threats that don't have connections with before • servers. To overcome this may need other properties and data sources. It can be a topic for our future work.

Evasion
• Attackers can make internal associations between benign servers and malicious servers by mixing benign queries in malicious activities. Thus deMSF may divide malicious servers within a campaign into different flocks and delete flocks cause they only contain one server. However, we can filter popular benign servers which are impossible involved in malicious campaigns by add whitelist in preprocessing step. • Another approach attackers can use is to let different compromised clients communicate with different servers to reduce the client similarity of malicious servers. However, this may be costly for attackers, as the more bots they have, the more servers they need to register. • One more method attackers can use is increasing the time interval between two queries. While some attacks require continuous queries such as malicious redirections and DGA. In addition, researchers can adjust the time window threshold to catch them.

Universality
• deMSF is designed to monitor the traffic from the edge of a network and it only requires basic three EAI Endorsed Transactions on Security and Safety 07 2020 -10 2020 | Volume 7 | Issue 26 | e1 fields: client, timestamp and server, thus it can be deployed at most enterprise or ISP networks.
• deMSF is an automatic threat discovery system. It leverages a basic hypothesis that servers occur in the same contexts tend to have similar meanings. Then it learns semantic vectors of servers to get the internal association between them and further classifies malicious flocks from normal activities. It should be noted that deMSF does not need any defined feature rules or knowledge. • deMSF don't need researchers to manually adjust parameters to get the proper value. By training a sufficiently good model, deMSF can discover malware behaviors and exclude known nonmalicious behaviors. While the parameters can be stable and effective for a long time.

Studies focus on Individual Servers.
Many approaches concentrate on individual malicious servers to mitigate malicious avitivites. Some works analyze web content to recognize malicious webs. Liao et al. [16] develop a semantic-based technique, which leverages Natural Language Processing (NLP) to identify the bad terms most irrelevant to an sTLD's semantics and detects webpages with malicious promotional injections. Delta [5] is a system identifing malicious web sites according to the changes of sites. It extracts change-related features between two versions of the same website and identifies an infection using signatures generated from such modifications. Saxe et al. [24] propose a deep learning approach to detecting malevolent web pages operated on a language-agnostic stream of tokens extracted directly from static HTML files with a simple regular expression.
Some works construct reputation system for a single server to recognize malicious servers. Notos [3] is a dynamic reputation system for domains. It uses passive DNS query data to construct the network and zone features of domains and compute accurate reputation scores. EXPOSURE [4] employs large-scale, passive DNS analysis techniques to detect malicious domains. It extracts 15 features from DNS traffic to characterize different properties of domains and the ways they are queried. PREDATOR [9] uses only time-of-registration features to establish domain reputation to predict malicious domains when they are registered.
Some concentrate on the technique adversaries use to avoid detection. Yadav et al. [32] develop a methodology to detect domain fluxes in DNS traffic by looking for patterns inherent to domain names that are generated algorithmically, in contrast to those generated by humans. Phoenix [25] is a mechanism using a combination of string and IP-based features to tell DGA and non-DGA domains.It can find groups of DGA domains that are representative of the respective botnets. It can associate previously unknown DGA-generated domains to these groups, and produce novel knowledge about the evolving behavior of each tracked botnet. WoodBridge et al. [30] leverages long short-term memory (LSTM) networks to predict malicious domains and their respective families. Luo [18] leverages the query time lags of non-existent domains (NXDomain) to mitigate DGA-based malware without the lexical property.

Studies focus on relations of servers.
There are many studies focus on malicious redirections. VisHunter [34] investigates the visibility of servers and finds that certain malicious servers tend to be invisible to normal users. It identifies malicious redirections from visible servers to invisible servers at the entryway of malicious web infrastructures. Akiyama et al. [2] develope a honeypot-based monitoring system across four years and analyze the ecosystem of malicious URL redirections. Stringhni et al. [27] aggregate the different redirection chains that lead to a specific web page and analyze the characteristics of the resulting redirection graph. Then they detect malicious web pages by looking at the redirection chains that lead to them. Mekky et al. [19] develop a methodology to identify malicious chains of HTTP redirections. They passively collected traffic and extract statistical features which capture inherent characteristics from malicious redirection cases. They further apply a supervised decision tree classifier to identify malicious chains.
Some works leverage many other relations of malicious servers. Zhang et al. [35] utilize an unsupervised framework to infer malware associated server herds by systematically mining the relationships among all servers from multiple dimensions: client similarity, IP address set similarity, whois similarity, URI file similarity. Li et al. [15] perform a study on the topological relations among hosts and find that dedicated malicious hosts are well connected to other malicious hosts and do not receive traffic from legitimate sites. They develope a graphbased approach that relies on a small set of known malicious hosts as seeds and results in an expansion rate of over 12 times in detection. Lee et al. [14] construct a domain travel graph based on the sequential correlation of DNS, cluster domains using the graph structure and determine malicious clusters by referring to public blacklists. Sun et al. [28]model the DNS scene as a Heterogeneous Information Network (HIN) consist of clients, domains, IP addresses and their diverse relationships. They leverage a transductive classification method to detect malicious domains with only a small fraction of labeled samples. Liu et al. [17] analyze a new attack infrastructures named shadowed domain. They propose a system to detect these domains from two dimensions: the deviation from legitimate domains under the same apex and the correlation among shadowed domains under a different apex.

Studies using embedding in security
Xu et al. [31] propose a neural network-based model to generate vectors based on the control flow graph of each binary function. Then the cross-platform binary code similarity detection problem can be done efficiently by measuring the distance between vectors. Popov [23] proposes a method applying word2vec technique for extracting vector embeddings of machine code instructions. And further build a convolutional neural network-based classifier using extracted vectors to detect malware. Ding et al. [6] develope a representation learning model named Asm2Vec to construct feature vectors for assembly code. It takes assembly code as input and does not require any prior knowledge such as the correct mapping between assembly functions. It can find and incorporate rich semantic relationships among tokens appearing in assembly code. Shen et al. [26] calculate the vector of an attack step by considering the entire attack sequence as a sentence, and each step as a word. They develop attack2vec to understand the emergence, the evolution, and the characteristics of attack steps in relation to the wider context in which they are exploited.

Conclusion
In this paper, we focus on the servers that are involved in the same malicious campaign. We learn the features of vectors leveraging the querying relationships among different servers and propose a novel approach to detect malicious activities using a neural network based on server semantic vectors. deMSF first mines server flocks from both temporal and spatial dimensions. Further it generates server semantic vectors with the techniques developed in the area of natural language processing, which can effectively model the internal connection among servers. Finally it recognizes malicious flocks by a deep neural network based on all server vectors of a flock. The feasibility of deMSF is demonstrated with one week logs acquired from real-world, and the results show that deMSF achieves a high precision of 99% with 90% recall.