SAM Centrality: a Hop-Based Centrality Measure for Ranking Users in Social Network

Majority of researcher are attracted by the social network analysis due to the rush of people towards social network. Along with many problems, social network analysis is facing an interesting problem that is ranking of users in social network which is gaining more attention due to the increasing number of social users. Measuring centrality of nodes in a social graph, have been important issue in social network analysis. Lot of centrality methods have been proposed in this regard. In this paper, hop based centrality measures called SAM is purposed. To investigate the measure, we applied on various dataset. In comparisons, on all these social graphs, we obtain better results than other centrality measures (i.e., Degree, PageRank, Betweeness and Closeness) using SIR model.


Introduction
Social network is defined as a string of people, groups and their confidential connections [42].It may be online through interactions on networking sites (such as Facebook, Twitter, LinkedIn etc.) or offline through face-to-face contacts [40] on public places (such as Universities, Schools, Colleges, Conferences etc.).Moreover, family network (formed by member of family) [35], farmer network (formed by farmers from the village) [49], business network (formed by business menŠs in the business) [19], employee network (formed by employees in the organizations) [48] and player network (formed by players on the ground) [32] are other examples of offline social networks.Nowadays, majority of people are using social apps on smart phones to connect with the people of similar interest.Lately, popular online social networks like Facebook, Twitter and LinkedIn have gained immense attractiveness in our life.Colleagues, individual, groups, etc. all are connected with each others on social networks.So, social network is house of spreading news [21], marketing product [44], targeting groups [4] etc. To carry out these types of tasks in social network, identifying the central or influential or significant node is challenging task.Centrality specify the most influential or central or significant node in the social network.The centrality of nodes, or detection of influential nodes which are most central than others, has been a basic issue in social network analysis.Suppose there are 75 participants in a meeting.Chair person of the meeting is assuming to be a central.Vice chancellor of the university among the professors is assuming as a influential.Monitor in the class room is assuming as an influential student.Main branch of the bank among other banks in a country is assuming as central branch.For sure, the detection of some nodes with high centralities, which are considered significant, is most valuable in numerous fields, such as detection of vital proteins and candidates for drug target [13], catastrophic outages prevention in the internet or power grids [34] [33] [2], detection of individuals or groups for advertisement of e-commerce items [26] [31], eruption of epidemics controlling [38] [12] and so on.
As time is most important [23], with the passage of time lot of centrality measures have been invented, some of them are Closeness [5], Degree [16], Betweenness [15] [8], Coreness [24], PageRank [9], H-Index [30], DPRank [28], LeadRank [29] etc. Lu et al. [6] have done a great survey for the identification of influential vertices in the complex networks.The significance of the node is basically impressed by the topological structure of the social network it affiliated to.Indeed, the majority of centrality measures use the structural information to identify the vital nodes in networks.In general, it is duty of centrality measure to calculate and assign ranking score to every node in the network, where ranking scores are expected to give ranking of nodes according to their significance.According to Lu et al. [6], on the base of structural information, centrality measures are categorized into path-based and neighborhood-based centralities.Neighborhood-based centrality measures indicate that whether two vertices are connected indirectly or directly by any path.In other words, centrality measures that belong to this category are used to find the capability of node to get in touch with all other nodes.Likewise, path-based centrality measures tries to find the shortest path from the starting node to ending node.
In reality, occasionally huge amount of data arise in the network.Therefore, due to high time complexity, it is difficult to measure centrality.Centrality matrices also perform an important role in detection of most powerful and vital elements in huge amount of data.Indeed, every centrality measure identifies influential nodes from different perspective.Time complexity is also another important component of the social network analysis.Based on the different concepts of influence of the nodes or edges, variety of centrality measure were proposed and exerted in suitable area.In this paper, we have investigated the influence of the node via hop distance from other nodes.We argued that as hop distance of node u increased from v the influence of node u decreased.The key contribution of our paper is the proposal of new centrality measure SAM based on the hop distance of nodes.SAM centrality measure is proposed to rank the users in the social network.
Reminder of this paper is structured as follows: Stateof-the-art centrality measures from the literature are discussed in section 2; In section 3, definition and methodology of SAM centrality is presented; Results of applying centrality measures on real datasets are shown in section 4; Finally, we conclude the contribution of this paper along with future direction in section 5.

Literature Review
A well-known task in finding influential nodes in social network is to measure the centrality of every node.According to some centrality measures, the nodes with maximum centrality have more capability to influence the other nodes in the network.Likewise, minimum centrality shows that nodes are not much powerful to influence other nodes the network.For the accurate prediction of vital nodes in the social network, it is more desirable to use centrality measures to discover the most influential nodes.In the last few decades, majority of centrality measures have been proposed as shown below.
In 1950, during the study of communication network, the initial work on centrality measure and its application was defined by bavelas [6].In 1953, Shimbel proposed stress centrality to determine the quantity of communication which works on the shortest paths [45].Katz [20] presented Katz centrality to determine the power of a node in the network.For determining the influence of the node, Katz centrality considers all the possible paths in the network.Moreover, Katz stated that the path with shortest length plays an important role in finding influential nodes.In 1965, beauchamp [7] revised the meanings of bavelas's [6] centrality measure and find out the limitations of centrality measure.Besides, proposed an improved centrality measure to extend its utility.After a while, in 1996, Sabidussi [39] professed about Beauchamp [7] improved centrality measure and gave a new definition of centrality.Moreover, new centrality measure is tested for fulfilling the constraints of that definition.Nieminen [37] altered some postulates of sabidussi [39] centrality index and proposed another centrality measure based on degrees of nodes for undirected network graph.In 1978, Freeman [17] revived the research in the field of find influential nodes by proposal of three centrality measures.The first measure is the total effect which indicates the total effect of the node on other nodes in the network.The second measure is the immediate effect which indicates the quickness with which the nodeŠs total effect perceived.The third measure is mediative effect which presented the degree of centralization for the whole network.Stephenson et al. [46] invented the information centrality which indicates that the transmitted information between two nodes in the connected network can be helpful to find out vital nodes in the graph.In 1991, Freeman et al. [18] again presented a novel idea to find the influential nodes by using the network flow.This measure was same as Freeman [17] but little bit different from the original one.In 1994, Borgatti [47] modified the centrality measure of Freeman [17] and proposed betweenness centrality for the undirected network graph to further directed network graph.In 1999, Borgatti and Evertt [14], extended the state-of-the-art three centrality measures to apply for the individual as well as the groups.Recently, Samad et al. [41], evaluated state of the art centrality measures in order to recommend paper citation.

METHODOLOGY: SAM CENTRALITY
First we have proposed a centrality measure namely SAM.In addition, we give a brief picture of real world datasets with ground truth.As the PageRank, Betweeness, Degree and Closeness are targeted by the majority of authors and most commonly used, we treat them as a benchmark centrality measures to compare with our proposed centrality.Also, we have tested our centrality measure on other real world datasets without ground truth.

Degree Centrality
In a social graph G(V,E), where V represents the set of nodes and E represents the set of ties between nodes, the degree of node xi, is defined as the count of directly connected adjacent nodes of k i = j a i,j .In mathematics, a i,j represents i th row and j th column of social graph.Where, a i,j = 1 means there is a link between nodes and a i,j = 0 means nodes are not connected to each other.Furthermore, degree centrality is widely used to identify influential nodes in different networks graphs.If node has more connected neighbors, the more chances to become an influential node.For the purpose of finding influential nodes in different network graphs, normalized degree centrality is calculated by Equation 1.
Where, k u is the number of neighbors of node i, n = |V | is total nodes in the graph and n − 1 is the possible maximum possible degree of any node.

Betweeness Centrality
Betweenness centrality was firstly introduced by Bavelas [27] in 1948.Important of node is decided on the stay of node on shortest path surrounded by pair of nodes in the network.Usually, between v s (source node) and v t (target node), many shortest paths exists.The influence of v u is computed by counting all paths on which v u exists.Hence, betweenness of v u is computed as follows in Equation 2.
Where σ st (u) representing the total number of shortest paths between node s and node t.While, σ st representing the number of those shortest paths which are passing through u.Moreover, betweenness centrality is known as global centrality measure.Furthermore, the normalized betweenness centrality is defined as follows in Equation 3.
Where n is the size of network.

PageRank Centrality
PageRank is used by search engine of Google to rank the web pages.It identifies the rank of websites through random walk in network that is based on the connection of different web pages.It identifies the significance of web page by considering the quality and quantity of linked pages.At first, every node assigned one unit as a PR value.After that, each node equally shares PR value with outgoing neighbors.In mathematics, PR value of each node at t step is calculated as follows in Equation 4.
Where n representing the total nodes in the network, k out j representing the out degree edges of node k j .The process will stop when PR values of every node will reach the steady state.

Closeness Centrality
In case of identify the influence of node u, Closeness centrality summarize the distance of node u to all other vertices in the network and defined as follows in Equation 5.
Where N = |V | is set of nodes in the network.On the other hand, d u , v is distance between u and v.For comparisons with other centralities closeness centrality is defined as follows in Equation 6.Where n − 1 is representing all other nodes in the network except u for which closeness centrality is computing.Closeness centrality is considered better index than degree as it consider both direct and indirect links.The limitation of this measure is that it is not good for disconnected networks, because finite distance cannot exists between two nodes.

SAM Centrality
Hop distance Analysis.A lot of researchers invested their research effort to understand online social networks (such as Facebook, Twitter etc) [43].Social networks have many properties like small-world, degree distribution, scale-free [36], cluttering [27].Use of the separation metrics(i.e., diameter, radius, eccentricity, average path length etc), which are based on shortest paths of all pairs [11], is mandatory to quantify these properties.These separation metrics are widely used in social networks to study and analyze the overall graph structure.Some commonly used metrics are as follows: 1. Graph Radius: Graph radius is the minimum eccentricity of any node in the graph.Considering Figure 2, Graph radius is 2 as node 4 has minimum eccentricity 2 in the graph.
2. Graph Diameter: Graph diameter is the maximum eccentricity of any node in the graph.Considering Figure 2, Graph diameter is 4 as nodes 1,7 and 8 have maximum eccentricity 4.
3. Eccentricity: Consider shortest paths of node 1 to all other nodes as shown in Figure 2, then the longest shortest path is 4 from node 7 and 8 that is the eccentricity of node 1.
4. Average Path Length: Average path length is defined as the average of all paths.
5. Shortest Path: Considering Figure 2, their are two paths between node 2 and 6, at first path node 4 exist, at second path node 1, 3 and 4 exist.So, the shortest path between nodes 2 and 6 is 2.
6. Hop Distance: Hop distance [25] is the number of intermediate nodes between source and destination nodes.Considering Figure 2, there is 2 hop distance between nodes 1 and 4.
Furthermore, these metrics are performed in the analysis Facebook social graph by nadeem at al. [1].
They have analyzed almost 957K unique Facebook users.According to the results, they have found the following stats: • 22 is the Highest degree of any node.
• Out of all, 26.96% nodes have only 1 degree.
• Diameter of network is 34.
• Average path length is almost 14.
• Average degree of nodes is 3.051.Which means, most of the nodes are indirectly connected to each others.

Method.
In this paper, we have proposed a SAM centrality based on hop distance in the network.This finds the influence of user by considering the hop distance and number of connected nodes on hop distance.For example, consider a graph in Figure 3, where node 1 is directly connected with nodes 2 and 3 and both nodes are at hop h = 1 from node 1.Likewise, nodes 4, 5, 6 and 7 are indirectly connected with node 1 by hop distance h = 2. Here, nodes 4, 5, 6 and 7 will be given less weight then nodes 2 and 3, which are directly connected.So, this process will end at hop h.In this way, centrality of every node will be computed as follows in Equation 7.
Where h is maximum number hop distance of node u whose centrality is calculated.While, σ i (x) is number of neighbors that are connected with node u at i th hop distance.Besides, N is total number of nodes in graph.Moreover, the proposed centrality is normalized i.e., value 1 shows the that the node is influencing the 100% network and 0 shows that no ability to influence the network.Put simply, we assume that in start network all N-1 nodes are connected directly with center node, therefore, the centered node would get value 1.A node with value 0 means node is disconnected from rest of the network or having no adjacent neighbor node.

Experiments and Results
In experimental section, to demonstrate the utility of SAM centrality, we apply SAM centrality to several datasets and compare the results with the state-ofthe-art centrality measures, such as PageRank, Degree, Closeness and Betweeness.The parameter a for the PageRank is set to 0.85 as recommended in the literature by Page et al. [10].As we consider only unweighted networks, if certain networks have weights on the edges, the weights of the edges will be erased.Besides, multiple edges between two nodes will be simplified into single edge.Here, we tested SAM centrality on four real networks (such as Zachary's Karate Club, Dolphin, Jakarta and Rhodes) with ground truth (i.e., which node is most important in the network).We have seen that SAM centrality finds correct results on these benchmark datasets.

Dataset 1: Zachary's Karate Club
Zachary's Karate Club was the initial social network.From 1970 to 1972, Wayne W Zachary studied the Karate Club social network for three years.The network graph consists of 34 nodes and 78 edges.The nodes, representing the members of the Karate Club, while edges representing the relationship between members who interacted to each other outside the club.During his study, a clash between administrator of the club "John A" (as 34) and instructor of the club "Mr.Hi" (as 1) was started, as a result the club had divided into two clubs.Few members formed another club along with Mr. Hi (instructor), while the remaining members of the club along with John A (Administrator) found new instructor for the club.
For the Karate Club social network, the normalized centrality score of all centrality measures is listed in Table 2.The top individuals from the network are presented in bold.It can easily be seen in Table 2, Administrator ( 34) and Instructor ( 1) getting the high scores from all centrality measures.Moreover, Figures 4  and 5 are showing the results of ranking and centrality score given by each centrality measure.Here you can see that, most of the nodes have got low ranking given by other centrality measures.Table 3 represents the centrality score of all nodes from all centrality measures.The dolphin 36 getting high scores from all centrality measures.Moreover, Figures 6 and 7 are showing the results of ranking and centrality score of all centrality measures.In case of ranking, you can see, there are more ups and downs.This shows that, all centrality measures are not agrees to each other as they given high and low ranks.

Dataset 3: Jakarta
In 2009, a terrorist attack took place in Jakarta, where two hotels were hit by a group of 28 terrorist.In this attack, nine people were killed by these terrorists.Jakarta social network representing the relations between terrorist who were directly or indirectly associated with Jakarta terrorist attack.The normalized centrality score, for the Jakarta social network, is listed in Table 4.The top individual 177 is getting high centrality score from all centrality measures.Furthermore, Figures 8 and 9 are showing the results of ranking of nodes as well as centrality score.The normalized centrality score of Rhodes network is listed in Table 5. Where, "Pavlos" and "Christodoulos" getting the high scores from all centrality measures.Moreover, Figures 10 and 11 are representing the results of ranking and centrality score given by each centrality measure.Here you can see that, most of the nodes have got low ranking given by other centrality measures.In case of centrality score, closeness centrality is close to our proposed centrality SAM.Besides, not much difference between state-ofthe-art centralities for the top few nodes.(i.e., susceptible-infected-recovered) model [3] for the ranking criteria.Here, every node is called susceptible except the first which is called infected node (i.e., the initial node).At every step, the infected nodes will infect their neighbors which are still in susceptible states with β (i.e., the probability value).After infecting the neighbors, infected node will enter into recovered state with γ (i.e., a probability value).Here we set the probability value γ = 1 .When there would be no infected node, the spreading process will be end.For comparing SAM centrality with other centrality measures, we analyze KendallŠs tau correlation [22].
high t means that the performance of centrality measures is better.We test centrality measures with two normally used infected probabilities (i.e., β = βc and β = 1.5βc),where βc is the threshold.The results of are shown in 7.

Conclusion
In this paper, we have proposed a hop-based centrality measure SAM.We have experimented on four datasets and compare our centrality with other state-of-the-art four centrality measures.Our purposed centrality SAM punishes the hop distance heavily in order to rank the nodes in the network.The results show that SAM is better to find t he i mportance o f n ode i n t he network.in future, we will try to find out the inverse distance of nodes in order to rank the users of the network.

Figure 1 .
Figure 1.Example of Link Prediction

Figure 2 .
Figure 2. Example of Social Network

Figure 3 .
Figure 3. Example of Hope Distance Between Nodes

Figure 4 .
Figure 4. Ranking of top 15 nodes from Zachary's Karate Club, sorted by SAM centrality

Figure 5 .
Figure 5. Centrality score of top 15 nodes from Zachary's Karate Club, sorted by SAM centrality

Figure 6 .Figure 7 .
Figure 6.Ranking of top 15 nodes from Dolphin dataset, sorted by SAM centrality

4. 4 .
Dataset 4: Rhodes Rhodes is a social network of Greek terrorist group known as N17 (November 17).That was collected from the reporting (Abram and smith 2004; Irwin et.al 2002).In the social graph, relations indicated that reporting has confirmed the link between two

Figure 8 .Figure 9 .
Figure 8. Ranking of top 15 nodes from Jakarta dataset, sorted by SAM centrality

Figure 10 .Figure 11 .
Figure 10.Ranking of top 15 nodes from Rhodes dataset, sorted by SAM centrality

Table 1 .
Notations Used in Results

Table 2 .
Centrality Score of Top 10 Nodes, From Zachary's Karate Club, Sorted by SAM Centrality

Table 3 .
Centrality Score of Top 10 Nodes, From Dolphin Dataset, Sorted by SAM Centrality

Table 4 .
Centrality Score of Top 10 Nodes, From Jakarta Dataset, Sorted by SAM Centrality

Table 5 .
Centrality Score of Top 10 Nodes, From Rhodes Dataset, Sorted by SAM Centrality

Table
Stats of Five More Datasets