8th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing

Research Article

Tweecalization: Efficient and Intelligent Location Mining in Twitter Using Semi-Supervised Learning

Download135 downloads
  • @INPROCEEDINGS{10.4108/icst.collaboratecom.2012.250520,
        author={Satyen Abrol and Latifur Khan and Bhavani Thuraisingham},
        title={Tweecalization: Efficient and Intelligent Location Mining in Twitter Using Semi-Supervised Learning},
        proceedings={8th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing},
        publisher={IEEE},
        proceedings_a={COLLABORATECOM},
        year={2012},
        month={12},
        keywords={twitter location mining label propagation social computing},
        doi={10.4108/icst.collaboratecom.2012.250520}
    }
    
  • Satyen Abrol
    Latifur Khan
    Bhavani Thuraisingham
    Year: 2012
    Tweecalization: Efficient and Intelligent Location Mining in Twitter Using Semi-Supervised Learning
    COLLABORATECOM
    ICST
    DOI: 10.4108/icst.collaboratecom.2012.250520
Satyen Abrol1,*, Latifur Khan1, Bhavani Thuraisingham1
  • 1: University of Texas at Dallas
*Contact email: abrol@utdallas.edu

Abstract

Geosocial Networking is the new hotness, with social networks providing services and capabilities to the users to associate location to their profiles. But, because of privacy and security reasons, most of the people on social networking sites like Twitter are unwilling to provide locations in their profiles. This creates a need for an algorithm that predicts the location of the user based on the implicit attributes associated with him. In this paper, we develop a tool, Tweecalization that predicts the location of the user purely on the basis of his social network, using the strong theoretical framework of semi-supervised learning. In particular we employ the label propagation algorithm. On the city locations returned by the algorithm, the system performs agglomerative clustering based on geospatial proximity and their individual scores to return cluster of locations with higher confidence. We perform extensive experiments to show the validity of our system in terms of both accuracy and running time. Experimental results show that Tweecalization outperforms the content based geo-tagging approach and the Tweethood algorithm in both accuracy and running time.