About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
Innovations and Interdisciplinary Solutions for Underserved Areas. 4th EAI International Conference, InterSol 2020, Nairobi, Kenya, March 8-9, 2020, Proceedings

Research Article

Enriching Geolocalized Dataset with POIs Descriptions at Large Scale

Download(Requires a free EAI acccount)
4 downloads
Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.1007/978-3-030-51051-0_19,
        author={Ibrahima Gueye and Hubert Naacke and St\^{e}phane Gan\`{e}arski},
        title={Enriching Geolocalized Dataset with POIs Descriptions at Large Scale},
        proceedings={Innovations and Interdisciplinary Solutions for Underserved Areas. 4th EAI International Conference, InterSol 2020, Nairobi, Kenya, March 8-9, 2020, Proceedings},
        proceedings_a={INTERSOL},
        year={2020},
        month={8},
        keywords={YFCC large scale dataset Distributed query processing Spatial join Apache Spark POI recommendation},
        doi={10.1007/978-3-030-51051-0_19}
    }
    
  • Ibrahima Gueye
    Hubert Naacke
    Stéphane Gançarski
    Year: 2020
    Enriching Geolocalized Dataset with POIs Descriptions at Large Scale
    INTERSOL
    Springer
    DOI: 10.1007/978-3-030-51051-0_19
Ibrahima Gueye1,*, Hubert Naacke2, Stéphane Gançarski2
  • 1: Ecole Polytechnique de Thiès
  • 2: Sorbonne Université, CNRS, Laboratoire d’Informatique de Paris 6, LIP6
*Contact email: igueye@ept.sn

Abstract

We present an efficient method to enrich a geolocalized dataset with contextual description about Points of Interest (POI). We implemented our solution using two large scale datasets: YFCC [14] and Geonames [2]. A practical problem we have encountered is the size of the manipulated data. Actually, the YFCC geolocalized dataset accounts for 45 million entries that we propose to cross with 12 millions of Geonames POIs. We show that using the Apache Spark cluster computing platform and the GeoSpark [18] spatial join library as-is lead to inefficient computation because of the important bias in the data. We propose a method to distribute the data non uniformly according to the data bias, which greatly improves the spatial join performance. Moreover, we propose a method to select among a set of close POIs, those which are the most relevant with the YFCC entries. The resulting enriched dataset will be made publicly available and should contribute to better validate future works on large scale POI recommendation.

Keywords
YFCC large scale dataset Distributed query processing Spatial join Apache Spark POI recommendation
Published
2020-08-06
Appears in
SpringerLink
http://dx.doi.org/10.1007/978-3-030-51051-0_19
Copyright © 2020–2025 ICST
EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL