International Workshop on Collaborative Big Data

Research Article

Using a Distributed Search Engine to Identify Optimal Product Sets for Use in an Outbreak Detection System

Download596 downloads
  • @INPROCEEDINGS{10.4108/icst.collaboratecom.2012.250728,
        author={Ruhsary Rexit and Fuchiang (Rich) Tsui and Jeremy Espino and Sahawut Wesaratchakit and Ye Ye and Panos Chrysanthis},
        title={Using a Distributed Search Engine to Identify Optimal Product Sets for Use in an Outbreak Detection System},
        proceedings={International Workshop on Collaborative Big Data},
        publisher={IEEE},
        proceedings_a={C-BIG},
        year={2012},
        month={12},
        keywords={distributed search syndromic surveillance out- break detection time series analysis},
        doi={10.4108/icst.collaboratecom.2012.250728}
    }
    
  • Ruhsary Rexit
    Fuchiang (Rich) Tsui
    Jeremy Espino
    Sahawut Wesaratchakit
    Ye Ye
    Panos Chrysanthis
    Year: 2012
    Using a Distributed Search Engine to Identify Optimal Product Sets for Use in an Outbreak Detection System
    C-BIG
    ICST
    DOI: 10.4108/icst.collaboratecom.2012.250728
Ruhsary Rexit1, Fuchiang (Rich) Tsui1, Jeremy Espino1, Sahawut Wesaratchakit1, Ye Ye1, Panos Chrysanthis1,*
  • 1: University of Pittsburgh
*Contact email: panos@cs.pitt.edu

Abstract

This study tests an approach for identifying sets of over-the-counter (OTC) thermometer products whose aggregate sales correlate optimally with aggregate counts of emergency department (ED) visits where patients have symptoms consistent with Constitutional syndrome such as fever and chills. We show that by using a distributed search engine alongside search algorithms (Brute-force), we can quickly identify a minimum set of OTC thermometer products whose sales are optimally correlated to the ED data. We used the Pearson correlation coefficient function to measure the degree of correlation between OTC and ED time series. The optimal OTC product set— comprising 9 thermometer products found by the Brute-force algorithm—has a correlation coefficient value of 0.96. We believe the approach used in this study can be used to efficiently identify different optimal OTC sets for detection of different types of disease outbreaks.