5th International ICST Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services

Research Article

Accessing Speech Documents on Smartphones

Download477 downloads
  • @INPROCEEDINGS{10.4108/ICST.MOBIQUITOUS2008.3635,
        author={Marcel-Cătălin Rosu},
        title={Accessing Speech Documents on Smartphones},
        proceedings={5th International ICST Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services},
        publisher={ICST},
        proceedings_a={MOBIQUITOUS},
        year={2010},
        month={5},
        keywords={Speech archive search smartphone Lucene},
        doi={10.4108/ICST.MOBIQUITOUS2008.3635}
    }
    
  • Marcel-Cătălin Rosu
    Year: 2010
    Accessing Speech Documents on Smartphones
    MOBIQUITOUS
    ICST
    DOI: 10.4108/ICST.MOBIQUITOUS2008.3635
Marcel-Cătălin Rosu1,*
  • 1: IBM T. J. Watson Research Center, 19 Skyline Dr, Hawthorne, NY 10532, 1 914 784 7242.
*Contact email: rosu@us.ibm.com

Abstract

This paper introduces BBSearch, which is an experimental system for exploring the challenges of ubiquitous access to recorded speech data. BBSearch applies information retrieval techniques to transcripts obtained by automatic speech recognition and it aims at providing a uniform user experience across platforms. To provide identical search functionality and document ranking, BBSearch applications use the same IR library for indexing and retrieval, namely Apache Lucene. For Java-enabled mobile platforms, BBSearch uses our J2ME Lucene port, called LuceneME. This paper explores the resource requirements of LuceneME when used for Boolean searches and for supporting the podcast navigation GUI. On a BlackBerry smartphone, a diverse set of queries against a 70-hour corpus complete in less than 3 seconds and use less than 2MB of memory. The results of the evaluation validate our design and warrant expanding BBSearch to less capable cellphones, larger corpuses, or with more complex search capabilities