10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing

Research Article

Distribution, Correlation and Prediction of Response Times in Stack Overflow

Download625 downloads
  • @INPROCEEDINGS{10.4108/icst.collaboratecom.2014.257265,
        author={Preeti Arunapuram and Jacob Bartel and Prasun Dewan},
        title={Distribution, Correlation and Prediction of Response Times in Stack Overflow},
        proceedings={10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing},
        publisher={IEEE},
        proceedings_a={COLLABORATECOM},
        year={2014},
        month={11},
        keywords={online forums response time prediction stack overflow},
        doi={10.4108/icst.collaboratecom.2014.257265}
    }
    
  • Preeti Arunapuram
    Jacob Bartel
    Prasun Dewan
    Year: 2014
    Distribution, Correlation and Prediction of Response Times in Stack Overflow
    COLLABORATECOM
    IEEE
    DOI: 10.4108/icst.collaboratecom.2014.257265
Preeti Arunapuram1, Jacob Bartel2, Prasun Dewan2,*
  • 1: Oracle
  • 2: University of North Carolina
*Contact email: dewan@cs.unc.edu

Abstract

The sending of a message raises two important questions about its response: When will the first response arrive? When will the first acceptable response arrive? These questions can be partly or completely answered by identifying distributions of response times, correlating features with response times, and/or predicting the actual response times. We address distribution, correlation and prediction of response times in Stack Overflow. We analyzed response times of over two million question-answer threads. We found no strong correlation between response times and features studied in other messaging domains: (a) use of various kinds of pronouns and punctuations, and (b) the time of day, and day of week when messages were sent. We found that title lengths show a quadratic relationship with median response time and that mean response times vary according to the tags used in a post. We explored a large design space of prediction algorithms based on the distributions of response times. These approaches predicted ranges of time that were automatically determined using a clustering algorithm. The best results were given by an approach that combines, using an index-base weighted-average algorithm introduced here, the most frequent time-ranges in the distributions for the tags in the posts.