Research Article
Distribution, Correlation and Prediction of Response Times in Stack Overflow
@INPROCEEDINGS{10.4108/icst.collaboratecom.2014.257265, author={Preeti Arunapuram and Jacob Bartel and Prasun Dewan}, title={Distribution, Correlation and Prediction of Response Times in Stack Overflow}, proceedings={10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing}, publisher={IEEE}, proceedings_a={COLLABORATECOM}, year={2014}, month={11}, keywords={online forums response time prediction stack overflow}, doi={10.4108/icst.collaboratecom.2014.257265} }
- Preeti Arunapuram
Jacob Bartel
Prasun Dewan
Year: 2014
Distribution, Correlation and Prediction of Response Times in Stack Overflow
COLLABORATECOM
IEEE
DOI: 10.4108/icst.collaboratecom.2014.257265
Abstract
The sending of a message raises two important questions about its response: When will the first response arrive? When will the first acceptable response arrive? These questions can be partly or completely answered by identifying distributions of response times, correlating features with response times, and/or predicting the actual response times. We address distribution, correlation and prediction of response times in Stack Overflow. We analyzed response times of over two million question-answer threads. We found no strong correlation between response times and features studied in other messaging domains: (a) use of various kinds of pronouns and punctuations, and (b) the time of day, and day of week when messages were sent. We found that title lengths show a quadratic relationship with median response time and that mean response times vary according to the tags used in a post. We explored a large design space of prediction algorithms based on the distributions of response times. These approaches predicted ranges of time that were automatically determined using a clustering algorithm. The best results were given by an approach that combines, using an index-base weighted-average algorithm introduced here, the most frequent time-ranges in the distributions for the tags in the posts.