10th EAI International Conference on Performance Evaluation Methodologies and Tools

Research Article

A Queueing Network Model for Performance Prediction of Apache Cassandra

  • @INPROCEEDINGS{10.4108/eai.25-10-2016.2266606,
        author={Salvatore Dipietro and Giuliano Casale and Giuseppe Serazzi},
        title={A Queueing Network Model for Performance Prediction of Apache Cassandra},
        proceedings={10th EAI International Conference on Performance Evaluation Methodologies and Tools},
        publisher={ACM},
        proceedings_a={VALUETOOLS},
        year={2017},
        month={5},
        keywords={nosql database apache cassandra queueing network model simulation},
        doi={10.4108/eai.25-10-2016.2266606}
    }
    
  • Salvatore Dipietro
    Giuliano Casale
    Giuseppe Serazzi
    Year: 2017
    A Queueing Network Model for Performance Prediction of Apache Cassandra
    VALUETOOLS
    ACM
    DOI: 10.4108/eai.25-10-2016.2266606
Salvatore Dipietro1,*, Giuliano Casale1, Giuseppe Serazzi2
  • 1: Imperial College London, UK
  • 2: Politecnico di Milano, Italy
*Contact email: s.dipietro14@imperial.ac.uk

Abstract

NoSQL databases such as Apache Cassandra have attracted large interest in recent years thanks to their high availability, scalability, flexibility and low latency. Still, there is limited research work on performance engineering methods for NoSQL databases, which yet are needed since these systems are highly distributed and thus can incur significant cost/performance trade-offs. To address this need, we propose a novel queueing network model for the Cassandra NoSQL database aimed at supporting resource provisioning. The model defines explicitly key configuration parameters of Cassandra such as consistency levels and replication factor, allowing engineers to compare alternative system setups.

Experimental results based on the YCSB benchmark indicate that, with a small amount of training for the estimation of its input parameters, the proposed model achieves good predictive accuracy across different loads and consistency levels. The average performance errors of the model compared to the real results are between 6% and 10%. We also demonstrate the applicability of our model to other NoSQL databases and we show other possible utilization of it.