1st International ICST Conference on Communication System Software and MiddleWare

Research Article

A Measurement Study of the Linux TCP/IP Stack Performance and Scalability on SMP systems

  • @INPROCEEDINGS{10.1109/COMSWA.2006.1665153,
        author={Shourya P.  Bhattacharya and Varsha  Apte},
        title={A Measurement Study of the Linux TCP/IP Stack Performance and Scalability on SMP systems},
        proceedings={1st International ICST Conference on Communication System Software and MiddleWare},
        publisher={IEEE},
        proceedings_a={COMSWARE},
        year={2006},
        month={8},
        keywords={},
        doi={10.1109/COMSWA.2006.1665153}
    }
    
  • Shourya P. Bhattacharya
    Varsha Apte
    Year: 2006
    A Measurement Study of the Linux TCP/IP Stack Performance and Scalability on SMP systems
    COMSWARE
    IEEE
    DOI: 10.1109/COMSWA.2006.1665153
Shourya P. Bhattacharya1,2,*, Varsha Apte3,2,*
  • 1: Kanwal Rekhi School of Information Technology
  • 2: Indian Institute of Technology, Bombay
  • 3: Computer Science and Engineering Department
*Contact email: shourya@it.iitb.ac.in, varsha@cse.iitb.ac.in

Abstract

The performance of the protocol stack implementation of an operating system can greatly impact the performance of networked applications that run on it. In this paper, we present a thorough measurement study and comparison of the network stack performance of the two popular Linux kernels: 2.4 and 2.6, with a special focus on their performance on SMP architectures. Our findings reveal that interrupt processing costs, device driver overheads, checksumming and buffer copying are dominant overheads of protocol processing. We find that although raw CPU costs are not very different between the two kernels, Linux 2.6 shows vastly improved scalability, attributed to better scheduling and kernel locking mechanisms. We also uncover an anomalous behavior in which Linux 2.6 performance degrades when packet processing for a single connection is distributed over multiple processors. This, however, verifies the superiority of the "processor per connection" model for parallel processing