3rd International ICST Symposium on Information Assurance and Security

Research Article

Inclusion of a Montgomery Multiplier Unit into an Embedded Processor's Datapath to Speed-up Elliptic Curve Cryptography

  • @INPROCEEDINGS{10.1109/IAS.2007.81,
        author={S.  Bartolini and G. Castagnini and E. Martinelli},
        title={Inclusion of a Montgomery Multiplier Unit into an Embedded Processor's Datapath to Speed-up Elliptic Curve Cryptography},
        proceedings={3rd International ICST Symposium on  Information Assurance and Security},
        publisher={IEEE},
        proceedings_a={IAS},
        year={2007},
        month={9},
        keywords={Elliptic-curve cryptography  instruction-set extensions  performance evaluation.  security in embedded  special-purpose unit  systems},
        doi={10.1109/IAS.2007.81}
    }
    
  • S. Bartolini
    G. Castagnini
    E. Martinelli
    Year: 2007
    Inclusion of a Montgomery Multiplier Unit into an Embedded Processor's Datapath to Speed-up Elliptic Curve Cryptography
    IAS
    IEEE
    DOI: 10.1109/IAS.2007.81
S. Bartolini1,*, G. Castagnini1, E. Martinelli1
  • 1: Dipartimento di Ingegneria dell’Informazione via Roma 56, 53100 Siena, Italy
*Contact email: bartolini@dii.unisi.it

Abstract

This paper analyzes the effects of including a full-width GF(2m) Montgomery multiplier within the datapath of an existing embedded processor, aiming to speed-up elliptic curve cryptography (ECC). This approach tends to exploit the tight coupling between the new and the other processor modules while maintaining both software compatibility and high flexibility to adapt to different ECC parameters and algorithms. In addition, the present work focuses on the effects on performance due to the interaction between the new unit and the other processor parts. We show that the modified ARM processor runs the ECC critical operation (kP) 9-times faster than in pure software and up to 14-times faster using 3 units and optimized instruction scheduling. Moreover, the improved processor achieves the same performance with 1/4 sized caches thanks to more than 93% reduction of memory traffic