Digital Forensics and Cyber Crime. Third International ICST Conference, ICDF2C 2011, Dublin, Ireland, October 26-28, 2011, Revised Selected Papers

Research Article

Performance Issues About Context-Triggered Piecewise Hashing

Download
470 downloads
  • @INPROCEEDINGS{10.1007/978-3-642-35515-8_12,
        author={Frank Breitinger and Harald Baier},
        title={Performance Issues About Context-Triggered Piecewise Hashing},
        proceedings={Digital Forensics and Cyber Crime. Third International ICST Conference, ICDF2C 2011, Dublin, Ireland, October 26-28, 2011, Revised Selected Papers},
        proceedings_a={ICDF2C},
        year={2012},
        month={12},
        keywords={Digital forensics techniques and tools context-triggered piecewise hash functions fuzzy-hashing efficiency of 
                   subtleties of fuzzy-hashing},
        doi={10.1007/978-3-642-35515-8_12}
    }
    
  • Frank Breitinger
    Harald Baier
    Year: 2012
    Performance Issues About Context-Triggered Piecewise Hashing
    ICDF2C
    Springer
    DOI: 10.1007/978-3-642-35515-8_12
Frank Breitinger1,*, Harald Baier1,*
  • 1: Hochschule Darmstadt
*Contact email: frank.breitinger@cased.de, harald.baier@cased.de

Abstract

A hash function is a well-known method in computer science to map arbitrary large data to bit strings of a fixed short length. This property is used in computer forensics to identify known files on base of their hash value. As of today, in a pre-step process hash values of files are generated and stored in a database; typically a cryptographic hash function like MD5 or SHA-1 is used. Later the investigator computes hash values of files, which he finds on a storage medium, and performs look ups in his database. Due to security properties of cryptographic hash functions, they can not be used to identify similar files. Therefore Jesse Kornblum proposed a similarity preserving hash function to identify similar files. This paper discusses the efficiency of Kornblum’s approach. We present some enhancements that increase the performance of his algorithm by 55% if applied to a real life scenario. Furthermore, we discuss some characteristics of a sample Windows XP system, which are relevant for the performance of Kornblum’s approach.