Digital Forensics and Cyber Crime. 9th International Conference, ICDF2C 2017, Prague, Czech Republic, October 9-11, 2017, Proceedings

Research Article

Finding and Rating Personal Names on Drives for Forensic Needs

Download
175 downloads
  • @INPROCEEDINGS{10.1007/978-3-319-73697-6_4,
        author={Neil Rowe},
        title={Finding and Rating Personal Names on Drives for Forensic Needs},
        proceedings={Digital Forensics and Cyber Crime. 9th International Conference, ICDF2C 2017, Prague, Czech Republic, October 9-11, 2017, Proceedings},
        proceedings_a={ICDF2C},
        year={2018},
        month={1},
        keywords={Digital forensics Personal names Extraction Email addresses Phone numbers Rating Filtering Bulk Extractor Na\~{n}ve Bayes Cross-modality},
        doi={10.1007/978-3-319-73697-6_4}
    }
    
  • Neil Rowe
    Year: 2018
    Finding and Rating Personal Names on Drives for Forensic Needs
    ICDF2C
    Springer
    DOI: 10.1007/978-3-319-73697-6_4
Neil Rowe1,*
  • 1: U.S. Naval Postgraduate School
*Contact email: ncrowe@nps.edu

Abstract

Personal names found on drives provide forensically valuable information about users of systems. This work reports on the design and engineering of tools to mine them from disk images, bootstrapping on output of the Bulk Extractor tool. However, most potential names found are either uninteresting sales and help contacts or are not being used as names, so we developed methods to rate name-candidate value by an analysis of the clues that they and their context provide. We used an empirically based approach with statistics from a large corpus from which we extracted 303 million email addresses and 74 million phone numbers, and then found 302 million personal names. We tested three machine-learning approaches and Naïve Bayes performed the best. Cross-modal clues from nearby email addresses improved performance still further. This approach eliminated from consideration 71.3% of the addresses found in our corpus with an estimated 67.4% F-score, a potential 3.5 times reduction in the name workload of most forensic investigations.