Security Level Classification of Confidential Documents Written in Turkish

Erdem Alparslan; Hayretdin Bahsi

Mining User-Generated Content for Security

Research Article

Security Level Classification of Confidential Documents Written in Turkish

Download

753 downloads

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.1007/978-3-642-12630-7_41,
    author={Erdem Alparslan and Hayretdin Bahsi},
    title={Security Level Classification of Confidential Documents Written in Turkish},
    proceedings={Mining User-Generated Content for Security},
    proceedings_a={MINUCS},
    year={2012},
    month={10},
    keywords={document classification security Turkish support vector machine na\~{n}ve bayes TF-IDF stemming data loss prevention},
    doi={10.1007/978-3-642-12630-7_41}
}

Erdem Alparslan
Hayretdin Bahsi
Year: 2012
Security Level Classification of Confidential Documents Written in Turkish
MINUCS
Springer
DOI: 10.1007/978-3-642-12630-7_41

Erdem Alparslan¹^,*, Hayretdin Bahsi¹^,*

1: National Research Institute of Electronics and Cryptology-TUBITAK

*Contact email: ealparslan@uekae.tubitak.gov.tr, bahsi@uekae.tubitak.gov.tr

Abstract

This article introduces a security level classification methodology of confidential documents written in Turkish language. Internal documents of TUBITAK UEKAE, holding various security levels (unclassified-restricted-secret) were classified within a methodology using Support Vector Machines (SVM’s) [1] and naïve bayes classifiers [3][9]. To represent term-document relations a recommended metric “TF-IDF" [2] was chosen to construct a weight matrix. Turkic languages provide a very difficult natural language processing problem in comparison with English: “Stemming”. A Turkish stemming tool "zemberek" was used to find out the features without suffix. At the end of the article some experimental results and success metrics are projected.

Keywords: document classification, security, Turkish, support vector machine, naïve bayes, TF-IDF, stemming, data loss prevention

Published: 2012-10-23

: http://dx.doi.org/10.1007/978-3-642-12630-7_41

Security Level Classification of Confidential Documents Written in Turkish

Abstract

About EAI

Community

Publish with EAI