Text Steganography in Statistically Clustered Iris Image

The hiding text within the iris to increase the data protection method is discussed in this work. It is impossible to distinguish between the iris image before and after concealment, and the difference between the two images only after using statistical measures such as PSNR and MSR to compare them. The proposed database consists of 500 images with different formats (tif, gif, png, jpg, .bmp) selected for analysis. The proposed method is shown with more accurate results, stronger image encoding, and high-efficiency text protection using performance evaluation factors to assess business standards. The success of hiding high-text ratios proved successful. Experimental results were shown based on a statistical strategy, and that the text was converted into two random variables, X and Y, which were distributed to Asia. Then, the random variables' data were included in the iris segment, cut-off, and iris' clustered image. It appears that the use of our proposed scheme can include sufficient data in the image of the iris that maintains the accuracy of the identification.


Introduction
Data hiding is one of the key ways to protect your privacy. The objective of hiding biometric data is to include enough personal data in the maintenance and biometric templates performance recognition. Methods of hiding current dynamic data usually include data in an area that does not contain basic attributes for dynamic measurements. In the template data of iris is only included in the blue channel. However, these perform schemes reasonably, well, to hide biometric data. How to minimize the effect, which remains unanswered, is embedded in biometrics recognition [1].
Digital data has one of the great advantages that it can be reproduced without quality losing. It can also be easily modified and created for authorized parties that want to prevent illegal distribution of secret documents in many contexts; as video security regulations and legal evidence, i.e., image, audio, or video [2].
A single image is composed of a group of pixels. There are many categories of pixels. Therefore, those pixels that * Corresponding author. Email: irtefaa.radhi@uokufa.edu.iq belong to the same category have similar values and must be different from other categories. So, within the same cluster, a group of pixels is combined. Then cluster values are calculated based on feature selections. The method is called k-means or (Lloyd's algorithm). After the cluster value calculation, three histograms have been used in RGB. Peak calculation is used for each histogram and calculates the peak value in Red histogram, Green histogram, and Blue histogram [3]. To accurately divide the iris areas into ideal images, a new technique is proposed, which uses the detection of statistical distribution mechanisms to compensate for iris image detection errors resulting from the detection of color fragmentation and how to protect the text and hide it within the iris after the adoption of fragmentation [4].
The remaining sections show Section 2: Statistical cluster analysis and Statistical Proposed Methodology approach discussed in Section 3, The results shown in Section 4 and performance analysis discussed under section 5, The paper concluded with future work in Section 6.

Statistical Cluster Analysis
Clustering is an unattended automated learning method that means there is no information about the output. Data that contains instances are divided-then described by their features or attributes. It is then divided into separate subtotals by aggregation of the algorithms. Instances should be in a similar set with some criteria. The instances of different groups should be as different as possible [5]. The assembly is used in science, medicine, economics, astronomy, web intelligence, management, security, etc. Over the past decades, many assembly techniques have been developed, improved, or modified to solve many problems. Some known assembly algorithms are k-mean, DBSCAN, hierarchical assembly, and so on. One of the simplest methods is K-mean, and it aims is to reduce the total distance between the instances and their centroids, which are represented as the mean of all cases in corresponding groups [6].
It is probably impossible to aggregate algorithms because hundreds of themes in literature can be found. One reason why many clustering algorithms exist is the fact that the cluster cannot be defined accurately [7]. Based on an understanding of the cluster model, a measure of similarity and difference, clustering algorithms can be divided into several categories. The connection models divide the data into groups depending on the distance connection. The hierarchical assembly is one example of connection forms. Creation of distribution models groups using statistical distributions [8].
The K-mean modified function algorithm was used in many applications, for example, image fragmentation. Kalgorithm means an iterative algorithm with two operations performed at each frequency. The algorithm begins with randomly generated k centers, i.e., the center points. In each generation, each object (instance) is assigned to the nearest centroid. The most commonly used distance scale in the k-mean in Euclidean space. After the appointment step, the locations of the central countries are updated [9].
Where |Si | is the number of instances in the cluster Si, i = 1, 2. . . k. The quality estimation is an important part of all clustering algorithms; in the k-mean algorithm, the aim is to reduce the sum of variations between cases and corresponding raster devices [10].

Statistical Proposed Methodology approach
The section at beginning built as testing the system by [11] interval database, when the image determined before enhancement and then measure the segmentation and hide text rate of the system when the determination process performed after enhancement the images, figure1 showed the sample of the database in this paper [12].

Figure 1. Sample of the images
The idea is to hide the largest possible amount of text data within the smallest possible storage space. One of the most important biometric characteristics is the iris, which is used to increase information security. Therefore, the iris selection and segmentation provide us with less size than the cover image to hide large-sized text information. After including important personal information within the iris's smallest spot, it can retrieve the iris' image as it was and send it with hidden information without notice to those who object to sending the image [13]. This paper used to convert the text to be hidden into statistical data that follows binary exponential distribution [14]. We assumed that the length of the text is n and by dividing it into two parts. The first part represents the variable X, and the second part represents the variable Y if the length of the text is an even number. If the length of the text is an odd number, we add an optional character for the text as in the formula below L ( x, y) is the length of random variables x and y, n is the original text's length.

EAI Endorsed Transactions on
Energy Web 03 2021 -05 2021 | Volume 8 | Issue 33 | e8 Text Steganography in Statistically Clustered Iris Image 3 After converting the text into two random variables, we can include these random variables as digital data inside the iris image after cutting and fining the edges accurately and classifying them. Thus, the iris's Image after chipping containing hidden text can be read according to the algorithm shown in figure 2. It is worth noting that this method can be applied to any image with different extensions and sizes.

Experimental Results
The flowchart procedures proposed in this paper have been applied to two types of iris images; the first ( The final stage in the application of the proposed method, which represents the most accurate part of the iris (chosen in the process of detection edges and identification, the integration of the text to hide in the selected part and we will use the following text as an example: The message will hide in the cover image is: "The book of nature is written in the language of Mathematics" -Galileo To determine the accuracy, reliability, and validity of the proposed method's results, a comparison between the iris's original image after the summer, without the hidden text and between the iris's image after the summer and the hidden text. Note that in Fig. 4, the naked eye shows no differences between the two images; for this, we must use the two images' data and find the differences more accurately. A set of criteria for evaluating the proposed work performance were taken, which are the following statistical criteria (PNSR, SNR, and MSE).

Performance
MSE (mean-squared error) is one way to quantify the difference between values implied by an estimator and the true values of the estimated quantity. MSE is calculated using the following equation: R is the maximum fluctuation in the input image data type in the previous equation. For example, if the input image has a double-precision floating-point data type, R is 1. If R is 255, R contains an 8-bit integer data type, etc. [9]. After the standards were applied to measure the performance of the algorithm, we obtained the results that we will show in the following table 1 and 2:  When applying the same performance criteria for iris images taken from the Internet, we observe that performance standards are clearer. This means that all the measurement values are different, and it is very clear in the PNSR scale, SNR, and MSE, respectively. Different sizes of images and specifications of each image affect the performance of the standards. Finally, the results obtained confirm the difference between the two images shown in each of the forms, such as in Figure (4) and Figure (6)

Conclusions
After applying the proposed method of hiding the text data inside the iris image after identifying and classification the selected area of the iris accurately to reduce the effect of identification of the iris that carries data. The proposed method is to include data more confidentially to protect users' data. This study's experimental results showed us that we could include the largest volume of data within the iris's smallest spot while maintaining high accuracy in secrecy to protect it from discrimination by attacks in the transmission of information. We can use this technique to hide large size data (any data size) and embed it in very small images. For example, the smallest part of the footprint, or the smallest part we get from a distorted image. Also, we convert the segmented part from the original image and convert it to the same size as the original image and use it to hide. The work's future scope focused on big data by hiding the image's large digital images using wavelet theory.