Performance analysis of compression algorithms for information security: A Review

Data compression is a vital part of information security, since compressed data is much more secure and convenient to handle. Effective data compression technique creates an effective, secure, easy communicable & redundant data. There are two types of compression algorithmic techniques: - lossy and lossless. These techniques can be applied to any data format like text, audio, video or image file. The primary objective of this study was to analyse data compression techniques used for information security techniques like steganography, cryptography etc. Four each, lossy and lossless techniques are implemented and evaluated on parameters like-file size, saving percentage, time, compression ratio and speed. Detailed analysis shows that the lossy techniques performs better quantitatively whereas lossless is better qualitatively. However, lossless techniques are more effective as there is no data loss during the process. Among all, Huffman encoding outperforms other algorithms.


Introduction
The key purpose of image compression is to reduce those parts of the image which are mainly irrelevant for the user.This reduces the size of the image by reducing the number of pixels and makes it efficient for data storage and transmission [1]. Figure 1 illustrates the image compression process where the image is sent to an encoder as an input which converts the image into streams of bits.These encoded streams (bits) are then sent to the decoder which decodes these streams (bits) and the final output image is retrieved as the output of the decoder.Lossy and lossless image compression techniques [51] are used in order to reduce the amount of number of bits required for image representation.In order to achieve this, the present redundancies in the image are either reduced or eliminated.Redundancies in digital images are categorized in three categories i.e. coding redundancy, inter-pixel redundancy and psycho-visual redundancy [2,15].The image to be compressed is coded using fixed-length code, pixel by pixel e.g. an array of 8-bit integer is used to represent an compression techniques which are widely used across fields. This study tries to identify most relevant techniques and implement their algorithms to validate their outputs across various parameters, which are defined in this research.Additionally, this research addresses the performance metrics used in previous studies. Initially, the dearth of quantitative analysis of these compression methodologies was prevalent as such information were not easily accessible or even available, to pursue further researches in it.This research has implemented and analysed most prevalent techniques and quantified results on most critical parameters.Thus, these results will address the issues discussed in this paper.

Literature Review
The key objective that image compression fulfils is that it helps in reducing the space requirements for storage of digital images.There are variety of uses of digital images, sometimes being used in some applications and sometimes in some important fields.The usage of image compression techniques is proportional to the requirements of an application.The idea here is to minimize the higher number of bits possible meanwhile maintaining the originality of the reconstructed image.Authors [3] have discussed various compression techniques which are principled on conventional and compressive sensing theories for colour medical images.Some of these transform techniques like DCT/DWT, and transforms based on their hybridization are used for compression.The conventional techniques take higher processing time as compared to CS theory based techniques.The proposed approach in [4] is divided into three stages wherein at the foremost stage convergence theory is applied to solve the problem and after that iterative process is used to lower down the computation complexity by using the convergence and finally hybrid approach is applied for reconstruction using compression sensing.Performance is measured in terms of peak signal to noise ratio, reconstruction error and time taken to perform reconstruction.For many decades images compression has been most researched area because transmission and storage of images are real challenging task.The authors [5] have discussed various image compression algorithms like JPEG, JPEG 2000, BWT, and sparse coding method.The sparse approximation method with DCT and Gabor basis is used to train static over complete dictionary where dictionary elements are trained using OMP algorithm.The experimental results indicate that sparse coding method shows better results.Arnold transforms in [6] are used for encrypting the original image and compressive sensing to compress and re-encrypt the resultant encrypted image and with the use of chaotic system measurement matrix is also created.For enhanced security in the proposed algorithm the use of bitwise XOR and pixel scrambling method are considered in order to diffuse and confuse the measurements.The experimental results show that proposed algorithm is effective and secure.The authors [7] have used discrete cosine transforms with Hadamard transforms and haar transform for image compression and transmission enhancement in wireless sensors networks.
Comparing the compressed file using hadamard transform gives better results, good image quality than compressing the files with DCT.Also the compressed files using hadamard takes less time to reach the destination node.[8] have compared their proposed algorithm with Huffman encoding.The proposed algorithm uses static fix length bit codes and takes higher compression time than Huffman encoding.It shows the less compression ratio and more saving percentage values and also ensures more data security.
Due to the presence of redundancies in the image, it can be compressed.The proposed work in [9] calculates the mean and difference of the images and they are DWT transformed to 2 levels.The method achieves 58-63% bit saving as compared to discrete cosine transforms and discrete wavelet transforms.The coefficient of correlation of proposed method is lower than DCT and higher than DWT.The acceptable root mean square and peak signal to noise ratio values are obtained with high bit saving and low running time.The authors [10] have evaluated Burrows-Wheeler transforms based image compression algorithm with arithmetic coding along with Huffman coding schemes, and observed that the results obtained from hybridized scheme surpasses the one with the single scheme.The analysis of lossless compression methods is done in [11] based on compression ratio and elapsed time.
The compression algorithms considered by the authors are Huffman encoding and RLE on multimedia data.The experimental results show that for file size 28 kb the compression ratio in percentage achieved by Huffman encoding is 87.16 and that of RLE is 87.5 which is slightly higher.And comparing the elapsed time in seconds RLE achieves it in 65.4 secs where Huffman achieves very less time of 1.97 secs.The [12] have followed the 2 step methodology.An n-MM method is used before entropy coding and then repeated sequence of values are used (before entropy coding (RLE) to get high compression ratio.A pixel matrix of image is created initially and then it is divided into blocks for each component (Red, Blue and Green).Then N-MM algorithm is applied at each of the blocks.Then RLE is applied on each block and this encoded block is stored on some secondary storage.To get the required image, Run Length Decoding is applied on each encoded block to get the Performance analysis of compression algorithms for information security: A Review 3 decoded image.All the blocks are combined to get the pixel matrix from which image is generated.The analysed results shows that storage size of the image decreases and therefore compression ratio increases.

Image Compression Techniques
There are two ways in which we can categorize image compression techniques: Lossy and Lossless image compression [52,53].The former technique involves the image reconstruction as a similar calculation of original data and thus some loss of data is incurred.It wins over the later in terms of achieving higher compression ratio.The techniques taken under consideration on the basis of literature includes: Discrete Cosine Transforms (DCT), Discrete Wavelet Transform (DWT), Block Truncation Coding (BTC), and Fractal Encoding.The later technique involves the recreation of original data from the compressed form.As it uses all the information of the original image while compressing the data/image, the image received after decompressing the image is exactly identical to the original image.Four techniques has been shortlisted for consideration, which are Walsh-Hadamard, Run-Length-Encoding (RLE), Burrows-Wheeler-Transform (BWT), and Huffman Encoding.

Discrete Cosine Transform
DCT [38] is widely considered as the most effective and competent coding scheme, which was presented to solve the problem of discontinuity and to achieve better performance.The transformation is discrete, factual and orthogonal which compactly transcribe the image info to frequency domain, deriving it from spatial domain [29].The equation for 1D DCT of a sequence {(),  = 0,1,2,3 … ,  − 1} with interval 0 ≤  ≤  − 1 is given as [22]: The inverse of the above equation is given as: DCT being a distinct transform, 2-D DCT and its inverse is obtained in two steps through succeeding application of 1-D DCT and its inverse [41].
For images, 2D DCT of an  ×  image with f(x,y) is give as [42]: The inverse DCT is given by: The transform matrix is given by: The coefficients of DC matrix with zero frequency are called DC coefficients and others are also addressed as AC coefficients, reflecting dissimilarities in the value of grey level [27][28].

Discrete Wavelet Transform
The discrete wavelet transform [13, 37, and 61] is a powerful technique in image processing, where the wavelet convert the image into a series of wavelets which are further stored more efficiently as compared to pixel blocks.For one dimensional, signals are split into two parts: high and low frequencies.The low pass and high pass filter of DWT is expressed as [40,[59][60][61][62]: Both filters satisfy the orthogonal condition [21]: 2D-DWT [16] can be obtained by applying 1D DWT row wise to produce L and H sub-bands in each row then column wise.Total four number of sub-bands (LL1, LH1, HL1, and HH1) are acquired for level 1 decomposition [23].By repeating similar procedure in sub-band labelled LL1, we get LL2, LH2, HL2, and HH2 and so on as depicted in Figure 2.  where the size is assumed to be  ×  for simplicity.Then, the two luminance values i.e, the mean and standard deviation are selected which represent each pixel in a block [30][31].
Where   the value of image block's i th pixel while in totality, there are n number of pixels in that block.The two values, i.e., ̅ and are called quantizers.Considering, ̅ as threshold value, a 2-level bit plane is generated by comparing   with̅ .
A binary block, B represents the pixel values where 1 denotes that pixel whose grey level value is greater than threshold, and 0 whose grey level value is less.
The image block is regenerated by converting all: 1 H and 0 L, expressed as: Where p and q are the quantifications of total 0 and total 1 in compressed bit plane.

Fractal Encoding
Fractal encoding is one of the technique used for compressing the still images.A fractal code is made up of three integral parts, i.e., ranges, domains and transformations.A range is to partition the image region into portions  , a domain  which signifies same number of image's other regions.
Thereafter, there is a requirement of two transformations, for each domain-range i.e., one geometric transformation which maps domain to range,   :   →   , and the other is affine transformation   , fine-tunes the value of intensity corresponding to respective range in the domain.An image is decomposed to N, non-overlapping range blocks R to M, other domain blocks D and the operator W is defined as: Where   = (|  ), signifying the fact that image  has a restriction upto domain   .  denotes the mapping of transformation, to the domain block   on the range of block   .The shape and size of   and   can be dissimilar.If M (number) of block   is either less than, greater to or equal to N, so blocks in   can be overlapped or culled from parts of image.

Walsh Hadamard Transform
WHT is considered as the simplest of all transformations, that requires incrementing and deducting, and is structured on the concept of square waves / rectangular waves with peaks value of±1.The lowest order of transform is: To obtain the higher order WHT, the elements of transform matrix can be replaced by first order matrix: The general expression is [24][25]: The process of calculating the inverse matrix for the same is simple due to symmetry of transform matrix [26].

Run Length Encoding
RLE technique counts the occurrence of same data and stores as a single data value and single count [19,50].Consider a colored image that has too many long runs of red pixels and short runs of green and blue pixels.For example, while scanning any single row in an image with R representing red pixels, G representing green and B representing blue pixels.

RRRRRRGGGBBBBRRRRRRRRBBBBBGGGGGRRRRRRR
After applying run length encoding on the above row, the resultant is: Where 6R means 6 count of red pixels, 3G means 3 count of green pixels, 4B means 4 count of blue pixels and so on.The technique encodes only the consecutive number of same color pixels i.e., the probability of occurrence of consecutive same color is high [46,48].

Burrows Wheeler Transform
In BWT , a presumed value T′ is produced from a text -T.It is presumed that T (text) will be concluded by an end marker "$".T′ is the resultant value of M, which is a conceptual matrix, whose rows signify the cyclic shifts of T, where F being the primary value in initial column and L is the terminal value [20].The rows of M are in lexical order and T′ = L.The algorithm is reversible.For example, let us consider the i th row of M, its terminal character L[i] presages the initial most character F[i] in T, i.e., T = L[i]F[i]……..let L[i] = c and   can be total number of times c has occurred in L [1,i].The   th row of M which is starting from c, will be M [j].Then in the initial most column F, the character which is corresponding to L[i] can be sited at F[j] because the occurrences of c is proportional to storages of F and L. This entire procedure is known as LF mapping, where L[i] = j.The technique can be reversed as follows [17,49]:

Huffman Encoding
This is one of the oldest lossless image compression technique, developed in order to minimize the code redundancy whilst maintaining the quality of the reconstructed image.To understand the algorithm, let us consider an example, there are seven source symbols of a digital image {S1, S2, S3, S4, S5, S6, S7) with the respective probability values of {0.25, 0.25, 0.125, 0.125, 0.125, 0.0625, and 0.0625}.The process of procuring Huffman code is represented in Figure 3.The probabilities are written in decreasing order, then adding the two least probabilities and writing the result above the probability which is equal to the sum of last two digits, all other probabilities are written beneath that and thus continuing the process [33][34].Figure 4 represents the calculation for the code word length.The average code word length is defined as (): Where I(  ) are the total bits that represent grey level, (  ) is the respective probability value and(  ), i=1, 2, L represents the i th grey level of an (L-grey level) image especially L-grey level [35].Hence fore, analyzing the Figure 4 we can reach a conclusion that an average length of code word, in this example can be calculated as:   = 2.625 From above example, it is evident that an average number of bits are shrunk to 2.625 bits for variable length coding.However, the algorithm is optimal, if the source of probability distribution is known prior and each of the source symbol bits are encoded in the form of integral number.

Performance Metrics
The performance of image compression algorithms [2] can be evaluated on the basis of parameters mentioned below:

Quality of an Image
The requirement of certain techniques in order to analyze the quality of image after the reconstruction.Better image quality is judged by minimum image distortion by the compression process.There are two ways to analyze the image after reconstruction, Qualitative and Quantitative quality measurements.Qualitative measurements are done by querying the human beings to judge and report the quality of the image by comparing it with other image with naked eye and choose the best one whereas quantitative measurements include some mathematical expression that can quantify the amount of distortion and the quality of image.

Compression Ratio
Compression ratio (CR) [32] is the ratio of the size of the file post compression, compared to the original file and is mathematically expressed in eq (21).CR is used to determine the efficiency of the compression, higher the CR better the compression [18].

File size after compression and reconstruction
This parameter determines the size in bytes of the image after compression and the image size after the reconstruction in order to identify the loss of data in compression and reconstruction process [14].It is measured in bytes.

Compression Speed
Speed of compression depends on compression technique adopted and the results of the parameter is dependent upon the size of memory.Lossy compression techniques [46] increases the computational complexity and storage.It is measured as ratio of compressed output file size per unit time required for compressing the file in seconds [24and is expressed mathematically in eq. ( 22):

Time required to compress and reconstruct
This parameter determines the amount of time required by a compression technique to compress and reconstruct the original file [36].It is measured in seconds.

Compression Factor
Compression factor is the ratio of size of the original image to the size of the image after compression size.It is basically the inverse of compression ratio and is mathematically expressed in eq. ( 23).Lower the compression factor better the compression.  =       (23)

Saving Percentage
Saving percentage is the term that calculates the percentage value of the shrinkage resource file and is expressed in eq. ( 24):

Experimental Results & Analysis
Figure 5 and Figure 6 shows the subjective analysis of the image.Figure 5 represents the output of the images using lossy compression techniques.In the implementation part, the color image of any format (like jpeg, bmp, png) is taken into consideration and the color image is converted to the gray scale image and further image based compression algorithms are thus applied.Figure 6 indicates the resultant images after applying lossless compression techniques.
The implemented code is executed on 100 images using different compression techniques and the results achieved are shown in Table 1.It indicates the results of lossy and lossless compression algorithms on various images.The implementation is done in MATLAB.The different image compression techniques are analysed and the results are evaluated as follows: Figure 8 shows the performance analysis of these algorithms on the basis of Compression Ratio.Out of lossy algorithms the best performance is shown by Fractal Encoding with compression ratio value as 0.004 whereas the algorithms such as DCT, DWT, and BTC shows the compression ratio values as 0.13, 0.02 and 0.167.In lossless algorithms the Huffman Encoding achieves the lowest value of 0.0101 where the other algorithms like Walsh Hadamard, RLE and BWT acquires the compression ratio values 0.705, 0.018, and 0.0102 respectively.

Conclusion
In this paper, distinctive types of image compression techniques are evaluated on the basis of certain parameters such as compressed file size, compression ratio, compression time, decompression time, compression speed, compression factor and saving percentage.The two types of image compression techniques which are used predominantly are discussed and evaluated.Comparing the lossy and lossless techniques, the lossy techniques performs better than lossless based on quantitative analysis but we talk of qualitative analysis lossless performs better.So, it really depends on the requirements of the user to select a particular technique.However lossless techniques are more effective as there isn't any data or image loss during the compression and same image and data is

Step 1 :
Compute an array -C [1, σ] by storing it in C[c] +1, is the first occurrence position of c in F.  Step 2: LF mappings: LF(i) = C[L[i]] +Occ(L,L[i], i), where the total number of occurrences of c in L[1,i] is given by Occ(L,c,i). Step 3: Remodel T: let s = 1 for every n-1,…1, then L[s]  T[i] and LF[s] s and set the terminal label T[n] = $.

Figure 3 .Figure 4 .
Figure 3. Huffman encoding: average word length = 2.625 bits, bit assignment process Performance analysis of compression algorithms for information security: A Review EAI Endorsed Transactions on Scalable Information Systems 05 2020 -10 2020 | Volume 7 | Issue 27 | e2

Figure 7 7 Figure 7 .
Figure7shows the performance analysis of the above mentioned techniques based on compressed file size.The original size of the image is 38278 bytes, the DCT compression algorithm compresses the original image to 5096 bytes.DWT compresses the image to 777 bytes which is better than DCT.However, BTC can compress up to 6395 bytes the value of which is greater than both DCT and DWT.Fractal encoding compresses the image by just 174 bytes.Out of these 4 lossy techniques better compression is provided by Fractal encoding.However, in lossless compression algorithms best performance is shown by Huffman Encoding by compressing the original image to 215 bytes as compared to other algorithms such as Walsh Hadamard, RLE, and BWT that compresses the image to 27007, 694, and 393 bytes respectively.

Figure 9
Figure 9 examines the performance of algorithms based on Compression Time parameter.In lossy techniques, BTC algorithm takes less time while compressing the image i.e., 0.04 secs whereas DCT, DWT, and Fractal Encoding takes

Figure 9 .
Figure 9. Performance on the basis of compressed time offered by different image compression algorithms Figure 10 represents the performance of the compression techniques on the basis of Decompression Time.DWT takes the minimum amount of time i.e., 0.0021 secs to decompress the image where DCT takes 0.015 secs.BTC decompresses the image in 0.02 secs and Fractal Encoding takes 0.2 secs.Out of lossless techniques, the minimum amount of time required to decompress an image by an algorithm is 0.0007 secs by BWT.Walsh Hadamard takes 0.11 secs, RLE takes 6.12 secs and Huffman Encoding takes 0.06 secs.

Figure 10 .
Figure 10.Performance on the basis of decompressed time offered by different image compression algorithms

Figure 5 .Figure 6 .
Figure 5. Output of Images using Lossy Compression Algorithms

Figure 11 .
Figure 11.Performance on the basis of compression factor offered by different image compression algorithmsFigure12shows the performance analysis of the techniques on the basis of Saving Percentage parameter.Out of lossy algorithms the best performance is shown by DCT with 99.614% whereas the algorithms such as DWT, BTC and Fractal encoding obtains the saving percentage values as 97.97%, 83.29%, and 99.54%.In lossless algorithms the Huffman Encoding achieves the highest saving percentage i.e., 99.43 where the other algorithms like Walsh Hadamard, RLE and BWT acquires the saving

Figure 12 .
Figure 12.Performance on the basis of saving percentage offered by different image compression algorithms

Table 1 .
Comparative Analysis of Lossy and Lossless compression algorithms Figure 11 examines the performance of algorithms based on Compression Factor parameter.In lossy techniques, DCT algorithm acquires the maximum value of 259.371 whereas DWT, BTC and Fractal Encoding achieves 49.26, 5.98, and 219.98 respectively.However, in lossless techniques Huffman Encoding achieves higher compression factor value of 99.009 as compared to Walsh Hadamard, RLE, and BWT that has compression factor values as 29.44, 98.18, and 98.97 respectively.