Research Article
Design and Development of Bioinformatics Feature Based DNA Sequence Data Compression Algorithm
@ARTICLE{10.4108/eai.13-7-2018.164097, author={Kakoli Banerjee and Vikram Bali}, title={Design and Development of Bioinformatics Feature Based DNA Sequence Data Compression Algorithm}, journal={EAI Endorsed Transactions on Pervasive Health and Technology}, volume={5}, number={20}, publisher={EAI}, journal_a={PHAT}, year={2019}, month={11}, keywords={Genetic Data, Genetic Data Compression, DNA, Health Care, Compression Algorithms}, doi={10.4108/eai.13-7-2018.164097} }
- Kakoli Banerjee
Vikram Bali
Year: 2019
Design and Development of Bioinformatics Feature Based DNA Sequence Data Compression Algorithm
PHAT
EAI
DOI: 10.4108/eai.13-7-2018.164097
Abstract
INTRODUCTION: Genetic data plays a key role in the healthcare area in specific, but they are typically very large in size. Many research shows that absence of DNA information at the right time is one of the major causes of error in the healthcare area. The more genomics information that analysts secure, the better the prospects for individual and general wellbeing. Persevering and retrieving genetic information in the right form within the given time is a big challenge in the field of Healthcare. Effectively, pre-birth DNA tests screen for formative variations from the norm. Before long, patients will have their blood sequenced to detect any nonhuman DNA that may flag an irresistible illness. Later on, somebody managing malignancy will most likely track the movement of the sickness by having the DNA and RNA of single cells from various tissues sequenced every day. DNA sequencing of whole population will give a complete and better prediction of population wellbeing.
OBJECTIVES: Hereditary data is growing exponentially; hence it is hard to deal with the consistently developing hereditary database. The human genome in its base configuration occupies almost thirty terabyte of storage space. Computational assets are constrained. Not just storage, transmission abilities and run time memory is likewise constrained. Data Compression is a test when the hereditary information is exponentially expanding. It is critical to save the integrity of hereditary information while packing it. Hence the main objective of this paper is to develop a lossless DNA compression algorithm that not only gives better compression but also help in retrieval of Information for efficient use in the area of Healthcare.
METHODS: In this paper a lossless hereditary data compression method is being proposed. The proposed calculation works in a horizontal mode and utilization a reference based substitution technique for compression. The principle thought of this paper is in the kind of similarity scanned. All the predominant hereditary Compression methods search for similarity within the chromosome. These calculations either pursue flat mode or vertical mode for accomplishing compression. But whichever method the existing genetic compression algorithms use, they are all based on searching similarities within the chromosome i.e. they exploit only inter chromosomal similarities. The current studies focus will show that compression ratio achieved by analyzing individual chromosome is always less than the method in which we analyze and compress intra chromosomal similarities.
RESULTS: This study shows that by simply using exactly matching repeats amongst all the chromosomes of the same genome, not only the compression ratio is improving but also a detailed study of all the similarities and differences between two genomes of the same species can be conducted.
CONCLUSION: In this study, a new compression algorithm is being proposed for compressing DNA. Along with Inter chromosomal similarities, Intra chromosomal similarities are considered for this method. The results clearly shows that intra chromosomal matches are bigger and more than inter chromosomal matches which helps us to achieve better compression ratio.
Copyright © 2019 Kakoli Banerjee et al., licensed to EAI. This is an open access article distributed under the terms of the Creative Commons Attribution license, which permits unlimited use, distribution and reproduction in any medium so long as the original work is properly cited.