On the Adoption of Erasure Code for Cloud Storage by Major Distributed Storage Systems

INTRODUCTION: Traditional Cloud Systems are struggling to cope with the exponential growth of data in todays’ distributed application environment. The amount of data online has continuously increased since 2003. From an estimated 5 Exabyte in 2003 to 988 Exabyte in 2010. Presently it is estimated that 5 Exabyte of data are produced daily. To cope with such astronomical load of data, Distributed Storage Systems such as Amazon, Google and Microsoft Azure are becoming the de-facto method for the storage of data. Replication is the method used for providing redundancy. However Erasure Coding is a worthy alternative. OBJECTIVES: The purpose of this paper is to assess the most used distributed storage systems using different evaluation criteria and identifying how erasure code can be integrated into them. METHODS: This paper provides a survey of well-known Distributed Storage Systems by using the CAP (Consistency, Availability and Partition Tolerance) Theorem. We go by presenting the solution according to the objectives set and trade-off acknowledged by the designers. RESULTS: A comprehensive survey is presented using five evaluation criteria (design principle, data model, failure detection and recovery, consistency and security). Adoption of erasure code in Distributed Storage Systems is discussed and its advantages are deliberated. Several open challenges are also put forward. CONCLUSION: This paper provides researchers in the field with a comprehensive review of Distributed Storage Systems and how the adoption of erasure codes will enhance their capabilities.


Introduction
The world is experiencing an exponential growth of data [1] as everything is being digitalized: music, pictures, order to reap the maximum possible benefit out of distributed processing, the traditional storage model (Relational Databases) does not fit into the bill anymore.Distributed Storage Systems (DSS) have been implemented to provide a helping hand by improving availability, scalability and performance to meet the requirements of today's applications.That is the underlying architecture which powers the world's major web service providers such as Google, Amazon and Microsoft Azure.
There are several reasons for distributed processing.Firstly, applications should be scalable and should reap the benefit of multiple systems as well as multi-core CPU architectures that are ready available in commodity PCs nowadays.Secondly, website servers have to be globally distributed for low latency and failover.For example, as soon as a client logs in to Amazon, there will be thousands of processes that will be triggered to display the best possible service to the customer such as list of previous purchases, list of similar goods bought by other customer, who bought same kind of goods, list of preferences of customer, best deals according to location of customer and so on.
Distributed processing implies distributed data, and to obtain scalability, performance and availability from traditional relational database systems is quite impossible.Several researchers have suggested that this is an end of an architectural era [3].However RDBMS systems still have their place especially in business applications.
The Cloud can be represented as a stack as shown in the Figure 1.In this paper we investigate the second layer from the top (circled in red).That is storage solutions that would allow the hosting of applications by developers.It is of prime importance to know the characteristics of the platform they are going to use.Since there are numerous distributed storage solutions available, our work has as objective to facilitate the task of developers to opt for the right platform that would meet the requirements of their application.

Figure 1. The new stack of Cloud Storage
In this paper, we survey six Distributed Storage Systems that have been deployed by the world's largest Cloud Service Providers using the CAP Theorem [4].The CAP theorem states that any distributed systems can achieve two of the three goals at the expense of the third one.The three goals of CAP are Consistency, Availability and Partition Tolerance.The DSS surveyed include Amazon's Dynamo [5,6], Facebook's Cassandra [6,7], Haystack [8 -10], Google's BigTable [6,11], Yahoo!'s pNuts [12,13] and Microsoft's Azure [14][15][16][17].
After surveying those different DSS against the CAP theorem, the concept of Erasure Code will be introduced, which is the next best alternative to replication for providing redundancy to guard against hardware failures (which many DSS lacks).Erasure Code splits the object to be stored into n blocks and creates m parity blocks.Any n number of blocks can be used to reconstruct the original object, and the system can tolerate up to m failures.As such Erasure Code provides higher storage efficiency at the expense of processing power and internal bandwidth.To assess the adoption of Erasure Code by those DSS, an understanding of the different cloud storage information coding implementations is required.To that end, different Erasure Codes are reviewed; namely Reed-Solomon, Hierarchical, Self-Repairing, Regenerating and Locally Repairable Codes.This paper is organized as follows; Section 2 discusses about the previous surveys that have been published on DSS.In Section 3, the six different DSS with focus on the objectives of each are elaborated.Section 4 summaries the different characteristics of the six DSS.Erasure Code schemes are introduced in Section 5.In Section 6, issues which have been identified during the surveying process are put forward and Section 7 proposes the different solutions to be adopted for the integration of Erasure Code in DSS.Finally Section 8 concludes this paper.

Previous Work
In this section, the previous surveys published in the area of Distributed Storage Systems are presented.Several reviews and surveys have been undertaken in this area.Four surveys have been chosen based on the different DSS they have reviewed, or characteristics of DSS they have surveyed.
The first review on DSS was published in an article entitled -The Evolving Field of Distributed Storage - [18] in the IEEE Internet Computing Magazine in its September-October 2001 issue.The article introduced the issue that DSS had to deal with, such as, shared content access, availability, survivability, interoperability, search, caching, load balancing and scalability.The article reviewed two sets of DSS.The first set of DSS (Past, Intermemory and Farsite) performed data archival through the usage of Replication.The second set (Napster, Gnutella, Mojo Nation and Freenet) of DSShas characteristics of Peer-to-Peer systems.This review article provides a good description of the characteristics required by today's DSS, and explicitly points out deficiencies in the systems reviewed.
The second survey paper -"A Survey of Distributed Storage Systems" - [19], was performed in the year 2004.The paper is a survey of the design and implementation of distribution storage systems.Again the author lay emphasis on the required characteristics of DSS, namely, Local Transparency, Permanent Storage, Consistency, Availability, Performance and Security.This paper examines four Distributed File Systems (DFS), namely Sun's Network File System (NFS), Andrew File System (AFS), Coda File System and Google File System (GFS).Among those we find that only GFS is commercially deployed at present.The paper also reviewed two DSS that operate using Distributed Hash Another good survey on DSS is "A Taxonomy of Distributed Storage Systems" - [22] and was published in 2008 and revised in 2012.In this survey, the authors presented the areas where research is more needed to improve DSS.The areas identified are: System Function, Storage Architecture, Operating Environment, Usage Patterns, Consistency, Security, Autonomic Management, Federation and Routing and Network Overlays.
Lately the term "Distributed Storage System" has been less frequently employed and has slowly been replaced by the term "Cloud Storage" in order to make reference storage systems for huge amount of data.As such we also reviewed surveys related to Cloud Storage.
One of the first surveys on Cloud Storage [23], "A Survey on Cloud Storage" was performed in 2011, and introduced the concepts of cloud computing as well as discussed about GFS and HDFS as platforms for cloud storage.
A simple but enlightening survey [24], "Overview of Cloud Storage and Architecture" completed in 2018, highlights the architecture of a cloud system, but most importantly discussed on the pros and cons of such systems.
And the latest survey on Cloud Storage [34] is entitled "Issues and challenges in Cloud Storage Architecture: A Survey", whereby the authors discussed about the challenges and issues in terms of data security and data management.For data security, the following issues were found: Integrity, Confidentiality, Access, Authentication & Authorization and Breaches of Data.As for data management issues, the following issues were discussed: Data Dynamics, Data Segregation, Virtualization Vulnerabilities, Backup issues, Availability and Data Locality.
Apart from reviewing some of the surveys on Distributed Storage Systems and Cloud Storage Systems, we also reviewed some other popular DSS as mentioned in Section 1.We compared the different systems in terms of CAP theorem that is in terms of Consistency, Availability and Partitioned-Tolerance and also provided a summary of the different characteristics of each of the surveyed DSS.
Deployment of different Erasure Codes in DSS has not been discussed in literatures so far.The factors acting as barriers and motivators have not been presented in a single presentation.There are several surveys presenting the theories of the different erasure codes, such as [35], [36] and [37].In this work, the different Erasure Codes are presented under different classifications.

Cloud Computing
Different definitions are provided by different literatures for Cloud Computing [38].Below are some popular examples: • Cloud Computing is the delivery of computing services-including servers, storage, databases, networking, software, analytics, and intelligence-over the Internet ("the cloud") to offer faster innovation, flexible resources, and economies of scale [39].
• The practice of using a network of remote servers hosted on the Internet to store, manage, and process data, rather than a local server or a personal computer [40].
Cloud Computing offers various advantages such as pay as you use, scalable services, virtualisation and CDN.All these factors makes cloud computing become more and more transparent in different organisations.

Distributed Storage Systems
The primary objective of this paper is to present the available solutions in such a way that developers of cloud applications can easily identify a suitable one that would match the requirements of their own applications.So first of all, we provide a brief introduction of the different types of distributed storage systems and then we differentiate them according to their characteristics as shown in Table 2.

Dynamo
In order to match the requirements of the one of the largest e-commerce operations in the world, Amazon has developed a series of Distributed Storage Systems.One of them is Dynamo [5,6].It is a highly-available key-value store, that is uses a primary-key only interface.The key is specified to have access to the data.This is possible as the data objects are relative small (< 1MB).

BigTable
Google's own data store is known as BigTable [6,11].It is a structured distributed storage system which is scalable to a very large capacity (in the range of petabytes spread across several datacentres using commodity servers).
BigTable has been deployed in several projects in Google, including web indexing, Google Earth, Google Analytics and Google Finance.

Cassandra
Cassandra [6,7] is an open sourced decentralized structured distributed storage implemented in Java by Facebook.It was initially designed to support the Facebook's inbox searching problem and has also been deployed for other application.In fact Cassandra can be considered to a hybrid of Google's BigTable and Amazon's Dynamo.Cassandra adopted the salient features from both of those DSS, and resulted in a system that could run on cheap commodity hardware and could handle high write throughput without sacrificing read efficiency.

PNUTS
As all other major web service provider, Yahoo! has been following the footsteps of Google and Facebook by deploying PNUTS [12,13], which is a distributed database system for Yahoo!'s web application.It has been designed to provide storage for Flickr.As Dynamo, it is a key-value lookup.It provides a hosted, centrally managed with relaxed consistency guarantees as all other DSS do.

Haystack
Haystack [8 -10] is used as a distributed storage system for storing photos for Facebook.It comes as a replacement for the previous system which was a network attached storage appliances over NFS.Since Haystack is primarily used for storing photos, it is said to be an object storage system.

Azure
Windows Azure Storage (WAS) [14][15][16][17] is Microsoft's cloud storage system that has the ability to provide strong consistency, availability and partition tolerance.It has been deployed since November 2008 and has been used for applications such as social networking search, serving video, music and game content.

Characteristics of Distributed Storage Systems
Traditionally, a database for web services such as MySQL [51] provides ACID-guarantees for reliability, whereby ACID is for Atomic, Consistent, Isolated and Durable.However these principles are applicable for a single node system only.In the year 2000, Eric Brewer introduced the CAP theorem [52] which states that it is impossible for a distributed service to provide consistency, availability and partition-tolerance at the same time.As such we present the DSS in terms of these three characteristics to distinguish among them.The characteristics are presented in the Table 1.
Fault-Tolerance is a de-facto characteristic of any Distributed Storage System, as failures are common in such systems.However how those failures are overcome varies from system to system.Consistency and Availability are decisive factors to choose between.In most systems one of them is sacrificed to some extent.

Dynamo
While designing Dynamo, specific algorithm has been devised to maintain consistency.Though strict consistency was not the priority, update/conflict resolution is performed during the reads in order to ensure that writes are never rejected.Dynamo targets the design space of an "always writeable" data store.As such consistency is provided by performing object versioning [5].The consistency among replicas during updates is preserved by a quorum-like technique.Other design principles adopted are Incremental Scalability, Symmetry (same set of responsibilities), Decentralization and Heterogeneity (work distribution according to node capabilities).

BigTable
Consistency is maintained by using a versioning mechanism Thus BigTable allows several versions of the same data as they will be indexed by different timestamps.
BigTable uses garbage-collection to keep track of cell versions.

Cassandra
Cassandra uses features from Dynamo and BigTable to ensure consistency.The features adopted from Dynamo are consistent hashing for key generation, Gossip-Based membership algorithm and replication model.From BigTable, the structured consistency model has been adopted.

PNUTS
The consistency models of PNUTS outcast its predecessors.The idea is to provide "per-record timeline consistency".That is PNUTS totally orders the updates that need to be applied in such a way that an update is only processed once all previous updates have been made.Yahoo! uses a message service known as YMB (Yahoo!Message Broker) as compared to the gossip service used in other DSS.However the consistency of PNUTS is still considered to be weak.

Haystack
Since Haystack uses synchronous writes and append-only semantics, consistency does not arise as an issue.Therefore Haystack can be considered to have a fair level of consistency.

Azure
Azure has several features which would achieve a strong level of Consistency.Firstly, Azure is the only DSS working in a layering manner.Each layer has specific responsibilities.As such Azure has been able to achieve the CAP requirements without compromising any of the three characteristics.[14] The Partition Layer provides transaction ordering leading to strong consistency.Separate replication engines (Intra-Stamp and Inter-Stamp) provide synchronous and asynchronous replication and also enlarged namespace also leading to strong consistency.

Design Principle (Availability) Dynamo
The design principle adopted by Dynamo is the ACID (Atomicity, Consistency, Isolation and Durability) properties.However Dynamo had to sacrifice the "C", consistency, to be able to achieve higher efficiency, which is provided using commodity hardware infrastructure and applying stringent SLA (Service Level Agreement) that states latency requirements of 99.9th percentile of the distribution.Dynamo uses versioning mechanism thoroughly and application-assisted conflict resolution in order to provide an ideal platform for developers.Also Dynamo has been built as a pure peer-to-peer architecture.

BigTable
BigTable has been designed to handle very large files generally measuring in the petabyte range.In fact BigTable is a combination of technologies.It uses services provided by GFS and Chubby.It employs ideas from Log-Structured Merge Tree and uses Distributed Hash Table for content location.These features provide availability to some extent.

Cassandra
The design of Cassandra has been based on the CAP theorem as opposed to ACID in Dynamo.. Cassandra aimed for Availability and Partitioning tolerance as consistency is achieved by using HBase.

PNUTS
PNUTS has been designed using four principles: (i) Asynchrony -the system provides high performance at large scale by using asynchrony, weak consistency and loose coupling; (ii) Automated replication and failure recovery has been put into the design to ensure high availability; (iii) The system has been designed to ensure that it is easy to use, operate and scale.Ease of use implies that the complexity of the system is hidden from the user, and only simple UI is provided.Ease of operation refers to the functions such as self-management and self-tuning, which are in-built into the system.Ease of scaling means that the addition of extra nodes should not affect the overall performance of the system; (iv) Multiple rich access methods are provided, including multiple types of primary tables and secondary indexing.

Haystack
While designing Haystack, four main goals were set: (i) High throughput and low latency -This has been achieved by requiring at most one disk operation per read, which has been made possible by shifting the metadata into main memory instead of using traditional databases; (ii) Fault-Tolerant -Haystack uses replication as the main method for Fault-Tolerance.As soon as a node has failed, another machine is added, the data is replicated to it, from the replicas scattered geographically; (iii) Cost-Effective -The author claimed that "each usable terabyte costs ~28% less and processes ~4x more reads per second than the previous system" [8]; (iv) Simple -Simple and straight forward design and implementation of the system.

Azure
Windows Azure System (WAS) has the following key design goal: (i) Strong Consistency -As compared to all other DSS, WAS offers the strongest consistency, and does so through the use of a layering system and applying both synchronous and asynchronous updates; (ii) Global and Scalable Namespace/Storage -WAS uses a namespace similar to a URL, which is employed to identify the Domain, the Account name, the Partition name and the Object name; (iii) Disaster Recovery -WAS uses replicas as well as erasure codes for redundancy.Erasure Code provides the ability to distribute parts of an object with redundant information that can be used for recovery in cases of disaster.Only a minimum of those parts are required to reconstruct the original object; (iv) Multitenancy and cost of storage -slashing the cost of storage has been possible by allowing several clients to share the same physical node to serve heterogonous application data.

Partitioned-Tolerance (Mechanism for Fault-Tolerance) Dynamo
Data objects are partitioned and replicated using consistent hashing.For redundancy purposes, each key range along with their values is stored over N machines.When the data object is required, a minimum of R (or W) machines is needed to return (or store) the data object respectively.This ensures consistency between the data objects.Failures and membership alterations are identified using a lightweight gossip-based protocol.Nodes communicate with each other every 10 seconds and keep track of the status of the membership.If a node is unreachable, other nodes will simply assume that it is down and continue communicating with other nodes.Consistency among replicas is achieved during update by applying a quorum-like technique with a decentralized synchronization protocol."Merkle trees" are used so that only the 'diff' of the tree is exchanged in the synchronization mechanism.

BigTable
Replication of data with versioning mechanism is used by BigTable to guard against failure.Given that files are stored in chunks and metadata stored tablet servers, even if some files or tablet servers are lost, they can be reconstructed using the master.However BigTable's dependency on Chubby locking service being up and running turns out to be a single point of failure and leads to downtime if Chubby is not working.

Cassandra
Cassandra uses different replication policies such as "Rack Aware", Rack Unaware" and "Data Center Aware".A leader is chosen among the nodes in the system, using a system called ZooKeeper.Any new node that joins in, will contact the leader, and will be assigned their responsibilities.The leader also ensures an optimal load balance among all nodes in the ring.Cassandra is described as "eventually consistent" because it doesn't guarantee that all replicas will always have the same data.[53] PNUTS Yahoo!'s PNUTS is a DSS that uses a centralized storage system and asynchronous replication to ensure low write latency while providing geographic replication.Updates are totally ordered to all replicas and follow a per-record timeline consistency.PNUTS has multi-level redundancy applicable to data, components as well as for metadata and leverages on its consistency model to be able to guard against hardware or network failures.

Haystack
Any photo is replicated to each of the physical volumes mapped to the assigned logical volume.Haystack Directory provides this mapping.So fault-tolerance is achieved by replicating each photo in geographically distinct locations.It is a fairly simplistic method of adding new machines to compensate failed machines and making more copies of the photos.The only single point of failure is the index file.Though it can be reconstructed by restarting the whole system.

Erasure Code
Following from section 4, it is found that the DSS will benefit enormously through the introduction of redundancy.The latter can be provided through the use of Erasure Code (EC).
In Erasure-Coded Storage Systems, a data object is divided into m blocks and they are encoded to generate the n redundant blocks (m<=n).Using the properties of erasure codes, the system is able to reconstruct the original data by collecting m out of the n encoded blocks.The data redundancy factor r is defined as n/m, and since m is always less than n, the data can be reconstructed given that the number of storage node failures does not exceed n-m.
Examples of such erasure codes include Reed-Solomon codes, Hierarchical codes, Regenerating codes and Self-Repairing codes.

Figure 2. Erasure Code vs Replication
From Figure 2, it is obvious that erasure codes bring a storage capacity saving of about 60-70% compared to replication, having the same amount of redundancy.The issue with erasure codes is that together with the benefits of reducing the storage space needed for the same amount of redundancy, it also brings along additional bandwidth requirements, storage overheads as well as computational loads when creating the blocks and repairing them (Repair Problem).Finding the optimal erasure code in terms of less processing and bandwidth requirement, is still an open problem.
Erasure Codes can be classified as per the specified categories: 1

Reed-Solomon Codes
Reed-Solomon code [54] is implemented is two parts: the encoder and decoder.The encoder takes a block of digital data and adds extra redundant bits, which are computed using the Reed-Solomon code.The decoder processes each block and attempts to correct errors and recover the original data.If an error occurs during transmission or storage, the error can be detected and corrected depending on the characteristics of the Reed-Solomon.
During the encoding process, data is divided into sets of k data symbols of fixed size s and some amount of parity symbols are generated and added to derive a codeword of the size n.Parity symbols derived will be n -k.It also means that up to n -k symbols can be corrected if the nk data symbols are erased.

Regenerating Codes
Regenerating codes [55] address the issue of rebuilding (also called repairing) lost encoded fragments by contacting only a subset of fragments thus reducing bandwidth requirements.
In traditional RS codes [54], each fragment is considered as a single symbol which is encoded together with other fragments and stored on different nodes.Regenerating code assumes that each fragments consists of α symbols.
That is one node will not only store one encoded fragment by several stored as one.During the repair process (a failed node), a random set of d residual nodes (known as helper nodes) are contacted and downloading of β <= α symbols from each node.The repair bandwidth, total amount of data downloaded amounts to γ = dβ.
Regenerating codes are characterised by [n, k, d] where d specifies the amount of nodes to be contacted for repair.With these new characteristics, came the issue of finding the trade-off between storage and repair bandwidth.

Local Reconstruction Codes
In LRC [56], the placement of blocks and rack awareness is of prime importance.LRC diminishes the amount of network traffic that needs to be transmitted when reading blocks for reconstruction of the object.By placing fragments closer to each other reduces transmission time, thus the reconstruction and repair time is also minimized.Another novelty in LRC is the usage of local parity.A group of blocks are encoded to generate one local parity and then other groups perform the same routine.Finally global parities are created by encoding all the blocks of the associated object.This is done by maintaining the same level of fault-tolerance.The highlight of this method is geared towards repairing single failure.
LRCs are significant for deployments where not only the repair bandwidth, but also the number of nodes to be in communication with, during repair matters.The number of nodes needed to be contacted during repair is often referred to as the repair locality.

Hierarchical codes
Most Erasure Codes such as Reed-Solomon Codes or MBR codes, are meant for solving the computational issues of building blocks.However they do not address the issue of limited bandwidth.Two erasure codes that aim at finding a flexible solution that allows the reduction of the network traffic while maintaining the benefits of traditional erasure codes are Hierarchical [57] and Self-Repairing Codes [58].However both solutions come with an additional storage costs, which is not a major issue in P2P systems, as storage is amply available.

Self-Repairing Codes
Self-Repairing Codes is a very interesting erasure codes.In [58], the codes are formulated with the acronym of PSRC which stands for Projective Self-Repairing Codes and PSRC is derived from a projective geometric construction.
The fundamental difference between Self-Repairing and Hierarchical Codes is in its construction, and satisfying some cardinal properties.
Self-Repairing codes has two main objectives: (i) Minimize the absolute amount of data transfer needed to recreate the lost data from one node and (ii) Minimize the number of nodes to be contacted for repairing one node failure.As such, in Self-Repairing codes, an object is also divided into blocks and some redundant blocks are created to add availability in case repair is required.

Issues in DSSs
Having surveyed the Distributed Storage Systems, several issues have been identified and they are, cost of storing replicated data, data location/migration and security.The most pertaining of them is Security to protect the data stored.

Dynamo, BigTable, Cassandra and Haystack
Most DSS have been designed to run in a trusted environment and to serve as a storage system for other services such as social network or e-commerce.Also each operation is meant to be atomic, hence provide some sort of secureness.DSS like Dynamo, BigTable, Cassandra and Haystack operates using those design principles.Haystack does have limitations on usage of cookies and providing minimal security.Encryption could have been implemented in BigTable as its data is structured.Security in those DSS have been omitted to provide faster access to the Big Data.

PNUTS
Security has been one of the requirements set in the design of Yahoo!'s PNUTS.A simple example of the low level of security in PNUTS, is that PNUTS does not always return the most current version of the data requested as it uses asynchronous updates.However little information is available on the application of security measures.

Azure
WAS is the DSS having security implanted at the various layers of its system.Access is granted by using a key generated by Azure for each user.However if the key is hijacked, the system will be compromised.WAS has a security mechanism to guard against faults such as "disk and node errors, power failures, network issues, bit-flip and random hardware failures" - [8].

Adoption of Erasure Code in DSSs
In this section, the factors that are preventing the full adoption of Erasure Code for cloud storage by major DSSs are discussed.

Security
From Table 2 and Section 6, security seems to be the issue that is predominant in most DSSs.Though a lot has been done by cloud providers to tighten up security, security breaches are still prevalent.Can erasure code be a solution to that issue?Given that the data is spilt, encoded and dispersed, a compromised node will not mean a compromised data.Added to that, if the dispersed data are further encrypted, this would increase the complexity for any breach.There have been several attempts in [59], [60] and [61] to implement Secure Erasure Code.

Live Data
Up to now, most deployments of erasure code storage [35], [61] are being used mainly for archiving purposes.There are two barriers preventing the usage of Erasure Code for live data.First, is the fact that the time taken for decoding is still quite high to be acceptable for live data.Secondly, erasure coded data is difficult to update.An update in a block would imply encoding all parity blocks.Update or mutation can happen beyond a single block, and this further entails additional processing.Authors of [62], [63], [64] and [65] have had significant enhancements to implement an updatable erasure coded DSS.

Support for NoSQL Databases
A large of amount of data being stored in the cloud, is through NoSQL databases.Those include transactional data, or data capture from IoT devices.Database Archiving is used to reduce the amount of data stored in the primary tables, but stored adjacently, to be recovered if ever needed.Replication is still used as redundancy method for archived database.Using erasure code for this purpose could enhance storage capacity usage.

Application Layer Control
Up to now, implementation of the erasure code systems have been limited to libraries, accessible through commands only.WebHDFS [66]  Energy Saving Capabilities This feature is not a barrier but rather a motivator.Erasure Coded DSS could be designed in such a way that they become energy saving (since parity blocks are only used when there is a failure).Those servers storing parity blocks could be placed in stand-by modes, thus using less energy.In a similar approach, there could be hierarchy of servers storing different levels of parity and placed in different energy saving modes.The authors of [68] did mention that erasure code can have energy saving potentials.Finally in [69], an attempt has been made to have energy efficient erasure code.

Conclusion
The increasing amount of data and incapacity of traditional databases to handle those quantities of data has led to a new type of data store named as Distributed Storage Systems.
In this work, we have surveyed six Distributed Storage Systems which are widely used as data storage in cloud computing settings.Most of the DSS have been built for a single purpose and they have been performing relatively well, though some had to sacrifice either on Consistency or Availability, but never on performance.Only BigTable and Azure have been deployed in several services.
Erasure Code is tipped to be the next best alternative to providing redundancy of stored data in the Cloud.Till date a lot of fundamental research have been undertaken in order to make erasure code more attractive for Cloud Service Providers and more performing for users.Its adoption is gradually happening as some of the world leading cloud providers (Facebook, Microsoft Azure, and IBM) are integrating Erasure Code in their system.Issues such as security, live update, and support for NoSQL and application layer control have been identified as barriers for the adoption of erasure code by DSS.

Azure
WAS also has two replication engines, namely, Intra-Stamp Replication and the Inter-Stamp Replication.The Intra-Stamp Replication is found in the stream layer and it provides synchronous replication and makes sure that all other replicas in the stamp are in sync.Its main objective is to provide redundancy in case of rack failure.Inter-Stamp Replication works in the partition layer and it provides asynchronous replication and makes sure that replicas in other stamps are in sync.Its objective is to provide redundancy for objects, and to ease disaster recovery.Another solution for cost reduction used in WAS, is the adoption of erasure codes for archival of Blobs.It uses Reed-Solomon codes for erasure coding algorithm.This mechanism has allowed the reduction in storage space from replicas to 1.3 -1.5x the original data.Erasure codes also increases the durability of the data stored.
Table, namely, Tapestry and Chord.Of all the DSS reviewed only GFS is widely used nowadays.

Table 1 .
Comparison of DSSs

Table 2 .
Summary of features of DSSs With modern Data Centres equipped with thousands of servers, both these disadvantages of Erasure Coding are now insignificant.
Replicated DataApart from Azure which uses Erasure Coding for storing archive data, the other five DSS uses only replication for providing storage redundancy to cater for hardware failure primarily.Replication consumes 300% more storage space compare to Erasure Code that requires around 60% -80% depending on policies used.Another drawback of replication is in terms of bandwidth consumption during the replication process, which is very greedy.The benefits of replication compared to Erasure Coding are that, replication requires less processing and less data nodes are involved.
offers an API to implement web interface to interact with Hadoop systems.However it does not support Erasure Coding functionalities existent in Hadoop version 3.0.[67]