1st International Workshop on Advanced Architectures and Algorithms for Internet DElivery and Applications

Research Article

Autonomic Data Placement Strategies for Update-intensive Web applications

  • @INPROCEEDINGS{10.1109/AAA-IDEA.2005.4,
        author={Swaminathan Sivasubramanian and Guillaume Pierre and Maarten van Steen},
        title={Autonomic Data Placement Strategies for Update-intensive Web applications},
        proceedings={1st International Workshop on Advanced Architectures and Algorithms for Internet DElivery and Applications},
        keywords={web applications},
  • Swaminathan Sivasubramanian
    Guillaume Pierre
    Maarten van Steen
    Year: 2006
    Autonomic Data Placement Strategies for Update-intensive Web applications
    DOI: 10.1109/AAA-IDEA.2005.4
Swaminathan Sivasubramanian1,*, Guillaume Pierre1,*, Maarten van Steen1,*
  • 1: Dept. of Computer Science, Vrije Universiteit Amsterdam, The Netherlands
*Contact email: swami@cs.vu.nl, gpierre@cs.vu.nl, steen@cs.vu.nl


Edge computing infrastructures have become the leading platform for hosting Web applications. One of the key challenges in these infrastructures is the replication of application data. In our earlier research, we presented GlobeDB, a middleware for edge computing infrastructures that performs autonomic replication of application data. In this paper, we study the problem of data unit placement for updateintensive Web applications in the context of GlobeDB. Our hypothesis is that there exists a continuous spectrum of placement choices between complete partitioning of sets of data units across edge servers and full replication of data units to all servers. We propose and evaluate different families of heuristics for this problem of replica placement. As we show in our experiments, a heuristic that takes into account both the individual characteristics of data units and the overall system load performs best. locality among requests) and the presence of data updates significantly reduce the effectiveness of these solutions. To handle such applications, CDNs often employ edge computing infrastructures where the application code is replicated at all edge servers. Database accesses become then the major performance bottleneck. This warrants the use of database caching solutions, which cache certain parts of the database at edge servers and are kept consistent with the central database. However, these infrastructures require the database administrator to define manually which part of the database should be placed at which edge server. In our earlier work, we described the design and implementation of GlobeDB, an autonomic replication middleware for Edge Computing infrastructures. The distinct feature of GlobeDB is that it performs autonomic placement of application data by monitoring the access to the underlying data. Instead of replicating all data units at all edge servers, GlobeDB automatically replicates the data only to the edge servers that access them often. GlobeDB provides Web-based data-intensive applications the same advantages that CDNs offer to traditional Web sites: low latency and reduced network usage [13]. The data placement heuristics developed in this previous work assumed that the number of data update requests is relatively low compared to that of the read requests. While this assumption is often true, there exists a class of applications that receive a large number of updates. For example, a stock exchange Web site which allows its customer to bid or sell stocks in real time is likely to receive large quantities of updates (the New York Stock Exchange receives in the order of 700 update requests per second [8]). Replicating an update-intensive application while maintaining consistency among the replicas is difficult because each update to a given data unit must be applied at every server that holds a copy of it. In such settings, creating extra replicas of a data unit can have the paradoxical effect of increasing the global system’s load rather than decrease it. This may be a significant problem as the service time to update a data unit is usually an order of magnitude higher than that to read a data unit.