Research Article
Context-Aware Fault Tolerance in Migratory Services
@INPROCEEDINGS{10.4108/ICST.MOBIQUITOUS2008.3564, author={Oriana Riva and Josiane Nzouonta and Cristian Borcea}, title={Context-Aware Fault Tolerance in Migratory Services}, proceedings={5th International ICST Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services}, publisher={ICST}, proceedings_a={MOBIQUITOUS}, year={2010}, month={5}, keywords={Context-aware Fault Tolerance Migratory Services Mobile Ad Hoc Networks}, doi={10.4108/ICST.MOBIQUITOUS2008.3564} }
- Oriana Riva
Josiane Nzouonta
Cristian Borcea
Year: 2010
Context-Aware Fault Tolerance in Migratory Services
MOBIQUITOUS
ICST
DOI: 10.4108/ICST.MOBIQUITOUS2008.3564
Abstract
Mobile ad hoc networks can be leveraged to provide ubiquitous services capable of acquiring, processing, and sharing real-time information from the physical world. Unlike Internet services, these services have to survive frequent and unpredictable faults such as disconnections, crashes, or users turning off their devices. This paper describes a context-aware fault tolerance mechanism for our migratory services model. In this model, a per-client service instance transparently migrates to different nodes in the network to provide a continuous and semantically-correct interaction with its client. The proposed fault tolerance mechanism extends the primary-backup approach with a context-aware checkpointing process. The backup node is dynamically selected based on its distance from the client and service, the similarity of its mobility pattern with those of the client and service, the frequency of the checkpointing process, and the size of the checkpointing state. We demonstrate the feasibility of our approach through a prototype implementation tested in a small scale ad hoc network of smart phones. Additionally, we simulate our mechanism in a realistic urban environment with 300 pedestrians, cyclists, and cars. Compared to approaches where the backup node is a neighbor of the service node or the client node itself, our mechanism performs as much as 80% better than the former for recovery ratio, and three times better than the latter for network overhead, while achieving better or similar recovery latency.