Flexible failure handling for cooperative processes in distributed systems

Artin Avanes; Johann-Christoph Freytag

5th International ICST Conference on Collaborative Computing: Networking, Applications, Worksharing

Research Article

Flexible failure handling for cooperative processes in distributed systems

Download451 downloads

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.4108/ICST.COLLABORATECOM2009.8306 ,
    author={Artin Avanes and Johann-Christoph Freytag},
    title={Flexible failure handling for cooperative processes in distributed systems},
    proceedings={5th International ICST Conference on Collaborative Computing: Networking, Applications, Worksharing},
    proceedings_a={COLLABORATECOM},
    year={2009},
    month={12},
    keywords={Computer networks Concurrent computing Distributed computing Handheld computers Parallel processing Performance gain Protocols Prototypes Sensor systems Wireless sensor networks},
    doi={10.4108/ICST.COLLABORATECOM2009.8306 }
}

Artin Avanes
Johann-Christoph Freytag
Year: 2009
Flexible failure handling for cooperative processes in distributed systems
COLLABORATECOM
ICST
DOI: 10.4108/ICST.COLLABORATECOM2009.8306

Artin Avanes¹^,*, Johann-Christoph Freytag¹^,*

1: Database and Information System Group, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099 Berlin, Germany

*Contact email: avanes@informatik.hu-berlin.de, freytag@informatik.hu-berlin.de

Abstract

Distributed systems will be increasingly built on top of wireless networks, such as sensor networks or hand-held devices with advanced sensing and computational abilities. Supporting cooperative processes executed by such unreliable and dynamic system components poses a various number of new technical challenges. In terms of recovery, limited resource capabilities have be considered during re-scheduling of failed process activities. In terms of concurrency, a non-blocking protocol is required to allow a high degree of parallelism. In this paper, we introduce a flexible and resource-oriented failure handling mechanism for cooperative processes in hierarchical and distributed systems. The objective is to ensure both - transactional semantics as well as the selection of suitable nodes with respect to available resource capabilities. Based on a nested execution model, we develop a multi-stage algorithm that uses constraint solving techniques in a parallel fashion thus achieving a more efficient recovery. We evaluate our proposed techniques in a prototype implementation, and demonstrate significant performance gains by using a parallel re-scheduling.

Keywords: Computer networks Concurrent computing Distributed computing Handheld computers Parallel processing Performance gain Protocols Prototypes Sensor systems Wireless sensor networks

Published: 2009-12-28

: http://dx.doi.org/10.4108/ICST.COLLABORATECOM2009.8306