2nd International IEEE Conference on Communication System Software and Middleware

Research Article

Scheduling in Grid: Rescheduling MPI applications using a fault-tolerant MPI implementation

  • @INPROCEEDINGS{10.1109/COMSWA.2007.382446,
        author={ Moosani Vivekananda Reddy and Sanjay Chaudhary},
        title={Scheduling in Grid: Rescheduling MPI applications using a fault-tolerant MPI implementation},
        proceedings={2nd International IEEE Conference on Communication System Software and Middleware},
        publisher={IEEE},
        proceedings_a={COMSWARE},
        year={2007},
        month={7},
        keywords={Grid Computing  MPI  Scheduling in Grid},
        doi={10.1109/COMSWA.2007.382446}
    }
    
  • Moosani Vivekananda Reddy
    Sanjay Chaudhary
    Year: 2007
    Scheduling in Grid: Rescheduling MPI applications using a fault-tolerant MPI implementation
    COMSWARE
    IEEE
    DOI: 10.1109/COMSWA.2007.382446
Moosani Vivekananda Reddy1,*, Sanjay Chaudhary2,*
  • 1: Persistent Systems Pvt. Ltd, Pune, India
  • 2: Dhirubhai Ambani-Institute of Information and Communication Technology (DA-IICT), Gandhinagar, India.
*Contact email: vivekananda_moosani@persistent.co.in, sanjay_chaudhary@daiict.ac.in

Abstract

Due to advancement in grid technologies, resources spread across the globe can be accessed using standard general-purpose protocols. Simulations and scientific experiments were earlier restricted due to limited availability of the resources. These are now carried out vigorously in the grid. Grid environments are dynamic in nature. The resources in a grid are heterogeneous in nature and are not under a central control. So scheduling in grid is complex. The initial schedule obtained for an application may not be good as it involves the selection of resources at a future time. The resource characteristics like CPU availability, memory availability, network bandwidth etc keep changing. Rescheduling becomes necessary under these conditions. The research experiment uses the fault-tolerant functionalities of MPICH-V2 to migrate MPI processes. Load-balancing modules, which make a decision of when and where to migrate a process are added into the MPICH-V2 system. Simulations are done to show that process migration is viable rescheduling technique for computationally intensive applications. The research experiment also gives brief descriptions of some existing fault-tolerant MPI implementations.