Estimation of Distribution Algorithm for solving the Multi-mode Resource Constrained Project Scheduling Problem

The Multi-mode Resource Constrained Project Scheduling Problem is characterized by a set of tasks, resources and an objective function. All tasks of a project must be organized carefully taking into account precedence relations, the mode in which they are performed and availability of resources at all times. At present, around 71% of the projects related to the software industry are renegotiated or canceled causing negative impacts on both, social and economic areas. Among the root causes of these failures, deficiencies in planning processes and a lack of tools to help generate quasi-optimal project schedules are found. This kind of problem can be presented as an optimization problem subjected to two groups of restrictions: precedence relations and resource constraints. This paper aims at proposing a new Estimation of Distribution Algorithm applied for the resolution of the Multi-mode Resource Constrained Project Scheduling Problem. In particular, this algorithm is based on Factorized Distribution Algorithm in which the precedence relations of the problem are represented by the factorization. A comprehensive computational experiment is described, performed on a set of benchmark instances of the well-known Project Scheduling Problem Library (PSPLIB) in its Multi-mode variant. The results show that the proposed algorithm can find similar or sometimes even shorter makespans than others reported in bibliography.


Introduction
The project management keeps on a growing trend during recent years. Generally, investment processes are organized as projects with a high impact on society and economy. In this sense, factors such as technological changes, economic pressures, work in multidisciplinary teams, limited resources and time, are essential elements that must be carefully coordinated to achieve project objectives with an adequate balance between costs and time [1].
Within this context, the planning and construction process of optimal or quasi-optimal project schedules is a constant concern of the project managers. It is generally identified that the Project Scheduling Problem (PSP) is a complex problem Gaafar Sadeq S. Mahdi et al. 2 This situation can be mitigated by actions aimed at improving the quality of planning, supported in the construction processes of project schedules.
There are several schools that try to standardize the concepts and practices associated with project management and enhance the construction of project plans adjusted to the needs of the projects. Some of these institutions are the Software Engineering Institute with its CMMI proposal [11], [12], the Project Management Body of Knowledge (PMBOK) [13], and the ISO 21500 standard developed by the International Standards Organization ISO [14], [15].
These schools propose new techniques and in their recent versions, they have introduced the need for simulation techniques, data analysis and resource optimization in the construction of project schedules. However, some difficulties still persist, such as: • They do not propose concrete optimization techniques for the construction of project schedules. • They explain the need to consider restrictions on the availability of resources, but do not take into account restrictions related to the competences of human resources or other specific characteristics of resources that influence the duration of the tasks of a project. • PERT and CPM classic techniques for project scheduling do not explicitly consider the allocation of resources to tasks, but rather constitute tools to help graph and analyze the schedules once they have been built [16].
The aim of this work is to present the Constraint-based Learning Factorization of Distribution Algorithm (CLFDA) for the resolution of the Multi-mode Resource Constrained Project Scheduling Problem (MMRCPSP). The experiments were done on a set of benchmark instances of the Project Scheduling Problem Library (PSPLIB) in its Multi-mode variant. The rest of the paper is structured as follows: In section 2, a background to the study is presented where the different categories of PSP and formulate the Multi-mode Resource Constrained Project Scheduling Problem (MMRCPSP) are explained. Section 3 provides the algorithm proposed, while results and analysis are described in Section 4. Section 5 presents the conclusions and suggests directions for future research.

Project Scheduling Problems
Four theoretical PSP are listed below:

Resource Constrained Project Scheduling Problem (RCPSP)
It consists of establishing the sequence of a set of tasks of a single project, subjected to two types of constraints, precedence relations and the number of resources available to perform the tasks in every moment. In this problem, the objective is to minimize the makespan of the project [17]. A limitation of this problem is that it does not distinguish between the particularities of these resources that can significantly influence the duration of the tasks.

Resource Constrained Multi-project Scheduling Problem (RCMPSP)
It is a generalized case of the RCPSP where it is desired to develop multiple project schedules simultaneously with limited resources [18]. Unlike the previous case, a new variable emerges that is the priority among projects for the consumption of the resources.

Multi-mode Resource Constrained Project Scheduling Problem (MMRCPSP)
It is an extension of the RCPSP that involves the selection of a performance mode for each task, where each mode is associated with a duration and a quantity of renewable and non-renewable resources required for the performance of that task. Moreover, this problem takes into account restrictions such as the precedence between tasks and the availability of resources. The use of modes helps to identify, for example, that similar resources, but with different characteristics, can have a significant impact on the duration of the tasks [19], [20].

Multi-mode Resource Constrained Multi-project Scheduling Problem (MMRCMPSP)
It combines the two previous concepts [21].
Considering the complexity involved in solving these problems, many authors make use of soft computing methods, especially meta-heuristics techniques [2,21]. Some authors [22], [23] propose algorithms based on Particle Swarm Optimization (PSO), others propose techniques based on Tabu Search [24], [25], while the most used meta-heuristic is the based on Genetic Algorithms (GA) [26], [27], [16], [28]. Ayodele [29][30] and collaborators apply an Estimation of Distribution Algorithm (EDA), but based on static learning, where new individuals are generated from exploring the most promising areas in the search space, based on the distribution of the best individuals of the previous generation. In this algorithm, a solution is coded using mode assignment and a list of tasks.
These algorithms implement known mechanisms such as, data pre-processing, instance generation using Sequence Generator Scheme (SGS), and a penalty function for the violation of restrictions associated with the availability of resources.
While these metaheuristics provide solutions to simple PSP, in the MMRCPSP case, they do not adequately handle the work with restrictions, especially they fail in problems with a high level of complexity. This fact means that the solutions obtained are not the best regarding the optimization of the execution time of the projects. In addition to this, they do not handle the difference between human resources competencies to execute each type of task, an element that significantly influences the execution time and the cost of the projects.

MMRCPSP Modelling
Multi-mode Resource Constrained Project Scheduling Problem (MMRCPSP) is a generalized version of RCPSP. The term multimode indicates that project tasks can be carried out in different ways (modes); each mode has a specific duration and corresponds to a certain number of resources. Due to this approach, planners take into account situations in which, for example, incorporating new resources for a task with the objective of minimizing its duration. So, the computational time required to solve the MMRCPSP problem is much more than the one needed in RCPSP.
A project contains a set of ∈ tasks. These tasks have a set = {1, … , } processing modes. A task , performed in mode takes a period of time , needs a certain number of resources and (renewable and non-renewable).
The first mathematical formulation of the MMRCPSP problem that took into account non-renewable resources was presented by Talbot in [31]. He proposes a linear model with binary variables and defines the variables as follows: = 1 if the task has started in mode and is completed in period ∈ [ , ], and 0 in other cases. The parameters are summarized in Table 1. See [32] For more understanding of these parameters. The formulation of this problem following a linear model in integers is represented as follows: Where the completion time of all tasks is minimized (minimization of the project's makespan), subjected to the following restrictions: 1.2 controls that all tasks are accomplished in a processing mode and at some point during the progress of the project. Finally, restriction 2.1.5 is the one that controls that the number of consumable (non-renewable) resources available is not exceeded.

Design of EDA for the resolution of the MMRCPSP
The Estimation of Distribution Algorithms (EDA) has been developed in 1996 by the authors Muehlenbein, Mahnig and Ochoa [33]. In general, EDA constitutes a family of algorithms to solve various optimization problems [34] and arises as an alternative to the difficulties of Genetic Algorithms (GA). These difficulties are associated with the fact that GA, by their nature, do not explicitly express the interdependencies between the variables of the problem and do not use this information sufficiently during the search process [35], [36]. The main characteristic of EDA is the identification of probabilistic distribution functions that model the dependency relations among the variables of the problem to be solved and the generation of new individuals from that distribution. This section presents CLFDA which constitutes an EDA algorithm. It includes the constraint handling inside the probabilistic model, for the resolution of the MMRCPSP problem presented in the previous section. A brief analysis is carried out that shows the differences between the CLFDA (Algorithm 2) and an EDA in its classical form (Algorithm 1). Algorithm 1. Pseudo-code of an EDA algorithm in its classical form The fundamental characteristics of CLFDA can be defined as follows: • Estimation of the probabilistic model, inspired by the FDA algorithm [37], which describes the dependency relations among variables of the selected individuals. • Construction of the probabilistic model by considering the problem constraints.
Algorithm 2. The pseudo-code of the algorithm proposed in this work A detailed explanation of the proposed model is provided below: (i) The constraints of the optimization problem are presented as follows: • Precedence restrictions, where are known, from each task its successor tasks. • Restrictions associated with the number of renewable and non-renewable resources available for project accomplishment. • Each task mode contains the duration and constraints of renewable and non-renewable resources. (ii) Population definition: each population has a fixed size specified as a parameter. The initial population is randomly generated. (iii) Design of the individual: each individual constitutes a possible solution to the scheduling problem and is formed by a sequence of tasks (see Figure 1). Each task has two relevant features: start date of the task (s) and the mode (m) it was performed in. In addition, there is a set of complementary attributes of each task that are estimated from the mode and the start date. For example, the closing date of a task can be calculated from the starting date and the mode duration of the task. (iv) Evaluation method and objective function: individuals are evaluated taking into account the objective function described in equation 3.1, where: • ∈ represent the individuals of each population.
• represents the final day of task . • represents the closing date of task of individual . • represents the cost of carrying out the task of the individual calculated from the sum of the costs of the resources associated with that task.
(v) Definition of the selection strategy: the selection method is based on Pareto optimization [38] or ranking, considering the cost and time objectives. The selection process is carried out by iterations until an amount equal to 30% of the population size. (vi) Elitism: it was applied the best elitism. (vii) Definition of the stop condition: it finds the optimum or reach the maximum number of generations.
The objective function realizes the minimization of the project's makespan and cost.
Minimize ( 1 , 2 ) (3.1) Where: Subjected to the following restrictions: Equation 3.4 presents the restriction that ensures that the use of available material (non-renewable) resources is not exceeded. Where ( , ) represents the number of times the resource ∈ is being used considering all the tasks ∈ during the project progress. Equation 3.5 presents the restriction that guarantees the per-period availabilities of human resources and equipment at time t for the execution of task ∈ (resource ∈ is not shared for more than one task at the same time t).
( , , ) represents the number of times resource is being used considering all tasks ∈ at one time t. Equation 3.6 represents the restriction that the starting task moment is always less than that of its completion day. Equation 3.7 represents the restriction that ensures that the precedence relations among tasks are not violated, where h is the set of the predecessor tasks of .

Experimental results and discussion
This section presents the results of the application of the proposed algorithm in solving the Multi-mode Resource Constrained Project Scheduling Problem (MMRCPSP). The main objective is to minimize the duration of the project, taking into account restrictions associated with the precedence among tasks and restrictions on resource constraints, whether renewable or nonrenewable.
The experiments were performed on the datasets "j30_17.mm", "c15_9.mm", "c15_10.mm" and "c15_12.mm" from the PSPLIB repository (Project Scheduling Problem Library) [32], [39] in its variant of multiple modes. The number of tasks, resources, modes and constraints have been selected to present a diverse set of problems. Each dataset consists of ten instances with two types of renewable resources and two non-renewable resources. The number of tasks varies between 16 and 30, and the number of modes is three for all instances.
The PSPLIB library available at http://www.omdb.wi.tum.de/psplib contains sets of instances that represent different categories of scheduling problems. These instances were generated using the ProGen generator. In addition, PSPLIB presents for each instance, the optimal solution and the best solution reached by different authors so far. Datasets can be used by researchers to evaluate their procedures for solving scheduling problems.
The general format of the file proposed by PSPLIB as a problem instance is described below: • Number of tasks (jobs).
• Number of modes in which each task can be executed (#modes). • Number of types of renewable resources existing in the problem (renewable). • Maximum availability of each type of renewable resources (RESOURCEAVAILABILITIES). • Number of types of non-renewable resources existing in the problem (nonrenewable).

• Number of successors of each activity (#successors).
• Set of successor tasks of each task (successors). • Duration of each task (completion time).
• The minimum possible time to carry out the project (duedate), which represents the optimum makespan value to be achieved.
In the experiments, the results of the CLFDA algorithm are compared with: • A genetic algorithm (GA).
• The best results reported in the PSPLIB repository (Reported_Bibliography).

EAI Endorsed Transactions on Energy Web Online First
Gaafar Sadeq S. Mahdi et al. 6 The validation stage used two optimization strategies crucial to this study. The first strategy is called just_time where the optimization process is based only on the time objective, and therefore the objective function only evaluates the solutions with respect to the makespan variable. While the second optimization strategy is based on the Pareto Optimal (pareto) approach and is oriented simultaneously on the time (makespan) and cost objectives, in this case, the optimization of plans based on equation (3.1) was used. For a fair comparison, the following common parameters are established for all the algorithms: The experiments have been performed by running each algorithm 20 times for every instance of the four datasets. The results of the algorithms were evaluated with respect to the following variables: • Mean_Makespan: average makespan considering the 20 runs for each dataset instance.
• StdDev_Makespan: standard deviation to the optimum value over the 20 runs for each dataset instance. • %Optimum: percent of times where the algorithm finds the optimal makespan of the dataset over the 20 runs. • Execution_time: average time used by the algorithm to execute 100 iterations.
In order to compare the algorithms, authors used SPSS version 25 and the Wilcoxon's non-parametric test for two samples related, with 95% of confidence interval and 0.05 significance level.
In the comparison, the groups of algorithms were organized according to the quality of the results, that means "Group A results" > "Group B results" > "Group C results" > "Group D results". The algorithms in the same group did not have significant differences between them. Regarding variable Mean_Makespan in j30_17 dataset, there were not significant differences between CLFDA_just_time and results reported in the bibliography.
The best results were obtained by CLFDA_just_time, whereas the worst results were obtained by GA_pareto.  As regards variable StdDev_Makespan in dataset j30_17, there are not significant difference between CLFDA_just_time and results reported in the bibliography. These algorithms reported the best results; whereas the worst results were obtained by GA_pareto algorithm. In dataset j30_17, concerning variable %Optimum, the best results were obtained by the algorithm CLFDA_just_time and the worst results were obtained by GA_pareto algorithm.
In this dataset, just_time optimization strategy reports better results than pareto optimization.
Estimation of Distribution Algorithm for solving the Multi-mode Resource Constrained Project Scheduling Problem In dataset J30_17, respect to variable Execution_time, the best results were obtained by the Genetic Algorithms approach. In particular, GA_just_time was the fastest algorithm; whereas CLFDA were the highest time consume algorithms. As for variable Mean_Makespan in dataset c15 with all 30 instances from c15_9, c15_10 and c15_12, there were not significant differences between CLFDA_just_time and results reported in the bibliography. The best results were obtained by CLFDA_just_time and the worst results were obtained by GA_pareto. EDA algorithms (CLFDA and UMDA) using just_time strategy reported better results than the same algorithms with the pareto optimization strategy.
Gaafar Sadeq S. Mahdi et al. In these datasets, for variable StdDev_Makespan, the algorithms Reported_ Bibliography, CLFDA_pareto, CLFDA_just_time did not have significant differences, whereas the worst results were obtained by GA algorithms. In this case, differences between just_time and pareto optimization strategies were not found to be significant. In c15_9, c15_10 and c15_12 datasets and in relation to variable %Optimum, the best results were obtained by the algorithms CLFDA_pareto, CLFDA_just_time, whereas the worst results were obtained by GA_pareto algorithm. In this variable, as in Mean_Makespan, there were not significant differences between just_time and pareto strategies. The worst results were obtained by GA approach.
Estimation of Distribution Algorithm for solving the Multi-mode Resource Constrained Project Scheduling Problem In c15_9, c15_10 and c15_12 datasets, regarding variable Execution_time, the best results were obtained by Genetic Algorithms approach, whereas the CLFDA approach used a higher time for the same stop criterion (100 generations). This is because CLFDA algorithms spend more time to detect the dependency relation among variables, but it is able to find solutions which have never been found by GA or UMDA. Furthermore, the objective is to minimize the makespan of the project, not to minimize the execution time of the algorithms.

Conclusion
In this paper, a new approach on Estimation of Distribution Algorithms with constraints handling inside the probabilistic model to solve the Multi-mode Resource Constrained Project Scheduling Problems was developed. The proposal was applied to four datasets of PSPLIB in its multi-mode variant, which have several complexity degrees (task numbers, number of modes, and number of resources). The cost component to be optimized along with time was added, always looking for a balance between them.
The obtained results prove to be very effective on the benchmark instances and improved others reported in the bibliography, especially in j30_17, c15_9 and c15_12 datasets.
Overall, CLFDA_just_time has been selected as the best algorithm. It achieved the best results for the Mean_Makespan, StdDev_MakeSpan and %Optimum variables.
Towards further improvement, the flexibility of adding different components like project quality to the model makes the procedure particularly useful. Moreover, further research on different strategies to diversify the search process can lead to a superior performance of the algorithm.