Research Article
Designing Automation for Pickup and Delivery Tasks in Modern Warehouses Using Multi Agent Path Finding (MAPF) and Multi Agent Reinforcement Learning (MARL) Based Approaches
@ARTICLE{10.4108/airo.3449, author={Shambhavi Mishra and Rajendra Kumar Dwivedi}, title={Designing Automation for Pickup and Delivery Tasks in Modern Warehouses Using Multi Agent Path Finding (MAPF) and Multi Agent Reinforcement Learning (MARL) Based Approaches}, journal={EAI Endorsed Transactions on AI and Robotics}, volume={3}, number={1}, publisher={EAI}, journal_a={AIRO}, year={2024}, month={3}, keywords={Multi agent pickup and delivery problem, multi agent reinforcement learning, MARL, Multi agent path finding}, doi={10.4108/airo.3449} }
- Shambhavi Mishra
Rajendra Kumar Dwivedi
Year: 2024
Designing Automation for Pickup and Delivery Tasks in Modern Warehouses Using Multi Agent Path Finding (MAPF) and Multi Agent Reinforcement Learning (MARL) Based Approaches
AIRO
EAI
DOI: 10.4108/airo.3449
Abstract
A warehouse pickup and delivery problem finds its solution using multi agent path finding (MAPF) approach. Also, the problem has been used to showcase the capabilities of the multi agent reinforcement learning (MARL). The warehouse pickup and delivery work needs the agent to pick up a requested item and successfully deliver it to the intended location within the warehouse. The problem has been solved based on two approaches that include single shot and lifelong problem solution. The single shot solution has the delivery as the final goal and thus once it reaches the delivery address, it stops whereas in case of lifelong, the agent needs to deliver the item which it had picked, deliver it to the required place and then again pick up new item until requests are satisfied. The strategy used by multi agent path finding (MAPF) approach aims at constructing collision free paths to reach the delivery location but in case of multi agent reinforcement learning (MARL), the agents’ decision making tactics (or policies) are learned which are then used to help agents decide path to be followed based on environment state and agent’s position. The results show that the lifelong conflict based search (CBS) is a better option when the agents are less in number as in that case, the re-planning will take overall less time but when the agents are large in number then this re-planning can take very long to produce conflict free paths from source to goal nodes. In this case, shared experience action critic (SEAC) which is based on multi agent reinforcement learning (MARL) approach can be more efficient choice as it takes the current environment state to give the most suitable action for that time t. For this study the agents taken for learning are homogeneous in nature that can pickup and deliver any type of requested item. We can address the same pickup and delivery problem when the agents are not all same and differ in their capabilities and the type of item they can handle.
Copyright © 2024 S. Mishra et al., licensed to EAI. This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.