Research Article
A Fine-grained Parallel Approach for one Logical Process on Multi-core Machines
@INPROCEEDINGS{10.1145/3173519.3173541, author={Jiawei Fei and Yiping Yao and Jing Luan and Lufan Li and Yingqian Bao}, title={A Fine-grained Parallel Approach for one Logical Process on Multi-core Machines}, proceedings={10th EAI International Conference on Simulation Tools and Techniques}, publisher={ACM}, proceedings_a={SIMUTOOLS}, year={2018}, month={8}, keywords={pdes logical process fine-grained parallel multi-core}, doi={10.1145/3173519.3173541} }
- Jiawei Fei
Yiping Yao
Jing Luan
Lufan Li
Yingqian Bao
Year: 2018
A Fine-grained Parallel Approach for one Logical Process on Multi-core Machines
SIMUTOOLS
ACM
DOI: 10.1145/3173519.3173541
Abstract
Currently, the time management algorithms applied in various parallel discrete event simulation (PDES) engines take the logical process as the smallest parallel unit which corresponds to a physical process and represents a sequential simulation. Before the simulation system is running, all the entities are distributed to each logical process. The parallelism of the simulation system depends on the parallelism between logical processes. The performance of this parallel approach is greatly affected by the entity distribution scheme. And it is difficult to find a suitable entity distribution scheme when it comes to simulation with hotspots which migrate between LPs dynamically. Besides, we always use as many LPs as possible to improve parallelism, but this also brings greater communication and synchronization overhead. Focusing on these drawbacks of current simulation engines, we propose an approach which supports fine grain parallelism in one LP. Our approach can process events of different entities in one LP in parallel with multi threads. We also propose a multi-level GVT management method to achieve synchronization in thread level and LP level efficiently. Finally, we use star structure and ring structure Pholds benchmarks to test our optimized simulation engine. Results show that, in either case, the optimized simulation engine shows higher performance. Especially when the load between LPs is unbalanced, our method can achieve much more performance improvement.