Optimizing Non-contiguous Accesses in MPI-IO
Title: Optimizing Non-contiguous Accesses in MPI-IO
Abstract: This research focuses on the optimization of non-contiguous accesses in MPI-IO, a parallel I/O interface. The study classifies different ways of expressing an application's I/O needs in MPI-IO into four levels: level0, level1, level2, and level3. It demonstrates that using level3 requests (non-contiguous, collective) rather than level0 requests (Unix-style) can significantly improve I/O performance. The research describes how ROMIO, a portable MPI-IO implementation, delivers high performance for non-contiguous requests using two key optimizations: data sieving for non-contiguous requests from one process and collective I/O for non-contiguous requests from multiple processes. The study presents performance and portability results for three applications: DIST3D, NAS BTIO benchmark, and UNSTRUC on five parallel machines: HP Exemplar, IBM SP, Intel Paragon, NECSX-4, and SGI Origin 2000.
Research Question: How can MPI-IO be optimized to improve the performance and portability of parallel applications with non-contiguous access patterns?
Methodology: The research team classified different ways of expressing an application's I/O needs in MPI-IO into four levels: level0, level1, level2, and level3. They demonstrated that using level3 requests (non-contiguous, collective) rather than level0 requests (Unix-style) can significantly improve I/O performance.
ROMIO, a portable MPI-IO implementation, was used to deliver high performance for non-contiguous requests. Two key optimizations were employed: data sieving for non-contiguous requests from one process and collective I/O for non-contiguous requests from multiple processes.
Performance and portability results were obtained for three applications: DIST3D, NAS BTIO benchmark, and UNSTRUC on five parallel machines: HP Exemplar, IBM SP, Intel Paragon, NECSX-4, and SGI Origin 2000.
Results: The research demonstrated that using level3 requests (non-contiguous, collective) rather than level0 requests (Unix-style) can significantly improve I/O performance. ROMIO, the portable MPI-IO implementation, successfully delivered high performance for non-contiguous requests using data sieving and collective I/O optimizations.
Implications: The research has significant implications for the performance and portability of parallel applications with non-contiguous access patterns. By using level3 requests and implementing data sieving and collective I/O optimizations, researchers can improve the efficiency of MPI-IO and enhance the overall performance of parallel applications.
Conclusion: This research has provided valuable insights into optimizing non-contiguous accesses in MPI-IO. The classification of different ways of expressing I/O needs in MPI-IO and the demonstration of improved performance using level3 requests have important implications for the development of high-performance parallel applications.
Link to Article: https://arxiv.org/abs/0310029v1 Authors: arXiv ID: 0310029v1