Optimizing Non-contiguous Accesses in MPI-IO

From Simple Sci Wiki
Revision as of 14:37, 24 December 2023 by SatoshiNakamoto (talk | contribs) (Created page with "Title: Optimizing Non-contiguous Accesses in MPI-IO Abstract: This research focuses on the optimization of non-contiguous accesses in MPI-IO, a parallel I/O interface. The study classifies different ways of expressing an application's I/O needs in MPI-IO into four levels: level0, level1, level2, and level3. It demonstrates that using level3 requests (non-contiguous, collective) rather than level0 requests (Unix-style) can significantly improve I/O performance. The resea...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Title: Optimizing Non-contiguous Accesses in MPI-IO

Abstract: This research focuses on the optimization of non-contiguous accesses in MPI-IO, a parallel I/O interface. The study classifies different ways of expressing an application's I/O needs in MPI-IO into four levels: level0, level1, level2, and level3. It demonstrates that using level3 requests (non-contiguous, collective) rather than level0 requests (Unix-style) can significantly improve I/O performance. The research describes how ROMIO, a portable MPI-IO implementation, delivers high performance for non-contiguous requests using two key optimizations: data sieving for non-contiguous requests from one process and collective I/O for non-contiguous requests from multiple processes. The study presents performance and portability results for three applications: DIST3D, NAS BTIO benchmark, and UNSTRUC on five parallel machines: HP Exemplar, IBM SP, Intel Paragon, NECSX-4, and SGI Origin 2000.

Research Question: How can MPI-IO be optimized to improve the performance and portability of parallel applications with non-contiguous access patterns?

Methodology: The research team classified different ways of expressing an application's I/O needs in MPI-IO into four levels: level0, level1, level2, and level3. They demonstrated that using level3 requests (non-contiguous, collective) rather than level0 requests (Unix-style) can significantly improve I/O performance.

ROMIO, a portable MPI-IO implementation, was used to deliver high performance for non-contiguous requests. Two key optimizations were employed: data sieving for non-contiguous requests from one process and collective I/O for non-contiguous requests from multiple processes.

Performance and portability results were obtained for three applications: DIST3D, NAS BTIO benchmark, and UNSTRUC on five parallel machines: HP Exemplar, IBM SP, Intel Paragon, NECSX-4, and SGI Origin 2000.

Results: The research demonstrated that using level3 requests (non-contiguous, collective) rather than level0 requests (Unix-style) can significantly improve I/O performance. ROMIO, the portable MPI-IO implementation, successfully delivered high performance for non-contiguous requests using data sieving and collective I/O optimizations.

Implications: The research has significant implications for the performance and portability of parallel applications with non-contiguous access patterns. By using level3 requests and implementing data sieving and collective I/O optimizations, researchers can improve the efficiency of MPI-IO and enhance the overall performance of parallel applications.

Conclusion: This research has provided valuable insights into optimizing non-contiguous accesses in MPI-IO. The classification of different ways of expressing I/O needs in MPI-IO and the demonstration of improved performance using level3 requests have important implications for the development of high-performance parallel applications.

Link to Article: https://arxiv.org/abs/0310029v1 Authors: arXiv ID: 0310029v1