MPD: A Scalable Process Management System for Parallel Programs

From Simple Sci Wiki
Jump to navigation Jump to search

Title: MPD: A Scalable Process Management System for Parallel Programs

Abstract: MPD, or Multipurpose Daemon, is a process management system designed for parallel programs. Its primary goal is to be scalable, meaning it can quickly start and terminate interactive parallel jobs, deliver signals to processes, and manage stdin, stdout, and stderr efficiently. MPD is suitable for clusters of SMPs and can be used in more tightly integrated environments. This paper describes how MPD enables faster startup and better runtime management of parallel jobs, and how it simplifies the implementation of system utilities and a parallel debugger. It also presents a general interface between process managers and parallel libraries, which is implemented in MPD.

Research Question: How can a process management system be designed and implemented to be scalable and efficiently manage parallel jobs?

Methodology: The research team developed MPD, a process management system for parallel programs. They designed and implemented it with the goal of scalability, focusing on fast startup of parallel jobs and efficient runtime management. MPD separates the process manager from the parallel library using a general interface, allowing for independent development of each component.

Results: MPD successfully achieves its goals. It provides fast startup of parallel jobs, efficient signal delivery, and effective management of stdin, stdout, and stderr. The system utilities and parallel debugger implemented using MPD demonstrate its flexibility and scalability.

Implications: The research highlights the importance of scalability in process management systems for parallel programs. The general interface between the process manager and parallel library implemented in MPD can be used to develop more efficient and scalable parallel programs. The system's design and implementation provide valuable insights for future research in this area.

Link to Article: https://arxiv.org/abs/0102017v1 Authors: arXiv ID: 0102017v1