Pai
Resource scheduling and cluster management for AI
Install / Use
/learn @microsoft/PaiREADME
Open Platform for AI (OpenPAI) 
After the release of v1.8.1 , OpenPAI has entered stable mode with no major feature release planned. In order to save maintenance efforts, we changed the repo to read only mode. For collaboration, please contact repo admin directly.
With the release of v1.0, OpenPAI is switching to a more robust, more powerful and lightweight architecture. OpenPAI is also becoming more and more modular so that the platform can be easily customized and expanded to suit new needs. OpenPAI also provides many AI user-friendly features, making it easier for end users and administrators to complete daily AI tasks.
<table> <tr> <td align="center"> <span> </span> <br/> <a href="https://github.com/microsoft/openpaimarketplace" target="_blank"> <img src="https://openpai.readthedocs.io/en/latest/images/architecture/Picture1.svg" width="610" alt="Marketplace Logo" /> </a> <br/> <img src="https://openpai.readthedocs.io/en/latest/images/architecture/Picture2.svg" width="200" alt=" Web Portal" /> <a href="https://github.com/microsoft/openpaisdk" target="_blank"> <img src="https://openpai.readthedocs.io/en/latest/images/architecture/Picture3.svg" width="200" alt="VScode" /> </a> <a href="https://github.com/microsoft/openpaivscode" target="_blank"> <img src="https://openpai.readthedocs.io/en/latest/images/architecture/Picture4.svg" width="200" alt="SDK" /> </a> <br/> <img src="https://openpai.readthedocs.io/en/latest/images/architecture/Picture5.svg" width="610" alt="API" /> <br/> <img src="https://openpai.readthedocs.io/en/latest/images/architecture/Picture18.svg" width="610" alt="Services" /> <br/> <img src="https://openpai.readthedocs.io/en/latest/images/architecture/Picture19.svg" width="304" alt="User Authentication" /> <img src="https://openpai.readthedocs.io/en/latest/images/architecture/Picture20.svg" width="304" alt="User/Group Management" /> <br/> <img src="https://openpai.readthedocs.io/en/latest/images/architecture/Picture21.svg" width="304" alt="Storage Management" /> <img src="https://openpai.readthedocs.io/en/latest/images/architecture/Picture22.svg" width="304" alt="Cluster/Job Monitoring" /> <br/> <a href="https://github.com/microsoft/frameworkcontroller" target="_blank"> <img src="https://openpai.readthedocs.io/en/latest/images/architecture/Picture23.svg" width="304" alt="Job Orchestration" /> </a> <a href="https://github.com/microsoft/hivedscheduler" target="_blank"> <img src="https://openpai.readthedocs.io/en/latest/images/architecture/Picture24.svg" width="304" alt="Job Scheduling" /> </a> <br/> <a href="https://github.com/microsoft/openpai-runtime" target="_blank"> <img src="https://openpai.readthedocs.io/en/latest/images/architecture/Picture25.svg" width="304" alt="Job Runtime" /> </a> <img src="https://openpai.readthedocs.io/en/latest/images/architecture/Picture26.svg" width="304" alt="Job Error Analysis" /> <br/> <img src="https://openpai.readthedocs.io/en/latest/images/architecture/Picture15.svg" width="610" alt="Kubernetes Cluster Management" /> <br/> <img src="https://openpai.readthedocs.io/en/latest/images/architecture/Picture16.svg" width="610" alt="CPU/GPU/FPGA/InfiniBand" /> </td> </tr> </table>Table of Contents
- When to consider OpenPAI
- Why choose OpenPAI
- Get started
- Standalone Components
- Reference
- Related Projects
- Get involved
- How to contribute
When to consider OpenPAI
- When your organization needs to share powerful AI computing resources (GPU/FPGA farm, etc.) among teams.
- When your organization needs to share and reuse common AI assets like Model, Data, Environment, etc.
- When your organization needs an easy IT ops platform for AI.
- When you want to run a complete training pipeline in one place.
Why choose OpenPAI
The platform incorporates the mature design that has a proven track record in Microsoft's large-scale production environment.
Support on-premises and easy to deploy
OpenPAI is a full stack solution. OpenPAI not only supports on-premises, hybrid, or public Cloud deployment but also supports single-box deployment for trial users.
Support popular AI frameworks and heterogeneous hardware
Pre-built docker for popular AI frameworks. Easy to include heterogeneous hardware. Support Distributed training, such as distributed TensorFlow.
Most complete solution and easy to extend
OpenPAI is a most complete solution for deep learning, support virtual cluster, compatible with Kubernetes eco-system, complete training pipeline at one cluster etc. OpenPAI is architected in a modular way: different module can be plugged in as appropriate. Here is the architecture of OpenPAI, highlighting technical innovations of the platform.
Get started
OpenPAI manages computing resources and is optimized for deep learning. Through docker technology, the computing hardware are decoupled with software, so that it's easy to run distributed jobs, switch with different deep learning frameworks, or run other kinds of jobs on consistent environments.
As OpenPAI is a platform, there are typically two different roles:
- Cluster users are the consumers of the cluster's computing resources. According to the deployment scenarios, cluster users could be researchers of Machine Learning and Deep Learning, data scientists, lab teachers, students and so on.
- Cluster administrators are the owners and maintainers of computing resources. The administrators are responsible for the deployment and availability of the cluster.
OpenPAI provides end-to-end manuals for both cluster users and administrators.
For cluster administrators
The admin manual is a comprehensive guide for cluster administrators, it covers (but not limited to) the following contents:
-
Installation and upgrade. The installation is based on Kubespray, and here is the system requirements. OpenPAI provides an installation guide to facilitate the installation.
If you are considering upgrade from older version to the latest v1.0.0, please refer to the table below for a brief comparison between
v0.14.0and thev1.0.0. More detail about the upgrade considerations can be found upgrade guide.| |
v0.14.0|v1.0.0| | ----------------- | ------------------------ | ----------------------- | | Architecture | Kubernetes + Hadoop YARN | Kubernetes | | Scheduler | YARN Scheduler | HiveD / K8S default | | Job Orchestrating | YARN Framework Launcher | Framework Controller | | RESTful API | v1 + v2 | pure v2 | | Storage | Team-wise storage plugin | PV/PVC stor
Related Skills
node-connect
339.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
339.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.9kCommit, push, and open a PR
