From e413d786304986d9fa3891979a4797eea9fca0d5 Mon Sep 17 00:00:00 2001 From: Carlos Eduardo Arango Gutierrez Date: Fri, 11 Mar 2022 17:57:03 -0500 Subject: [PATCH 1/3] Proposal: Donate the MPI-Operator.V2 to kubernetes-sigs Signed-off-by: Carlos Eduardo Arango Gutierrez --- proposals/donate-mpi-operator.md | 39 ++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+) create mode 100644 proposals/donate-mpi-operator.md diff --git a/proposals/donate-mpi-operator.md b/proposals/donate-mpi-operator.md new file mode 100644 index 000000000..113a22098 --- /dev/null +++ b/proposals/donate-mpi-operator.md @@ -0,0 +1,39 @@ +# Donate MPI-Operator.v2 repo to kubernetes-sigs + Donate the kubeflow/mpi-operator to a more generic/neutral place (e.g. K8s-sigs) where this project could be beneficial to more people, not limiting to ML-specific workloads. Especially given that now kubeflow/training-operator gets heavier, people should be given the option to only install MPI Operator for their use cases. + + During kubernetes SIG-APPS (Mar-7 2022) [call]((https://github.com/kubernetes/community/tree/master/sig-apps#meetings)) the topic was proposed to the SIG chairs, and they agreed to sponsor the repo. + +- [Motivation](#motivation) +- [Goals](#goals) +- [Non-Goals](#non-goals) +- [Process](#process) +- [Alternatives Considered](#alternatives-considered) + + + +_Status_ + +* 2022-03-11 - Proposed + +## Motivation +Kubeflow currently is moving to an [Unified operator](https://github.com/kubeflow/training-operator) +The motivation is to encourage non-training users (like HPC) to use and contribute to it, without having to install or learn about kubeflow's training-operator. +With the creation of the project [k8s-sigs/Kueue](https://github.com/kubernetes-sigs/kueue), having the MPi-Operator as a k8s-sigs project will facilitate the efforts to create and maintain a custom job queueing mechasinm for mpi jobs on kubernetes. + +## Goals +* Migrate repo kubeflow/mpi-operatorv2 to kubernetes-sigs/mpi-operator +* The training operator could declare the kubernetes-sigs/mpi-operator as a dependency, leveraging new features like job queueing + +## Non-Goals +* Migrate kubeflow/mpi-operator.v1 + +## Process + +* Donate kubeflow/mpi-operator to kuberenetes-sigs as detailed [here](https://github.com/kubernetes/community/blob/master/github-management/kubernetes-repositories.md#rules-for-donated-repositories), being tracked [here](https://github.com/kubeflow/mpi-operator/issues/459) +* Close https://github.com/kubeflow/training-operator/issues/1479 +* Close https://github.com/kubeflow/mpi-operator/issues/422 +* Re-org the repo to remove dependency from kubeflow/common +* Cut a first release under k8s-sigs + +## Alternatives Considered +Continue to maintain the MPI-Operator as a stand alone project unregarless of the development of the universal operator, for non AI/ML use cases. From 503cbbd2b331ca7175be55195d017c7c6ff57734 Mon Sep 17 00:00:00 2001 From: Carlos Eduardo Arango Gutierrez Date: Mon, 14 Mar 2022 10:34:48 -0400 Subject: [PATCH 2/3] Apply suggestions from code review Co-authored-by: Aldo Culquicondor <1299064+alculquicondor@users.noreply.github.com> --- proposals/donate-mpi-operator.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/proposals/donate-mpi-operator.md b/proposals/donate-mpi-operator.md index 113a22098..efb2a226d 100644 --- a/proposals/donate-mpi-operator.md +++ b/proposals/donate-mpi-operator.md @@ -17,12 +17,12 @@ _Status_ ## Motivation Kubeflow currently is moving to an [Unified operator](https://github.com/kubeflow/training-operator) -The motivation is to encourage non-training users (like HPC) to use and contribute to it, without having to install or learn about kubeflow's training-operator. +The motivation is to encourage non-training users (like HPC) to use and contribute to mpi-operator, without having to install or learn about kubeflow's training-operator. With the creation of the project [k8s-sigs/Kueue](https://github.com/kubernetes-sigs/kueue), having the MPi-Operator as a k8s-sigs project will facilitate the efforts to create and maintain a custom job queueing mechasinm for mpi jobs on kubernetes. ## Goals -* Migrate repo kubeflow/mpi-operatorv2 to kubernetes-sigs/mpi-operator -* The training operator could declare the kubernetes-sigs/mpi-operator as a dependency, leveraging new features like job queueing +* Migrate repo kubeflow/mpi-operator/v2 to kubernetes-sigs/mpi-operator +* Kubeflow declares the kubernetes-sigs/mpi-operator as a dependency ## Non-Goals * Migrate kubeflow/mpi-operator.v1 @@ -32,7 +32,7 @@ With the creation of the project [k8s-sigs/Kueue](https://github.com/kubernetes- * Donate kubeflow/mpi-operator to kuberenetes-sigs as detailed [here](https://github.com/kubernetes/community/blob/master/github-management/kubernetes-repositories.md#rules-for-donated-repositories), being tracked [here](https://github.com/kubeflow/mpi-operator/issues/459) * Close https://github.com/kubeflow/training-operator/issues/1479 * Close https://github.com/kubeflow/mpi-operator/issues/422 -* Re-org the repo to remove dependency from kubeflow/common +* Re-org the repo to remove dependency on kubeflow/common * Cut a first release under k8s-sigs ## Alternatives Considered From e63eb1e9ce379b0a894316bcaae4d37c038095bf Mon Sep 17 00:00:00 2001 From: Carlos Eduardo Arango Gutierrez Date: Mon, 14 Mar 2022 11:53:10 -0400 Subject: [PATCH 3/3] Add https://github.com/kubeflow/community/pull/557#discussion_r825989622 Signed-off-by: Carlos Eduardo Arango Gutierrez --- proposals/donate-mpi-operator.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/donate-mpi-operator.md b/proposals/donate-mpi-operator.md index efb2a226d..0f2a76e7b 100644 --- a/proposals/donate-mpi-operator.md +++ b/proposals/donate-mpi-operator.md @@ -18,11 +18,11 @@ _Status_ ## Motivation Kubeflow currently is moving to an [Unified operator](https://github.com/kubeflow/training-operator) The motivation is to encourage non-training users (like HPC) to use and contribute to mpi-operator, without having to install or learn about kubeflow's training-operator. -With the creation of the project [k8s-sigs/Kueue](https://github.com/kubernetes-sigs/kueue), having the MPi-Operator as a k8s-sigs project will facilitate the efforts to create and maintain a custom job queueing mechasinm for mpi jobs on kubernetes. ## Goals * Migrate repo kubeflow/mpi-operator/v2 to kubernetes-sigs/mpi-operator * Kubeflow declares the kubernetes-sigs/mpi-operator as a dependency +* The Kubeflow Training WG continues to be involved in the mpi-operator development, proposing and implementing changes that align with the needs of AI/ML. ## Non-Goals * Migrate kubeflow/mpi-operator.v1