Limit GPU resource usage for a namespace

延伸以下兩篇的內容:

User和Group的資料來源為Directory Service,權限設定的部分交由Openshift(ResourceQuota, ClusterRole, RoleBinding),主要參考這一篇文章:Kubernetes: Limit GPU resource usage for a namespace

目標

  • gpu-project-1
    • Assign 1 gpu resource quota
  • gpu-project-2
    • Assign 3 gpu resource quota
  • user1 in Group bu-1
    • Can use 1 gpu in namespace gpu-project-1
    • Can NOT use 2 gpus in namespace gpu-project-1
    • Can NOT use the gpu in namespace gpu-project-2

建立namespace和ResourceQuota

1
2
3
4
5
oc new-project gpu-project-1
oc new-project gpu-project-2
oc apply -f gpu-projects-quota.yaml
oc describe quota -n gpu-project-1
oc describe quota -n gpu-project-2

gpu-projects-quota.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
apiVersion: v1
kind: ResourceQuota
metadata:
  name: gpu-project-1-quota
  namespace: gpu-project-1
spec:
  hard:
    requests.nvidia.com/gpu: 1
---
apiVersion: v1
kind: ResourceQuota
metadata:
  name: gpu-project-2-quota
  namespace: gpu-project-2
spec:
  hard:
    requests.nvidia.com/gpu: 3

建立ClusterRole

1
oc apply -f gpu-projects-clusterrole.yaml

gpu-projects-clusterrole.yaml

1
2
3
4
5
6
7
8
9
10
11
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: gpu-project-deployment-and-pod-manager
rules:
- apiGroups: ["apps"]
  resources: ["deployments"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "watch", "list", "create", "update", "patch", "delete"]

創建RoleBinding

1
oc apply -f gpu-projects-rolebinding.yaml

gpu-projects-rolebinding.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: pod-manager-bu1-binding
  namespace: gpu-project-1
subjects:
- kind: Group
  name: bu-1
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: gpu-project-deployment-and-pod-manager
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: pod-manager-bu2-binding
  namespace: gpu-project-2
subjects:
- kind: Group
  name: bu-2
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: gpu-project-deployment-and-pod-manager
  apiGroup: rbac.authorization.k8s.io

驗證

使用oc auth

1
2
3
4
5
6
7
8
oc auth can-i  create pods --as jdoe --as-group bu-1  -n gpu-project-1
# yes
oc auth can-i  create pods --as jdoe --as-group bu-1  -n gpu-project-2
# no
oc auth can-i  create pods --as william --as-group bu-2  -n gpu-project-1
# no
oc auth can-i  create pods --as william --as-group bu-2  -n gpu-project-2
# yes

啟動container