Kubernetes 高级调度和资源管理
文章目录
调度过程
把 Pod 放到合适的 Node 上去
- 满足 Pod 资源要求
- 满足 Pod 的特殊关系要求
- 满足 Node 限制条件要求
- 做到集群资源合理利用
基础调度能力
- 资源调度 – 满足 Pod 资源要求
- 资源 request/limit
- CPU 1=1000m
- 内存 1Gi=1024Mi
- 存储
- GPU
- FPGA
- QoS
- Guaranteed 保障(高)
- Burstable 弹性(中)
- BestEffort 尽力而为(低)
- 资源配额
- 资源 request/limit
- 关系调度 – 满足 Pod/Node 特殊关系/条件要求
- Pod 和 Pod 间关系
- PodAffinity
- PodAntiAffinity
- 由 Pod 决定适合自己的 Node
- NodeSelector
- NodeAffinity
- 限制调度到某些 Node
- Taint
- Tolerations
- Pod 和 Pod 间关系
资源调度
apiVersion: v1
kind: Pod
metadata:
namespace: demo-ns
name: demo-pod
spec:
containers:
- image: nginx:latest
name: demo-container
resource:
requests:
cpu: 2
memory: 1Gi
limits:
cpu: 2
memory: 1Gi
Kubernetes 无法手动定义 QoS
Guaranteed
CPU/Mem 必须 request==limit
apiVersion: v1
kind: Pod
metadata:
namespace: demo-ns
name: demo-pod
spec:
containers:
- image: nginx:latest
name: demo-container
resource:
requests:
cpu: 2
memory: 1Gi
limits:
cpu: 2
memory: 1Gi
Burstable
CPU/Mem request 和 limit 不等
apiVersion: v1
kind: Pod
metadata:
namespace: demo-ns
name: demo-pod
spec:
containers:
- image: nginx:latest
name: demo-container
resource:
requests:
cpu: 2
memory: 1Gi
BestEffort
所有资源 request/limit 都不填
apiVersion: v1
kind: Pod
metadata:
namespace: demo-ns
name: demo-pod
spec:
containers:
- image: nginx:latest
name: demo-container
不同的 QoS
- 调度表现不同
- 调度器会使用 request 进行调度
- 底层表现不同
- CPU 按照 request 划分权重
- Mem 按 QoS 划分 OOMScore
- Guaranteed -998
- Burstable 2~999
- BestEffort 1000
- Eviction
- 优先 BestEffort
- Kubelet
资源配额
限制每个 Namespace 资源用量,当配额用超过后会禁止创建
apiVersion: v1
kind: ResourceQuota
metadata:
name: demo-quota
namespace: demo-ns
spec:
hard:
cpu: 1000
memory: 200Gi
pods: 10
scopeSelector:
matchExpressions:
- operator: Exists
scopeName: NotBestEffort
scope:
- Terminating/NotTerminating
- BestEffort/NotBestEffort
- PriorityClass Pod 要配置合理的资源要求
- CPU/Mem/EphemeralStorage/GPU
通过 request 和 limit 来为不同业务特点的 Pod 选择不同的 QoS
- Guaranteed 敏感型、需要保障的业务
- Burstable 次敏感型、需要弹性的业务
- BestEffort 可容忍型业务
- 为每个命名空间配置 ResourceQuota 来防止过量使用,保障其他人的资源可用
亲和调度
Pod – Pod
- Pod 亲和调度 PodAffinity
- 必须和某些 Pod 调度到一起requiredDuringSchedulingIgnoredDuringExecution
- 优先和某些 Pod 调度到一起preferredDuringSchedulingIgnoredDuringExecution
- Pod 反亲和调度 PodAntiAffinity
- 禁止和某系 Pod 调度到一起requiredDuringSchedulingIgnoredDuringExecution
- 优先不和某些 Pod 调度preferredDuringSchedulingIgnoredDuringExecution
- operator
- In
- NotIn
- Exists
- DoesNotExist
apiVersion: v1
kind: Pod
metadata:
namespace: demo-ns
name: demo-pod
spec:
containers:
- image: nginx:latest
name: demo-container
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution
- labelSelector:
matchExpressions:
- key: k1
operator: In
values:
- v1
topologykey: "kubernetes.io/hostname"
apiVersion: v1
kind: Pod
metadata:
namespace: demo-ns
name: demo-pod
spec:
containers:
- image: nginx:latest
name: demo-container
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution
- labelSelector:
matchExpressions:
- key: k1
operator: In
values:
- v1
topologykey: "kubernetes.io/hostname"
Pod – Node
- NodeSelector
- 必须调度到带了某些标签的 Node
- Map[string]string
- NodeAffinity
- 必须调度到某些 Node 上requiredDuringSchedulingIgnoredDuringExecution
- 优先调度到某些 Node 上preferredDuringSchedulingIgnoredDuringExecution
- operator
apiVersion: v1
kind: Pod
metadata:
namespace: demo-ns
name: demo-pod
spec:
containers:
- image: nginx:latest
name: demo-container
nodeSelector:
k1: v1
apiVersion: v1
kind: Pod
metadata:
namespace: demo-ns
name: demo-pod
spec:
containers:
- image: nginx:latest
name: demo-container
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: k1
operator: In
values:
- v1
Node 污点/容忍
- Taint (Node)
- 一个 Node 可以有多个 Taints
- Effect(Taint 的行为)
- NoSchedule 禁止新的 Pod 调度上来
- PreferNoSchedule 尽量不调度到这台
- NoExecute 会驱逐不能容忍的 Pod
- Toleration (Pod)
- 一个 Pod 可以有多个 Tolerations
- Effect 可以为空,匹配所有
- operator
- Exists
- Equal
apiVersion: v1
kind: Node
metadata:
name: demo-node
spec:
taints:
- key: k1
value: v1
effect: NoSchedule
apiVersion: v1
kind: Pod
metadata:
namespace: demo-ns
name: demo-pod
spec:
containers:
- image: nginx:latest
name: demo-container
tolerations:
- key: k1
operator: Equal
value: v1
effect: NoSchedule
Kubernetes 高级调度能力
- 优先级抢占调度
- Priority
- Preemption
优先级调度配置
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high
value: 10000
globalDefault: false
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: low
value: 100
globalDefault: false
优先级:
- 默认优先级DefaultPriorityWhenNoDefaultClassExists=0
- 用户可配置的最大优先级限制HighestUserDefaultPriority=1000000000
- 系统级别优先级SystemCriticalPriority=200000000
- 内置系统级别优先级
- system-cluster-critical
- system-node-critical
优先级调度过程:
- Pod2 和 Pod1 先后进入调度队列,但均未被调度
- 当进行调度时,PriorityQueue 会优先 Pod 优先级更大的 Pod1 出队列镜像调度
- 调度成功后,下一轮调度 Pod2
优先级抢占过程:
- Pod2 先进行调度,调度成功后被分配至 Node1 上运行
- 之后 Pod1 再进行调度,由于 Node1 资源不足出现调度失败,此时进入抢占流程
- 在经过抢占算法计算后,选中 Pod2 为 Pod1 让渡
- 驱逐 Node1 上运行的 Pod2,并将 Pod1 调度至 Node1