kubelet Orphaned Pod Found – but Volume Paths Are Still Present on Disk problem
问题描述
今天一台kubernetes
计算节点状态显示异常(NotReady
)。首先登陆到计算节点查看Kubelet
和Docker
进行状态,显示都没有问题。
然后去查看系统日志(/var/log/message),发现如下的报错信息:
Dec 31 12:44:16 docker18 kubelet: E1231 12:44:16.634146 707301 kubelet_volumes.go:128] Orphaned pod "356a8df1-0b4e-11e9-8afe-fa163e75de2b" found, but volume paths are still present on disk : There were a total of 1 errors similar to this. Turn up verbosity to see them. Dec 31 12:44:18 docker18 kubelet: E1231 12:44:18.629745 707301 kubelet_volumes.go:128] Orphaned pod "356a8df1-0b4e-11e9-8afe-fa163e75de2b" found, but volume paths are still present on disk : There were a total of 1 errors similar to this. Turn up verbosity to see them.
问题定位
从错误信息可以推测到,这台计算节点存在一个孤儿Pod,并且该Pod挂载了数据卷(volume),阻碍了Kubelet对孤儿Pod正常的回收清理。
注意: 孤儿Pod: 就是裸露的Pod,没有相关的控制器领养的Pod
通过google搜索相关的信息也证实了这一点:
https://github.com/kubernetes/kubernetes/issues/60987
https://github.com/kubernetes/kubernetes/pull/68616
While meet Orphan Pod, kubelet will clean up it and its directorys (cleanupOrphanedPodDirs); But if there are mount path in the directorys, the clean action will be skipped.
解决问题
1.首先通过Pod ID
获取Pod的挂载数据卷的mount信息:
mount -l | grep 356a8df1-0b4e-11e9-8afe-fa163e75de2b ceph01,ceph02,ceph03:/kube/volumes/kubernetes-dynamic-pvc-fad0f75d-f3ab-11e8-ad67-1e1c4625dec0 on /data/kubelet/pods/356a8df1-0b4e-11e9-8afe-fa163e75de2b/volumes/kubernetes.io~cephfs/pvc-fac32543-f3ab-11e8-acec-fa163e75de2b type ceph (rw,relatime,name=kubernetes-dynamic-user-fad0f7ae-f3ab-11e8-ad67-1e1c4625dec0,secret=<hidden>,acl)
2.为了防止数据丢失,umount该挂载点
umount /data/kubelet/pods/356a8df1-0b4e-11e9-8afe-fa163e75de2b/volumes/kubernetes.io~cephfs/pvc-fac32543-f3ab-11e8-acec-fa163e75de2b
3.删除该计算节点Pod的元数据
rm -r /data/kubelet/pods/356a8df1-0b4e-11e9-8afe-fa163e75de2b
4.检查kubernetes计算节点是否正常
kubectl get nodes