k8s学习笔记3-搭建k8s metrics server
- 一.介绍
- 二.原理
- 三.部署
-
- yaml文件下载
- 部署
-
- a.镜像问题
- b.500错误
- 四.验证
- 五.参考资料
一.介绍
metrics-servery用途: metric-server主要用于通过aggregate api向其它组件(kube-scheduler、HorizontalPodAutoscaler、Kubernetes集群客户端等。)提供集群pod和node的cpu和memory监控指标,弹性伸缩podautoscaler通过调用这个接口来查看pod目前的资源利用量来进行pod扩缩容。 需要注意的是: 1.metric-server实时指标(实际上是最近收集的数据,保存在内存中)没有数据库存储 2.这些数据指标不是由metric-server它本身是由每个节点收集的,由每个节点收集的cadvisor采集,metric-server只发请求cadvisor并将metric格式数据转换为aggregate api 3.需要通过aggregate api为了提供接口,需要集群kube-apiserver打开此功能(可参考官方社区文档) 关于k8s metrics-server的部署,可以直接看本文章的部署部分,就是将官网中的yaml下载文件,然后修改两两个地方,一个是由于网络问题无法下载的镜像,另一个是报告500个错误问题,这两个地方,修改,然后直接应用,可以ok了。
二.原理
Metrics server定时从Kubelet的Summary API(类似/ap1/v1/nodes/nodename/stats/summary)收集指标信息,这些聚合的数据将存储在内存中metric-api暴露形式。 Metrics server复用了api-server库来实现自己的功能,如权利鉴定、版本等,为了实现内存中的数据存储,删除默认etcd存储存储被引入(即实现)Storage interface)。
由于存储在内存中,监控数据不持久,可以通过第三方存储扩展。 来看下Metrics-Server的架构: 从 Kubelet、cAdvisor 等待获取测量数据,然后通过metrics-server提供给 Dashboard、HPA 使用控制器等。本质上metrics-server相当于做数据转换,把数据转换cadvisor格式数据转换为kubernetes的api的json格式。不难猜测,metrics-server这种代码必然存在于代码中metric在接口中获取所有信息,然后分析数据的过程。我们给metric-server发送请求时,metrics-server定期从中cadvisor获取数据后,请求发送时直接返回缓存中的数据。 数据流通:
三.部署
yaml文件下载
官方地址如下:https://github.com/kubernetes-sigs/metrics-server 最新版本是v0.6.1
wget https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.6.1/components.yaml
部署
直接使用以下命令部署
kubectl apply -f components.yaml
由于镜像在国外网站上无法拉取,我们首先使用它们docker search一下,看docker hub上面有没有镜像,然后逐个测试下载,版本要用v0.6.1
a.镜像问题
1.搜索docker hub上的metrics-server镜像
root@k8s-master1:~# docker search metrics-server NAME 换两个地方,一个是因为网络问题不能下载的镜像,一个是报500错误问题,这两个地方,修改 DESCRIPTION STARS OFFICIAL AUTOMATED mirrorgooglecontainers/metrics-server-amd64 17 bitnami/metrics-server Bitnami Docker Image for Metrics Server 8 [OK] rancher/metrics-server 5 eipwork/metrics-server 2 cytopia/metrics-server-prom A Docker image on which Prometheus can scrap… 2 rancher/metrics-server-amd64 2 willdockerhub/metrics-server sync from k8s.gcr.io/metrics-server/metrics-… 1 kanewinter/metrics-server-prom-aws Added aws-iam-authenticator to cytopia/metri… 1 vivareal/metrics-server-exporter metrics-server exporter 1 radishgz/metrics-server-amd64 gcr.io/google_containers/metrics-server-amd6… 1 [OK] anjia0532/metrics-server-amd64 1 htcfive/metrics-server-amd64 metrics-server-amd64:v0.3.6 0
ibmcom/metrics-server-s390x 0
ibmcom/metrics-server-ppc64le 0
dyrnq/metrics-server k8s.gcr.io/metrics-server/metrics-server 0
f3n9/metrics-server mirror of k8s.gcr.io/metrics-server-amd64:v0… 0
ifnoelse/metrics-server-amd64 metrics-server-amd64 0 [OK]
ghouscht/metrics-server-exporter Export metrics-server metrics to prometheus 0
ibmcom/metrics-server-amd64 0
roywangtj/metrics-server-amd64 metrics-server-amd64 0 [OK]
ibmcom/metrics-server 0
v5cn/metrics-server sync k8s.gcr.io/metrics-server/metrics-serve… 0
yametech/metrics-server-amd64 0
cloudnil/metrics-server-amd64 metrics-server-amd64 0 [OK]
carlziess/metrics-server-amd64-v0.2.1 metrics-server-amd64:v0.2.1 0 [OK]
root@k8s-master1:~#
2.下载到合适的镜像后,先将其tag一下,然后push到之前搭建的harbor网站上去
root@k8s-master1:~# docker tag willdockerhub/metrics-server:v0.6.1 registry.harbor.com/library/metrics-server:v0.6.1
root@k8s-master1:~# docker push registry.harbor.com/library/metrics-server:v0.6.1
3.修改下载的components.yaml的文件
root@k8s-master1:~# sed -i s#k8s.gcr.io/metrics-server/metrics-server:v0.6.1#registry.harbor.com/test/metrics-server:v0.6.1#g components.yaml
4.删除之前部署的metrics-server,并且重新部署
root@k8s-master1:~# kubectl delete -f components.yaml
serviceaccount "metrics-server" deleted
clusterrole.rbac.authorization.k8s.io "system:aggregated-metrics-reader" deleted
clusterrole.rbac.authorization.k8s.io "system:metrics-server" deleted
rolebinding.rbac.authorization.k8s.io "metrics-server-auth-reader" deleted
clusterrolebinding.rbac.authorization.k8s.io "metrics-server:system:auth-delegator" deleted
clusterrolebinding.rbac.authorization.k8s.io "system:metrics-server" deleted
service "metrics-server" deleted
deployment.apps "metrics-server" deleted
apiservice.apiregistration.k8s.io "v1beta1.metrics.k8s.io" deleted
root@k8s-master1:~# kubectl apply -f components.yaml
serviceaccount/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
service/metrics-server created
deployment.apps/metrics-server created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
root@k8s-master1:~#
5.查看metrics-server能否能够正常启动,发现无法正常启动,并且报500错误
root@k8s-master1:~# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-74586cf9b6-476cz 1/1 Running 3 (5h32m ago) 26h
coredns-74586cf9b6-trxrm 1/1 Running 4 (5h32m ago) 26h
etcd-k8s-master1 1/1 Running 3 (5h32m ago) 26h
kube-apiserver-k8s-master1 1/1 Running 3 (5h32m ago) 26h
kube-controller-manager-k8s-master1 1/1 Running 22 (20m ago) 26h
kube-proxy-8gjdf 1/1 Running 4 (5h19m ago) 26hPU% MEMORY(bytes) MEMORY%
master1 272m 3% 4272Mi 29%
node1 384m 5% 9265Mi 30%
node2 421m 5% 14476Mi 48%
kube-proxy-h5c7j 1/1 Running 4 (5h32m ago) 26h
kube-proxy-kwf78 1/1 Running 5 (5h20m ago) 26h
kube-scheduler-k8s-master1 1/1 Running 24 (20m ago) 26h
metrics-server-54f89f765b-t92qm 0/1 Running 0 36s
root@k8s-master1:~# kubectl describe pods -n kube-system metrics-server-54f89f765b-t92qm
........
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 50s default-scheduler Successfully assigned kube-system/metrics-server-54f89f765b-t92qm to k8s-node1
Normal Pulled 48s kubelet Container image "registry.harbor.com/library/metrics-server:v0.6.1" already present on machine
Normal Created 48s kubelet Created container metrics-server
Normal Started 48s kubelet Started container metrics-server
Warning Unhealthy 9s (x2 over 19s) kubelet Readiness probe failed: HTTP probe failed with statuscode: 500
root@k8s-master1:~#
b.500错误
解决方案1:直接忽略证书验证 出现500错误后,先删除components.yaml部署的metrics-server
kubecrl delete -f components.yaml
出现500错误后,修改components.yaml文件,增加参数–kubelet-insecure-tls即可
.....
containers:
- args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
- --kubelet-insecure-tls
image: registry.harbor.com/library/metrics-server:v0.6.1
imagePullPolicy: IfNotPresent
livenessProbe:
......
修改好之后,再次应用
root@k8s-master1:~# kubectl apply -f components.yaml
serviceaccount/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
service/metrics-server created
deployment.apps/metrics-server created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
root@k8s-master1:~#
解决方案2: 在各个节点修改kubelet的配置文件:vim /var/lib/kubelet/config.yaml, 增加serverTLSBootstrap: true
.......
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
volumeStatsAggPeriod: 0s
serverTLSBootstrap: true
重启服务:
systemctl restart kubelet.service
查看认证信息:
root@k8s-master1:~# kubectl get csr
NAME AGE SIGNERNAME REQUESTOR REQUESTEDDURATION CONDITION
csr-2z952 13m kubernetes.io/kubelet-serving system:node:k8s-master1 <none> Pending
csr-56sx7 15m kubernetes.io/kubelet-serving system:node:k8s-node2 <none> Pending
csr-ckts9 14m kubernetes.io/kubelet-serving system:node:k8s-node1 <none> Pending
通过认证: kubectl certificate approve csr-2z952 csr-56sx7 csr-ckts9
root@k8s-master1:~# kubectl certificate approve csr-2z952 csr-56sx7 csr-ckts9
certificatesigningrequest.certificates.k8s.io/csr-2z952 approved
certificatesigningrequest.certificates.k8s.io/csr-56sx7 approved
certificatesigningrequest.certificates.k8s.io/csr-ckts9 approved
root@k8s-master1:~# kubectl get csr
NAME AGE SIGNERNAME REQUESTOR REQUESTEDDURATION CONDITION
csr-2z952 16m kubernetes.io/kubelet-serving system:node:k8s-master1 <none> Approved,Issued
csr-56sx7 19m kubernetes.io/kubelet-serving system:node:k8s-node2 <none> Approved,Issued
csr-ckts9 17m kubernetes.io/kubelet-serving system:node:k8s-node1 <none> Approved,Issued
root@k8s-master1:~#
四.验证
1.查看原生apiserver是否有metrics.k8s.io/v1beta1
root@k8s-master1:~# kubectl api-versions|grep metrics
metrics.k8s.io/v1beta1
2.查看metrics server pod是否运行正常
root@k8s-master1:~# kubectl get pods -n=kube-system |grep metrics
metrics-server-654cffbc87-87g2k 1/1 Running 0 5m36s
3.使用kubectl top 命令查看node和pod的cpu以及内存
root@k8s-master1:~# kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
k8s-master1 55m 5% 1535Mi 81%
k8s-node1 12m 1% 1233Mi 65%
k8s-node2 12m 1% 1288Mi 68%
root@k8s-master1:~# kubectl top pods -n kube-system
NAME CPU(cores) MEMORY(bytes)
coredns-74586cf9b6-476cz 2m 15Mi
coredns-74586cf9b6-trxrm 2m 17Mi
etcd-k8s-master1 17m 71Mi
kube-apiserver-k8s-master1 62m 347Mi
kube-controller-manager-k8s-master1 14m 43Mi
kube-proxy-8gjdf 1m 9Mi
kube-proxy-h5c7j 1m 22Mi
kube-proxy-kwf78 1m 9Mi
kube-scheduler-k8s-master1 4m 15Mi
metrics-server-654cffbc87-87g2k 3m 13Mi
五.参考资料
1.Kubernetes核心指标监控——Metrics Server https://www.cnblogs.com/zhangmingcheng/p/15770672.html 2.关于RBAC https://blog.csdn.net/luanpeng825485697/article/details/88375842
https://blog.csdn.net/qq_35745940/article/details/120693490?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522165871012416782391833879%2522%252C%2522scm%2522%253A%252220140713.130102334…%2522%257D&request_id=165871012416782391833879&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2allsobaiduend~default-1-120693490-null-null.142v33pc_rank_34,185v2control&utm_term=k8s%20rbac&spm=1018.2226.3001.4187 3.关于错误500的解决方案 https://blog.csdn.net/qq_41582883/article/details/114301817