ovs-cni是由kubevirt提供的一种k8s cni, 用于将pod接口长在ovs网桥上面,其原理为:创建一对veth添加接口,一端ovs加到网桥的另一端pod内部。 ovs-cni网桥不会自动创建,必须提前创建。 ovs-cni也不会实现跨度host的pod通信必须提前规划好ovs跨host通信方案。
环境介绍
必须安装multus的k8s因为要在环境中使用multus创建的crd资源network-attachment-definitions来定义ovs配置。 k8s环境如下
root@master:~# kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME master Ready master 183d v1.17.3 192.168.122.20 <none> Ubuntu 19.10 5.3.0-62-generic docker://19.3.2 node1 Ready <none> 183d v1.17.3 192.168.122.21 <none> Ubuntu 19.10 5.3.0-62-generic docker://19.3.2 node2 Ready <none> 183d v1.17.3 192.168.122.22 <none> Ubuntu 19.10 5.3.0-62-generic docker://19.3.2 root@master:~# kubectl get pod -n kube-system NAME READY STATUS RESTARTS AGE calico-kube-controllers-5b644bc49c-4vfjx 1/1 Running 2 46d calico-node-5gtw7 1/1 Running 2 46d calico-node-mqt6l 1/1 Running 4 46d calico-node-t4vjh 1/1 Running 2 46d coredns-9d85f5447-4znmx 1/1 Running 4 42d coredns-9d85f5447-fh667 1/1 Running 2 42d etcd-master 1/1 Running 8 183d kube-apiserver-master 1/1 Running 0 27h kube-controller-manager-master 1/1 Running 8 183d kube-multus-ds-amd64-7b4fw 1/1 Running 0 5h13m kube-multus-ds-amd64-dq2s8 1/1 Running 0 5h13m kube-multus-ds-amd64-sqf8g 1/1 Running 0 5h13m kube-proxy-l4wn7 1/1 Running 5 183d kube-proxy-prhcm 1/1 Running 5 183d kube-proxy-psxqt 1/1 Running 8 183d kube-scheduler-master 1/1 Running 8 183d
安装三个节点openvswitch,若要实现跨度host的pod通信,可以host对外通信网卡加到网桥上。
apt install openvswitch-switch/eoan
安装ovs-cni
下载ovs-cni用于安装的源代码ovs-cni的yaml文件
git clone https://github.com/kubevirt/ovs-cni.git cp manifests/ovs-cni.yml.in ./ovs-cni.yaml
修改ovs-cni.yaml以下宏定义文件
#安装到kube-system namespace中 NAMESPACE -> kube-system #ovs-cni-plugin的image路径 ${OVS_CNI_PLUGIN_IMAGE_REPO}/${OVS_CNI_PLUGIN_IMAGE_NAME}:${OVS_CNI_PLUGIN_IMAGE_VERSION} -> quay.io/kubevirt/ovs-cni-plugin #cni binary路径,ovs-cni-plugin起床后,会pod内的ovs binary复制到这条路径 CNI_MOUNT_PATH->/opt/cni/bin #image pull策略,不一定非是Always OVS_CNI_PLUGIN_IMAGE_PULL_POLICY->Always #ovs-cni-marker的image路径 ${OVS_CNI_MARKER_IMAGE_REPO}/${OVS_CNI_MARKER_IMAGE_NAME}:${OVS_CNI_MARKER_IMAGE_VERSION} ->quay.io/kubevirt/ovs-cni-marker
安装
root@master:~/ovs/ovs-cni-master# kubectl apply -f ovs-cni.yaml daemonset.apps/ovs-cni-amd64 created clusterrole.rbac.authorization.k8s.io/ovs-cni-marker-cr created clusterrolebinding.rbac.authorization.k8s.io/ovs-cni-marker-crb created serviceaccount/ovs-cni-marker created
如下,ovs-cni pod已经在三个节点上处于running状态。
root@master:~/ovs/ovs-cni-master# kubectl get pod -n kube-system NAME READY STATUS RESTARTS AGE calico-kube-controllers-5b644bc49c-4vfjx 1/1 Running 2 46d calico-node-5gtw7 1/1 Running 2 46d calico-node-mqt6l 1/1 Running 4 46d calico-node-t4vjh 1/1 Running 2 46d coredns-9d85f5447-4znmx 1/1 Running 4 42d coredns-9d85f5447-fh667 1/1 Running 2 42d etcd-master 1/1 Running 8 183d kube-apiserver-master 1/1 Running 0 28h kube-controller-manager-master 1/1 Running 8 183d kube-multus-ds-amd64-7b4fw 1/1 Running 0 5h26m kube-multus-ds-amd64-dq2s8 1/1 Running 0 5h26m kube-multus-ds-amd64-sqf8g 1/1 Running 0 5h26m kube-proxy-l4wn7 1/1 Running 5 183d kube-proxy-prhcm 1/1 Running 5 183d kube-proxy-psxqt 1/1 Running 8 183
kube-scheduler-master 1/1 Running 8 183d
ovs-cni-amd64-2wjnx 1/1 Running 0 4m53s
ovs-cni-amd64-dp7w5 1/1 Running 0 4m53s
ovs-cni-amd64-l849m 1/1 Running 0 4m53s
从上面的ovs-cni.yaml可知,ovs-cni的pod中配置了两个容器:ovs-cni-plugin和ovs-cni-marker。下面分别介绍这俩容器的作用 a. ovs-cni-plguin是一个initContainers,它的作用是将ovs binary从image中拷贝到host上的/opt/cni/bin目录下,执行完此容器就结束,所以pod处于running后,READY为1/1,只显示一个容器。可查看下pod的describe中和 ovs-cni-plugin相关的状态。
Init Containers:
ovs-cni-plugin:
Container ID: docker://b74f58af95cf2e36be9c34bc168fcf57a51643b4aeaef92dbff7eae1b25951f8
Image: quay.io/kubevirt/ovs-cni-plugin
Image ID: docker-pullable://quay.io/kubevirt/ovs-cni-plugin@sha256:4101c52617efb54a45181548c257a08e3689f634b79b9dfcff42bffd8b25af53
Port: <none>
Host Port: <none>
Command:
cp
/ovs
/host/opt/cni/bin/ovs
State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 16 Aug 2020 15:03:48 +0000
Finished: Sun, 16 Aug 2020 15:03:48 +0000
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/host/opt/cni/bin from cnibin (rw)
/var/run/secrets/kubernetes.io/serviceaccount from ovs-cni-marker-token-mg682 (ro)
b. ovs-cni-marker主要是为了将node上发现的ovs网桥通知k8s,作为k8s的node资源 在三个节点上创建网桥br1
root@master:~# ovs-vsctl add-br br1
root@master:~# ovs-vsctl show
10e5bd4e-be5c-4f68-ba52-59d428e9dbe3
Bridge "br1"
Port "br1"
Interface "br1"
type: internal
ovs_version: "2.12.0"
可以查看到node的capacity和Allocatable多了ovs-cni的资源,其中"1k"表示br1网桥上ovs端口个数(代码写死,没参数可以修改)。
root@master:~/ovs/ovs-cni-master# kubectl describe node master
...
Capacity:
ovs-cni.network.kubevirt.io/br1: 1k
...
Allocatable:
ovs-cni.network.kubevirt.io/br1: 1k
...
查看marker进程,-ovs-socket 指定了ovs db的sock文件,用于获取ovs网桥和接口信息
root@master:~# ps -ef | grep marker | grep -v grep
root 23338 23319 0 15:03 ? 00:00:04 ./marker -v 3 -logtostderr -node-name master -ovs-socket /host/var/run/openvswitch/db.sock
使用ovs-cni
首先创建net-attach-def,可使用如下参数
name (string, required): the name of the network.
type (string, required): "ovs".
bridge (string, required): name of the bridge to use.
vlan (integer, optional): VLAN ID of attached port. Trunk port if not specified.
mtu (integer, optional): MTU.
trunk (optional): List of VLAN ID's and/or ranges of accepted VLAN ID's.
创建一个net-attach-def ovs-conf, cni 类型为ovs,使用网桥为br1,指定vlan id为100
cat <<EOF | kubectl create -f -
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: ovs-conf
annotations:
k8s.v1.cni.cncf.io/resourceName: ovs-cni.network.kubevirt.io/br1
spec:
config: '{
"cniVersion": "0.3.1",
"type": "ovs",
"bridge": "br1",
"vlan": 100
}'
EOF
创建一个pod,annotations指定网络为ovs-conf
cat <<EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
name: test
annotations:
k8s.v1.cni.cncf.io/networks: ovs-conf
spec:
containers:
- name: samplepod
command: ["/bin/sh", "-c", "sleep 99999"]
image: alpine
resources: # this may be omitted if intel/network-resources-injector is present on the cluster
limits:
ovs-cni.network.kubevirt.io/br1: 1
EOF
查看pod的网络接口,lo为默认的loopback接口,tunl0为calico网络自动创建的,eth0为calico创建的默认的pod接口,net1为刚才创建的连接到ovs网桥的接口(接口类型为veth,net1的对端连接到ovs网桥)。
root@master:~# kubectl exec -it test ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
4: eth0@if42: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1440 qdisc noqueue state UP
link/ether 1e:6a:39:93:ba:e8 brd ff:ff:ff:ff:ff:ff
inet 10.24.166.138/32 scope global eth0
valid_lft forever preferred_lft forever
6: net1@if5: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP
link/ether 02:00:00:75:a6:3e brd ff:ff:ff:ff:ff:ff
此pod被调度到node1上,查看网桥br1,可看到有一个veth74ca52ee接口,并且vlan id为100。此接口的对端为pod内部的net1。
root@node1:~# ovs-vsctl show
14b23f58-db07-4f45-acf5-a424a31eabee
Bridge "br1"
Port "veth74ca52ee"
tag: 100
Interface "veth74ca52ee"
Port "br1"
Interface "br1"
type: internal
ovs_version: "2.12.0"
可以在pod的annotations中,指定多次同一个net-attach-def, 或者指定多个不同的net-attach-def来达到添加多个接口的目的。 a. 指定多次同一个net-attach-def ovs-conf
cat <<EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
name: test
annotations:
k8s.v1.cni.cncf.io/networks: ovs-conf,ovs-conf
spec:
containers:
- name: samplepod
command: ["/bin/sh", "-c", "sleep 99999"]
image: alpine
resources: # this may be omitted if intel/network-resources-injector is present on the cluster
limits:
ovs-cni.network.kubevirt.io/br1: 1
EOF
在node1的网桥br1上,可看到添加了两个veth接口,并且vlan都是100
root@node1:~# ovs-vsctl show
14b23f58-db07-4f45-acf5-a424a31eabee
Bridge "br1"
Port "br1"
Interface "br1"
type: internal
Port "veth2fa51154"
tag: 100
Interface "veth2fa51154"
Port "veth99fb8572"
tag: 100
Interface "veth99fb8572"
ovs_version: "2.12.0"
b. 指定多个不同的net-attach-def 首先创建另一个net-attach-def ovs-conf1, 指定vlan id为200
cat <<EOF | kubectl create -f -
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: ovs-conf1
annotations:
k8s.v1.cni.cncf.io/resourceName: ovs-cni.network.kubevirt.io/br1
spec:
config: '{
"cniVersion": "0.3.1",
"type": "ovs",
"bridge": "br1",
"vlan": 200
}'
EOF
创建pod时,同时指定ovs-conf和ovs-conf1
cat <<EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
name: test
annotations:
k8s.v1.cni.cncf.io/networks: ovs-conf,ovs-conf1
spec:
containers:
- name: samplepod
command: ["/bin/sh", "-c", "sleep 99999"]
image: alpine
resources: # this may be omitted if intel/network-resources-injector is present on the cluster
limits:
ovs-cni.network.kubevirt.io/br1: 1
EOF
在node1的网桥br1添加了两个veth接口,并且vlan是不同的,分别为ovs-conf指定的100和ovs-conf1指定的200.
root@node1:~# ovs-vsctl show
14b23f58-db07-4f45-acf5-a424a31eabee
Bridge "br1"
Port "veth1d98bc6f"
tag: 100
Interface "veth1d98bc6f"
Port "br1"
Interface "br1"
type: internal
Port "veth2e3c55ba"
tag: 200
Interface "veth2e3c55ba"
ovs_version: "2.12.0"
参考
https://github.com/kubevirt/ovs-cnihttps://github.com/kubevirt/ovs-cni/blob/master/docs/cni-plugin.mdhttps://github.com/kubevirt/ovs-cni/blob/master/docs/marker.md
也可参考:k8s之ovs-cni - 简书 (jianshu.com)