q90hZW 发表于 2025-2-28 00:47:15

使用 kubeadm 创建高可用 Kubernetes 及外部 etcd 集群

博客链接:使用 kubeadm 创建高可用 Kubernetes 及外部 etcd 集群
前言

Kubernetes 的官方中文文档内容全面,表达清晰,有大量示例和解析
无论任何情况下都推荐先花几个小时通读官方文档,来了解配置过程中的可选项,以及可能会遇到哪些问题
本文基于官方文档中 入门 - 生产环境 一章来整理部署流程
Kubernetes 文档 | Kubernetes
架构


[*]OS: Debian 12
[*]CGroup Driver: systemd
[*]Container Runtime: containerd
[*]CNI: Calico
[*]Kubernetes: v1.32.0
注意
所有节点服务器都需要关闭 swap


[*]Other

[*]说明

[*]该服务器运行 K8S 外部应用,包括 Nginx、Nexus 等
[*]该服务器运行的所有业务通过 docker-compose 管理
[*]与 K8S 自身配置相关的步骤说明中的“所有节点”不包括该服务器

[*]Server

[*]vCPU: 2
[*]Memory: 4G

[*]Network: 192.168.1.100 2E:7E:86:3A:A5:20
[*]Port:

[*]8443/tcp: 向集群提供 Kubernetes APIServer 负载均衡


[*]Etcd

[*]Server

[*]vCPU: 1
[*]Memory: 1G

[*]Network

[*]Etcd-01: 192.168.1.101 2E:7E:86:3A:A5:21
[*]Etcd-02: 192.168.1.102 2E:7E:86:3A:A5:22
[*]Etcd-03: 192.168.1.103 2E:7E:86:3A:A5:23

[*]Port:

[*]2379/tcp: etcd HTTP API
[*]2380/tcp: etcd peer 通信


[*]Master

[*]Server

[*]vCPU: 4
[*]Memory: 8G

[*]Network

[*]Master-01: 192.168.1.104 2E:7E:86:3A:A5:24
[*]Master-02: 192.168.1.105 2E:7E:86:3A:A5:25
[*]Master-03: 192.168.1.106 2E:7E:86:3A:A5:26

[*]Port:

[*]179/tcp: Calico BGP
[*]6443/tcp: Kubernetes APIServer
[*]10250/tcp: kubelet API


[*]Node

[*]Server

[*]vCPU: 4
[*]Memory: 8G

[*]Network

[*]Node-01: 192.168.1.107 2E:7E:86:3A:A5:27
[*]Node-02: 192.168.1.108 2E:7E:86:3A:A5:28
[*]Node-03: 192.168.1.109 2E:7E:86:3A:A5:29

[*]Port:

[*]179/tcp: Calico BGP
[*]10250/tcp: kubelet API


配置基础环境

说明
所有节点
apt updateapt upgradeapt install curl apt-transport-https ca-certificates gnupg2 software-properties-common vimcurl -fsSL https://mirrors.ustc.edu.cn/kubernetes/core:/stable:/v1.32/deb/Release.key | gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpgchmod 644 /etc/apt/keyrings/kubernetes-apt-keyring.gpgecho "deb https://mirrors.ustc.edu.cn/kubernetes/core:/stable:/v1.32/deb/ /" | tee /etc/apt/sources.list.d/kubernetes.listcurl -fsSL https://mirrors.ustc.edu.cn/docker-ce/linux/debian/gpg -o /etc/apt/keyrings/docker.ascchmod a+r /etc/apt/keyrings/docker.ascecho "deb https://mirrors.ustc.edu.cn/docker-ce/linux/debian bookworm stable" | tee /etc/apt/sources.list.d/docker.listapt updateapt install containerd.iomkdir -p /etc/containerdcontainerd config default | tee /etc/containerd/config.tomlsystemctl restart containerdapt install kubelet kubeadm kubectlapt-mark hold kubelet kubeadm kubectl开启 ipv4 转发
编辑 /etc/sysctl.conf,找到下方配置并取消注释
net.ipv4.ip_forward=1执行 sysctl -p 应用配置
创建 crictl 配置
cat << EOF > /etc/crictl.yamlruntime-endpoint: unix:///run/containerd/containerd.sockimage-endpoint: unix:///run/containerd/containerd.socktimeout: 10debug: falseEOF如果需要通过代理服务器访问容器仓库,需要为 containerd 配置代理服务
mkdir -p /etc/systemd/system/containerd.service.dcat << EOF > /etc/systemd/system/containerd.service.d/http-proxy.confEnvironment="HTTP_PROXY=http://username:password@proxy-server-ip:port"Environment="HTTPS_PROXY=http://username:password@proxy-server-ip:port"Environment="NO_PROXY=localhost,127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16"EOFsystemctl daemon-reloadsystemctl restart containerd.service已知问题

使用 systemd 作为 CGroup Driver 且使用 containerd 作为 CRI 运行时
需要修改 /etc/containerd/config.toml,添加如下配置
相关文章:配置 systemd cgroup 驱动 | Kubernetes
...    SystemdCgroup = true执行 systemctl restart containerd
或参照另一篇文章的解决方案
相关文章:Why does etcd fail with Debian/bullseye kernel? - General Discussions - Discuss Kubernetes
cat /etc/default/grub# Source:# GRUB_CMDLINE_LINUX_DEFAULT="quiet"# Modify:GRUB_CMDLINE_LINUX_DEFAULT="quiet systemd.unified_cgroup_hierarchy=0"执行 update-grub 并重启
配置 etcd 节点

将 kubelet 配置为 etcd 的服务管理器

说明
所有 etcd 节点
apt updateapt install etcd-clientmkdir -p /etc/systemd/system/kubelet.service.dcat << EOF > /etc/systemd/system/kubelet.service.d/kubelet.confapiVersion: kubelet.config.k8s.io/v1beta1kind: KubeletConfigurationauthentication:anonymous:    enabled: falsewebhook:    enabled: falseauthorization:mode: AlwaysAllowcgroupDriver: systemdaddress: 127.0.0.1containerRuntimeEndpoint: unix:///var/run/containerd/containerd.sockstaticPodPath: /etc/kubernetes/manifestsEOFcat << EOF > /etc/systemd/system/kubelet.service.d/20-etcd-service-manager.confEnvironment="KUBELET_CONFIG_ARGS=--config=/etc/systemd/system/kubelet.service.d/kubelet.conf"ExecStart=ExecStart=/usr/bin/kubelet $KUBELET_CONFIG_ARGSRestart=alwaysEOFsystemctl daemon-reloadsystemctl restart kubelet为 kubeadm 创建配置文件

说明
Etcd-01 节点,由该节点向其他节点分发证书及配置
该节点同时作为 CA
生成 CA
kubeadm init phase certs etcd-ca生成如下文件

[*]/etc/kubernetes/pki/etcd/ca.crt
[*]/etc/kubernetes/pki/etcd/ca.key
为方便接下来的步骤操作,先将 etcd 节点信息导出为环境变量
export HOST0=192.168.1.101export HOST1=192.168.1.102export HOST2=192.168.1.103export NAME0="etcd-01"export NAME1="etcd-02"export NAME2="etcd-03"为 etcd 成员生成 kubeadm 配置
HOSTS=(${HOST0} ${HOST1} ${HOST2})NAMES=(${NAME0} ${NAME1} ${NAME2})for i in "${!HOSTS[@]}"; doHOST=${HOSTS[$i]}NAME=${NAMES[$i]}mkdir -p /tmp/${HOST}cat << EOF > /tmp/${HOST}/kubeadmcfg.yaml---apiVersion: "kubeadm.k8s.io/v1beta4"kind: InitConfigurationnodeRegistration:    name: ${NAME}localAPIEndpoint:    advertiseAddress: ${HOST}---apiVersion: "kubeadm.k8s.io/v1beta4"kind: ClusterConfigurationetcd:    local:      serverCertSANs:      - "${HOST}"      peerCertSANs:      - "${HOST}"      extraArgs:      - name: initial-cluster          value: ${NAMES}=https://${HOSTS}:2380,${NAMES}=https://${HOSTS}:2380,${NAMES}=https://${HOSTS}:2380      - name: initial-cluster-state          value: new      - name: name          value: ${NAME}      - name: listen-peer-urls          value: https://${HOST}:2380      - name: listen-client-urls          value: https://${HOST}:2379      - name: advertise-client-urls          value: https://${HOST}:2379      - name: initial-advertise-peer-urls          value: https://${HOST}:2380EOFdone为每个成员创建证书
kubeadm init phase certs etcd-server --config=/tmp/${HOST2}/kubeadmcfg.yamlkubeadm init phase certs etcd-peer --config=/tmp/${HOST2}/kubeadmcfg.yamlkubeadm init phase certs etcd-healthcheck-client --config=/tmp/${HOST2}/kubeadmcfg.yamlkubeadm init phase certs apiserver-etcd-client --config=/tmp/${HOST2}/kubeadmcfg.yamlcp -R /etc/kubernetes/pki /tmp/${HOST2}/# Clear useless certfind /etc/kubernetes/pki -not -name ca.crt -not -name ca.key -type f -deletekubeadm init phase certs etcd-server --config=/tmp/${HOST1}/kubeadmcfg.yamlkubeadm init phase certs etcd-peer --config=/tmp/${HOST1}/kubeadmcfg.yamlkubeadm init phase certs etcd-healthcheck-client --config=/tmp/${HOST1}/kubeadmcfg.yamlkubeadm init phase certs apiserver-etcd-client --config=/tmp/${HOST1}/kubeadmcfg.yamlcp -R /etc/kubernetes/pki /tmp/${HOST1}/find /etc/kubernetes/pki -not -name ca.crt -not -name ca.key -type f -deletekubeadm init phase certs etcd-server --config=/tmp/${HOST0}/kubeadmcfg.yamlkubeadm init phase certs etcd-peer --config=/tmp/${HOST0}/kubeadmcfg.yamlkubeadm init phase certs etcd-healthcheck-client --config=/tmp/${HOST0}/kubeadmcfg.yamlkubeadm init phase certs apiserver-etcd-client --config=/tmp/${HOST0}/kubeadmcfg.yaml# Clear ca key from memberfind /tmp/${HOST2} -name ca.key -type f -deletefind /tmp/${HOST1} -name ca.key -type f -delete将证书移动到对应的成员服务器
scp -r /tmp/${HOST2}/pki root@${HOST2}:/etc/kubernetes/scp /tmp/${HOST2}/kubeadmcfg.yaml root@${HOST2}:~/scp -r /tmp/${HOST1}/pki root@${HOST1}:/etc/kubernetes/scp /tmp/${HOST1}/kubeadmcfg.yaml root@${HOST1}:~/mv /tmp/${HOST0}/kubeadmcfg.yaml ~/rm -rf /tmp/${HOST2}rm -rf /tmp/${HOST1}rm -rf /tmp/${HOST0}此时在三台 etcd 节点中的文件结构均应如下
/root└── kubeadmcfg.yaml---/etc/kubernetes/pki├── apiserver-etcd-client.crt├── apiserver-etcd-client.key└── etcd    ├── ca.crt    ├── ca.key # 仅 CA 节点既 etcd-01    ├── healthcheck-client.crt    ├── healthcheck-client.key    ├── peer.crt    ├── peer.key    ├── server.crt    └── server.key创建静态 Pod 清单

说明
所有 etcd 节点
kubeadm init phase etcd local --config=/root/kubeadmcfg.yaml检查集群运行情况

将 ${HOST0} 替换为想要检查的节点 ip
ETCDCTL_API=3 etcdctl \--cert /etc/kubernetes/pki/etcd/peer.crt \--key /etc/kubernetes/pki/etcd/peer.key \--cacert /etc/kubernetes/pki/etcd/ca.crt \--endpoints https://${HOST0}:2379 endpoint health使用 kubeadm 创建高可用集群

说明

配置过程中需要完全重置控制平面节点的配置时,需要有至少一台能够访问集群的节点,在该节点上按如下流程操作
kubectl delete pods,nodes,namespaces,deployments,services --all --all-namespaces --forcekubectl delete -f tigera-operator.yaml --forcekubectl delete -f custom-resources.yaml --forcekubeadm reset --cleanup-tmp-dir -frm -rf /etc/cni/net.d/*rm -rf ~/.kubesystemctl restart kubelet containerd为 kube-apiserver 创建负载均衡

说明
本文中负载均衡使用 Nginx
Nginx 配置
http {    ...}stream {    upstream apiserver {      server 192.168.1.104:6443 weight=5 max_fails=3 fail_timeout=30s; # Master-01      server 192.168.1.105:6443 weight=5 max_fails=3 fail_timeout=30s; # Master-02      server 192.168.1.106:6443 weight=5 max_fails=3 fail_timeout=30s; # Master-03    }    server {      listen 8443;      proxy_pass apiserver;    }}为控制平面节点配置外部 etcd 节点

说明
任一 etcd 节点与主控制平面节点,本文中为 Etcd-01 与 Master-01
从集群中任一 etcd 节点复制到主控制平面节点
scp /etc/kubernetes/pki/etcd/ca.crt /etc/kubernetes/pki/apiserver-etcd-client.crt /etc/kubernetes/pki/apiserver-etcd-client.key root@192.168.1.104:~在主控制平面节点中将文件移动到指定位置
mkdir -p /etc/kubernetes/pki/etcdmv ~/ca.crt /etc/kubernetes/pki/etcd/mv ~/apiserver-etcd-client.crt /etc/kubernetes/pki/mv ~/apiserver-etcd-client.key /etc/kubernetes/pki/创建 kubeadm-config.yaml,内容如下

[*]controlPlaneEndpoint: 负载均衡服务器
[*]etcd

[*]external

[*]endpoints: etcd 节点列表


[*]networking

[*]podSubnet: pod ip cidr

---apiVersion: kubeadm.k8s.io/v1beta4kind: ClusterConfigurationkubernetesVersion: v1.32.0controlPlaneEndpoint: 192.168.1.100:8443etcd:external:    endpoints:      - https://192.168.1.101:2379      - https://192.168.1.102:2379      - https://192.168.1.103:2379    caFile: /etc/kubernetes/pki/etcd/ca.crt    certFile: /etc/kubernetes/pki/apiserver-etcd-client.crt    keyFile: /etc/kubernetes/pki/apiserver-etcd-client.keynetworking:dnsDomain: cluster.localpodSubnet: 10.244.0.0/24serviceSubnet: 10.96.0.0/16初始化主控制平面

说明
主控制平面节点,本文中为 Master-01


[*]--upload-certs: 将控制平面间的共享证书上传到 kubeadm-certs Secret

[*]kubeadm-certs Secret 和解密密钥将在两小时后失效
[*]如果要重新上传证书并生成新的解密密钥,需要在已加入集群的控制平面节点上执行 kubeadm init phase upload-certs --upload-certs

kubeadm init --config kubeadm-config.yaml --upload-certs等待运行完成后应输出类似如下内容
...Your Kubernetes control-plane has initialized successfully!To start using your cluster, you need to run the following as a regular user:mkdir -p $HOME/.kubesudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/configsudo chown $(id -u):$(id -g) $HOME/.kube/configAlternatively, if you are the root user, you can run:export KUBECONFIG=/etc/kubernetes/admin.confYou should now deploy a pod network to the cluster.Run "kubectl apply -f .yaml" with one of the options listed at:https://kubernetes.io/docs/concepts/cluster-administration/addons/You can now join any number of control-plane nodes running the following command on each as root:kubeadm join 192.168.1.100:8443 --token 7r34LU.iLiRgu2qHdAeeanS --discovery-token-ca-cert-hash sha256:9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08 --control-plane --certificate-key 03d66dd08835c1ca3f128cceacd1f31ac94163096b20f445ae84285bc0832d72Please note that the certificate-key gives access to cluster sensitive data, keep it secret!As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use"kubeadm init phase upload-certs --upload-certs" to reload certs afterward.Then you can join any number of worker nodes by running the following on each as root:kubeadm join 192.168.1.100:8443 --token 7r34LU.iLiRgu2qHdAeeanS --discovery-token-ca-cert-hash sha256:9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08先将控制台输出的以上内容保存,稍后将使用这些命令来将其他控制平面节点和工作节点加入集群
根据输出的提示,复制 kubeconfig 用于 kubectl
mkdir -p ~/.kubecp /etc/kubernetes/admin.conf ~/.kube/config应用 CNI 插件
由于该清单过大,kubectl apply 会产生如下报错,使用 kubectl create 或 kubectl replace
注意
确认 custom-resources.yaml 中 calicoNetwork 配置的 ip cidr 与集群 podSubnet 配置一致
# kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/refs/heads/release-v3.29/manifests/tigera-operator.yaml# The CustomResourceDefinition "installations.operator.tigera.io" is invalid: metadata.annotations: Too long: may not be more than 262144 byteswget https://raw.githubusercontent.com/projectcalico/calico/refs/heads/release-v3.29/manifests/tigera-operator.yamlwget https://raw.githubusercontent.com/projectcalico/calico/refs/heads/release-v3.29/manifests/custom-resources.yamlkubectl create -f tigera-operator.yamlkubectl create -f custom-resources.yaml输入以下内容查看控制平面组件 pod 启动状态
kubectl get pod -A初始化其他控制平面

说明
除主控制平面节点外的其他控制平面节点,本文中为 Master-02 Master-03
使用 kubeadm join 命令加入集群的节点会将 KubeConfig 同步到 /etc/kubernetes/admin.conf
依照上面输出的命令,分别在其他控制平面节点中执行

[*]--control-plane: 通知 kubeadm join 创建一个新控制平面
[*]--certificate-key xxx: 从集群 kubeadm-certs Secret 下载控制平面证书并使用给定的密钥解密
kubeadm join 192.168.1.100:8443 --token 7r34LU.iLiRgu2qHdAeeanS --discovery-token-ca-cert-hash sha256:9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08 --control-plane --certificate-key 03d66dd08835c1ca3f128cceacd1f31ac94163096b20f445ae84285bc0832d72根据输出的提示,复制 kubeconfig 用于 kubectl
mkdir -p ~/.kubecp /etc/kubernetes/admin.conf ~/.kube/config初始化负载节点

说明
所有负载节点
使用 kubeadm join 命令加入集群的节点会将 KubeConfig 同步到 /etc/kubernetes/kubelet.conf
依照上面输出的命令,分别在负载节点中执行
kubeadm join 192.168.1.100:8443 --token 7r34LU.iLiRgu2qHdAeeanS --discovery-token-ca-cert-hash sha256:9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08
页: [1]
查看完整版本: 使用 kubeadm 创建高可用 Kubernetes 及外部 etcd 集群