概述:
多年来我尝试了很多种部署K8S的方式,例如:Ansible脚本、Kubeadm、ClusterAPI、Shell脚本、Terraform等,但体验都不能满足我的需求,特别是如何维护Kubernetes的生命周期、安全性和稳定性提升方面。
偶然看到了Talos Linux,让我眼前一亮,其专门为Kubernetes设计的Linux系统,首先其可以支持云平台、裸机和虚拟化平台,满足多种环境需求,其次其不需要像ClusterAPI一样需要一个管理集群(VMware的Supervisor Cluster),最后其极其精简,不仅可以减少被攻击的概率,还提升了系统的稳定性和性能。
这让我想起了2008年VMware发布了首个ESXi3.5系统,相比于之前的ESX系统,其剔除了基于Red Hat Linux的服务控制平台,使VMware代理可以直接在VMkernel上运行,系统大小又2GB缩减到150MB,这个改变构建了ESXi的“安全、稳定、高性能”形象基础,也证明了精简架构带来的持续性优势。所以,当我看到Talos Linux后,初步判定这就是我想要的Linux for Kubernetes,我立即进行了测试和验证,今天分享一下整个部署的过程,供大家参考。
Talos Linux介绍:
- 支持云平台、裸机和虚拟化平台
- 所有系统管理均通过API完成。无需SSH、shell或控制台
- 生产就绪:支持世界上一些最大的Kubernetes集群
- Sidero Labs团队开源的项目
相关工具:
Talos Linux Image Factory在线Image构建平台,可自定义系统扩展(vmtools-guest-agent、Nvidia container toolkit、qemu-guest-agent等)。 Talos Docs官方文档,结构清晰,写的很详细,强烈推荐先阅读一遍,再实际操作。
环境需求:
为验证整体架构和部署,整体采用生产架构,如果只是为了体验Talos可以通过Docker运行。
- 一台Linux虚拟机,安装Talosctl、Kubernetes;
- 7台虚拟机/物理机/云平台主机,为了演示生产部署本次我采用7台VMware vSphere虚拟机(3个Master+4个Node);
- 1台vyos路由器,这里利用Cilium的BGP提供LoadBalancer服务和Pod可路由支持,并不是必须的。
- 一个DHCP的网络,用于Talos Linux首次启动获取IP地址,初始化完成后不再需要。
注意:如果环境无法联网或网络受限,请参考Talos Linux的官方离线部署文档(https://www.talos.dev/v1.7/advanced/air-gapped/)
整体架构介绍
本手册采用我个人常用的组件,您根据自己企业的规范进行调整,包含以下内容:
- 采用Cilium作为CNI,替代kube-proxy,公告LoadBalancer IP/Pod IP,提供更丰富的网络访问控制策略和可观测性;
- 采用外部NFS作为CSI存储;
- 采用Ingress-Nginx作为7层代理;
- 采用kube-prometheus-stack作为监控平台;
- 采用Loki作为日志平台;
- 所有组件通过helm部署,通过--set配置参数;
构建Talos Linux Image
为模拟裸机环境,我们需要在线构建满足自己需求的Talos Linux系统ISO。 打开Talos Linux Image Factory网站,选择“Bare-metal Machine”类型 选择Talos版本 选择处理器架构 选择所需的系统扩展,由于我运行在VMware平台,我选择了vmtoolsd-guest-agent、btrfs和drbd 下载最终的ISO文件,并将其上传到vSphere共享存储中
准备虚拟机
虚拟机准备非常简单,操作系统选择“其他5.x或更高版本的Linux(64位)”,CPU、内存和硬盘根据生产需要选择,光驱加载刚下载的“metal-amd64.iso” 打开虚拟机电源,并通过虚拟机控制台记录每台虚拟机从DHCP获取的IP地址,下一步的初始化Talos Linux配置时使用;
规划IP地址
主机名 | IP地址 | 备注 |
---|---|---|
N/A | 10.40.45.10 | Master VIP |
talos-cp01 | 10.40.45.11 | Master |
talos-cp02 | 10.40.45.11 | Master |
talos-cp03 | 10.40.45.11 | Master |
talos-w01 | 10.40.45.21 | Worker |
talos-w02 | 10.40.45.22 | Worker |
talos-w03 | 10.40.45.23 | Worker |
talos-w04 | 10.40.45.24 | Worker |
控制虚拟机配置
控制虚拟机需要安装“talosctl、kubernetes和helm”等工具,由于都是Go语言开发的二进制程序,不再详细介绍安装方法;
准备Machine配置
通过talosctl生成集群配置,此处必须指定Master的VIP地址,支持使用外部负载均衡或内置的VIP功能(二层),此处使用内置的VIP功能。 talos-cluster01:集群名称; 10.40.45.10:Master VIP地址;
$talosctl gen config talos-cluster01 https://10.40.45.10:6443
generating PKI and tokens
created /home/liguoqiang/talos-cluster01/controlplane.yaml
created /home/liguoqiang/talos-cluster01/worker.yaml
created /Home/liguoqiang/talos-cluster01/talosconfig
准备Talos Linux节点的path文件
上一步生成的controlplane.yaml和worker.yaml是最基础的配置,为了满足以下功能需求,这里对配置文件进行定制。
- 静态IP地址
- 静态Hosts记录
- 使用VIP
- 滚动证书更新
- 使用独立部署的CNI(此处使用Cilium)
- 打开ETCD的监控
当然,您可以不使用Cilium,直接通过默认的CNI,这样仅需部分配置。
下面是示例的talos-cp01.path文件,您需要为3个Master都准备path文件。
machine:
network:
hostname: talos-cp01
interfaces:
- interface: eth0
addresses:
- 10.40.45.11/24
routes:
- network: 0.0.0.0/0
gateway: 10.40.45.1
dhcp: false
vip:
ip: 10.40.45.10
nameservers:
- 10.40.45.2
extraHostEntries:
- ip: 10.40.45.11
aliases:
- talos-cp01
- ip: 10.40.45.12
aliases:
- talos-cp02
- ip: 10.40.45.13
aliases:
- talos-cp03
- ip: 10.40.45.21
aliases:
- talos-w01
- ip: 10.40.45.22
aliases:
- talos-w02
- ip: 10.40.45.23
aliases:
- talos-w03
- ip: 10.40.45.24
aliases:
- talos-w04
time:
servers:
- 10.40.45.2
kubelet:
extraArgs:
rotate-server-certificates: true
cluster:
network:
cni:
name: none
proxy:
disabled: true
extraManifests:
- https://raw.githubusercontent.com/alex1989hu/kubelet-serving-cert-approver/main/deploy/standalone-install.yaml
- https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
etcd:
extraArgs:
listen-metrics-urls: http://0.0.0.0:2381
下面是示例的talos-w01.path文件,您需要为4个Worker都准备path文件。
machine:
network:
hostname: talos-w01
interfaces:
- interface: eth0
addresses:
- 10.40.45.21/24
routes:
- network: 0.0.0.0/0
gateway: 10.40.45.1
dhcp: false
nameservers:
- 10.40.45.2
extraHostEntries:
- ip: 10.40.45.11
aliases:
- talos-cp01
- ip: 10.40.45.12
aliases:
- talos-cp02
- ip: 10.40.45.13
aliases:
- talos-cp03
- ip: 10.40.45.21
aliases:
- talos-w01
- ip: 10.40.45.22
aliases:
- talos-w02
- ip: 10.40.45.23
aliases:
- talos-w03
- ip: 10.40.45.24
aliases:
- talos-w04
time:
servers:
- 10.40.45.2
kubelet:
extraArgs:
rotate-server-certificates: true
基于path生成节点配置文件
使用“talosctl machineconfig path”命令生成每个节点的配置文件,每个节点都有自己的静态IP地址。
talosctl machineconfig patch controlplane.yaml --patch @talos-cp01.patch --output talos-cp01.yaml
talosctl machineconfig patch controlplane.yaml --patch @talos-cp02.patch --output talos-cp02.yaml
talosctl machineconfig patch controlplane.yaml --patch @talos-cp03.patch --output talos-cp03.yaml
talosctl machineconfig patch worker.yaml --patch @talos-w01.patch --output talos-w01.yaml
talosctl machineconfig patch worker.yaml --patch @talos-w02.patch --output talos-w02.yaml
talosctl machineconfig patch worker.yaml --patch @talos-w03.patch --output talos-w03.yaml
talosctl machineconfig patch worker.yaml --patch @talos-w04.patch --output talos-w04.yaml
应用配置文件到每个节点
根据每个节点DHCP获取的IP地址,使用“talosctl apply-config”命令应用配置到节点,此时节点将进行初始化,包括安装系统到硬盘中和配置静态地址等。
talosctl apply-config --insecure --nodes 10.40.45.103 --file talos-cp01.yaml
talosctl apply-config --insecure --nodes 10.40.45.104 --file talos-cp02.yaml
talosctl apply-config --insecure --nodes 10.40.45.105 --file talos-cp03.yaml
talosctl apply-config --insecure --nodes 10.40.45.106 --file talos-w01.yaml
talosctl apply-config --insecure --nodes 10.40.45.107 --file talos-w02.yaml
talosctl apply-config --insecure --nodes 10.40.45.108 --file talos-w03.yaml
talosctl apply-config --insecure --nodes 10.40.45.109 --file talos-w04.yaml
启动Kubernetes
当节点初始化完成后,我们就可以启动Kubernetes集群。
talosctl --talosconfig=./talosconfig --nodes 10.40.45.11 -e 10.40.45.11 bootstrap
通过talosctl查看节点dashboard
talosctl --talosconfig=./talosconfig config endpoint 10.40.45.11 10.40.45.12 10.40.45.13
talosctl config add talos-cluster01 --talosconfig ./talosconfig
export TALOSCONFIG=$(pwd)/talosconfig
talosctl dashboard -n 10.40.45.11
通过talosctl查看etcd
查看etcd成员
$ talosctl etcd members -n 10.40.45.10
NODE ID HOSTNAME PEER URLS CLIENT URLS LEARNER
10.40.45.10 5def1d04742456ae talos-cp01 https://10.40.45.11:2380 https://10.40.45.11:2379 false
10.40.45.10 93715bd7d5e9571e talos-cp03 https://10.40.45.13:2380 https://10.40.45.13:2379 false
10.40.45.10 a99d303c4da1378d talos-cp02 https://10.40.45.12:2380 https://10.40.45.12:2379 false
查看etcd日志
$ talosctl logs etcd -n 10.40.45.11
10.40.45.11: 426881}
10.40.45.11: {"level":"info","ts":"2024-08-18T14:59:55.159855Z","caller":"mvcc/kvstore_compaction.go:68","msg":"finished scheduled compaction","compact-revision":5426881,"took":"21.92027ms","hash":3003988622,"current-db-size-bytes":30232576,"current-db-size":"30 MB","current-db-size-in-use-bytes":8912896,"current-db-size-in-use":"8.9 MB"}
10.40.45.11: {"level":"info","ts":"2024-08-18T14:59:55.159917Z","caller":"mvcc/hash.go:137","msg":"storing new hash","hash":3003988622,"revision":5426881,"compact-revision":5425854}
10.40.45.11: {"level":"info","ts":"2024-08-18T15:04:55.148871Z","caller":"mvcc/index.go:214","msg":"compact tree index","revision":5427912}
talosctl查看服务
$ talosctl service -n 10.40.45.11
NODE SERVICE STATE HEALTH LAST CHANGE LAST EVENT
10.40.45.11 apid Running OK 574h24m12s ago Health check successful
10.40.45.11 containerd Running OK 574h24m20s ago Health check successful
10.40.45.11 cri Running OK 574h24m15s ago Health check successful
10.40.45.11 dashboard Running ? 574h24m19s ago Process Process(["/sbin/dashboard"]) started with PID 2730
10.40.45.11 etcd Running OK 189h17m59s ago Health check successful
10.40.45.11 kubelet Running OK 574h23m55s ago Health check successful
10.40.45.11 machined Running OK 574h24m27s ago Health check successful
10.40.45.11 syslogd Running OK 574h24m26s ago Health check successful
10.40.45.11 trustd Running OK 574h24m10s ago Health check successful
10.40.45.11 udevd Running OK 574h24m21s ago Health check successful
获取Kubeconfig
集群启动完成后,我们就可以获取kubeconfig,并通过kubectl管理kubernetes集群
talosctl --talosconfig=./talosconfig kubeconfig -e 10.40.45.10 -n 10.40.45.10
kubectl get nodes
准备相关Namespace
我的环境使用NFS作为容器存储、使用kube-prometheus-stack作为监控平台、使用ingress-nginx作为入口,并使用Cilium做为CNI,并通过BGP与VYOS进行路由公告,所以,我需要做下面的配置。
请根据自己的环境进行规划,此处只是示例参考。
kubectl create ns monitoring
kubectl label ns monitoring pod-security.kubernetes.io/enforce=privileged
kubectl create ns nfs-storage
kubectl label ns nfs-storage pod-security.kubernetes.io/enforce=privileged
kubectl create ns ingress-nginx
kubectl label ns ingress-nginx pod-security.kubernetes.io/enforce=privileged
for node in talos-w01 talos-w02 talos-w03 talos-w04; do kubectl label nodes $node node-role.kubernetes.io/worker=worker; done
部署nfs-provisioner
helm upgrade --install nfs-subdir-external-provisioner nfs-subdir-external-provisioner/nfs-subdir-external-provisioner
--repo https://kubernetes-sigs.github.io/nfs-subdir-external-provisioner \
-n nfs-storage \
--set nfs.server=10.40.45.3 \
--set nfs.path=/volume1/k8s-nfs \
--set storageClass.name=corp-nfs-storage \
--set storageClass.archiveOnDelete=false \
--set storageClass.accessModes=ReadWriteMany \
--set replicaCount=3
部署Kube-Prometheus-Stack
helm upgrade --install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
--repo https://prometheus-community.github.io/helm-charts \
-n monitoring --create-namespace \
--set prometheus.enabled=true \
--set prometheus.prometheusSpec.retention=7d \
--set prometheus.service.type=ClusterIP \
--set prometheus.prometheusSpec.resources.requests.cpu=200m \
--set prometheus.prometheusSpec.resources.limits.cpu=500m \
--set prometheus.prometheusSpec.resources.requests.memory=512Mi \
--set prometheus.prometheusSpec.resources.limits.memory=1Gi \
--set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.storageClassName=corp-nfs-storage \
--set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=30Gi \
--set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false \
--set prometheus.prometheusSpec.podMonitorSelectorNilUsesHelmValues=false \
--set prometheus.prometheusSpec.probeSelectorNilUsesHelmValues=false \
--set prometheus.prometheusSpec.ruleSelectorNilUsesHelmValues=false \
--set prometheus.ingress.enabled=true \
--set prometheus.ingress.hosts='{prometheus.talos.corp.local}' \
--set prometheus.ingress.paths='{/}' \
--set prometheus.ingress.pathType=Prefix \
--set prometheus.ingress.ingressClassName=nginx \
--set alertmanager.enabled=true \
--set alertmanager.service.type=ClusterIP \
--set alertmanager.alertmanagerSpec.resources.requests.cpu=100m \
--set alertmanager.alertmanagerSpec.resources.limits.cpu=300m \
--set alertmanager.alertmanagerSpec.resources.requests.memory=256Mi \
--set alertmanager.alertmanagerSpec.resources.limits.memory=512Mi \
--set alertmanager.alertmanagerSpec.storage.volumeClaimTemplate.spec.storageClassName=corp-nfs-storage \
--set alertmanager.alertmanagerSpec.storage.volumeClaimTemplate.spec.resources.requests.storage=5Gi \
--set alertmanager.ingress.enabled=true \
--set alertmanager.ingress.hosts='{alertmanager.talos.corp.local}' \
--set alertmanager.ingress.paths='{/}' \
--set alertmanager.ingress.pathType=Prefix \
--set alertmanager.ingress.ingressClassName=nginx \
--set grafana.enabled=true \
--set grafana.adminPassword=VMware1! \
--set grafana.ingress.enabled=true \
--set grafana.ingress.hosts='{grafana.talos.corp.local}' \
--set grafana.ingress.paths='{/}' \
--set grafana.ingress.pathType=Prefix \
--set grafana.ingress.ingressClassName=nginx \
--set grafana.persistence.type=pvc \
--set grafana.persistence.enabled=true \
--set grafana.persistence.storageClassName=corp-nfs-storage \
--set grafana.persistence.size=2Gi \
--set kubeEtcd.enabled=true \
--set kubeEtcd.endpoints[0]=10.40.45.11 \
--set kubeEtcd.endpoints[1]=10.40.45.12 \
--set kubeEtcd.endpoints[2]=10.40.45.13 \
--set kubeEtcd.serviceMonitor.enabled=true
部署Cilium
helm upgrade --install \
--repo https://helm.cilium.io \
cilium \
cilium/cilium \
--version 1.16.0 \
--namespace kube-system \
--set ipam.mode=kubernetes \
--set kubeProxyReplacement=true \
--set securityContext.capabilities.ciliumAgent="{CHOWN,KILL,NET_ADMIN,NET_RAW,IPC_LOCK,SYS_ADMIN,SYS_RESOURCE,DAC_OVERRIDE,FOWNER,SETGID,SETUID}" \
--set securityContext.capabilities.cleanCiliumState="{NET_ADMIN,SYS_ADMIN,SYS_RESOURCE}" \
--set cgroup.autoMount.enabled=false \
--set cgroup.hostRoot=/sys/fs/cgroup \
--set k8sServiceHost=localhost \
--set k8sServicePort=7445 \
--set egressGateway.enabled=true \
--set hubble.ui.enabled=true \
--set bgpControlPlane.enabled=true \
--set hubble.enabled=true \
--set hubble.prometheus.enabled=true \
--set hubble.metrics.enableOpenMetrics=true \
--set hubble.metrics.enabled="{dns,drop,tcp,flow,port-distribution,icmp,httpV2:exemplars=true;labelsContext=source_ip\,source_namespace\,source_workload\,destination_ip\,destination_namespace\,destination_workload\,traffic_direction}" \
--set hubble.metrics.serviceMonitor.enabled=true \
--set hubble.metrics.dashboards.enabled=true \
--set hubble.metrics.dashboards.namespace=monitoring \
--set hubble.metrics.dashboards.annotations.grafana_folder=Hubble \
--set hubble.relay.enabled=true \
--set hubble.relay.rollOutPods=true \
--set hubble.relay.prometheus.enabled=true \
--set hubble.relay.prometheus.serviceMonitor.enabled=true \
--set prometheus.enabled=true \
--set prometheus.serviceMonitor.enabled=true \
--set prometheus.serviceMonitor.trustCRDsExist=true \
--set operator.prometheus.enabled=true \
--set operator.prometheus.serviceMonitor.enabled=true \
--set operator.dashboards.enabled=true \
--set operator.dashboards.namespace=monitoring \
--set operator.rollOutPods=true \
--set ipv6.enabled=false \
--set bandwidthManager.enabled=true \
--set rollOutCiliumPods=true \
--set monitor.enabled=true \
--set bpf.masquerade=true \
--set routingMode=native \
--set autoDirectNodeRoutes=true \
--set ipv4NativeRoutingCIDR="10.40.50.0/24" \
--set installNoConntrackIptablesRules=true \
--set socketLB.enabled=true \
--set loadBalancer.algorithm=maglev \
--set loadBalancer.mode=dsr \
--set dashboards.enabled=true \
--set dashboards.namespace=monitoring
部署Ingress-nginx
helm upgrade --install ingress-nginx ingress-nginx \
--repo https://kubernetes.github.io/ingress-nginx \
--namespace ingress-nginx --create-namespace \
--set controller.ingressClassResource.default=true \
--set controller.service.type="LoadBalancer" \
--set controller.kind=DaemonSet \
--set controller.metrics.enabled=true \
--set-string controller.podAnnotations."prometheus\.io/scrape"="true" \
--set-string controller.podAnnotations."prometheus\.io/port"="10254" \
--set controller.metrics.serviceMonitor.enabled=true \
--set controller.metrics.serviceMonitor.additionalLabels.release="kube-prometheus-stack"
配置VYOS的BGP
此处提供示例的VYOS的BGP配置,其用于Cilium的BGP公告,当前配置接收Cilim公告的LoadBalaner路由,并进行路由地址聚合。
set protocols bgp system-as '65100'
set policy prefix-list talos-k8s-lb description 'talos-k8s-loadbalancer-cidr'
set policy prefix-list talos-k8s-lb rule 1 action 'permit'
set policy prefix-list talos-k8s-lb rule 1 ge '24'
set policy prefix-list talos-k8s-lb rule 1 prefix '10.50.0.0/24'
set policy route-map talos-k8s-ipv4-in rule 1 action 'permit'
set policy route-map talos-k8s-ipv4-in rule 1 match ip address prefix-list 'talos-k8s-lb'
set policy route-map talos-k8s-ipv4-out rule 1 action 'deny'
set policy route-map talos-k8s-ipv4-out rule 1 match ip address prefix-list 'any'
set protocols bgp address-family ipv4-unicast aggregate-address 10.50.0.0/24 as-set
set protocols bgp address-family ipv4-unicast aggregate-address 10.50.0.0/24 summary-only
set protocols bgp listen range 10.40.45.0/24 peer-group 'talos-k8s-peers'
set protocols bgp peer-group talos-k8s-peers address-family ipv4-unicast allowas-in
set protocols bgp peer-group talos-k8s-peers address-family ipv4-unicast route-map export 'talos-k8s-ipv4-out'
set protocols bgp peer-group talos-k8s-peers address-family ipv4-unicast route-map import 'talos-k8s-ipv4-in'
set protocols bgp peer-group talos-k8s-peers bfd
set protocols bgp peer-group talos-k8s-peers ebgp-multihop '2'
set protocols bgp peer-group talos-k8s-peers remote-as '65320'
set protocols bgp peer-group talos-k8s-peers update-source '10.40.45.1'
配置Cilium BGP
如需要了解Cilium的BGP配置,请参考官方文档,下面只是当前环境的示例配置。
cilium-bgp-cluster-config.yaml
apiVersion: cilium.io/v2alpha1
kind: CiliumBGPClusterConfig
metadata:
name: cilium-bgp
spec:
nodeSelector:
matchLabels:
node-role.kubernetes.io/worker: worker
bgpInstances:
- name: "instance-65320"
localASN: 65320
peers:
- name: "peer-65100"
peerASN: 65100
peerAddress: 10.40.45.1
peerConfigRef:
name: "cilium-peer"
cilium-bgp-peer-config.yaml
apiVersion: cilium.io/v2alpha1
kind: CiliumBGPPeerConfig
metadata:
name: cilium-peer
spec:
timers:
holdTimeSeconds: 9
keepAliveTimeSeconds: 3
gracefulRestart:
enabled: true
restartTimeSeconds: 15
families:
- afi: ipv4
safi: unicast
advertisements:
matchLabels:
advertise: "bgp"
cilium-bgp-advertisement.yaml
apiVersion: cilium.io/v2alpha1
kind: CiliumBGPAdvertisement
metadata:
name: bgp-advertisements
labels:
advertise: bgp
spec:
advertisements:
- advertisementType: "Service"
service:
addresses:
- LoadBalancerIP
selector: {}
cilium-lb-ip-pool.yaml
apiVersion: "cilium.io/v2alpha1"
kind: CiliumLoadBalancerIPPool
metadata:
name: "talos-ip-pool"
spec:
blocks:
- cidr: "10.50.0.0/24"
kubectl apply -f cilium-lb-ip-pool.yaml
kubectl apply -f cilium-bgp-cluster-config.yaml
kubectl apply -f cilium-bgp-peer-config.yaml
kubectl apply -f cilium-bgp-advertisement.yaml
$ kubectl get svc -n ingress-nginx
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ingress-nginx-controller LoadBalancer 10.111.62.245 10.50.0.0 80:30568/TCP,443:32139/TCP 23d
vyos@Router-Internal:~$ sh ip bgp
BGP table version is 50455, local router ID is 10.59.254.100, vrf id 0
Default local pref 100, local AS 65100
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
s= 10.50.0.0/32 10.40.45.23 0 65320 i
s= 10.40.45.21 0 65320 i
s= 10.40.45.24 0 65320 i
s> 10.40.45.22 0 65320 i
vyos@Router-Internal:~$ sh bgp summary
IPv4 Unicast Summary (VRF default):
BGP router identifier 10.59.254.100, local AS number 65100 vrf-id 0
BGP table version 50455
RIB entries 120, using 11 KiB of memory
Peers 15, using 302 KiB of memory
Peer groups 2, using 128 bytes of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
*10.40.45.21 4 65320 29386 29385 50455 0 0 1d00h29m 1 0 N/A
*10.40.45.22 4 65320 29387 29386 50455 0 0 1d00h29m 1 0 N/A
*10.40.45.23 4 65320 29386 29385 50455 0 0 1d00h29m 1 0 N/A
*10.40.45.24 4 65320 29387 29385 50455 0 0 1d00h29m 1 0 N/A
监控平台
ETCD监控面板 Node监控面板 Cilium监控面板 Ingress-Nginx监控面板
部署vmware-tools
talos-vmtoolsd 应用程序作为集群创建的一部分作为守护进程集进行部署;但是,我们现在必须提供一个 talos 凭证文件供其使用。
talosctl --talosconfig talosconfig -n 10.40.45.10 config new vmtoolsd-secret.yaml --roles os:admin
kubectl -n kube-system create secret generic talos-vmtoolsd-config \
--from-file=talosconfig=./vmtoolsd-secret.yaml
rm vmtoolsd-secret.yaml
部署vmtoolsd
kubectl apply -f https://raw.githubusercontent.com/siderolabs/talos-vmtoolsd/master/deploy/latest.yaml
部署完成后,可以通过vCenter中看到虚拟机的ip地址等信息。
Rest环境
当我们需要彻底清理环境时,可以使用如下命令重置节点到默认ISO启动;
"--graceful"参数通过用于最后一个Master节点,强制reset。
talosctl --nodes 10.40.45.13 -e 10.40.45.13 reset --graceful=false
Talos Linux升级
升级Talos非常简单,仅需通过"talosctl upgrade"对节点逐台升级。
注意:强烈建议逐台升级,确认升级完成后,再升级下一台;
$ talosctl upgrade --nodes 10.40.45.11 --image ghcr.io/siderolabs/installer:v1.7.6
watching nodes: [10.40.45.11]
* 10.40.45.11: post check passed
#wait updgrade finished
talosctl upgrade --nodes 10.40.45.12 --image ghcr.io/siderolabs/installer:v1.7.6
#wait updgrade finished
talosctl upgrade --nodes 10.40.45.13 --image ghcr.io/siderolabs/installer:v1.7.6
#wait updgrade finished
talosctl upgrade --nodes 10.40.45.21 --image ghcr.io/siderolabs/installer:v1.7.6
#wait updgrade finished
talosctl upgrade --nodes 10.40.45.22 --image ghcr.io/siderolabs/installer:v1.7.6
#wait updgrade finished
talosctl upgrade --nodes 10.40.45.23 --image ghcr.io/siderolabs/installer:v1.7.6
#wait updgrade finished
talosctl upgrade --nodes 10.40.45.24 --image ghcr.io/siderolabs/installer:v1.7.6
Kubernetes版本升级
通过添加“--dry-run”参数,可以查看要升级的内容
$ talosctl --nodes 10.40.45.11 upgrade-k8s --to 1.30.4
automatically detected the lowest Kubernetes version 1.30.1
discovered controlplane nodes ["10.40.45.11" "10.40.45.12" "10.40.45.13"]
discovered worker nodes ["10.40.45.21" "10.40.45.22" "10.40.45.23" "10.40.45.24"]
> "10.40.45.11": pre-pulling registry.k8s.io/kube-apiserver:v1.30.4
> "10.40.45.12": pre-pulling registry.k8s.io/kube-apiserver:v1.30.4
> "10.40.45.13": pre-pulling registry.k8s.io/kube-apiserver:v1.30.4
> "10.40.45.11": pre-pulling registry.k8s.io/kube-controller-manager:v1.30.4
> "10.40.45.12": pre-pulling registry.k8s.io/kube-controller-manager:v1.30.4
> "10.40.45.13": pre-pulling registry.k8s.io/kube-controller-manager:v1.30.4
> "10.40.45.11": pre-pulling registry.k8s.io/kube-scheduler:v1.30.4
> "10.40.45.12": pre-pulling registry.k8s.io/kube-scheduler:v1.30.4
> "10.40.45.13": pre-pulling registry.k8s.io/kube-scheduler:v1.30.4
> "10.40.45.11": pre-pulling ghcr.io/siderolabs/kubelet:v1.30.4
> "10.40.45.12": pre-pulling ghcr.io/siderolabs/kubelet:v1.30.4
> "10.40.45.13": pre-pulling ghcr.io/siderolabs/kubelet:v1.30.4
> "10.40.45.21": pre-pulling ghcr.io/siderolabs/kubelet:v1.30.4
> "10.40.45.22": pre-pulling ghcr.io/siderolabs/kubelet:v1.30.4
> "10.40.45.23": pre-pulling ghcr.io/siderolabs/kubelet:v1.30.4
> "10.40.45.24": pre-pulling ghcr.io/siderolabs/kubelet:v1.30.4
updating "kube-apiserver" to version "1.30.4"
> "10.40.45.11": starting update
> update kube-apiserver: v1.30.1 -> 1.30.4
> "10.40.45.11": machine configuration patched
> "10.40.45.11": waiting for kube-apiserver pod update
> "10.40.45.11": kube-apiserver: waiting, config version mismatch: got "1", expected "2"
< "10.40.45.11": successfully updated
> "10.40.45.12": starting update
> update kube-apiserver: v1.30.1 -> 1.30.4
> "10.40.45.12": machine configuration patched
> "10.40.45.12": waiting for kube-apiserver pod update
> "10.40.45.12": kube-apiserver: waiting, config version mismatch: got "1", expected "2"
< "10.40.45.12": successfully updated
> "10.40.45.13": starting update
> update kube-apiserver: v1.30.1 -> 1.30.4
> "10.40.45.13": machine configuration patched
> "10.40.45.13": waiting for kube-apiserver pod update
> "10.40.45.13": kube-apiserver: waiting, config version mismatch: got "1", expected "2"
< "10.40.45.13": successfully updated
updating "kube-controller-manager" to version "1.30.4"
> "10.40.45.11": starting update
> update kube-controller-manager: v1.30.1 -> 1.30.4
> "10.40.45.11": machine configuration patched
> "10.40.45.11": waiting for kube-controller-manager pod update
> "10.40.45.11": kube-controller-manager: waiting, config version mismatch: got "1", expected "2"
> "10.40.45.11": kube-controller-manager: pod is not ready, waiting
> "10.40.45.11": kube-controller-manager: pod is not ready, waiting
< "10.40.45.11": successfully updated
> "10.40.45.12": starting update
> update kube-controller-manager: v1.30.1 -> 1.30.4
> "10.40.45.12": machine configuration patched
> "10.40.45.12": waiting for kube-controller-manager pod update
> "10.40.45.12": kube-controller-manager: waiting, config version mismatch: got "1", expected "2"
> "10.40.45.12": kube-controller-manager: pod is not ready, waiting
> "10.40.45.12": kube-controller-manager: pod is not ready, waiting
< "10.40.45.12": successfully updated
> "10.40.45.13": starting update
> update kube-controller-manager: v1.30.1 -> 1.30.4
> "10.40.45.13": machine configuration patched
> "10.40.45.13": waiting for kube-controller-manager pod update
> "10.40.45.13": kube-controller-manager: waiting, config version mismatch: got "1", expected "2"
> "10.40.45.13": kube-controller-manager: pod is not ready, waiting
< "10.40.45.13": successfully updated
updating "kube-scheduler" to version "1.30.4"
> "10.40.45.11": starting update
> update kube-scheduler: v1.30.1 -> 1.30.4
> "10.40.45.11": machine configuration patched
> "10.40.45.11": waiting for kube-scheduler pod update
> "10.40.45.11": kube-scheduler: waiting, config version mismatch: got "1", expected "2"
> "10.40.45.11": kube-scheduler: pod is not ready, waiting
< "10.40.45.11": successfully updated
> "10.40.45.12": starting update
> update kube-scheduler: v1.30.1 -> 1.30.4
> "10.40.45.12": machine configuration patched
< "10.40.45.12": successfully updated
> "10.40.45.13": starting update
> update kube-scheduler: v1.30.1 -> 1.30.4
> "10.40.45.13": machine configuration patched
> "10.40.45.13": waiting for kube-scheduler pod update
> "10.40.45.13": kube-scheduler: waiting, config version mismatch: got "1", expected "2"
> "10.40.45.13": kube-scheduler: pod is not ready, waiting
< "10.40.45.13": successfully updated
updating kube-proxy to version "1.30.4"
> "10.40.45.11": starting update
> "10.40.45.12": starting update
> "10.40.45.13": starting update
updating kubelet to version "1.30.4"
> "10.40.45.11": starting update
> update kubelet: 1.30.1 -> 1.30.4
> "10.40.45.11": machine configuration patched
> "10.40.45.11": waiting for kubelet restart
> "10.40.45.11": waiting for node update
< "10.40.45.11": successfully updated
> "10.40.45.12": starting update
> update kubelet: 1.30.1 -> 1.30.4
> "10.40.45.12": machine configuration patched
> "10.40.45.12": waiting for kubelet restart
> "10.40.45.12": waiting for node update
< "10.40.45.12": successfully updated
> "10.40.45.13": starting update
> update kubelet: 1.30.1 -> 1.30.4
> "10.40.45.13": machine configuration patched
> "10.40.45.13": waiting for kubelet restart
> "10.40.45.13": waiting for node update
< "10.40.45.13": successfully updated
> "10.40.45.21": starting update
> update kubelet: 1.30.1 -> 1.30.4
> "10.40.45.21": machine configuration patched
> "10.40.45.21": waiting for kubelet restart
> "10.40.45.21": waiting for node update
< "10.40.45.21": successfully updated
> "10.40.45.22": starting update
> update kubelet: 1.30.1 -> 1.30.4
> "10.40.45.22": machine configuration patched
> "10.40.45.22": waiting for kubelet restart
> "10.40.45.22": waiting for node update
< "10.40.45.22": successfully updated
> "10.40.45.23": starting update
> update kubelet: 1.30.1 -> 1.30.4
> "10.40.45.23": machine configuration patched
> "10.40.45.23": waiting for kubelet restart
> "10.40.45.23": waiting for node update
< "10.40.45.23": successfully updated
> "10.40.45.24": starting update
> update kubelet: 1.30.1 -> 1.30.4
> "10.40.45.24": machine configuration patched
> "10.40.45.24": waiting for kubelet restart
> "10.40.45.24": waiting for node update
< "10.40.45.24": successfully updated
完成
至此已完成了基于Talos Linux搭建Kubernetes的示例,大家可以直观感受到精简带来的好处,Talos Linux的SquashFS映像只有80M,整个系统只有12个二进制文件,不仅可以降低系统的资源消耗,还可以带来更高的安全性。