前言
由于隔一段时间各个云服务商都会搞活动,然后就会剁手入一个,手上已经有 4 个云服务器了。
- 阿里云轻量一台
- 腾讯云 ESC 一台
- 腾讯云轻量两台
然后家里有用 PVE 搞了个虚拟机化,来运行软路由,NAS 之类的家庭服务。由于有高配强迫症,组了台 16 核 32 线程的服务器,导致性能严重过剩,就琢磨着能不能和云服务器组网,来组建一个小集群
最终选定方案是用 zerotier 搭建 VPN 组内网,docker swarm 来组建集群,基于此安装管理面板,以及 https 证书,网关服务日志记录搜索之类的,当然还有服务滚动更新,期间遇到一些坑,记录一下
搭建内网
首先请去 zerotier 创建账号,以及创建一个网络,这里网上教程很多,搜一下就有了。我给个简单的安装以及加入网络的代码。
1 2 3 4
| curl -s https://install.zerotier.com | sudo bash
sudo zerotier-cli join xxx
|
安装 Docker 并配置加速镜像源
可以按照 腾讯云 的文档,来配置,这里就不赘述了
https://cloud.tencent.com/document/product/1207/45596?from=information.detail.腾讯云加速docker
初始化集群管理节点&加入 Worker
初始化集群 Manager
注意把192.168.xxx.xx
替换成你自己 zerotier 后台中的 ip
1
| sudo docker swarm init --advertise-addr=192.168.xxx.xx:2377 --data-path-addr=192.168.xxx.xx --data-path-port 5789
|
可以注意到我指定了--data-path-addr=192.168.xxx.xx --data-path-port 5789
这是因为云服务的网络也是基于 vxlan
, 占用了 docker 默认的 4789 端口,导致如果不指定端口,会导致集群虽然能组建成功,但是 docker 容器之间的网络不通。如加入了同一个 network,node1 中的容器,ping 不通 node2 中的容器,这就失去了组建集群的意义了。
这是需要特别注意,踩了好久最后通过搜索才发现,我一度以为是不是这是厂商为了卖自己的集群服务,禁止了用户自建的可能。来源可以参考
加入 Worker
在其他服务器中运行,加入到集群当中
1 2 3 4 5
| sudo docker swarm join-token worker
sudo docker swarm join --token xxx 192.168.xxx.xx:2377
|
在 manager 节点运行 sudo docker node ls
查看加入的 node 状态
Node 提权降权操作
我将我所有的云服务器都作为流量的出入口节点,家里虚拟机的流量将会通过域名指定的云服务器来对外开放。
我是用的是 traefik 作为网关及容器内的负载均衡, 由于 treafik 需要监听 docker 的 event 事件,节点必须是 manager 才能有权限,所以我将所有的云服务器都提升为 manager
1 2 3 4 5
| sudo docker node promote swarm-node1
sudo docker node demote swarm-node1
|
创建 Swarm 网络
所有需要跨 Node 通信的容器,都需要加入该网络
1 2
| sudo docker network create -d overlay --attachable proxy
|
测试集群容器网络是否互通
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
| sudo docker service create --mode global --network proxy --name web srampal/nginx-netutils:2
sudo docker network inspect proxy "Containers": { "39a532786c2c23a1033f7899afe0973bdac9100191b2077306477129f78eafe4": { "Name": "nginx-netutils.1.atc36jt29aidgbtgqx95hfefu", "EndpointID": "8368996ff2921687ec57ce51412a987c95390b5cb9bd757c6094a74e48ca6640", "MacAddress": "02:42:0a:00:01:68", "IPv4Address": "10.0.1.104/24", "IPv6Address": "" } }
sudo docker exec xxxId ping 10.0.1.104
|
Traefik 网关及负载均衡
由于配置过多,我这里直接贴上我现在的配置+注释,这是 Treafik 的后台面板
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139
| version: '3.4'
services: proxy: image: traefik:v2.4 environment: - TZ=Asia/Shanghai - ALICLOUD_ACCESS_KEY=xxx - ALICLOUD_SECRET_KEY=xxx command: - '--providers.docker.endpoint=unix:///var/run/docker.sock' - '--providers.docker.swarmMode=true' - '--providers.docker.exposedbydefault=false' - '--providers.docker.network=proxy' - '--entrypoints.http.address=:80' - '--entrypoints.https.address=:443' - '--entrypoints.https.http.tls=true' - '--entrypoints.mysql.address=:3306' - '--api' - '--accesslog=true' - '--accesslog.fields.names.StartUTC=drop'
- '--certificatesresolvers.letsencryptresolver.acme.dnschallenge.provider=alidns' - '--certificatesresolvers.letsencryptresolver.acme.email=xxx@gmail.com' - '--certificatesresolvers.letsencryptresolver.acme.storage=/www/config/acme.json' ports: - target: 80 published: 80 protocol: tcp mode: host - target: 443 published: 443 protocol: tcp mode: host - target: 3306 published: 3306 protocol: tcp mode: host volumes: - /var/run/docker.sock:/var/run/docker.sock:ro - letsencrypt-config:/www/config/:ro - /etc/timezone:/etc/timezone:ro - /etc/localtime:/etc/localtime:ro logging: driver: splunk options: splunk-token: xxxxx-xxx-xxxx-xxx-xxxxx splunk-url: http://192.168.xxx.xx:8088/ splunk-format: raw networks: - proxy deploy: mode: global update_config: parallelism: 1 failure_action: rollback restart_policy: condition: on-failure delay: 5s max_attempts: 3 window: 120s placement: constraints: [node.role == manager] labels: - 'traefik.enable=true' - 'traefik.http.routers.traefik.entrypoints=http' - 'traefik.http.routers.traefik.rule=Host(`traefik.xxx.com`)' - 'traefik.http.middlewares.traefik-https-redirect.redirectscheme.scheme=https' - 'traefik.http.routers.traefik.middlewares=traefik-https-redirect' - 'traefik.http.routers.traefik-secure.rule=Host(`traefik.xxx.com`)' - 'traefik.http.routers.traefik-secure.entrypoints=https' - 'traefik.http.routers.traefik-secure.middlewares=authtraefik' - 'traefik.http.routers.traefik-secure.tls=true' - 'traefik.http.routers.traefik-secure.tls.certresolver=letsencryptresolver' - 'traefik.http.routers.traefik-secure.tls.domains[0].main=xxx.com' - 'traefik.http.routers.traefik-secure.tls.domains[0].sans=*.xxx.com' - 'traefik.http.routers.traefik-secure.service=api@internal' - 'traefik.http.services.traefik-secure.loadbalancer.server.port=80' - 'traefik.http.middlewares.authtraefik.basicauth.users=user:&&xxxxx&&xxxx'
networks: proxy: external: true
volumes: letsencrypt-config: driver: vieux/sshfs:latest driver_opts: sshcmd: 'ubuntu@192.168.xxx.xxx:/home/' password: 'xxxx'
|
部署一个服务
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
| version: '3.4'
services: helloworld: image: traefik/whoami networks: - proxy deploy: labels: - 'traefik.enable=true' - 'traefik.http.routers.helloworld.entrypoints=http' - 'traefik.http.routers.helloworld.rule=Host(`helloworld.xxx.top`)' - 'traefik.http.middlewares.helloworld-https-redirect.redirectscheme.scheme=https' - 'traefik.http.routers.helloworld.middlewares=helloworld-https-redirect' - 'traefik.http.routers.helloworld-secure.entrypoints=https' - 'traefik.http.routers.helloworld-secure.rule=Host(`helloworld.xxx.top`)' - 'traefik.http.routers.helloworld-secure.tls=true' - 'traefik.http.routers.helloworld-secure.service=helloworld' - 'traefik.http.services.helloworld.loadbalancer.server.port=80' networks: proxy: external: true
|
滚动更新、回滚、重启策略,及资源限制
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
| appserver image: juzisang/xxx networks: - proxy deploy: replicas: 2 update_config: parallelism: 2 delay: 10s failure_action: rollback resources: limits: cpus: '0.50' memory: 1024M reservations: cpus: '0.25' memory: 512M placement: constraints: - 'node.role == worker' - 'node.labels.role==node1' restart_policy: condition: on-failure delay: 5s max_attempts: 3 window: 120s
|
1 2 3 4
| sudo docker node update --label-add role=node1 swarm-node1
sudo docker node update --label-rm node1 swarm-node1
|
安装 swarmpit 面板
swarmpit 可以用于监控集群状态,操纵节点回滚,升级,已经查看日志等操作
这是我的配置,也是基于官方 docker-compose.yml 基础,加上了 traefik 的配置
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119
| version: '3.3'
services: app: image: swarmpit/swarmpit:latest environment: - TZ=Asia/Shanghai - SWARMPIT_DB=http://db:5984 - SWARMPIT_INFLUXDB=http://influxdb:8086 volumes: - /etc/timezone:/etc/timezone:ro - /etc/localtime:/etc/localtime:ro - /var/run/docker.sock:/var/run/docker.sock:ro healthcheck: test: ['CMD', 'curl', '-f', 'http://localhost:8080'] interval: 60s timeout: 10s retries: 3 networks: - proxy deploy: resources: limits: cpus: '0.50' memory: 1024M reservations: cpus: '0.25' memory: 512M placement: constraints: - node.labels.role==node2 labels: - 'traefik.enable=true' - 'traefik.http.routers.swarmpit.entrypoints=http' - 'traefik.http.routers.swarmpit.rule=Host(`swarmpit.xxx.com`)' - 'traefik.http.middlewares.swarmpit-https-redirect.redirectscheme.scheme=https' - 'traefik.http.routers.swarmpit.middlewares=swarmpit-https-redirect' - 'traefik.http.routers.swarmpit-secure.entrypoints=https' - 'traefik.http.routers.swarmpit-secure.rule=Host(`swarmpit.xxx.com`)' - 'traefik.http.routers.swarmpit-secure.tls=true' - 'traefik.http.routers.swarmpit-secure.service=swarmpit' - 'traefik.http.services.swarmpit.loadbalancer.server.port=8080'
db: image: couchdb:2.3.0 environment: - TZ=Asia/Shanghai volumes: - /etc/timezone:/etc/timezone:ro - /etc/localtime:/etc/localtime:ro - db-data:/opt/couchdb/data networks: - proxy deploy: placement: constraints: - node.labels.role==node2 resources: limits: cpus: '0.30' memory: 256M reservations: cpus: '0.15' memory: 128M
influxdb: image: influxdb:1.7 environment: - TZ=Asia/Shanghai volumes: - /etc/timezone:/etc/timezone:ro - /etc/localtime:/etc/localtime:ro - influx-data:/var/lib/influxdb networks: - proxy deploy: placement: constraints: - node.labels.role==node2 resources: limits: cpus: '0.60' memory: 512M reservations: cpus: '0.30' memory: 128M
agent: image: swarmpit/agent:latest environment: - TZ=Asia/Shanghai - DOCKER_API_VERSION=1.35 volumes: - /etc/timezone:/etc/timezone:ro - /etc/localtime:/etc/localtime:ro - /var/run/docker.sock:/var/run/docker.sock:ro networks: - proxy deploy: mode: global labels: swarmpit.agent: 'true' resources: limits: cpus: '0.10' memory: 64M reservations: cpus: '0.05' memory: 32M
networks: proxy: external: true
volumes: db-data: driver: local influx-data: driver: local
|
安装 splunk
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
| version: '3.4'
services: splunk: image: splunk/splunk:latest networks: - proxy environment: - TZ=Asia/Shanghai - SPLUNK_START_ARGS=--accept-license - SPLUNK_PASSWORD=xxxx volumes: - /etc/timezone:/etc/timezone:ro - /etc/localtime:/etc/localtime:ro - splunk-var:/opt/splunk/var - splunk-etc:/opt/splunk/etc ports: - target: 8088 published: 8088 protocol: tcp mode: host deploy: replicas: 1 restart_policy: condition: on-failure delay: 5s max_attempts: 3 window: 120s placement: constraints: - node.labels.role==node3 labels: - 'traefik.enable=true' - 'traefik.http.routers.splunk.entrypoints=http' - 'traefik.http.routers.splunk.rule=Host(`splunk.xxx.com`)' - 'traefik.http.middlewares.splunk-https-redirect.redirectscheme.scheme=https' - 'traefik.http.routers.splunk.middlewares=splunk-https-redirect' - 'traefik.http.routers.splunk-secure.entrypoints=https' - 'traefik.http.routers.splunk-secure.rule=Host(`splunk.xxx.com`)' - 'traefik.http.routers.splunk-secure.tls=true' - 'traefik.http.routers.splunk-secure.service=splunk' - 'traefik.http.services.splunk.loadbalancer.server.port=8000' networks: proxy: external: true
volumes: splunk-var: driver: local splunk-etc: driver: local
|
1 2 3 4 5 6 7
| logging: driver: splunk options: splunk-token: xxxx-xxxx-xxxx-xxxx-xxxx splunk-url: http://192.168.xxx.xxx:8088/ splunk-format: raw
|
启动
1 2 3
| sudo docker stack deploy -c proxy-compose.yml proxy sudo docker stack deploy -c splunk-compose.yml splunk sudo docker stack deploy -c swarmpit-compose.yml swarmpit
|
补充