ArgoCD High Availability Solution¶
ArgoCD mainly includes the following components:
-
argocd-repo-server
The argocd-repo-server is responsible for cloning Git repositories and keeping them up to date. Repositories are cloned to /tmp (or a path specified by the TMPDIR environment variable). If the Pod has too many repositories or if the repositories contain a large number of files, the Pod may run out of disk space. To avoid this problem, please install a persistent volume.
-
argocd-controller
It is used to fetch the generated Manifests and requests the Kubernetes API server to get the actual state of the argocd-application-controller from the argocd-repo-server cluster. Each controller replica uses two separate queues to handle the coordination (milliseconds) and synchronization (seconds) of applications. Manifest generation usually takes the most time during coordination, and if manifest generation takes too long, application coordination will fail and result in an error. As a solution, you can increase the deployment value --repo-server-timeout-seconds and consider expanding the scale of the argocd-repo-server deployment. If the controller manages too many clusters, it will use too much memory, and you can shard the clusters across multiple controller replicas. To enable sharding, increase the number of replicas in the argocd-application-controller StatefulSet environment variable and duplicate the number of replicas ARGOCD_CONTROLLER_REPLICAS. By default, the controller updates cluster information every 10 seconds. If there are issues with your cluster network that cause longer update times, you can try modifying the environment variable ARGO_CD_UPDATE_CLUSTER_INFO_TIMEOUT to increase the timeout (in seconds).
-
argocd-server
This is stateless and is least likely to cause issues. To ensure there is no downtime during upgrades, consider increasing the number of replicas to 3 or more, and replicate this number in the environment variable ARGOCD_API_SERVER_REPLICAS. The environment variable ARGOCD_GRPC_MAX_SIZE_MB allows you to specify the maximum size of server response messages (in megabytes). The default value is 200. For an ArgoCD instance managing over 3000 applications, you might need to increase this value.
-
argocd-dex-server
Using argocd-dex-server in-memory database, data will be inconsistent across two or more instances.
Installation¶
Below is the content of a values.yaml file for an HA installation mode, based on official documentation modified environment variables (just a necessary example, actual installation is recommended to install persistent volumes).
argo-cd:
crds:
install: false # Set to false if argo-cd was installed previously
keep: false
configs:
params:
reposerver.parallelism.limit: 4
controller.status.processors: 20 # Set as needed
controller.operation.processors: 10 # Set as needed
controller.repo.server.timeout.seconds: 60 # If managing many repositories, consider increasing this value
timeout.reconciliation: 180s # Polling time
repoServer:
replicas: 2
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 5
env:
- name: ARGOCD_GIT_ATTEMPTS_COUNT
value: "3"
- name: ARGOCD_EXEC_TIMEOUT
value: "2m30s"
extraArgs:
- --repo-cache-expiration=1h
controller:
replicas: 2 # Consider increasing deployment size if managing many repositories
env:
- name: WORKQUEUE_BUCKET_SIZE
value: "500"
- name: ARGOCD_RECONCILIATION_JITTER
value: "60"
- name: ARGO_CD_UPDATE_CLUSTER_INFO_TIMEOUT
value: "60" # In seconds
- name: ARGOCD_CLUSTER_CACHE_LIST_PAGE_SIZE
value: "500"
- name: ARGOCD_CLUSTER_CACHE_LIST_PAGE_BUFFER_SIZE
value: "5"
server:
autoscaling:
enabled: true
minReplicas: 2 # Should not be 1
maxReplicas: 5
env:
- name: ARGOCD_API_SERVER_REPLICAS
value: "2" # Recommended to be consistent with minReplicas, not less than minReplicas
- name: ARGOCD_GRPC_MAX_SIZE_MB
value: "200"
dex: # Argo's own in-memory storage component, HA mode will cause data inconsistency, can set replica count to 1
enabled: true
redis-ha:
enabled: true
Field explanations are as follows:
Field | Recommended Value | Description |
---|---|---|
argo-cd.configs.params.reposerver.parallelism.limit | 10 | Configures the number of manifests the management tool can operate on simultaneously; insufficient memory or system threads can cause git repository fetch failures |
argo-cd.configs.params.controller.status.processors | 20 | Configures the controller's queue for coordinating apps (processing time in milliseconds) default length is 20 (recommended length per 1000 apps is 50) |
argo-cd.configs.params.controller.operation.processors | 10 | Configures the controller's queue for syncing apps (processing time in seconds) default length is 10 (recommended length per 1000 apps is 25) |
argo-cd.configs.params.controller.repo.server.timeout.seconds | 60 | Configures the timeout to prevent queue overflow during manifest generation |
argo-cd.configs.params.timeout.reconciliation | 180s | Configures the polling cycle for the controller's git repository |
argo-cd.reposerver.env[0].name: ARGOCD_GIT_ATTEMPTS_COUNT | 3 | Configures the number of retries for failed git repository requests |
argo-cd.reposerver.env[1].name: ARGOCD_EXEC_TIMEOUT | 2m30s | Configures the execution timeout for the reposerver processing git or helm repositories |
argo-cd.reposerver.extraArgs[0]: --repo-cache-expiration=1h | 1h | Configures the cache expiration time for the reposerver |
argo-cd.controller.env[0].name: WORKQUEUE_BUCKET_SIZE | 500 | Configures the queue length for the controller handling concurrent events |
argo-cd.controller.env[1].name: ARGOCD_RECONCILIATION_JITTER | 60 | Configures the jitter time to prevent spikes in storage server components when an app sync times out, in seconds |
argo-cd.controller.env[2].name: ARGO_CD_UPDATE_CLUSTER_INFO_TIMEOUT | 60 | Configures the interval for the controller updating cluster information (increase this variable when cluster network issues cause long update times or when cluster updates are infrequent) |
argo-cd.controller.env[3].name: ARGOCD_CLUSTER_CACHE_LIST_PAGE_SIZE | 500 | Configures the page size for the controller retrieving cluster resources (ARGOCD_CLUSTER_CACHE_LIST_PAGE_SIZE*ARGOCD_CLUSTER_CACHE_LIST_PAGE_BUFFER_SIZE should be greater than the estimated maximum resource count in the cluster) |
argo-cd.controller.env[4].name: ARGOCD_CLUSTER_CACHE_LIST_PAGE_BUFFER_SIZE | 5 | Can increase the buffer size appropriately to prevent controller memory overflow errors (in MB) |
argo-cd.server.env[1].name: ARGOCD_GRPC_MAX_SIZE_MB | 200 | In MB, allows the maximum size of server response messages, set to a larger value if managing many projects (set to 200+ if 3000 projects) |