Configuring SecureAuth Pods Autoscaling to Match Increased/Decreased Demand
Learn how to configure autoscaling of base SecureAuth pods and worker pods.
Autoscaling in a Nutshell
Kubernetes workload resources (such as the SecureAuth platform deployment) can be automatically scaled to match demand based on custom metrics, multiple metrics, or metric APIs. For example, increased load may result in additional pods being deployed. On the other hand, if the load decreases, the workload resources is scaled back down.
SecureAuth Helm Chart provides support for the HorizontalPodAutoscaler resource. Metrics Server must be used to expose SecureAuth resources usage metrics used by autoscaling.
More on autoscaling details can be found in the Horizontal Pod Autoscaler K8s documentation.
Prerequisites
- 
Kubernetes cluster v1.16+ 
- 
Kubernetes Metrics Server 
- 
Helm v3.0+ 
- 
Resource requests specified in the Helm chart 
Configure Autoscaling
Autoscaling can be enabled for base SecureAuth pods as well as worker pods. The configuration parameters are identical, although worker pods configuration is located under the workers key in values.yaml.
For autoscaling to work properly, resource requests must be set for the SecureAuth pods.
resources:
  requests:
    cpu: 500m
    memory: 1.2Gi         
To enable autoscaling integration, set the autoscaling.enabled parameter to true.
When autoscaling is enabled, the replicaCount parameter is ignored.
- 
autoscaling.minReplicasparameter is used to set the minimum number of replicas.
- 
autoscaling.maxReplicasparameter is used to set the maximum number of replicas.
- 
autoscaling.targetCPUUtilizationPercentageparameter can be used to enable CPU autoscaling at a given percentage.
- 
autoscaling.targetMemoryUtilizationPercentageparameter can be used to enable memory autoscaling at a given percentage.
- 
autoscaling.behaviorparameter can be used to configure detailed scaling behaviors
autoscaling:
  ## If true, autoscaling is enabled
  ##
  enabled: true
  ## Set a minimum number of 3 replicas
  ##
  minReplicas: 3
  ## Set a maximum number of 9 replicas
  ##
  maxReplicas: 9
  ## Enable CPU autoscaling at 70% of request utilization
  ##
  targetCPUUtilizationPercentage: 70
  ## Enable memory autoscaling at 50% of request utilization
  ##
  targetMemoryUtilizationPercentage: 50
  ## Consider utilization values from last 5 minutes during scaling
  ## Scale Down one pod at a time every 180 seconds
  ##
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Pods
          value: 1
          periodSeconds: 180         
Enhancing SecureAuth Autoscaling
This section delves into the more intricate aspects of ACP autoscaling. The concepts outlined here may be beneficial and can be adapted according to your requirements.
Kubernetes Event-Driven Autoscaling for SecureAuth workers
Kubernetes Event-Driven Autoscaling (KEDA) is an open-source project developed in collaboration between Microsoft and Red Hat. KEDA is an event-driven autoscaler that allows applications to scale based on events from various event sources, such as Prometheus, Kafka, RabbitMQ, and many more. KEDA provides a Kubernetes-native way to scale applications in response to events, rather than traditional metrics like CPU or memory usage.
In scenarios where an exceptionally high number of events are anticipated in SecureAuth, there's a risk that the handlers may not keep pace and the HPA may not adequately scale up the SecureAuth worker pods. This situation could lead to events being processed with a significant delay, potentially a few minutes. Should you encounter such a scenario, we suggest leveraging KEDA. It can drive the scaling up of SecureAuth workers HPA based on the lag of the stream handler rather than relying on CPU usage.
Configuration
- 
Install monitoring stack for your SecureAuth installation (docs). 
- 
Install KEDA in your Kubernetes cluster from the official Helm chart (docs). Below is some basic configuration for the chart we recommend. operator:
 replicaCount: 2
 affinity:
 podAntiAffinity:
 requiredDuringSchedulingIgnoredDuringExecution:
 - labelSelector:
 matchExpressions:
 - key: app
 operator: In
 values:
 - keda-operator
 topologyKey: "kubernetes.io/hostname"
 metricsServer:
 replicaCount: 2
 affinity:
 podAntiAffinity:
 requiredDuringSchedulingIgnoredDuringExecution:
 - labelSelector:
 matchExpressions:
 - key: app
 operator: In
 values:
 - keda-operator-metrics-apiserver
 topologyKey: "kubernetes.io/hostname"
 tolerations:
 - key: CriticalAddonsOnly
 value: "true"
 effect: NoSchedule
 topologySpreadConstraints:
 operator:
 - labelSelector:
 matchLabels:
 app: keda-operator
 maxSkew: 1
 topologyKey: topology.kubernetes.io/zone
 whenUnsatisfiable: ScheduleAnyway
 metricsServer:
 - labelSelector:
 matchLabels:
 app: keda-operator-metrics-apiserver
 maxSkew: 1
 topologyKey: topology.kubernetes.io/zone
 whenUnsatisfiable: ScheduleAnywayFor more parameters refer to the official values.yaml description. 
- 
Create KEDA ScaledObjectbased on the SecureAuthacp_redis_lag_seconds_bucketmetric.Below is our recommendation of how it can be configured. apiVersion: keda.sh/v1alpha1
 kind: ScaledObject
 metadata:
 name: acp-workers
 spec:
 scaleTargetRef:
 name: acp-workers
 minReplicaCount: 1
 maxReplicaCount: 1
 triggers:
 - type: prometheus
 metadata:
 name: prom-trigger
 serverAddress: <prometheus endpoint cluster url>
 threshold: '10'
 query: 'round(max(avg(histogram_quantile(0.9, rate(acp_redis_lag_seconds_bucket{job="acp-workers", namespace="acp"}[1m])) > 0) by (group, stream)))'
 advanced:
 horizontalPodAutoscalerConfig:
 behavior:
 scaleDown:
 stabilizationWindowSeconds: 300
 policies:
 - type: Pods
 value: 2
 periodSeconds: 100
 scaleUp:
 stabilizationWindowSeconds: 60
 policies:
 - type: Pods
 value: 4
 periodSeconds: 30The PromQL query defined in spec.triggers.0.metadata.queryfield calculates average 90th percentile of Redis lag across all ACP workers pods over the last 1 min period. If the calculated value is higher thanspec.triggers.0.metadata.thresholdvalue, KEDA will start the scaling-up procedure.
- 
Disable the default Horizontal Pod Autoscaling on the SecureAuth workers by switching the workers.autoscaling.enabledvalue tofalsein the main SecureAuth configuration yaml:workers:
 autoscaling:
 enabled: falseSecureAuth is not needed as KEDA will take care of the autoscaling process of SecureAuth worker pods since now. 
Conclusion
- 
Autoscaling works based on the average usage across all SecureAuth pods for given metric. 
- 
Average value of 100% for the metric is defined in resources.requests.
- 
CPU and Memory autoscaling can work at the same time. Scaling works on metric which reports the higher desired count. 
- 
The default upscale delay equals 0s(controlled by cluster operator in kube-controller-manager)
- 
The default downscale delay equals 5m(controlled by cluster operator in kube-controller-manager)
- 
Metrics scrape interval can be configured in metric-server via the metric-resolution(default60s)
- 
The default scale up bahavior is to add 100% of currently running replicas or 4 pods (whichever is higher) every 15 seconds based on last metric 
- 
The default scale down bahavior is to remove up to 100% of currently running replicas every 15 seconds based on 5 minutes of metrics