1. Knative Autoscaling?
- https://knative.dev/docs/serving/autoscaling/
✓ automatic scaling of replicas for an application to closely match incoming demand
- Supported Autoscaler types
✓ Knative Serving supports the implementation of Knative Pod Autoscaler (KPA) and Kubernetes' Horizontal Pod Autoscaler (HPA).
a. Knative Pod Autoscaler (KPA)
Part of the Knative Serving core and enabled by default once Knative Serving is installed.
Supports scale to zero functionality. (scale to zero)
Does not support CPU-based autoscaling.
b. Horizontal Pod Autoscaler (HPA) : https://kubernetes.io/ko/docs/tasks/run-application/horizontal-pod-autoscale/
Not part of the Knative Serving core, and you must install Knative Serving first.
Does not support scale to zero functionality.
Supports CPU-based autoscaling.
- Metric for KPA
✓ The default KPA Autoscaler supports the concurrency and rps (request per second) metrics.
a. concurrency
▷ Concurrency determines the number of simultaneous requests that can be processed by each replica of an application at any given time.
▷ concurrency type
i. Soft limit (autoscaling.knative.dev/target)
The soft limit is a targeted limit rather than a strictly enforced bound. In some situations, particularly if there is a sudden burst of requests, this value can be exceeded.
ii. Hard limit (containerConcurrency)
The hard limit is an enforced upper bound. If concurrency reaches the hard limit, surplus requests will be buffered and must wait until enough capacity is free to execute the requests.
Using a hard limit configuration is only recommended if there is a clear use case for it with your application.
Having a low hard limit specified may have a negative impact on the throughput and latency of an application, and may cause additional cold starts.
b. rps (request per second)
▷ a target for requests-per-second per replica of an application
autoscaling.knative.dev/metric: "rps"
autoscaling.knative.dev/target: "150"
- Metric for HPA
✓ cpu
The HPA Autoscaler supports the cpu metric.
2. Algorithms about concurrency
- Ref. https://www.alibabacloud.com/help/doc-detail/186027.htm
- KPA performs auto scaling based on the average number of concurrent requests per pod.
✓ This value is specified by the concurrency target.
✓ The default concurrency target is 100.
✓ The number of pods required to handle requests is determined based on the following formula:
Number of pods = Total number of concurrent requests to the application / Concurrency target.
For example, if you set the concurrency target to 10 and send 50 concurrent requests to an application, KPA creates five pods.
- KPA provides two auto scaling modes: stable and panic.
✓ Stable
In stable mode, KPA adjusts the number of pods provisioned for a Deployment to match the specified concurrency target. The concurrency target indicates the average number of requests received by a pod within a stable window of 60 seconds.
✓ Panic
KPA calculates the average number of concurrent requests per pod within a stable window of 60 seconds. Therefore, the number of concurrent requests must remain at a specific level for 60 seconds.
KPA also calculates the number of concurrent requests per pod within a panic window of 6 seconds.
If the number of concurrent requests reaches twice the concurrency target, KAP switches to the panic mode.
In panic mode, KPA scales pods within a shorter time window than in stable mode.
After the burst of concurrent requests lasts for 60 seconds, KPA automatically switches back to the stable mode.
|
Panic Target---> +--| 20
| |
| <------Panic Window
| |
Stable Target---> +-------------------------|--| 10 CONCURRENCY
| | |
| <-----------Stable Window
| | |
--------------------------+-------------------------+--+ 0
120 60 0
TIME
3. KPA configurations
- Autoscaler 관련된 파라미터들을 Global 하게 적용하기 위해서는 Configmap('config-autoscaler')에 추가해야 한다.
- 설정할 수 있는 파라미터들에 대한 상세 정보는 아래 링크를 참조하면 된다.
https://www.alibabacloud.com/help/doc-detail/186027.htm -> KPA configurations 절
- 아래는 container-concurrency-target-default를 추가한 예제이다.
$ kubectl -n knative-serving edit cm config-autoscaler
apiVersion: v1
data:
container-concurrency-target-default: "100"
_example: |
################################
# #
# EXAMPLE CONFIGURATION #
# #
################################
# This block is not actually functional configuration,
# but serves to illustrate the available configuration
# options and document them in a way that is accessible
# to users that `kubectl edit` this config map.
#
# These sample configuration options may be copied out of
# this example block and unindented to be in the data block
# to actually change the configuration.
...
4. Autoscaling 설정
- Knative service를 위한 yaml 파일에서 autoscaling를 위한 annotaion key를 설정할 수 있다
$ vi service.yaml
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: autoscale-go
namespace: yoosung-jeon
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/class: "kpa.autoscaling.knative.dev"
autoscaling.knative.dev/metric: "concurrency"
autoscaling.knative.dev/target: "10"
autoscaling.knative.dev/minSacle: "1"
- Autoscaler type
annotaion key: autoscaling.knative.dev/class
설정 가능 값: "kpa.autoscaling.knative.dev" or "hpa.autoscaling.knative.dev"
기본 값: "kpa.autoscaling.knative.dev"
Global (configMap/config-autoscaler): pod-autoscaler-class
- Metrics
정의: Autoscaler가 참조하는 metric type으로, Autoscaler가 kpa는 "concurrency" 또는 "rps"를, hpa를 "cpu"를 설정
annotaion key: autoscaling.knative.dev/metric
설정 가능 값: "concurrency", "rps", "cpu"
기본 값: "concurrency"
- Concurrency
정의: 주어진 시점에 애플리케이션의 각 Pod가 처리할 수 있는 동시 요청 수, metric type을 위한 값, soft와 hard limit 둘 다 설정한 경우 작은 값 사용
✓ soft limit
annotaion key: autoscaling.knative.dev/target
기본 값: 100
Global (configMap/config-autoscaler): container-concurrency-target-default
✓ hard limit
spec key: containerConcurrency
기본 값: 0 (no limit)
Global (configMap/config-defaults) key: container-concurrency
hard limit 설정은 annotaion이 아닌 spec에 정의한다.
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: helloworld-go
spec:
template:
spec:
containerConcurrency: 50
- 그 외 annocation key
✓ autoscaling.knative.dev/minScale
기본 값: 0 if scale-to-zero is enabled and class KPA is used, 1 otherwise
https://knative.dev/docs/serving/autoscaling/scale-bounds/#lower-bound
✓ autoscaling.knative.dev/maxScale
기본 값: 0 which means unlimited
https://knative.dev/docs/serving/autoscaling/scale-bounds/#upper-bound
✓ autoscaling.knative.dev/initialScale
https://knative.dev/docs/serving/autoscaling/scale-bounds/#initial-scale
✓ autoscaling.knative.dev/scaleDownDelay
https://knative.dev/docs/serving/autoscaling/scale-bounds/#scale-down-delay
✓ autoscaling.knative.dev/targetUtilizationPercentage
https://knative.dev/docs/serving/autoscaling/concurrency/#target-utilization
✓ autoscaling.knative.dev/scaleToZeroPodRetentionPeriod
https://knative.dev/docs/serving/autoscaling/scale-to-zero/#scale-to-zero-last-pod-retention-period
✓ autoscaling.knative.dev/window
https://knative.dev/docs/serving/autoscaling/kpa-specific/#stable-window
✓ autoscaling.knative.dev/panicWindowPercentage
https://knative.dev/docs/serving/autoscaling/kpa-specific/#panic-window
✓ autoscaling.knative.dev/panicThresholdPercentage
https://knative.dev/docs/serving/autoscaling/kpa-specific/#panic-mode-threshold
'Kubernetes > Management' 카테고리의 다른 글
Knative - SKS Mode (Proxy, Serve) 이해 (0) | 2021.10.13 |
---|---|
Knative - Autoscaling #2 (테스트) (0) | 2021.10.12 |
istio - Access logs 설정 (0) | 2021.10.08 |
Knative - Custom domain 변경 (0) | 2021.10.06 |
Knative 이해 (0) | 2021.10.05 |
댓글