2020,11.30
a. Problem: Elastic Observability - filebeat/metricbeat POD 오류
- Environment
ECK(Elastic Cloud on Kubernetes) 1.2.1 + Elasticsearch/Kibana 7.9.1, Kubernetes 1.16.15
[iap@iap01 elastic-cloud-kubernetes]$ k get pod -n kube-system | egrep "filebeat|metricbeat"
filebeat-2w85f 0/1 CrashLoopBackOff 13 46m
filebeat-4nbwm 0/1 CrashLoopBackOff 307 25h
filebeat-59wfj 0/1 CrashLoopBackOff 307 25h
…
b. Cause analysis: .kibana_task_manager index의 상태가 UNASSIGNED되어 발생
- state
UNASSIGNED: The shard is not assigned to any node.
[iap@iap01 ~]$ k logs filebeat-4nbwm -n kube-system
…
2020-12-02T05:01:50.741Z ERROR instance/beat.go:951 Exiting: error connecting to Kibana: fail to get the Kibana version: HTTP GET request to http://observer-kb-http.elastic-cluster.svc.cluster.local:5601/api/status fails: fail to execute the HTTP GET request: Get "http://observer-kb-http.elastic-cluster.svc.cluster.local:5601/api/status": dial tcp 10.99.79.112:5601: connect: connection refused. Response: .
Exiting: error connecting to Kibana: fail to get the Kibana version: HTTP GET request to http://observer-kb-http.elastic-cluster.svc.cluster.local:5601/api/status fails: fail to execute the HTTP GET request: Get "http://observer-kb-http.elastic-cluster.svc.cluster.local:5601/api/status": dial tcp 10.99.79.112:5601: connect: connection refused. Response: .
[iap@iap01 ~]$ k logs metricbeat-22m9l -n kubde-system
…
2020-12-02T04:56:41.340Z ERROR instance/beat.go:951 Exiting: error connecting to Kibana: fail to get the Kibana version: HTTP GET request to http://observer-kb-http.elastic-cluster.svc.cluster.local:5601/api/status fails: fail to execute the HTTP GET request: Get "http://observer-kb-http.elastic-cluster.svc.cluster.local:5601/api/status": dial tcp 10.99.79.112:5601: connect: connection refused. Response: .
Exiting: error connecting to Kibana: fail to get the Kibana version: HTTP GET request to http://observer-kb-http.elastic-cluster.svc.cluster.local:5601/api/status fails: fail to execute the HTTP GET request: Get "http://observer-kb-http.elastic-cluster.svc.cluster.local:5601/api/status": dial tcp 10.99.79.112:5601: connect: connection refused. Response: .
[iap@iap01 ~]$
[iap@iap01 ~]$ k get pod -n elastic-cluster | grep observer-kb
observer-kb-d6d57648b-n7bf5 0/1 Running 0 19h
[iap@iap01 ~]$ k describe pod observer-kb-d6d57648b-n7bf5 -n elastic-cluster | grep Events -A10
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 112s (x6867 over 19h) kubelet, iap11 Readiness probe failed: HTTP probe failed with statuscode: 503
[iap@iap01 ~]$ k logs observer-kb-d6d57648b-n7bf5 -n elastic-cluster
…
{"type":"log","@timestamp":"2020-12-01T10:03:13Z","tags":["warning","savedobjects-service"],"pid":6,"message":"Unable to connect to Elasticsearch. Error: [search_phase_execution_exception] all shards failed"}
[iap@iap01 elastic-cloud-kubernetes]$ k exec observer-kb-d6d57648b-n7bf5 -n elastic-cluster -it -- bash
bash-4.2$ curl http://localhost:5601/
Kibana server is not ready yet
bash-4.2$ exit
[iap@iap01 ~]$
[iap@iap01 elastic-cloud-kubernetes]$ k logs observer-es-master-nodes-2 -n elastic-cluster -f
…
{"type": "server", "timestamp": "2020-12-01T10:02:25,278Z", "level": "WARN", "component": "r.suppressed", "cluster.name": "observer", "node.name": "observer-es-master-nodes-2", "message": "path: /.kibana_task_manager/_count, params: {index=.kibana_task_manager}", "cluster.uuid": "7lYtO5R_R7muGZfwY1smrg", "node.id": "6MTg_Jo9SVWgOjXkA6e8yA" ,
"stacktrace": ["org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed",
"at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:551) [elasticsearch-7.9.1.jar:7.9.1]",
"at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:309) [elasticsearch-7.9.1.jar:7.9.1]",
"at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:582) [elasticsearch-7.9.1.jar:7.9.1]",
"at org.elasticsearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:393) [elasticsearch-7.9.1.jar:7.9.1]",
"at org.elasticsearch.action.search.AbstractSearchAsyncAction.lambda$performPhaseOnShard$0(AbstractSearchAsyncAction.java:223) [elasticsearch-7.9.1.jar:7.9.1]",
"at org.elasticsearch.action.search.AbstractSearchAsyncAction$2.doRun(AbstractSearchAsyncAction.java:288) [elasticsearch-7.9.1.jar:7.9.1]",
"at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.9.1.jar:7.9.1]",
"at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:44) [elasticsearch-7.9.1.jar:7.9.1]",
"at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:710) [elasticsearch-7.9.1.jar:7.9.1]",
"at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.9.1.jar:7.9.1]",
"at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]",
"at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]",
"at java.lang.Thread.run(Thread.java:832) [?:?]"] }
[iap@iap01 elastic-cloud-kubernetes]$ curl -u elastic:E2dUu6FGq6rT9484ovh5y18c http://14.52.244.134:31313/_cat/shards/_all?pretty=true -s | grep ".kibana_task_manager"
.kibana_task_manager_1 0 p UNASSIGNED
.kibana_task_manager_1 0 r UNASSIGNED
[iap@iap01 elastic-cloud-kubernetes]$ curl -u elastic:E2dUu6FGq6rT9484ovh5y18c http://14.52.244.134:31313/_cat/shards?h=index,shard,prirep,state,unassigned.reason -s | grep .kibana_task_manager_1
.kibana_task_manager_1 0 p UNASSIGNED ALLOCATION_FAILED
.kibana_task_manager_1 0 r UNASSIGNED REPLICA_ADDED
[iap@iap01 elastic-cloud-kubernetes]$
c. Solution: .kibana_task_manager index 삭제
https://www.datadoghq.com/blog/elasticsearch-unassigned-shards/
[iap@iap01 elastic-cloud-kubernetes]$ curl -u elastic:E2dUu6FGq6rT9484ovh5y18c -XDELETE http://14.52.244.134:31313/.kibana_task_manager_1/ -s
{"acknowledged":true}
[iap@iap01 elastic-cloud-kubernetes]$ curl -u elastic:E2dUu6FGq6rT9484ovh5y18c http://14.52.244.134:31313/_cat/shards/.kibana_task_manager?pretty=true -s
.kibana_task_manager_1 0 p STARTED 6 78.1kb 10.244.7.208 observer-es-data-nodes-9
.kibana_task_manager_1 0 r STARTED 6 60.5kb 10.244.9.235 observer-es-data-nodes-2
[iap@iap01 elastic-cloud-kubernetes]$ k get pod -n elastic-cluster | grep observer-kb
observer-kb-d6d57648b-qnlf6 1/1 Running 1 18m
[iap@iap01 elastic-cloud-kubernetes]$ k rollout restart daemonset filebeat metricbeat -n kube-system
daemonset.apps/filebeat restarted
daemonset.apps/metricbeat restarted
[iap@iap01 elastic-cloud-kubernetes]$
'Kubernetes > Monitoring' 카테고리의 다른 글
GPU Monitor (0) | 2021.09.21 |
---|---|
Elastic Observability (0) | 2021.09.20 |
Dashboard on bare-metal (0) | 2021.09.15 |
Dashboard on GCE (0) | 2021.09.15 |
Metrics-server (0) | 2021.09.14 |
댓글