Elastic Observability - filebeat/metricbeat POD 오류

2020,11.30

a. Problem: Elastic Observability - filebeat/metricbeat POD 오류

- Environment

ECK(Elastic Cloud on Kubernetes) 1.2.1 + Elasticsearch/Kibana 7.9.1, Kubernetes 1.16.15

[iap@iap01 elastic-cloud-kubernetes]$ k get pod -n kube-system | egrep "filebeat|metricbeat"

filebeat-2w85f 0/1 CrashLoopBackOff 13 46m

filebeat-4nbwm 0/1 CrashLoopBackOff 307 25h

filebeat-59wfj 0/1 CrashLoopBackOff 307 25h

…

b. Cause analysis: .kibana_task_manager index의 상태가 UNASSIGNED되어 발생

- state

UNASSIGNED: The shard is not assigned to any node.

[iap@iap01 ~]$ k logs filebeat-4nbwm -n kube-system

…

2020-12-02T05:01:50.741Z ERROR instance/beat.go:951 Exiting: error connecting to Kibana: fail to get the Kibana version: HTTP GET request to http://observer-kb-http.elastic-cluster.svc.cluster.local:5601/api/status fails: fail to execute the HTTP GET request: Get "http://observer-kb-http.elastic-cluster.svc.cluster.local:5601/api/status": dial tcp 10.99.79.112:5601: connect: connection refused. Response: .

Exiting: error connecting to Kibana: fail to get the Kibana version: HTTP GET request to http://observer-kb-http.elastic-cluster.svc.cluster.local:5601/api/status fails: fail to execute the HTTP GET request: Get "http://observer-kb-http.elastic-cluster.svc.cluster.local:5601/api/status": dial tcp 10.99.79.112:5601: connect: connection refused. Response: .

[iap@iap01 ~]$ k logs metricbeat-22m9l -n kubde-system

…

2020-12-02T04:56:41.340Z ERROR instance/beat.go:951 Exiting: error connecting to Kibana: fail to get the Kibana version: HTTP GET request to http://observer-kb-http.elastic-cluster.svc.cluster.local:5601/api/status fails: fail to execute the HTTP GET request: Get "http://observer-kb-http.elastic-cluster.svc.cluster.local:5601/api/status": dial tcp 10.99.79.112:5601: connect: connection refused. Response: .

[iap@iap01 ~]$

[iap@iap01 ~]$ k get pod -n elastic-cluster | grep observer-kb

observer-kb-d6d57648b-n7bf5 0/1 Running 0 19h

[iap@iap01 ~]$ k describe pod observer-kb-d6d57648b-n7bf5 -n elastic-cluster | grep Events -A10

Events:

Type Reason Age From Message

---- ------ ---- ---- -------

Warning Unhealthy 112s (x6867 over 19h) kubelet, iap11 Readiness probe failed: HTTP probe failed with statuscode: 503

[iap@iap01 ~]$ k logs observer-kb-d6d57648b-n7bf5 -n elastic-cluster

…

{"type":"log","@timestamp":"2020-12-01T10:03:13Z","tags":["warning","savedobjects-service"],"pid":6,"message":"Unable to connect to Elasticsearch. Error: [search_phase_execution_exception] all shards failed"}

[iap@iap01 elastic-cloud-kubernetes]$ k exec observer-kb-d6d57648b-n7bf5 -n elastic-cluster -it -- bash

bash-4.2$ curl http://localhost:5601/

Kibana server is not ready yet

bash-4.2$ exit

[iap@iap01 ~]$

[iap@iap01 elastic-cloud-kubernetes]$ k logs observer-es-master-nodes-2 -n elastic-cluster -f

…

{"type": "server", "timestamp": "2020-12-01T10:02:25,278Z", "level": "WARN", "component": "r.suppressed", "cluster.name": "observer", "node.name": "observer-es-master-nodes-2", "message": "path: /.kibana_task_manager/_count, params: {index=.kibana_task_manager}", "cluster.uuid": "7lYtO5R_R7muGZfwY1smrg", "node.id": "6MTg_Jo9SVWgOjXkA6e8yA" ,

"stacktrace": ["org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed",

"at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:551) [elasticsearch-7.9.1.jar:7.9.1]",

"at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:309) [elasticsearch-7.9.1.jar:7.9.1]",

"at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:582) [elasticsearch-7.9.1.jar:7.9.1]",

"at org.elasticsearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:393) [elasticsearch-7.9.1.jar:7.9.1]",

"at org.elasticsearch.action.search.AbstractSearchAsyncAction.lambda$performPhaseOnShard$0(AbstractSearchAsyncAction.java:223) [elasticsearch-7.9.1.jar:7.9.1]",

"at org.elasticsearch.action.search.AbstractSearchAsyncAction$2.doRun(AbstractSearchAsyncAction.java:288) [elasticsearch-7.9.1.jar:7.9.1]",

"at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.9.1.jar:7.9.1]",

"at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:44) [elasticsearch-7.9.1.jar:7.9.1]",

"at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:710) [elasticsearch-7.9.1.jar:7.9.1]",

"at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.9.1.jar:7.9.1]",

"at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]",

"at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]",

"at java.lang.Thread.run(Thread.java:832) [?:?]"] }

[iap@iap01 elastic-cloud-kubernetes]$ curl -u elastic:E2dUu6FGq6rT9484ovh5y18c http://14.52.244.134:31313/_cat/shards/_all?pretty=true -s | grep ".kibana_task_manager"

.kibana_task_manager_1 0 p UNASSIGNED

.kibana_task_manager_1 0 r UNASSIGNED

[iap@iap01 elastic-cloud-kubernetes]$ curl -u elastic:E2dUu6FGq6rT9484ovh5y18c http://14.52.244.134:31313/_cat/shards?h=index,shard,prirep,state,unassigned.reason -s | grep .kibana_task_manager_1

.kibana_task_manager_1 0 p UNASSIGNED ALLOCATION_FAILED

.kibana_task_manager_1 0 r UNASSIGNED REPLICA_ADDED

[iap@iap01 elastic-cloud-kubernetes]$

c. Solution: .kibana_task_manager index 삭제

https://www.datadoghq.com/blog/elasticsearch-unassigned-shards/

[iap@iap01 elastic-cloud-kubernetes]$ curl -u elastic:E2dUu6FGq6rT9484ovh5y18c -XDELETE http://14.52.244.134:31313/.kibana_task_manager_1/ -s

{"acknowledged":true}

[iap@iap01 elastic-cloud-kubernetes]$ curl -u elastic:E2dUu6FGq6rT9484ovh5y18c http://14.52.244.134:31313/_cat/shards/.kibana_task_manager?pretty=true -s

.kibana_task_manager_1 0 p STARTED 6 78.1kb 10.244.7.208 observer-es-data-nodes-9

.kibana_task_manager_1 0 r STARTED 6 60.5kb 10.244.9.235 observer-es-data-nodes-2

[iap@iap01 elastic-cloud-kubernetes]$ k get pod -n elastic-cluster | grep observer-kb

observer-kb-d6d57648b-qnlf6 1/1 Running 1 18m

[iap@iap01 elastic-cloud-kubernetes]$ k rollout restart daemonset filebeat metricbeat -n kube-system

daemonset.apps/filebeat restarted

daemonset.apps/metricbeat restarted

[iap@iap01 elastic-cloud-kubernetes]$

'Kubernetes > Monitoring' 카테고리의 다른 글

GPU Monitor (0)	2021.09.21
Elastic Observability (0)	2021.09.20
Dashboard on bare-metal (0)	2021.09.15
Dashboard on GCE (0)	2021.09.15
Metrics-server (0)	2021.09.14

일주일만 하면 ...

Elastic Observability - filebeat/metricbeat POD 오류

'Kubernetes > Monitoring' 카테고리의 다른 글

댓글

티스토리툴바

Elastic Observability - filebeat/metricbeat POD 오류

'Kubernetes > Monitoring' 카테고리의 다른 글

관련글

댓글

티스토리툴바