본문 바로가기
Kubernetes/Monitoring

Elastic Observability - filebeat/metricbeat POD 오류

by 여행을 떠나자! 2021. 9. 15.

2020,11.30

 

a. Problem: Elastic Observability - filebeat/metricbeat POD 오류

    - Environment

      ECK(Elastic Cloud on Kubernetes) 1.2.1 + Elasticsearch/Kibana 7.9.1, Kubernetes 1.16.15

 

  [iap@iap01 elastic-cloud-kubernetes]$ k get pod -n kube-system | egrep "filebeat|metricbeat"

  filebeat-2w85f          0/1     CrashLoopBackOff   13         46m

  filebeat-4nbwm          0/1     CrashLoopBackOff   307        25h

  filebeat-59wfj          0/1     CrashLoopBackOff   307        25h

  … 

 

 

b.  Cause analysis: .kibana_task_manager index의 상태가 UNASSIGNED되어 발생

     - state

       UNASSIGNED: The shard is not assigned to any node.

 

  [iap@iap01 ~]$ k logs filebeat-4nbwm -n kube-system

  …

  2020-12-02T05:01:50.741Z  ERROR   instance/beat.go:951    Exiting: error connecting to Kibana: fail to get the Kibana version: HTTP GET request to http://observer-kb-http.elastic-cluster.svc.cluster.local:5601/api/status fails: fail to execute the HTTP GET request: Get "http://observer-kb-http.elastic-cluster.svc.cluster.local:5601/api/status": dial tcp 10.99.79.112:5601: connect: connection refused. Response: .

  Exiting: error connecting to Kibana: fail to get the Kibana version: HTTP GET request to http://observer-kb-http.elastic-cluster.svc.cluster.local:5601/api/status fails: fail to execute the HTTP GET request: Get "http://observer-kb-http.elastic-cluster.svc.cluster.local:5601/api/status": dial tcp 10.99.79.112:5601: connect: connection refused. Response: .

  [iap@iap01 ~]$ k logs metricbeat-22m9l -n kubde-system

  …

  2020-12-02T04:56:41.340Z  ERROR   instance/beat.go:951    Exiting: error connecting to Kibana: fail to get the Kibana version: HTTP GET request to http://observer-kb-http.elastic-cluster.svc.cluster.local:5601/api/status fails: fail to execute the HTTP GET request: Get "http://observer-kb-http.elastic-cluster.svc.cluster.local:5601/api/status": dial tcp 10.99.79.112:5601: connect: connection refused. Response: .

  Exiting: error connecting to Kibana: fail to get the Kibana version: HTTP GET request to http://observer-kb-http.elastic-cluster.svc.cluster.local:5601/api/status fails: fail to execute the HTTP GET request: Get "http://observer-kb-http.elastic-cluster.svc.cluster.local:5601/api/status": dial tcp 10.99.79.112:5601: connect: connection refused. Response: .

  [iap@iap01 ~]$

  [iap@iap01 ~]$ k get pod -n elastic-cluster | grep observer-kb

  observer-kb-d6d57648b-n7bf5   0/1     Running   0          19h

  [iap@iap01 ~]$ k describe pod observer-kb-d6d57648b-n7bf5 -n elastic-cluster | grep Events -A10

  Events:

    Type     Reason     Age                    From            Message

    ----     ------     ----                   ----            -------

    Warning  Unhealthy  112s (x6867 over 19h)  kubelet, iap11  Readiness probe failed: HTTP probe failed with statuscode: 503

  [iap@iap01 ~]$ k logs observer-kb-d6d57648b-n7bf5 -n elastic-cluster

  …

  {"type":"log","@timestamp":"2020-12-01T10:03:13Z","tags":["warning","savedobjects-service"],"pid":6,"message":"Unable to connect to Elasticsearch. Error: [search_phase_execution_exception] all shards failed"}

  [iap@iap01 elastic-cloud-kubernetes]$ k exec observer-kb-d6d57648b-n7bf5 -n elastic-cluster -it -- bash

  bash-4.2$ curl http://localhost:5601/

  Kibana server is not ready yet

  bash-4.2$ exit

  [iap@iap01 ~]$

  [iap@iap01 elastic-cloud-kubernetes]$ k logs observer-es-master-nodes-2 -n elastic-cluster -f

  …

  {"type": "server", "timestamp": "2020-12-01T10:02:25,278Z", "level": "WARN", "component": "r.suppressed", "cluster.name": "observer", "node.name": "observer-es-master-nodes-2", "message": "path: /.kibana_task_manager/_count, params: {index=.kibana_task_manager}", "cluster.uuid": "7lYtO5R_R7muGZfwY1smrg", "node.id": "6MTg_Jo9SVWgOjXkA6e8yA" ,

  "stacktrace": ["org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed",

  "at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:551) [elasticsearch-7.9.1.jar:7.9.1]",

  "at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:309) [elasticsearch-7.9.1.jar:7.9.1]",

  "at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:582) [elasticsearch-7.9.1.jar:7.9.1]",

  "at org.elasticsearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:393) [elasticsearch-7.9.1.jar:7.9.1]",

  "at org.elasticsearch.action.search.AbstractSearchAsyncAction.lambda$performPhaseOnShard$0(AbstractSearchAsyncAction.java:223) [elasticsearch-7.9.1.jar:7.9.1]",

  "at org.elasticsearch.action.search.AbstractSearchAsyncAction$2.doRun(AbstractSearchAsyncAction.java:288) [elasticsearch-7.9.1.jar:7.9.1]",

  "at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.9.1.jar:7.9.1]",

  "at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:44) [elasticsearch-7.9.1.jar:7.9.1]",

  "at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:710) [elasticsearch-7.9.1.jar:7.9.1]",

  "at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.9.1.jar:7.9.1]",

  "at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]",

  "at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]",

  "at java.lang.Thread.run(Thread.java:832) [?:?]"] }

  [iap@iap01 elastic-cloud-kubernetes]$ curl -u elastic:E2dUu6FGq6rT9484ovh5y18c http://14.52.244.134:31313/_cat/shards/_all?pretty=true -s | grep ".kibana_task_manager"

  .kibana_task_manager_1             0 p UNASSIGNED

  .kibana_task_manager_1             0 r UNASSIGNED

  [iap@iap01 elastic-cloud-kubernetes]$ curl -u elastic:E2dUu6FGq6rT9484ovh5y18c http://14.52.244.134:31313/_cat/shards?h=index,shard,prirep,state,unassigned.reason -s | grep .kibana_task_manager_1

  .kibana_task_manager_1             0 p UNASSIGNED ALLOCATION_FAILED

  .kibana_task_manager_1             0 r UNASSIGNED REPLICA_ADDED

  [iap@iap01 elastic-cloud-kubernetes]$

 

 

c. Solution: .kibana_task_manager index 삭제

  https://www.datadoghq.com/blog/elasticsearch-unassigned-shards/

 

  [iap@iap01 elastic-cloud-kubernetes]$ curl -u elastic:E2dUu6FGq6rT9484ovh5y18c -XDELETE http://14.52.244.134:31313/.kibana_task_manager_1/ -s

  {"acknowledged":true}

  [iap@iap01 elastic-cloud-kubernetes]$ curl -u elastic:E2dUu6FGq6rT9484ovh5y18c http://14.52.244.134:31313/_cat/shards/.kibana_task_manager?pretty=true -s

  .kibana_task_manager_1 0 p STARTED 6 78.1kb 10.244.7.208 observer-es-data-nodes-9

  .kibana_task_manager_1 0 r STARTED 6 60.5kb 10.244.9.235 observer-es-data-nodes-2

  [iap@iap01 elastic-cloud-kubernetes]$ k get pod -n elastic-cluster | grep observer-kb

  observer-kb-d6d57648b-qnlf6   1/1     Running   1          18m

  [iap@iap01 elastic-cloud-kubernetes]$ k rollout restart daemonset filebeat metricbeat -n kube-system

  daemonset.apps/filebeat restarted

  daemonset.apps/metricbeat restarted

  [iap@iap01 elastic-cloud-kubernetes]$

'Kubernetes > Monitoring' 카테고리의 다른 글

GPU Monitor  (0) 2021.09.21
Elastic Observability  (0) 2021.09.20
Dashboard on bare-metal  (0) 2021.09.15
Dashboard on GCE  (0) 2021.09.15
Metrics-server  (0) 2021.09.14

댓글