Rook Ceph 구성

2020.05.25

1. 구성 환경

- Rook v1.3.6, ceph image version: "14.2.10-0 nautilus”, cephcsi v2.1.2, kubenetes 1.16.15, CentOS 7.8

2. Rook / Ceph ?

- Rook is an open source cloud-native storage orchestrator, providing the platform, framework, and support for a diverse set of storage solutions to natively integrate with cloud-native environments.

https://rook.io/docs/rook/v1.3/ceph-examples.html

- Ceph CSI (Container Storage Interface)

Ceph CSI plugins implement an interface between CSI enabled Container Orchestrator (CO) and Ceph cluster. It allows dynamically provisioning Ceph volumes and attaching them to workloads.

- Independent CSI plugins are provided to support RBD and CephFS backed volumes

- Ceph is an open-source software storage platform, implements object storage on a single distributed computer cluster, and provides 3in1 interfaces for: object-, block- and file-level storage.

https://www.cloudops.com/2019/05/the-ultimate-rook-and-ceph-survival-guide/ ***

https://docs.ceph.com/docs/master/architecture/

- 주요 프로세스

ceph-mon: Cluster monitor로 active / failed node 확인하는 역할을 수행하며 ceph storage cluster map의 master copy를 유지

ceph-mds: Metadata server로 inode와 디렉토리들의 메타데이터(filesystem의 디렉토리 및 파일이름, RADOS cluster에 저장된 object로의 매핑정보) 를 저장

ceph-osd: Object storage devices. 실제 파일 내용을 저장하고 OSD의 상태를 확인해서 monitor에 알려주는 역할도 수행

ceph-rgw: RESTful gateways. Object storage layer를 외부에 노출시키기 위한 인터페이스

3. Ceph Prerequisites

- https://rook.io/docs/rook/v1.3/ceph-prerequisites.html

- Ceph OSDs have a dependency on LVM in the following scenarios:

OSDs are created on raw devices or partitions

If encryption is enabled (encryptedDevice: true in the cluster CR)

A metadata device is specified

$ sudo yum install -y lvm2

- Ceph requires a Linux kernel built with the RBD(RADOS Block Device) module. - RADOS : Reliable Autonomic Distributed Object Store

# lsmod | grep rbd

# modprobe rbd

- If you will be creating volumes from a Ceph shared file system (CephFS), the recommended minimum kernel version is 4.17.

If you have a kernel version less than 4.17, the requested PVC sizes will not be enforced. Storage quotas will only be enforced on newer kernels.

- NTP 설정 (On-Prem 일 경우)

# yum install chrony -y

# vi /etc/chrony.conf # KT GiGATechHub내에 서버가 있는 경우 변경 및 방화벽 허용 요청

server time.google.com iburst

server time.kriss.re.kr iburst

server time.bora.net iburst

…

# systemctl enable chronyd && systemctl start chronyd

# chronyc sources

…

# timedatectl set-timezone Asia/Seoul

- Rook Ceph 재 설치 시 Worker node에서 실행

# rm -rf /var/lib/rook/ /var/lib/kubelet/plugins/rook-ceph* /var/lib/kubelet/plugins_registry/rook-ceph*

# lvscan

…

# lvremove ceph-… # Ceph가 생성했던 lvm 삭제

# vgremove …

# pvremove …

# wipefs -a /dev/sdc # Device or resource busy 에러 발생되면 reboot후 재 실행

4. Rook Install

- Ceph Storage Quickstart

- https://rook.io/docs/rook/v1.3/ceph-quickstart.html

- https://ruzickap.github.io/k8s-istio-workshop/lab-04/

a. Download YAML files

[ysjeon71_kubeflow3@master ~]$ git clone --single-branch -b release-1.3 https://github.com/rook/rook.git

…

[ysjeon71_kubeflow3@master ~]$ cd ~/rook/cluster/examples/kubernetes/ceph/

b. Common Resources

[ysjeon71_kubeflow3@master ~]$ kubectl apply -f common.yaml

…

[ysjeon71_kubeflow3@master ~]$

c. Operator

[ysjeon71_kubeflow3@master ~]$ vi operator.yaml

…

# Whether to start pods as privileged that mount a host path, which includes the Ceph mon and osd pods.

# This is necessary to workaround the anyuid issues when running on OpenShift.

# For more details see https://github.com/rook/rook/issues/1314#issuecomment-355799641

- name: ROOK_HOSTPATH_REQUIRES_PRIVILEGED

value: "true" # false를 true로 변경

…

[ysjeon71_kubeflow3@master ~]$ kubectl apply -f operator.yaml

configmap/rook-ceph-operator-config created

deployment.apps/rook-ceph-operator created

[ysjeon71_kubeflow3@master ~]$

## verify the rook-ceph-operator is in the `Running` state before proceeding

[ysjeon71_kubeflow3@master ~]$ k get pod -o wide -n rook-ceph

NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES

rook-ceph-operator-7d99d768f4-bs8q9 1/1 Running 0 2m19s 10.46.0.1 worker-3 <none> <none>

rook-discover-h6s7b 1/1 Running 0 103s 10.32.0.17 worker-1 <none> <none>

rook-discover-mvqgs 1/1 Running 0 103s 10.38.0.16 worker-2 <none> <none>

rook-discover-t5d2h 1/1 Running 0 103s 10.46.0.2 worker-3 <none> <none>

[ysjeon71_kubeflow3@master ~]$

d. Cluster CRD

- let’s create your Ceph storage cluster

• cluster.yaml: Cluster settings for a production cluster running on bare metal

• cluster-on-pvc.yaml: Cluster settings for a production cluster running in a dynamic cloud environment

• cluster-test.yaml: Cluster settings for a test environment such as minikube.

[ysjeon71_kubeflow3@master ~]$ kubectl apply -f cluster.yaml

…

[ysjeon71_kubeflow3@master ~]$

NAME READY STATUS RESTARTS AGE IP NODE

rook-ceph-mgr-a-f55d5d7-zvr45 1/1 Running 1 4h45m 10.32.0.20 worker-1

rook-ceph-mon-a-69c75c5bdd-ktrq9 1/1 Running 1 22h 10.38.0.17 worker-2

rook-ceph-mon-d-697d6bf679-5crf2 1/1 Running 0 4h45m 10.32.0.18 worker-1

rook-ceph-mon-e-5ccfcb6fcf-xkhx9 1/1 Running 0 4h45m 10.46.0.5 worker-3

rook-ceph-osd-0-7bdf4b8597-xcnx4 1/1 Running 0 31m 10.32.0.21 worker-1

rook-ceph-osd-1-7dc576d9fb-qsnss 1/1 Running 0 9m46s 10.38.0.22 worker-2

rook-ceph-osd-2-565bbb875f-rztrb 1/1 Running 0 93s 10.46.0.7 worker-3

[ysjeon71_kubeflow3@master ~]$ k logs rook-ceph-operator-7fc446864f-v6mgt -n rook-ceph | egrep "E \||W \|"

2020-05-08 04:47:47.409147 W | ceph-csi: CSI Block volume expansion requires Kubernetes version >=1.16.0

[ysjeon71_kubeflow3@master ~]$

5. Rook Toolbox

- The Rook toolbox is a container with common tools used for rook debugging and testing.

- https://rook.io/docs/rook/v1.3/ceph-toolbox.html

[ysjeon71_kubeflow3@master ceph]$ k apply -f toolbox.yaml

deployment.apps/rook-ceph-tools created

[ysjeon71_kubeflow3@master ceph]$ kubectl -n rook-ceph get pod -l "app=rook-ceph-tools"

NAME READY STATUS RESTARTS AGE

rook-ceph-tools-58df7d6b5c-j4gfv 1/1 Running 0 32s

[ysjeon71_kubeflow3@master ceph]$

[ysjeon71_kubeflow3@master ceph]$ kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash

[root@rook-ceph-tools-58df7d6b5c-j4gfv /]# ceph status

cluster:

id: cb0faddd-9ceb-4ee8-87ff-e4ade239182d

health: HEALTH_OK

services:

mon: 3 daemons, quorum a,d,e (age 4h)

mgr: a(active, since 3m)

osd: 3 osds: 3 up (since 2m), 3 in (since 2m)

data:

pools: 0 pools, 0 pgs

objects: 0 objects, 0 B

usage: 3.0 GiB used, 27 GiB / 30 GiB avail

pgs:

[root@rook-ceph-tools-58df7d6b5c-j4gfv /]# ceph osd status

+----+----------+-------+-------+--------+---------+--------+---------+-----------+

+----+----------+-------+-------+--------+---------+--------+---------+-----------+

| 0 | worker-1 | 1025M | 9210M | 0 | 0 | 0 | 0 | exists,up |

| 1 | worker-2 | 1025M | 9210M | 0 | 0 | 0 | 0 | exists,up |

| 2 | worker-3 | 1025M | 9210M | 0 | 0 | 0 | 0 | exists,up |

+----+----------+-------+-------+--------+---------+--------+---------+-----------+

[root@rook-ceph-tools-58df7d6b5c-j4gfv /]# ceph df

RAW STORAGE:

CLASS SIZE AVAIL USED RAW USED %RAW USED

hdd 30 GiB 27 GiB 5.4 MiB 3.0 GiB 10.02

TOTAL 30 GiB 27 GiB 5.4 MiB 3.0 GiB 10.02

POOLS:

POOL ID STORED OBJECTS USED %USED MAX AVAIL

[root@rook-ceph-tools-58df7d6b5c-j4gfv /]# rados df

POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR

total_objects 0

total_used 3.0 GiB

total_avail 27 GiB

total_space 30 GiB

[root@rook-ceph-tools-58df7d6b5c-j4gfv /]#

6. Ceph Dashboard

- https://rook.io/docs/rook/v1.3/ceph-dashboard.html

[ysjeon71_kubeflow3@master ~]$ k edit CephCluster rook-ceph -n rook-ceph

…

dashboard:

enabled: true

ssl: false # true -> false로 변경

urlPrefix: /ceph-dashboard # 추가

…

[ysjeon71_kubeflow3@master ceph]$ cd ~/rook/cluster/examples/kubernetes/ceph/

[ysjeon71_kubeflow3@master ceph]$ ls dashboard-external-http*

dashboard-external-http.yaml dashboard-external-https.yaml

[ysjeon71_kubeflow3@master ceph]$ k apply -f dashboard-external-http.yaml

service/rook-ceph-mgr-dashboard-external-http created

[ysjeon71_kubeflow3@master ceph]$

[ysjeon71_kubeflow3@master ceph]$ kubectl -n rook-ceph get service | egrep "NAME|rook-ceph-mgr-dashboard-external"

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE

rook-ceph-mgr-dashboard-external-http NodePort 10.110.229.161 <none> 7000:32667/TCP 3m44s

[ysjeon71_kubeflow3@master ceph]$

yoosungjeon@ysjeon-MacBook-Pro ~ % gcloud compute firewall-rules create allow-rook-ceph-mgr-dashboard-rule --allow=tcp:32667

yoosungjeon@ysjeon-MacBook-Pro ~ % gcloud compute instances list

NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS

master us-east1-b n1-standard-2 10.142.0.2 34.75.83.180 RUNNING

worker-1 us-east1-c custom (1 vCPU, 4.75 GiB) 10.142.0.3 35.227.3.25 RUNNING

worker-2 us-east1-d custom (1 vCPU, 4.75 GiB) 10.142.0.4 34.75.168.175 RUNNING

worker-3 us-east1-d n1-standard-1 10.142.0.5 34.75.65.90 RUNNING

yoosungjeon@ysjeon-MacBook-Pro ~ %

URL:

http://35.227.3.25:32667/ceph-dashboard/

Username: admin

Password: Jq-|.Fgu"g@;I:T1*:@\

[ysjeon71_kubeflow3@master ~]$ kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath="{['data']['password']}" | base64 --decode && echo

Jq-|.Fgu"g@;I:T1*:@\

[ysjeon71_kubeflow3@master ~]$

7. Storage

- Ceph storage

• Block: Create block storage to be consumed by a pod

• Shared Filesystem: Create a filesystem to be shared across multiple pods

• Object: Create an object store that is accessible inside or outside the Kubernetes cluster

a. Block

https://rook.io/docs/rook/v1.3/ceph-block.html

CephBlockPool -> storageClass -> PersistentVolumeClaim -> volumes

Each OSD must be located on a different node, because the failureDomain is set to host and the replicated.size is set to 3.

- Provision Storage

[ysjeon71_kubeflow3@master ~]$ cd rook/cluster/examples/kubernetes/ceph/

[ysjeon71_kubeflow3@master ~]$ vi storageclass.yaml

apiVersion: ceph.rook.io/v1

kind: CephBlockPool

metadata:

namespace: rook-ceph

spec:

failureDomain: host

replicated:

size: 3

---

apiVersion: storage.k8s.io/v1

kind: StorageClass

metadata:

# Change "rook-ceph" provisioner prefix to match the operator namespace if needed

provisioner: rook-ceph.rbd.csi.ceph.com

parameters:

# clusterID is the namespace where the rook cluster is running

clusterID: rook-ceph

# Ceph pool into which the RBD image shall be created

pool: replicapool

# RBD image format. Defaults to "2".

imageFormat: "2"

# RBD image features. Available for imageFormat: "2". CSI RBD currently supports only `layering` feature.

imageFeatures: layering

# The secrets contain Ceph admin credentials.

csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner

csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph

csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node

csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph

# Specify the filesystem type of the volume. If not specified, csi-provisioner

# will set default as `ext4`.

csi.storage.k8s.io/fstype: xfs

# Delete the rbd volume when a PVC is deleted

reclaimPolicy: Delete

[ysjeon71_kubeflow3@master ~] k apply -f storageclass.yaml

- Consume the storage: Wordpress sample

[ysjeon71_kubeflow3@master ceph]$ cd ~/rook/cluster/examples/kubernetes/

[ysjeon71_kubeflow3@master ceph]$ cat mysql.yaml

…

---

apiVersion: v1

kind: PersistentVolumeClaim

metadata:

labels:

app: wordpress

spec:

storageClassName: rook-ceph-block

accessModes:

- ReadWriteOnce

resources:

requests:

storage: 1Gi

---

apiVersion: apps/v1

kind: Deployment

…

spec:

containers:

- image: mysql:5.6

…

volumeMounts:

- name: mysql-persistent-storage

mountPath: /var/lib/mysql

volumes:

- name: mysql-persistent-storage

persistentVolumeClaim:

claimName: mysql-pv-claim

[ysjeon71_kubeflow3@master kubernetes]$ k apply -f mysql.yaml

service/wordpress-mysql created

persistentvolumeclaim/mysql-pv-claim created

deployment.apps/wordpress-mysql created

[ysjeon71_kubeflow3@master kubernetes]$ k apply -f wordpress.yaml

service/wordpress created

persistentvolumeclaim/wp-pv-claim created

deployment.apps/wordpress created

[ysjeon71_kubeflow3@master kubernetes]$ k get pvc

NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE

mysql-pv-claim Bound pvc-17fedd75-3998-4b09-bc13-de112afd8ab7 1Gi RWO rook-ceph-block 16m

wp-pv-claim Bound pvc-adf8d37b-8882-4568-badf-fead0f499a34 1Gi RWO rook-ceph-block 3m10s

[ysjeon71_kubeflow3@master kubernetes]$

b. Shared Filesystem

https://rook.io/docs/rook/v1.3/ceph-filesystem.html

By default only one shared filesystem can be created with Rook.

CephFilesystem -> storageClass -> PersistentVolumeClaim -> volumes

- Create the Filesystem

Create the filesystem by specifying the desired settings for the metadata pool, data pools, and metadata server in the CephFilesystem CRD.

[ysjeon71_kubeflow3@master ~]]$ cd rook/cluster/examples/kubernetes/ceph/

[ysjeon71_kubeflow3@master ceph]$ cat filesystem.yaml

apiVersion: ceph.rook.io/v1

kind: CephFilesystem

metadata:

namespace: rook-ceph

spec:

metadataPool:

replicated:

size: 3

dataPools:

- replicated:

size: 3

preservePoolsOnDelete: true

metadataServer:

activeCount: 1

activeStandby: true

[ysjeon71_kubeflow3@master ceph]$ k apply -f filesystem.yaml

cephfilesystem.ceph.rook.io/myfs created

[ysjeon71_kubeflow3@master ceph]$ kubectl -n rook-ceph get pod -l app=rook-ceph-mds

NAME READY STATUS RESTARTS AGE

rook-ceph-mds-myfs-a-78d59c77c7-ndps7 1/1 Running 0 10m

rook-ceph-mds-myfs-b-9b59d896c-q898t 1/1 Running 0 10m

[ysjeon71_kubeflow3@master ceph]$

- Provision Storage

[ysjeon71_kubeflow3@master ceph]$ vi storageclass_fs.yaml

apiVersion: storage.k8s.io/v1

kind: StorageClass

metadata:

# Change "rook-ceph" provisioner prefix to match the operator namespace if needed

provisioner: rook-ceph.cephfs.csi.ceph.com

parameters:

# clusterID is the namespace where operator is deployed.

clusterID: rook-ceph

# CephFS filesystem name into which the volume shall be created

fsName: myfs

# Ceph pool into which the volume shall be created

# Required for provisionVolume: "true"

pool: myfs-data0

# Root path of an existing CephFS volume

# Required for provisionVolume: "false"

# rootPath: /absolute/path

# The secrets contain Ceph admin credentials. These are generated automatically by the operator

# in the same namespace as the cluster.

csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner

csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph

csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node

csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph

reclaimPolicy: Delete

[ysjeon71_kubeflow3@master ceph]$ k apply -f storageclass_fs.yaml

storageclass.storage.k8s.io/rook-cephfs created

[ysjeon71_kubeflow3@master ceph]$ k get StorageClass

NAME PROVISIONER AGE

rook-ceph-block rook-ceph.rbd.csi.ceph.com 3h47m

rook-cephfs rook-ceph.cephfs.csi.ceph.com 53s

[ysjeon71_kubeflow3@master ceph]$

- Consume the Shared Filesystem: K8s Registry Sample

As an example, we will start the kube-registry pod with the shared filesystem as the backing store

$ cd ~/rook/cluster/examples/kubernetes/ceph/csi/cephfs

$ cat kube-registry.yaml

apiVersion: v1

kind: PersistentVolumeClaim

metadata:

spec:

accessModes:

- ReadWriteMany

resources:

requests:

storage: 1Gi

storageClassName: rook-cephfs

---

apiVersion: apps/v1

kind: Deployment

…

spec:

containers:

- name: registry

image: registry:2

…

volumeMounts:

- name: image-store

mountPath: /var/lib/registry

…

volumes:

- name: image-store

persistentVolumeClaim:

claimName: cephfs-pvc

readOnly: false

[ysjeon71_kubeflow3@master cephfs]$ k apply -f kube-registry.yaml

persistentvolumeclaim/cephfs-pvc created

deployment.apps/kube-registry created

[ysjeon71_kubeflow3@master cephfs]$

c. Object storage

https://rook.io/docs/rook/v1.3/ceph-object.html

Object storage exposes an S3 API to the storage cluster for applications to put and get data.

CephObjectStore -> storageClass -> ObjectBucketClaim

- Create an Object Store

$ cd ~/rook/cluster/examples/kubernetes/ceph

[ysjeon71_kubeflow3@master ceph]$ k apply -f object.yaml

cephobjectstore.ceph.rook.io/my-store created

[ysjeon71_kubeflow3@master ceph]$

# To confirm the object store is configured, wait for the rgw pod to start

[ysjeon71_kubeflow3@master ceph]$ kubectl -n rook-ceph get pod -l app=rook-ceph-rgw

NAME READY STATUS RESTARTS AGE

rook-ceph-rgw-my-store-a-689745fc44-dnlbm 1/1 Running 0 40s

[ysjeon71_kubeflow3@master ceph]$

- Create a Bucket

[ysjeon71_kubeflow3@master ceph]$ k apply -f storageclass-bucket-delete.yaml

storageclass.storage.k8s.io/rook-ceph-delete-bucket created

[ysjeon71_kubeflow3@master ceph]$

A secret and ConfigMap are created with the same name as the OBC and in the same namespace

[ysjeon71_kubeflow3@master ceph]$ cat object-bucket-claim-delete.yaml

apiVersion: objectbucket.io/v1alpha1

kind: ObjectBucketClaim

metadata:

spec:

generateBucketName: ceph-bkt

storageClassName: rook-ceph-delete-bucket

[ysjeon71_kubeflow3@master ceph]$

[ysjeon71_kubeflow3@master ceph]$ k apply -f object-bucket-claim-delete.yaml

objectbucketclaim.objectbucket.io/ceph-delete-bucket created

[ysjeon71_kubeflow3@master ceph]$

$ export AWS_HOST=$(kubectl -n default get cm ceph-delete-bucket -o yaml | grep BUCKET_HOST | awk '{print $2}')

$ export AWS_ACCESS_KEY_ID=$(kubectl -n default get secret ceph-delete-bucket -o yaml | grep AWS_ACCESS_KEY_ID | awk '{print $2}' | base64 --decode)

$ export AWS_SECRET_ACCESS_KEY=$(kubectl -n default get secret ceph-delete-bucket -o yaml | grep AWS_SECRET_ACCESS_KEY | awk '{print $2}' | base64 --decode)

$ export AWS_ENDPOINT_IP=$(kubectl -n rook-ceph get svc rook-ceph-rgw-my-store -o yaml | grep clusterIP | awk '{print $2}')

$ export AWS_ENDPOINT_PORT=$(kubectl -n rook-ceph get svc rook-ceph-rgw-my-store -o yaml | grep -w port | awk '{print $2}')

$ export AWS_ENDPOINT=$AWS_ENDPOINT_IP:$AWS_ENDPOINT_PORT

$ export AWS_BUCKET=$(k get ObjectBucketClaim ceph-delete-bucket -o yaml | grep -w bucketName | awk '{print $2}')

[ysjeon71_kubeflow3@master ceph]$ echo $AWS_HOST

rook-ceph-rgw-my-store.rook-ceph

[ysjeon71_kubeflow3@master ceph]$ echo $AWS_ACCESS_KEY_ID

TWHS4NPWZRWLI5RQUS6H

[ysjeon71_kubeflow3@master ceph]$ echo $AWS_SECRET_ACCESS_KEY

zypu1DtYM4EdeqnpOzvReTFt64Rp1GrDGISGqLvP

[ysjeon71_kubeflow3@master ceph]$ echo $AWS_ENDPOINT

10.109.8.217:80

[ysjeon71_kubeflow3@master ceph]$ echo $AWS_BUCKET

ceph-bkt-70338ace-9fb9-44a1-8d80-37d20fc1864a

[ysjeon71_kubeflow3@master ceph]$

- Consume the Object Storage

To simplify the s3 client commands, you will want to set the four environment variables for use by your client (ie. inside the toolbox)

[ysjeon71_kubeflow3@master ceph]$ kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash

[root@rook-ceph-tools-58df7d6b5c-hk88n /]# yum --assumeyes install s3cmd

…

[root@rook-ceph-tools-58df7d6b5c-hk88n /]#export AWS_HOST=rook-ceph-rgw-my-store.rook-ceph

[root@rook-ceph-tools-58df7d6b5c-hk88n /]# export AWS_ENDPOINT=10.109.8.217:80

[root@rook-ceph-tools-58df7d6b5c-hk88n /]# export AWS_ACCESS_KEY_ID=TWHS4NPWZRWLI5RQUS6H

[root@rook-ceph-tools-58df7d6b5c-hk88n /]# export AWS_SECRET_ACCESS_KEY=zypu1DtYM4EdeqnpOzvReTFt64Rp1GrDGISGqLvP

[root@rook-ceph-tools-58df7d6b5c-hk88n /]# export AWS_BUCKET=ceph-bkt-70338ace-9fb9-44a1-8d80-37d20fc1864a

[root@rook-ceph-tools-58df7d6b5c-hk88n /]# s3cmd put /tmp/rookObj --no-ssl --host=${AWS_HOST} --host-bucket= s3://${AWS_BUCKET}

upload: '/tmp/rookObj' -> 's3://ceph-bkt-70338ace-9fb9-44a1-8d80-37d20fc1864a/rookObj' [1 of 1]

11 of 11 100% in 0s 25.44 B/s done

[root@rook-ceph-tools-58df7d6b5c-hk88n /]#

[root@rook-ceph-tools-58df7d6b5c-hk88n /]# s3cmd get s3://${AWS_BUCKET}/rookObj /tmp/rookObj-download --no-ssl --host=${AWS_HOST} --host-bucket=

download: 's3://ceph-bkt-70338ace-9fb9-44a1-8d80-37d20fc1864a/rookObj' -> '/tmp/rookObj-download' [1 of 1]

11 of 11 100% in 0s 733.53 B/s done

[root@rook-ceph-tools-58df7d6b5c-hk88n /]# cat /tmp/rookObj-download

Hello Rook

[root@rook-ceph-tools-58df7d6b5c-hk88n /]#

8. Prometheus Monitoring

- https://rook.io/docs/rook/v1.3/ceph-monitoring.html

- Prometheus Operator

the Prometheus operator needs to be started in the cluster so it can watch for our requests to start monitoring Rook and respond by deploying the correct Prometheus pods and configuration.

[ysjeon71_kubeflow3@master ~]$ kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/v0.26.0/bundle.yaml

clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator created

clusterrole.rbac.authorization.k8s.io/prometheus-operator created

deployment.apps/prometheus-operator created

serviceaccount/prometheus-operator created

[ysjeon71_kubeflow3@master ~]$

[ysjeon71_kubeflow3@master ~]$ k get pod --all-namespaces | grep prometheus-operator

default prometheus-operator-6bfb74db6-6msll 2/2 Running 2 2m9s

[ysjeon71_kubeflow3@master ~]$

- Prometheus Instances

we can create a service monitor that will watch the Rook cluster and collect metrics regularly.

[ysjeon71_kubeflow3@master monitoring]$ cd ~/rook/cluster/examples/kubernetes/ceph/monitoring

[ysjeon71_kubeflow3@master monitoring]$ kubectl create -f service-monitor.yaml

servicemonitor.monitoring.coreos.com/rook-ceph-mgr created

[ysjeon71_kubeflow3@master monitoring]$ kubectl create -f prometheus.yaml

serviceaccount/prometheus created

clusterrole.rbac.authorization.k8s.io/prometheus created

clusterrole.rbac.authorization.k8s.io/prometheus-rules created

clusterrolebinding.rbac.authorization.k8s.io/prometheus created

prometheus.monitoring.coreos.com/rook-prometheus created

[ysjeon71_kubeflow3@master monitoring]$ kubectl create -f prometheus-service.yaml

service/rook-prometheus created

[ysjeon71_kubeflow3@master monitoring]$

[ysjeon71_kubeflow3@master monitoring]$ kubectl -n rook-ceph get pod prometheus-rook-prometheus-0

NAME READY STATUS RESTARTS AGE

prometheus-rook-prometheus-0 3/3 Running 1 32s

[ysjeon71_kubeflow3@master monitoring]$

- Prometheus Web Console

[ysjeon71_kubeflow3@master monitoring]$ echo "http://$(kubectl -n rook-ceph -o jsonpath={.status.hostIP} get pod prometheus-rook-prometheus-0):30900"

http://10.142.0.4:30900

[ysjeon71_kubeflow3@master monitoring]$

[ysjeon71_kubeflow3@master monitoring]$ k get services rook-prometheus -n rook-ceph

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE

rook-prometheus NodePort 10.108.145.174 <none> 9090:30900/TCP 4m1s

[ysjeon71_kubeflow3@master monitoring]$

URI: http://34.75.82.128:30900/

- Prometheus Alerts

[ysjeon71_kubeflow3@master monitoring]$ cd ~/rook/cluster/examples/kubernetes/ceph/monitoring

kubectl apply -f rbac.yaml

kubectl edit ~/rook/cluster/examples/kubernetes/ceph/cluster.yaml -n rook-ceph

apiVersion: ceph.rook.io/v1

kind: CephCluster

metadata:

namespace: rook-ceph

[…]

spec:

[…]

monitoring:

enabled: true # 라인 추가

rulesNamespace: "rook-ceph"

[…]

- Grafana Dashboards

The following Grafana dashboards are available:

• Ceph - Cluster

• Ceph - OSD

• Ceph - Pools

9. Trouble shooting

- Case #1

Problem:

rook-ceph-mon-a-7d67c65584-m7nrf 배포시 오류(CrashLoopBackOff) 발생

$ kubectl apply -f cluster.yaml

…

$ kubectl get pods -n rook-ceph

…

rook-ceph-mon-a-7d67c65584-m7nrf 0/1 Init:CrashLoopBackOff 6 10m

…

Cause:

$ [ysjeon71_kubeflow3@master ~]$ k logs rook-ceph-mon-a-7d67c65584-m7nrf -n rook-ceph

Error from server (BadRequest): container "mon" in pod "rook-ceph-mon-a-7d67c65584-m7nrf" is waiting to start: PodInitializing

[ysjeon71_kubeflow3@master ~]$ k logs rook-ceph-mon-a-7d67c65584-m7nrf -n rook-ceph -c chown-container-data-dir

failed to change ownership of '/var/log/ceph' from root:root to ceph:ceph

failed to change ownership of '/var/lib/ceph/crash' from root:root to ceph:ceph

failed to change ownership of '/var/lib/ceph/mon/ceph-a' from root:root to ceph:ceph

chown: changing ownership of '/var/log/ceph': Permission denied

chown: changing ownership of '/var/lib/ceph/crash': Permission denied

chown: changing ownership of '/var/lib/ceph/mon/ceph-a': Permission denied

Solution:

[ysjeon71_kubeflow3@master ceph]$ k edit deployments rook-ceph-operator -n rook-ceph

…

- name: ROOK_HOSTPATH_REQUIRES_PRIVILEGED

value: "true" # false를 true로 변경

…

[ysjeon71_kubeflow3@master ~]$ k delete -f cluster.yaml

$ k apply -f cluster.yaml

- Case #2

Problem: OSD pods are not created on my devices

https://rook.io/docs/rook/v1.3/ceph-common-issues.html#osd-pods-are-not-created-on-my-devices

Cause:

if Rook determines that a device is not available (has existing partitions or a formatted file system), Rook will skip consuming the devices.

[ysjeon71_kubeflow3@worker-1 ~]$ kubectl -n rook-ceph get pod -l app=rook-ceph-osd-prepare

…

# view the logs for the node of interest in the "provision" container

[ysjeon71_kubeflow3@worker-1 ~]$ k logs rook-ceph-osd-prepare-worker-1-2lzf6 -n rook-ceph

…

2020-05-08 00:06:45.931260 I | cephosd: skipping device "sda1" because it contains a filesystem "vfat"

2020-05-08 00:06:45.931264 I | cephosd: skipping device "sda2" because it contains a filesystem "xfs"

2020-05-08 00:06:45.931268 I | cephosd: skipping device "sdb" because it contains a filesystem "ext4"

2020-05-08 00:06:45.958256 I | cephosd: configuring osd devices: {"Entries":{}}

…

[ysjeon71_kubeflow3@worker-1 ~]

[ysjeon71_kubeflow3@worker-1 ~]$ lsblk -f

NAME FSTYPE LABEL UUID MOUNTPOINT

sda

├─sda1 vfat 79E8-651E /boot/efi

└─sda2 xfs root 8bf9cd93-b3d1-421e-ba67-5fd6e4189f3d /

sdb ext4 5bde26dc-d40f-41fb-88c1-c6c464e6b785 /user1

[ysjeon71_kubeflow3@worker-1 ~]$

Solution:

[root@worker-1 ~]# wipefs -a /dev/sdb -f

/dev/sdb: 2 bytes were erased at offset 0x00000438 (ext4): 53 ef

[root@worker-1 ~]# lsblk -f

NAME FSTYPE LABEL UUID MOUNTPOINT

sda

├─sda1 vfat 79E8-651E /boot/efi

└─sda2 xfs root 8bf9cd93-b3d1-421e-ba67-5fd6e4189f3d /

sdb /user1

[root@worker-1 ~]# umount /user1

[ysjeon71_kubeflow3@master ~]$ k delete pod rook-ceph-operator-7fc446864f-fjq5p -n rook-ceph

[root@worker-1 ~]# lsblk -f

NAME FSTYPE LABEL UUID MOUNTPOINT

sda

├─sda1 vfat 79E8-651E /boot/efi

└─sda2 xfs root 8bf9cd93-b3d1-421e-ba67-5fd6e4189f3d /

sdb LVM2_member XGnpK0-KKLt-yEXY-bq2W-JN1v-DIT7-W2wkMX

└─ceph--2c6e6c30--1ad8--4d12--9459--6ef9a9517363-osd--data--4bd16688--5f51--402d--a934--6a20136dc459

Worker-2, Worker-3 노드에 Disk (10G) 추가

yoosungjeon@ysjeon-MacBook-Pro ~ % gcloud compute disks create disk-2 --size 10G --zone=us-east1-d

yoosungjeon@ysjeon-MacBook-Pro ~ % gcloud compute disks create disk-3 --size 10G --zone=us-east1-d

yoosungjeon@ysjeon-MacBook-Pro ~ % gcloud compute instances attach-disk worker-2 --disk disk-2 --zone=us-east1-d

yoosungjeon@ysjeon-MacBook-Pro ~ % gcloud compute instances attach-disk worker-3 --disk disk-3 --zone=us-east1-d

- Case #3

Problem: kubectl patch 명령어에서 에러 발생

$ k patch CephCluster rook-ceph -n rook-ceph -p '{ "spec": { "dashboard": { "urlPrefix": "/ceph-dashboard" } } }'

Error from server (UnsupportedMediaType): the body of the request was in an unknown format - accepted media types include: application/json-patch+json, application/merge-patch+json

$ k patch CephCluster rook-ceph -n rook-ceph -p '{ "spec": { "dashboard": { "ssl": "false" } } }'

Error from server (UnsupportedMediaType): the body of the request was in an unknown format - accepted media types include: application/json-patch+json, application/merge-patch+json

Solution:

kubectl edit 명령여로 작업

- Case #4

Problem: cluster.yaml, storageclass.storage.k8s.io 등을 삭제시 블럭 됨

$ k delete -f cluster.yaml

cephcluster.ceph.rook.io "rook-ceph" deleted

…

$ k delete -f storageclass-block.yaml

cephblockpool.ceph.rook.io "rook-ceph-block-pool-iap" deleted

storageclass.storage.k8s.io "rook-ceph-block-sc-iap" deleted

…

Solution:

$ k delete pod rook-ceph-operator-6c68bb688-jngnn -n rook-ceph

pod "rook-ceph-operator-6c68bb688-jngnn" deleted

- Case #5

Problem: "rpc error: code = Unknown desc = context canceled"

[iap@iap01 ceph]$ k describe pod csi-cephfsplugin-provisioner-7487dcb679-6lvx6 -n rook-ceph | grep -i events -A 15

Events:

Type Reason Age From Message

---- ------ ---- ---- -------

...

Warning Failed 53s (x2 over 7m25s) kubelet, iap09 Failed to pull image "quay.io/k8scsi/csi-resizer:v0.4.0": rpc error: code = Unknown desc = context canceled

Warning Failed <invalid> (x2 over 5m14s) kubelet, iap09 Error: ErrImagePull

Normal Pulling <invalid> (x3 over 11m) kubelet, iap09 Pulling image "quay.io/k8scsi/csi-attacher:v2.1.0"

Warning Failed <invalid> (x2 over 5m14s) kubelet, iap09 Failed to pull image "quay.io/k8scsi/csi-provisioner:v1.4.0": rpc error: code = Unknown desc = context canceled

[iap@iap01 ceph]$

Cause:

[root@iap07 ~]# docker pull quay.io/k8scsi/csi-resizer:v0.4.0

Trying to pull repository quay.io/k8scsi/csi-resizer ...

v0.4.0: Pulling from quay.io/k8scsi/csi-resizer

9ff2acc3204b: Downloading

0d3d64020a22: Downloading

dial tcp 13.225.112.61:443: i/o timeout

Solution:

cdn02.quay.io(13.225.112.61)의 IP가 변경되어 방화벽 룰 변경 요청

- Case #6

Problem:

driver name rook-ceph.rbd.csi.ceph.com not found in the list of registered CSI drivers

driver name rook-ceph.cephfs.csi.ceph.com not found in the list of registered CSI drivers

[iap@iap01 rook-storage-yaml]$ k get pod

NAME READY STATUS RESTARTS AGE

wordpress-mysql-6cc97b86fc-lrqgd 0/1 ContainerCreating 0 75s

[iap@iap01 ~]$ k describe pod wordpress-mysql-6cc97b86fc-lrqgd | grep Events -A 10

Events:

Type Reason Age From Message

---- ------ ---- ---- -------

Warning FailedScheduling 19s (x2 over 20s) default-scheduler pod has unbound immediate PersistentVolumeClaims (repeated 5 times)

Normal Scheduled 16s default-scheduler Successfully assigned default/wordpress-mysql-6cc97b86fc-lrqgd to iap09

Normal SuccessfulAttachVolume 16s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-2518fd30-5c45-4887-ba6c-accb947745a5"

Warning FailedMount 6s (x5 over 14s) kubelet, iap09 MountVolume.MountDevice failed for volume "pvc-2518fd30-5c45-4887-ba6c-accb947745a5" : driver name rook-ceph.rbd.csi.ceph.com not found in the list of registered CSI drivers

[iap@iap01 ~]$

Solution:

Rook v1.3.0-beta 사용시 On-Prem 환경에서 위 에러가 발생되었으나, GKE에서는 정상 동작

Rook v1.3.6으로 버전 업하여 해결

'Kubernetes > Storage' 카테고리의 다른 글

Rook Ceph - OSD autoout (0)	2021.09.16
Rook Ceph - failed to get status (0)	2021.09.16
Rook-ceph - OSD/K8s Node 제거 (0)	2021.09.15
Rook ceph vs NFS (3)	2021.09.15
NFS-Client Provisioner (0)	2021.09.15

일주일만 하면 ...

Rook Ceph 구성

'Kubernetes > Storage' 카테고리의 다른 글

댓글

티스토리툴바

Rook Ceph 구성

'Kubernetes > Storage' 카테고리의 다른 글

관련글

댓글

티스토리툴바