2020.05.25
1. 구성 환경
- Rook v1.3.6, ceph image version: "14.2.10-0 nautilus”, cephcsi v2.1.2, kubenetes 1.16.15, CentOS 7.8
2. Rook / Ceph ?
- Rook is an open source cloud-native storage orchestrator, providing the platform, framework, and support for a diverse set of storage solutions to natively integrate with cloud-native environments.
https://rook.io/docs/rook/v1.3/ceph-examples.html
- Ceph CSI (Container Storage Interface)
Ceph CSI plugins implement an interface between CSI enabled Container Orchestrator (CO) and Ceph cluster. It allows dynamically provisioning Ceph volumes and attaching them to workloads.
- Independent CSI plugins are provided to support RBD and CephFS backed volumes
- Ceph is an open-source software storage platform, implements object storage on a single distributed computer cluster, and provides 3in1 interfaces for: object-, block- and file-level storage.
https://www.cloudops.com/2019/05/the-ultimate-rook-and-ceph-survival-guide/ ***
https://docs.ceph.com/docs/master/architecture/
- 주요 프로세스
ceph-mon: Cluster monitor로 active / failed node 확인하는 역할을 수행하며 ceph storage cluster map의 master copy를 유지
ceph-mds: Metadata server로 inode와 디렉토리들의 메타데이터(filesystem의 디렉토리 및 파일이름, RADOS cluster에 저장된 object로의 매핑정보) 를 저장
ceph-osd: Object storage devices. 실제 파일 내용을 저장하고 OSD의 상태를 확인해서 monitor에 알려주는 역할도 수행
ceph-rgw: RESTful gateways. Object storage layer를 외부에 노출시키기 위한 인터페이스
3. Ceph Prerequisites
- https://rook.io/docs/rook/v1.3/ceph-prerequisites.html
- Ceph OSDs have a dependency on LVM in the following scenarios:
OSDs are created on raw devices or partitions
If encryption is enabled (encryptedDevice: true in the cluster CR)
A metadata device is specified
$ sudo yum install -y lvm2
- Ceph requires a Linux kernel built with the RBD(RADOS Block Device) module. - RADOS : Reliable Autonomic Distributed Object Store
# lsmod | grep rbd
# modprobe rbd
- If you will be creating volumes from a Ceph shared file system (CephFS), the recommended minimum kernel version is 4.17.
If you have a kernel version less than 4.17, the requested PVC sizes will not be enforced. Storage quotas will only be enforced on newer kernels.
- NTP 설정 (On-Prem 일 경우)
# yum install chrony -y
# vi /etc/chrony.conf # KT GiGATechHub내에 서버가 있는 경우 변경 및 방화벽 허용 요청
server time.google.com iburst
server time.kriss.re.kr iburst
server time.bora.net iburst
…
# systemctl enable chronyd && systemctl start chronyd
# chronyc sources
…
# timedatectl set-timezone Asia/Seoul
- Rook Ceph 재 설치 시 Worker node에서 실행
# rm -rf /var/lib/rook/ /var/lib/kubelet/plugins/rook-ceph* /var/lib/kubelet/plugins_registry/rook-ceph*
# lvscan
…
# lvremove ceph-… # Ceph가 생성했던 lvm 삭제
# vgremove …
# pvremove …
# wipefs -a /dev/sdc # Device or resource busy 에러 발생되면 reboot후 재 실행
4. Rook Install
- Ceph Storage Quickstart
- https://rook.io/docs/rook/v1.3/ceph-quickstart.html
- https://ruzickap.github.io/k8s-istio-workshop/lab-04/
a. Download YAML files
[ysjeon71_kubeflow3@master ~]$ git clone --single-branch -b release-1.3 https://github.com/rook/rook.git
…
[ysjeon71_kubeflow3@master ~]$ cd ~/rook/cluster/examples/kubernetes/ceph/
b. Common Resources
[ysjeon71_kubeflow3@master ~]$ kubectl apply -f common.yaml
…
[ysjeon71_kubeflow3@master ~]$
c. Operator
[ysjeon71_kubeflow3@master ~]$ vi operator.yaml
…
# Whether to start pods as privileged that mount a host path, which includes the Ceph mon and osd pods.
# This is necessary to workaround the anyuid issues when running on OpenShift.
# For more details see https://github.com/rook/rook/issues/1314#issuecomment-355799641
- name: ROOK_HOSTPATH_REQUIRES_PRIVILEGED
value: "true" # false를 true로 변경
…
[ysjeon71_kubeflow3@master ~]$ kubectl apply -f operator.yaml
configmap/rook-ceph-operator-config created
deployment.apps/rook-ceph-operator created
[ysjeon71_kubeflow3@master ~]$
## verify the rook-ceph-operator is in the `Running` state before proceeding
[ysjeon71_kubeflow3@master ~]$ k get pod -o wide -n rook-ceph
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
rook-ceph-operator-7d99d768f4-bs8q9 1/1 Running 0 2m19s 10.46.0.1 worker-3 <none> <none>
rook-discover-h6s7b 1/1 Running 0 103s 10.32.0.17 worker-1 <none> <none>
rook-discover-mvqgs 1/1 Running 0 103s 10.38.0.16 worker-2 <none> <none>
rook-discover-t5d2h 1/1 Running 0 103s 10.46.0.2 worker-3 <none> <none>
[ysjeon71_kubeflow3@master ~]$
d. Cluster CRD
- let’s create your Ceph storage cluster
• cluster.yaml: Cluster settings for a production cluster running on bare metal
• cluster-on-pvc.yaml: Cluster settings for a production cluster running in a dynamic cloud environment
• cluster-test.yaml: Cluster settings for a test environment such as minikube.
[ysjeon71_kubeflow3@master ~]$ kubectl apply -f cluster.yaml
…
[ysjeon71_kubeflow3@master ~]$
[ysjeon71_kubeflow3@master ~]$ k get pods -n rook-ceph -o wide | cut -c-115 | egrep "NAME|rook-ceph-mgr|rook-ceph-mon|ook-ceph-osd-[0-9]"
NAME READY STATUS RESTARTS AGE IP NODE
rook-ceph-mgr-a-f55d5d7-zvr45 1/1 Running 1 4h45m 10.32.0.20 worker-1
rook-ceph-mon-a-69c75c5bdd-ktrq9 1/1 Running 1 22h 10.38.0.17 worker-2
rook-ceph-mon-d-697d6bf679-5crf2 1/1 Running 0 4h45m 10.32.0.18 worker-1
rook-ceph-mon-e-5ccfcb6fcf-xkhx9 1/1 Running 0 4h45m 10.46.0.5 worker-3
rook-ceph-osd-0-7bdf4b8597-xcnx4 1/1 Running 0 31m 10.32.0.21 worker-1
rook-ceph-osd-1-7dc576d9fb-qsnss 1/1 Running 0 9m46s 10.38.0.22 worker-2
rook-ceph-osd-2-565bbb875f-rztrb 1/1 Running 0 93s 10.46.0.7 worker-3
[ysjeon71_kubeflow3@master ~]$ k logs rook-ceph-operator-7fc446864f-v6mgt -n rook-ceph | egrep "E \||W \|"
2020-05-08 04:47:47.409147 W | ceph-csi: CSI Block volume expansion requires Kubernetes version >=1.16.0
[ysjeon71_kubeflow3@master ~]$
5. Rook Toolbox
- The Rook toolbox is a container with common tools used for rook debugging and testing.
- https://rook.io/docs/rook/v1.3/ceph-toolbox.html
[ysjeon71_kubeflow3@master ceph]$ k apply -f toolbox.yaml
deployment.apps/rook-ceph-tools created
[ysjeon71_kubeflow3@master ceph]$ kubectl -n rook-ceph get pod -l "app=rook-ceph-tools"
NAME READY STATUS RESTARTS AGE
rook-ceph-tools-58df7d6b5c-j4gfv 1/1 Running 0 32s
[ysjeon71_kubeflow3@master ceph]$
[ysjeon71_kubeflow3@master ceph]$ kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash
[root@rook-ceph-tools-58df7d6b5c-j4gfv /]# ceph status
cluster:
id: cb0faddd-9ceb-4ee8-87ff-e4ade239182d
health: HEALTH_OK
services:
mon: 3 daemons, quorum a,d,e (age 4h)
mgr: a(active, since 3m)
osd: 3 osds: 3 up (since 2m), 3 in (since 2m)
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 3.0 GiB used, 27 GiB / 30 GiB avail
pgs:
[root@rook-ceph-tools-58df7d6b5c-j4gfv /]# ceph osd status
+----+----------+-------+-------+--------+---------+--------+---------+-----------+
| id | host | used | avail | wr ops | wr data | rd ops | rd data | state |
+----+----------+-------+-------+--------+---------+--------+---------+-----------+
| 0 | worker-1 | 1025M | 9210M | 0 | 0 | 0 | 0 | exists,up |
| 1 | worker-2 | 1025M | 9210M | 0 | 0 | 0 | 0 | exists,up |
| 2 | worker-3 | 1025M | 9210M | 0 | 0 | 0 | 0 | exists,up |
+----+----------+-------+-------+--------+---------+--------+---------+-----------+
[root@rook-ceph-tools-58df7d6b5c-j4gfv /]# ceph df
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 30 GiB 27 GiB 5.4 MiB 3.0 GiB 10.02
TOTAL 30 GiB 27 GiB 5.4 MiB 3.0 GiB 10.02
POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL
[root@rook-ceph-tools-58df7d6b5c-j4gfv /]# rados df
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR
total_objects 0
total_used 3.0 GiB
total_avail 27 GiB
total_space 30 GiB
[root@rook-ceph-tools-58df7d6b5c-j4gfv /]#
6. Ceph Dashboard
- https://rook.io/docs/rook/v1.3/ceph-dashboard.html
[ysjeon71_kubeflow3@master ~]$ k edit CephCluster rook-ceph -n rook-ceph
…
dashboard:
enabled: true
ssl: false # true -> false로 변경
urlPrefix: /ceph-dashboard # 추가
…
[ysjeon71_kubeflow3@master ceph]$ cd ~/rook/cluster/examples/kubernetes/ceph/
[ysjeon71_kubeflow3@master ceph]$ ls dashboard-external-http*
dashboard-external-http.yaml dashboard-external-https.yaml
[ysjeon71_kubeflow3@master ceph]$ k apply -f dashboard-external-http.yaml
service/rook-ceph-mgr-dashboard-external-http created
[ysjeon71_kubeflow3@master ceph]$
[ysjeon71_kubeflow3@master ceph]$ kubectl -n rook-ceph get service | egrep "NAME|rook-ceph-mgr-dashboard-external"
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
rook-ceph-mgr-dashboard-external-http NodePort 10.110.229.161 <none> 7000:32667/TCP 3m44s
[ysjeon71_kubeflow3@master ceph]$
yoosungjeon@ysjeon-MacBook-Pro ~ % gcloud compute firewall-rules create allow-rook-ceph-mgr-dashboard-rule --allow=tcp:32667
yoosungjeon@ysjeon-MacBook-Pro ~ % gcloud compute instances list
NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
master us-east1-b n1-standard-2 10.142.0.2 34.75.83.180 RUNNING
worker-1 us-east1-c custom (1 vCPU, 4.75 GiB) 10.142.0.3 35.227.3.25 RUNNING
worker-2 us-east1-d custom (1 vCPU, 4.75 GiB) 10.142.0.4 34.75.168.175 RUNNING
worker-3 us-east1-d n1-standard-1 10.142.0.5 34.75.65.90 RUNNING
yoosungjeon@ysjeon-MacBook-Pro ~ %
URL:
http://35.227.3.25:32667/ceph-dashboard/
Username: admin
Password: Jq-|.Fgu"g@;I:T1*:@\
[ysjeon71_kubeflow3@master ~]$ kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath="{['data']['password']}" | base64 --decode && echo
Jq-|.Fgu"g@;I:T1*:@\
[ysjeon71_kubeflow3@master ~]$
7. Storage
- Ceph storage
• Block: Create block storage to be consumed by a pod
• Shared Filesystem: Create a filesystem to be shared across multiple pods
• Object: Create an object store that is accessible inside or outside the Kubernetes cluster
a. Block
https://rook.io/docs/rook/v1.3/ceph-block.html
CephBlockPool -> storageClass -> PersistentVolumeClaim -> volumes
Each OSD must be located on a different node, because the failureDomain is set to host and the replicated.size is set to 3.
- Provision Storage
[ysjeon71_kubeflow3@master ~]$ cd rook/cluster/examples/kubernetes/ceph/
[ysjeon71_kubeflow3@master ~]$ vi storageclass.yaml
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
name: replicapool
namespace: rook-ceph
spec:
failureDomain: host
replicated:
size: 3
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: rook-ceph-block
# Change "rook-ceph" provisioner prefix to match the operator namespace if needed
provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
# clusterID is the namespace where the rook cluster is running
clusterID: rook-ceph
# Ceph pool into which the RBD image shall be created
pool: replicapool
# RBD image format. Defaults to "2".
imageFormat: "2"
# RBD image features. Available for imageFormat: "2". CSI RBD currently supports only `layering` feature.
imageFeatures: layering
# The secrets contain Ceph admin credentials.
csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
# Specify the filesystem type of the volume. If not specified, csi-provisioner
# will set default as `ext4`.
csi.storage.k8s.io/fstype: xfs
# Delete the rbd volume when a PVC is deleted
reclaimPolicy: Delete
[ysjeon71_kubeflow3@master ~] k apply -f storageclass.yaml
- Consume the storage: Wordpress sample
[ysjeon71_kubeflow3@master ceph]$ cd ~/rook/cluster/examples/kubernetes/
[ysjeon71_kubeflow3@master ceph]$ cat mysql.yaml
…
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mysql-pv-claim
labels:
app: wordpress
spec:
storageClassName: rook-ceph-block
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
---
apiVersion: apps/v1
kind: Deployment
…
spec:
containers:
- image: mysql:5.6
…
volumeMounts:
- name: mysql-persistent-storage
mountPath: /var/lib/mysql
volumes:
- name: mysql-persistent-storage
persistentVolumeClaim:
claimName: mysql-pv-claim
[ysjeon71_kubeflow3@master kubernetes]$ k apply -f mysql.yaml
service/wordpress-mysql created
persistentvolumeclaim/mysql-pv-claim created
deployment.apps/wordpress-mysql created
[ysjeon71_kubeflow3@master kubernetes]$ k apply -f wordpress.yaml
service/wordpress created
persistentvolumeclaim/wp-pv-claim created
deployment.apps/wordpress created
[ysjeon71_kubeflow3@master kubernetes]$ k get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
mysql-pv-claim Bound pvc-17fedd75-3998-4b09-bc13-de112afd8ab7 1Gi RWO rook-ceph-block 16m
wp-pv-claim Bound pvc-adf8d37b-8882-4568-badf-fead0f499a34 1Gi RWO rook-ceph-block 3m10s
[ysjeon71_kubeflow3@master kubernetes]$
b. Shared Filesystem
https://rook.io/docs/rook/v1.3/ceph-filesystem.html
By default only one shared filesystem can be created with Rook.
CephFilesystem -> storageClass -> PersistentVolumeClaim -> volumes
- Create the Filesystem
Create the filesystem by specifying the desired settings for the metadata pool, data pools, and metadata server in the CephFilesystem CRD.
[ysjeon71_kubeflow3@master ~]]$ cd rook/cluster/examples/kubernetes/ceph/
[ysjeon71_kubeflow3@master ceph]$ cat filesystem.yaml
apiVersion: ceph.rook.io/v1
kind: CephFilesystem
metadata:
name: myfs
namespace: rook-ceph
spec:
metadataPool:
replicated:
size: 3
dataPools:
- replicated:
size: 3
preservePoolsOnDelete: true
metadataServer:
activeCount: 1
activeStandby: true
[ysjeon71_kubeflow3@master ceph]$ k apply -f filesystem.yaml
cephfilesystem.ceph.rook.io/myfs created
[ysjeon71_kubeflow3@master ceph]$ kubectl -n rook-ceph get pod -l app=rook-ceph-mds
NAME READY STATUS RESTARTS AGE
rook-ceph-mds-myfs-a-78d59c77c7-ndps7 1/1 Running 0 10m
rook-ceph-mds-myfs-b-9b59d896c-q898t 1/1 Running 0 10m
[ysjeon71_kubeflow3@master ceph]$
- Provision Storage
[ysjeon71_kubeflow3@master ceph]$ vi storageclass_fs.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: rook-cephfs
# Change "rook-ceph" provisioner prefix to match the operator namespace if needed
provisioner: rook-ceph.cephfs.csi.ceph.com
parameters:
# clusterID is the namespace where operator is deployed.
clusterID: rook-ceph
# CephFS filesystem name into which the volume shall be created
fsName: myfs
# Ceph pool into which the volume shall be created
# Required for provisionVolume: "true"
pool: myfs-data0
# Root path of an existing CephFS volume
# Required for provisionVolume: "false"
# rootPath: /absolute/path
# The secrets contain Ceph admin credentials. These are generated automatically by the operator
# in the same namespace as the cluster.
csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
reclaimPolicy: Delete
[ysjeon71_kubeflow3@master ceph]$ k apply -f storageclass_fs.yaml
storageclass.storage.k8s.io/rook-cephfs created
[ysjeon71_kubeflow3@master ceph]$ k get StorageClass
NAME PROVISIONER AGE
rook-ceph-block rook-ceph.rbd.csi.ceph.com 3h47m
rook-cephfs rook-ceph.cephfs.csi.ceph.com 53s
[ysjeon71_kubeflow3@master ceph]$
- Consume the Shared Filesystem: K8s Registry Sample
As an example, we will start the kube-registry pod with the shared filesystem as the backing store
$ cd ~/rook/cluster/examples/kubernetes/ceph/csi/cephfs
$ cat kube-registry.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: cephfs-pvc
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
storageClassName: rook-cephfs
---
apiVersion: apps/v1
kind: Deployment
…
spec:
containers:
- name: registry
image: registry:2
…
volumeMounts:
- name: image-store
mountPath: /var/lib/registry
…
volumes:
- name: image-store
persistentVolumeClaim:
claimName: cephfs-pvc
readOnly: false
$
[ysjeon71_kubeflow3@master cephfs]$ k apply -f kube-registry.yaml
persistentvolumeclaim/cephfs-pvc created
deployment.apps/kube-registry created
[ysjeon71_kubeflow3@master cephfs]$
c. Object storage
https://rook.io/docs/rook/v1.3/ceph-object.html
Object storage exposes an S3 API to the storage cluster for applications to put and get data.
CephObjectStore -> storageClass -> ObjectBucketClaim
- Create an Object Store
$ cd ~/rook/cluster/examples/kubernetes/ceph
[ysjeon71_kubeflow3@master ceph]$ k apply -f object.yaml
cephobjectstore.ceph.rook.io/my-store created
[ysjeon71_kubeflow3@master ceph]$
# To confirm the object store is configured, wait for the rgw pod to start
[ysjeon71_kubeflow3@master ceph]$ kubectl -n rook-ceph get pod -l app=rook-ceph-rgw
NAME READY STATUS RESTARTS AGE
rook-ceph-rgw-my-store-a-689745fc44-dnlbm 1/1 Running 0 40s
[ysjeon71_kubeflow3@master ceph]$
- Create a Bucket
[ysjeon71_kubeflow3@master ceph]$ k apply -f storageclass-bucket-delete.yaml
storageclass.storage.k8s.io/rook-ceph-delete-bucket created
[ysjeon71_kubeflow3@master ceph]$
A secret and ConfigMap are created with the same name as the OBC and in the same namespace
[ysjeon71_kubeflow3@master ceph]$ cat object-bucket-claim-delete.yaml
apiVersion: objectbucket.io/v1alpha1
kind: ObjectBucketClaim
metadata:
name: ceph-delete-bucket
spec:
generateBucketName: ceph-bkt
storageClassName: rook-ceph-delete-bucket
[ysjeon71_kubeflow3@master ceph]$
[ysjeon71_kubeflow3@master ceph]$ k apply -f object-bucket-claim-delete.yaml
objectbucketclaim.objectbucket.io/ceph-delete-bucket created
[ysjeon71_kubeflow3@master ceph]$
$ export AWS_HOST=$(kubectl -n default get cm ceph-delete-bucket -o yaml | grep BUCKET_HOST | awk '{print $2}')
$ export AWS_ACCESS_KEY_ID=$(kubectl -n default get secret ceph-delete-bucket -o yaml | grep AWS_ACCESS_KEY_ID | awk '{print $2}' | base64 --decode)
$ export AWS_SECRET_ACCESS_KEY=$(kubectl -n default get secret ceph-delete-bucket -o yaml | grep AWS_SECRET_ACCESS_KEY | awk '{print $2}' | base64 --decode)
$ export AWS_ENDPOINT_IP=$(kubectl -n rook-ceph get svc rook-ceph-rgw-my-store -o yaml | grep clusterIP | awk '{print $2}')
$ export AWS_ENDPOINT_PORT=$(kubectl -n rook-ceph get svc rook-ceph-rgw-my-store -o yaml | grep -w port | awk '{print $2}')
$ export AWS_ENDPOINT=$AWS_ENDPOINT_IP:$AWS_ENDPOINT_PORT
$ export AWS_BUCKET=$(k get ObjectBucketClaim ceph-delete-bucket -o yaml | grep -w bucketName | awk '{print $2}')
[ysjeon71_kubeflow3@master ceph]$ echo $AWS_HOST
rook-ceph-rgw-my-store.rook-ceph
[ysjeon71_kubeflow3@master ceph]$ echo $AWS_ACCESS_KEY_ID
TWHS4NPWZRWLI5RQUS6H
[ysjeon71_kubeflow3@master ceph]$ echo $AWS_SECRET_ACCESS_KEY
zypu1DtYM4EdeqnpOzvReTFt64Rp1GrDGISGqLvP
[ysjeon71_kubeflow3@master ceph]$ echo $AWS_ENDPOINT
10.109.8.217:80
[ysjeon71_kubeflow3@master ceph]$ echo $AWS_BUCKET
ceph-bkt-70338ace-9fb9-44a1-8d80-37d20fc1864a
[ysjeon71_kubeflow3@master ceph]$
- Consume the Object Storage
To simplify the s3 client commands, you will want to set the four environment variables for use by your client (ie. inside the toolbox)
[ysjeon71_kubeflow3@master ceph]$ kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash
[root@rook-ceph-tools-58df7d6b5c-hk88n /]# yum --assumeyes install s3cmd
…
[root@rook-ceph-tools-58df7d6b5c-hk88n /]#export AWS_HOST=rook-ceph-rgw-my-store.rook-ceph
[root@rook-ceph-tools-58df7d6b5c-hk88n /]# export AWS_ENDPOINT=10.109.8.217:80
[root@rook-ceph-tools-58df7d6b5c-hk88n /]# export AWS_ACCESS_KEY_ID=TWHS4NPWZRWLI5RQUS6H
[root@rook-ceph-tools-58df7d6b5c-hk88n /]# export AWS_SECRET_ACCESS_KEY=zypu1DtYM4EdeqnpOzvReTFt64Rp1GrDGISGqLvP
[root@rook-ceph-tools-58df7d6b5c-hk88n /]# export AWS_BUCKET=ceph-bkt-70338ace-9fb9-44a1-8d80-37d20fc1864a
[root@rook-ceph-tools-58df7d6b5c-hk88n /]# s3cmd put /tmp/rookObj --no-ssl --host=${AWS_HOST} --host-bucket= s3://${AWS_BUCKET}
upload: '/tmp/rookObj' -> 's3://ceph-bkt-70338ace-9fb9-44a1-8d80-37d20fc1864a/rookObj' [1 of 1]
11 of 11 100% in 0s 25.44 B/s done
[root@rook-ceph-tools-58df7d6b5c-hk88n /]#
[root@rook-ceph-tools-58df7d6b5c-hk88n /]# s3cmd get s3://${AWS_BUCKET}/rookObj /tmp/rookObj-download --no-ssl --host=${AWS_HOST} --host-bucket=
download: 's3://ceph-bkt-70338ace-9fb9-44a1-8d80-37d20fc1864a/rookObj' -> '/tmp/rookObj-download' [1 of 1]
11 of 11 100% in 0s 733.53 B/s done
[root@rook-ceph-tools-58df7d6b5c-hk88n /]# cat /tmp/rookObj-download
Hello Rook
[root@rook-ceph-tools-58df7d6b5c-hk88n /]#
8. Prometheus Monitoring
- https://rook.io/docs/rook/v1.3/ceph-monitoring.html
- Prometheus Operator
the Prometheus operator needs to be started in the cluster so it can watch for our requests to start monitoring Rook and respond by deploying the correct Prometheus pods and configuration.
[ysjeon71_kubeflow3@master ~]$ kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/v0.26.0/bundle.yaml
clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator created
clusterrole.rbac.authorization.k8s.io/prometheus-operator created
deployment.apps/prometheus-operator created
serviceaccount/prometheus-operator created
[ysjeon71_kubeflow3@master ~]$
[ysjeon71_kubeflow3@master ~]$ k get pod --all-namespaces | grep prometheus-operator
default prometheus-operator-6bfb74db6-6msll 2/2 Running 2 2m9s
[ysjeon71_kubeflow3@master ~]$
- Prometheus Instances
we can create a service monitor that will watch the Rook cluster and collect metrics regularly.
[ysjeon71_kubeflow3@master monitoring]$ cd ~/rook/cluster/examples/kubernetes/ceph/monitoring
[ysjeon71_kubeflow3@master monitoring]$ kubectl create -f service-monitor.yaml
servicemonitor.monitoring.coreos.com/rook-ceph-mgr created
[ysjeon71_kubeflow3@master monitoring]$ kubectl create -f prometheus.yaml
serviceaccount/prometheus created
clusterrole.rbac.authorization.k8s.io/prometheus created
clusterrole.rbac.authorization.k8s.io/prometheus-rules created
clusterrolebinding.rbac.authorization.k8s.io/prometheus created
prometheus.monitoring.coreos.com/rook-prometheus created
[ysjeon71_kubeflow3@master monitoring]$ kubectl create -f prometheus-service.yaml
service/rook-prometheus created
[ysjeon71_kubeflow3@master monitoring]$
[ysjeon71_kubeflow3@master monitoring]$ kubectl -n rook-ceph get pod prometheus-rook-prometheus-0
NAME READY STATUS RESTARTS AGE
prometheus-rook-prometheus-0 3/3 Running 1 32s
[ysjeon71_kubeflow3@master monitoring]$
- Prometheus Web Console
[ysjeon71_kubeflow3@master monitoring]$ echo "http://$(kubectl -n rook-ceph -o jsonpath={.status.hostIP} get pod prometheus-rook-prometheus-0):30900"
[ysjeon71_kubeflow3@master monitoring]$
[ysjeon71_kubeflow3@master monitoring]$ k get services rook-prometheus -n rook-ceph
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
rook-prometheus NodePort 10.108.145.174 <none> 9090:30900/TCP 4m1s
[ysjeon71_kubeflow3@master monitoring]$
URI: http://34.75.82.128:30900/
- Prometheus Alerts
[ysjeon71_kubeflow3@master monitoring]$ cd ~/rook/cluster/examples/kubernetes/ceph/monitoring
kubectl apply -f rbac.yaml
kubectl edit ~/rook/cluster/examples/kubernetes/ceph/cluster.yaml -n rook-ceph
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
name: rook-ceph
namespace: rook-ceph
[…]
spec:
[…]
monitoring:
enabled: true # 라인 추가
rulesNamespace: "rook-ceph"
[…]
- Grafana Dashboards
The following Grafana dashboards are available:
9. Trouble shooting
- Case #1
Problem:
rook-ceph-mon-a-7d67c65584-m7nrf 배포시 오류(CrashLoopBackOff) 발생
$ kubectl apply -f cluster.yaml
…
$ kubectl get pods -n rook-ceph
…
rook-ceph-mon-a-7d67c65584-m7nrf 0/1 Init:CrashLoopBackOff 6 10m
…
$
Cause:
$ [ysjeon71_kubeflow3@master ~]$ k logs rook-ceph-mon-a-7d67c65584-m7nrf -n rook-ceph
Error from server (BadRequest): container "mon" in pod "rook-ceph-mon-a-7d67c65584-m7nrf" is waiting to start: PodInitializing
[ysjeon71_kubeflow3@master ~]$ k logs rook-ceph-mon-a-7d67c65584-m7nrf -n rook-ceph -c chown-container-data-dir
failed to change ownership of '/var/log/ceph' from root:root to ceph:ceph
failed to change ownership of '/var/lib/ceph/crash' from root:root to ceph:ceph
failed to change ownership of '/var/lib/ceph/mon/ceph-a' from root:root to ceph:ceph
chown: changing ownership of '/var/log/ceph': Permission denied
chown: changing ownership of '/var/lib/ceph/crash': Permission denied
chown: changing ownership of '/var/lib/ceph/mon/ceph-a': Permission denied
Solution:
[ysjeon71_kubeflow3@master ceph]$ k edit deployments rook-ceph-operator -n rook-ceph
…
- name: ROOK_HOSTPATH_REQUIRES_PRIVILEGED
value: "true" # false를 true로 변경
…
[ysjeon71_kubeflow3@master ~]$ k delete -f cluster.yaml
$ k apply -f cluster.yaml
- Case #2
Problem: OSD pods are not created on my devices
https://rook.io/docs/rook/v1.3/ceph-common-issues.html#osd-pods-are-not-created-on-my-devices
Cause:
if Rook determines that a device is not available (has existing partitions or a formatted file system), Rook will skip consuming the devices.
[ysjeon71_kubeflow3@worker-1 ~]$ kubectl -n rook-ceph get pod -l app=rook-ceph-osd-prepare
…
# view the logs for the node of interest in the "provision" container
[ysjeon71_kubeflow3@worker-1 ~]$ k logs rook-ceph-osd-prepare-worker-1-2lzf6 -n rook-ceph
…
2020-05-08 00:06:45.931260 I | cephosd: skipping device "sda1" because it contains a filesystem "vfat"
2020-05-08 00:06:45.931264 I | cephosd: skipping device "sda2" because it contains a filesystem "xfs"
2020-05-08 00:06:45.931268 I | cephosd: skipping device "sdb" because it contains a filesystem "ext4"
2020-05-08 00:06:45.958256 I | cephosd: configuring osd devices: {"Entries":{}}
…
[ysjeon71_kubeflow3@worker-1 ~]
[ysjeon71_kubeflow3@worker-1 ~]$ lsblk -f
NAME FSTYPE LABEL UUID MOUNTPOINT
sda
├─sda1 vfat 79E8-651E /boot/efi
└─sda2 xfs root 8bf9cd93-b3d1-421e-ba67-5fd6e4189f3d /
sdb ext4 5bde26dc-d40f-41fb-88c1-c6c464e6b785 /user1
[ysjeon71_kubeflow3@worker-1 ~]$
Solution:
[root@worker-1 ~]# wipefs -a /dev/sdb -f
/dev/sdb: 2 bytes were erased at offset 0x00000438 (ext4): 53 ef
[root@worker-1 ~]# lsblk -f
NAME FSTYPE LABEL UUID MOUNTPOINT
sda
├─sda1 vfat 79E8-651E /boot/efi
└─sda2 xfs root 8bf9cd93-b3d1-421e-ba67-5fd6e4189f3d /
sdb /user1
[root@worker-1 ~]# umount /user1
[ysjeon71_kubeflow3@master ~]$ k delete pod rook-ceph-operator-7fc446864f-fjq5p -n rook-ceph
[root@worker-1 ~]# lsblk -f
NAME FSTYPE LABEL UUID MOUNTPOINT
sda
├─sda1 vfat 79E8-651E /boot/efi
└─sda2 xfs root 8bf9cd93-b3d1-421e-ba67-5fd6e4189f3d /
sdb LVM2_member XGnpK0-KKLt-yEXY-bq2W-JN1v-DIT7-W2wkMX
└─ceph--2c6e6c30--1ad8--4d12--9459--6ef9a9517363-osd--data--4bd16688--5f51--402d--a934--6a20136dc459
Worker-2, Worker-3 노드에 Disk (10G) 추가
yoosungjeon@ysjeon-MacBook-Pro ~ % gcloud compute disks create disk-2 --size 10G --zone=us-east1-d
yoosungjeon@ysjeon-MacBook-Pro ~ % gcloud compute disks create disk-3 --size 10G --zone=us-east1-d
yoosungjeon@ysjeon-MacBook-Pro ~ % gcloud compute instances attach-disk worker-2 --disk disk-2 --zone=us-east1-d
yoosungjeon@ysjeon-MacBook-Pro ~ % gcloud compute instances attach-disk worker-3 --disk disk-3 --zone=us-east1-d
- Case #3
Problem: kubectl patch 명령어에서 에러 발생
$ k patch CephCluster rook-ceph -n rook-ceph -p '{ "spec": { "dashboard": { "urlPrefix": "/ceph-dashboard" } } }'
Error from server (UnsupportedMediaType): the body of the request was in an unknown format - accepted media types include: application/json-patch+json, application/merge-patch+json
$ k patch CephCluster rook-ceph -n rook-ceph -p '{ "spec": { "dashboard": { "ssl": "false" } } }'
Error from server (UnsupportedMediaType): the body of the request was in an unknown format - accepted media types include: application/json-patch+json, application/merge-patch+json
$
Solution:
kubectl edit 명령여로 작업
- Case #4
Problem: cluster.yaml, storageclass.storage.k8s.io 등을 삭제시 블럭 됨
$ k delete -f cluster.yaml
cephcluster.ceph.rook.io "rook-ceph" deleted
…
$ k delete -f storageclass-block.yaml
cephblockpool.ceph.rook.io "rook-ceph-block-pool-iap" deleted
storageclass.storage.k8s.io "rook-ceph-block-sc-iap" deleted
…
Solution:
$ k delete pod rook-ceph-operator-6c68bb688-jngnn -n rook-ceph
pod "rook-ceph-operator-6c68bb688-jngnn" deleted
$
- Case #5
Problem: "rpc error: code = Unknown desc = context canceled"
[iap@iap01 ceph]$ k describe pod csi-cephfsplugin-provisioner-7487dcb679-6lvx6 -n rook-ceph | grep -i events -A 15
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
...
Warning Failed 53s (x2 over 7m25s) kubelet, iap09 Failed to pull image "quay.io/k8scsi/csi-resizer:v0.4.0": rpc error: code = Unknown desc = context canceled
Warning Failed <invalid> (x2 over 5m14s) kubelet, iap09 Error: ErrImagePull
Normal Pulling <invalid> (x3 over 11m) kubelet, iap09 Pulling image "quay.io/k8scsi/csi-attacher:v2.1.0"
Warning Failed <invalid> (x2 over 5m14s) kubelet, iap09 Failed to pull image "quay.io/k8scsi/csi-provisioner:v1.4.0": rpc error: code = Unknown desc = context canceled
[iap@iap01 ceph]$
Cause:
[root@iap07 ~]# docker pull quay.io/k8scsi/csi-resizer:v0.4.0
Trying to pull repository quay.io/k8scsi/csi-resizer ...
v0.4.0: Pulling from quay.io/k8scsi/csi-resizer
9ff2acc3204b: Downloading
0d3d64020a22: Downloading
dial tcp 13.225.112.61:443: i/o timeout
Solution:
cdn02.quay.io(13.225.112.61)의 IP가 변경되어 방화벽 룰 변경 요청
- Case #6
Problem:
driver name rook-ceph.rbd.csi.ceph.com not found in the list of registered CSI drivers
driver name rook-ceph.cephfs.csi.ceph.com not found in the list of registered CSI drivers
[iap@iap01 rook-storage-yaml]$ k get pod
NAME READY STATUS RESTARTS AGE
wordpress-mysql-6cc97b86fc-lrqgd 0/1 ContainerCreating 0 75s
[iap@iap01 ~]$ k describe pod wordpress-mysql-6cc97b86fc-lrqgd | grep Events -A 10
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 19s (x2 over 20s) default-scheduler pod has unbound immediate PersistentVolumeClaims (repeated 5 times)
Normal Scheduled 16s default-scheduler Successfully assigned default/wordpress-mysql-6cc97b86fc-lrqgd to iap09
Normal SuccessfulAttachVolume 16s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-2518fd30-5c45-4887-ba6c-accb947745a5"
Warning FailedMount 6s (x5 over 14s) kubelet, iap09 MountVolume.MountDevice failed for volume "pvc-2518fd30-5c45-4887-ba6c-accb947745a5" : driver name rook-ceph.rbd.csi.ceph.com not found in the list of registered CSI drivers
[iap@iap01 ~]$
Solution:
Rook v1.3.0-beta 사용시 On-Prem 환경에서 위 에러가 발생되었으나, GKE에서는 정상 동작
Rook v1.3.6으로 버전 업하여 해결
'Kubernetes > Storage' 카테고리의 다른 글
Rook Ceph - OSD autoout (0) | 2021.09.16 |
---|---|
Rook Ceph - failed to get status (0) | 2021.09.16 |
Rook-ceph - OSD/K8s Node 제거 (0) | 2021.09.15 |
Rook ceph vs NFS (3) | 2021.09.15 |
NFS-Client Provisioner (0) | 2021.09.15 |
댓글