본문 바로가기
Kubernetes/Storage

Rook Ceph 구성

by 여행을 떠나자! 2021. 9. 15.

2020.05.25

1. 구성 환경

    - Rook v1.3.6, ceph image version: "14.2.10-0 nautilus”, cephcsi v2.1.2, kubenetes 1.16.15, CentOS 7.8

 

 

2. Rook / Ceph ?

    - Rook is an open source cloud-native storage orchestrator, providing the platform, framework, and support for a diverse set of storage solutions to natively integrate with cloud-native environments.

      https://rook.io/docs/rook/v1.3/ceph-examples.html

    - Ceph CSI (Container Storage Interface)

        Ceph CSI plugins implement an interface between CSI enabled Container Orchestrator (CO) and Ceph cluster. It allows dynamically provisioning Ceph volumes and attaching them to workloads.

     - Independent CSI plugins are provided to support RBD and CephFS backed volumes

 

    - Ceph is an open-source software storage platform, implements object storage on a single distributed computer cluster, and provides 3in1 interfaces for: object-, block- and file-level storage.

      https://www.cloudops.com/2019/05/the-ultimate-rook-and-ceph-survival-guide/ ***

     https://docs.ceph.com/docs/master/architecture/

    - 주요 프로세스

      ceph-mon: Cluster monitor로 active / failed node 확인하는 역할을 수행하며 ceph storage cluster map의 master copy를 유지

      ceph-mds: Metadata server로 inode와 디렉토리들의 메타데이터(filesystem의 디렉토리 및 파일이름, RADOS cluster에 저장된 object로의 매핑정보) 를 저장

      ceph-osd: Object storage devices. 실제 파일 내용을 저장하고 OSD의 상태를 확인해서 monitor에 알려주는 역할도 수행

      ceph-rgw: RESTful gateways. Object storage layer를 외부에 노출시키기 위한 인터페이스

 

 

3. Ceph Prerequisites

    https://rook.io/docs/rook/v1.3/ceph-prerequisites.html

    - Ceph OSDs have a dependency on LVM in the following scenarios:

      OSDs are created on raw devices or partitions

      If encryption is enabled (encryptedDevice: true in the cluster CR)

      A metadata device is specified

   $ sudo yum install -y lvm2

    - Ceph requires a Linux kernel built with the RBD(RADOS Block Device) module. - RADOS : Reliable Autonomic Distributed Object Store

   # lsmod | grep rbd

   # modprobe rbd

    - If you will be creating volumes from a Ceph shared file system (CephFS), the recommended minimum kernel version is 4.17

      If you have a kernel version less than 4.17, the requested PVC sizes will not be enforced. Storage quotas will only be enforced on newer kernels.

    - NTP 설정 (On-Prem 일 경우)

   # yum install chrony -y

   # vi /etc/chrony.conf          # KT GiGATechHub내에 서버가 있는 경우 변경 및 방화벽 허용 요청

   server time.google.com iburst

   server time.kriss.re.kr iburst

   server time.bora.net iburst

   … 

   # systemctl enable chronyd && systemctl start chronyd

   # chronyc sources

   … 

   # timedatectl set-timezone Asia/Seoul

   - Rook Ceph 재 설치 시 Worker node에서 실행 

   # rm -rf /var/lib/rook/ /var/lib/kubelet/plugins/rook-ceph* /var/lib/kubelet/plugins_registry/rook-ceph*

   # lvscan

   … 

   # lvremove ceph-…               # Ceph가 생성했던 lvm 삭제

   # vgremove 

   # pvremove … 

   # wipefs -a /dev/sdc            # Device or resource busy 에러 발생되면 reboot후 재 실행

 

 

4. Rook Install

    - Ceph Storage Quickstart

    - https://rook.io/docs/rook/v1.3/ceph-quickstart.html

   - https://ruzickap.github.io/k8s-istio-workshop/lab-04/

 

    a. Download YAML files

   [ysjeon71_kubeflow3@master ~]$ git clone --single-branch -b release-1.3 https://github.com/rook/rook.git

   …

   [ysjeon71_kubeflow3@master ~]$ cd ~/rook/cluster/examples/kubernetes/ceph/

 

    b. Common Resources

   [ysjeon71_kubeflow3@master ~]$ kubectl apply -f common.yaml

   …

   [ysjeon71_kubeflow3@master ~]$

 

    c. Operator

   [ysjeon71_kubeflow3@master ~]$ vi operator.yaml

   …

        # Whether to start pods as privileged that mount a host path, which includes the Ceph mon and osd pods.

        # This is necessary to workaround the anyuid issues when running on OpenShift.

        # For more details see https://github.com/rook/rook/issues/1314#issuecomment-355799641

        - name: ROOK_HOSTPATH_REQUIRES_PRIVILEGED

          value: "true"    # false를 true로 변경

   …

   [ysjeon71_kubeflow3@master ~]$ kubectl apply -f operator.yaml

   configmap/rook-ceph-operator-config created

   deployment.apps/rook-ceph-operator created

   [ysjeon71_kubeflow3@master ~]$

 

      ## verify the rook-ceph-operator is in the `Running` state before proceeding

   [ysjeon71_kubeflow3@master ~]$ k get pod -o wide -n rook-ceph

   NAME                                  READY  STATUS   RESTARTS  AGE    IP           NODE      NOMINATED NODE  READINESS GATES

   rook-ceph-operator-7d99d768f4-bs8q9   1/1    Running  0         2m19s  10.46.0.1    worker-3  <none>          <none>

   rook-discover-h6s7b                   1/1    Running  0         103s   10.32.0.17   worker-1  <none>          <none>

   rook-discover-mvqgs                   1/1    Running  0         103s   10.38.0.16   worker-2  <none>          <none>

   rook-discover-t5d2h                   1/1    Running  0         103s   10.46.0.2    worker-3  <none>          <none>

   [ysjeon71_kubeflow3@master ~]$

 

    d. Cluster CRD

       - let’s create your Ceph storage cluster

    • cluster.yaml: Cluster settings for a production cluster running on bare metal

    • cluster-on-pvc.yaml: Cluster settings for a production cluster running in a dynamic cloud environment

    • cluster-test.yaml: Cluster settings for a test environment such as minikube.

 

   [ysjeon71_kubeflow3@master ~]$ kubectl apply -f cluster.yaml

  …

  [ysjeon71_kubeflow3@master ~]$

  [ysjeon71_kubeflow3@master ~]$ k get pods -n rook-ceph -o wide | cut -c-115 | egrep "NAME|rook-ceph-mgr|rook-ceph-mon|ook-ceph-osd-[0-9]" 

  NAME                                  READY   STATUS      RESTARTS   AGE     IP           NODE

  rook-ceph-mgr-a-f55d5d7-zvr45         1/1     Running     1          4h45m   10.32.0.20   worker-1

  rook-ceph-mon-a-69c75c5bdd-ktrq9      1/1     Running     1          22h     10.38.0.17   worker-2

  rook-ceph-mon-d-697d6bf679-5crf2      1/1     Running     0          4h45m   10.32.0.18   worker-1

  rook-ceph-mon-e-5ccfcb6fcf-xkhx9      1/1     Running     0          4h45m   10.46.0.5    worker-3

  rook-ceph-osd-0-7bdf4b8597-xcnx4      1/1     Running     0          31m     10.32.0.21   worker-1

  rook-ceph-osd-1-7dc576d9fb-qsnss      1/1     Running     0          9m46s   10.38.0.22   worker-2

  rook-ceph-osd-2-565bbb875f-rztrb      1/1     Running     0          93s     10.46.0.7    worker-3

  [ysjeon71_kubeflow3@master ~]$ k logs rook-ceph-operator-7fc446864f-v6mgt -n rook-ceph | egrep "E \||W \|"

  2020-05-08 04:47:47.409147 W | ceph-csi: CSI Block volume expansion requires Kubernetes version >=1.16.0

  [ysjeon71_kubeflow3@master ~]$

 

 

5. Rook Toolbox

    - The Rook toolbox is a container with common tools used for rook debugging and testing.

    -  https://rook.io/docs/rook/v1.3/ceph-toolbox.html

  [ysjeon71_kubeflow3@master ceph]$ k apply -f toolbox.yaml

 deployment.apps/rook-ceph-tools created

  [ysjeon71_kubeflow3@master ceph]$ kubectl -n rook-ceph get pod -l "app=rook-ceph-tools"

 NAME                               READY   STATUS    RESTARTS   AGE

  rook-ceph-tools-58df7d6b5c-j4gfv   1/1     Running   0          32s

  [ysjeon71_kubeflow3@master ceph]$

  [ysjeon71_kubeflow3@master ceph]$ kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash

  [root@rook-ceph-tools-58df7d6b5c-j4gfv /]# ceph status

    cluster:

      id:     cb0faddd-9ceb-4ee8-87ff-e4ade239182d

      health: HEALTH_OK

    services:

      mon: 3 daemons, quorum a,d,e (age 4h)

      mgr: a(active, since 3m)

      osd: 3 osds: 3 up (since 2m), 3 in (since 2m)

    data:

      pools:   0 pools, 0 pgs

      objects: 0 objects, 0 B

      usage:   3.0 GiB used, 27 GiB / 30 GiB avail

      pgs:

  [root@rook-ceph-tools-58df7d6b5c-j4gfv /]# ceph osd status

  +----+----------+-------+-------+--------+---------+--------+---------+-----------+

  | id |   host   |  used | avail | wr ops | wr data | rd ops | rd data |   state   |

  +----+----------+-------+-------+--------+---------+--------+---------+-----------+

  | 0  | worker-1 | 1025M | 9210M |    0   |     0   |    0   |     0   | exists,up |

  | 1  | worker-2 | 1025M | 9210M |    0   |     0   |    0   |     0   | exists,up |

  | 2  | worker-3 | 1025M | 9210M |    0   |     0   |    0   |     0   | exists,up |

  +----+----------+-------+-------+--------+---------+--------+---------+-----------+

  [root@rook-ceph-tools-58df7d6b5c-j4gfv /]# ceph df

  RAW STORAGE:

    CLASS     SIZE       AVAIL      USED        RAW USED     %RAW USED

    hdd       30 GiB     27 GiB     5.4 MiB      3.0 GiB         10.02

    TOTAL     30 GiB     27 GiB     5.4 MiB      3.0 GiB         10.02

  POOLS:

    POOL     ID     STORED     OBJECTS     USED     %USED     MAX AVAIL

  [root@rook-ceph-tools-58df7d6b5c-j4gfv /]# rados df

  POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR

  total_objects    0

  total_used       3.0 GiB

  total_avail      27 GiB

  total_space      30 GiB

  [root@rook-ceph-tools-58df7d6b5c-j4gfv /]#

 

 

6. Ceph Dashboard

    - https://rook.io/docs/rook/v1.3/ceph-dashboard.html

  [ysjeon71_kubeflow3@master ~]$ k edit CephCluster rook-ceph -n rook-ceph 

  …

  dashboard:

    enabled: true

    ssl: false                    # true -> false로 변경

    urlPrefix: /ceph-dashboard    # 추가

  …

  [ysjeon71_kubeflow3@master ceph]$ cd ~/rook/cluster/examples/kubernetes/ceph/

  [ysjeon71_kubeflow3@master ceph]$ ls dashboard-external-http*

  dashboard-external-http.yaml  dashboard-external-https.yaml

  [ysjeon71_kubeflow3@master ceph]$ k apply -f dashboard-external-http.yaml

  service/rook-ceph-mgr-dashboard-external-http created

  [ysjeon71_kubeflow3@master ceph]$

  [ysjeon71_kubeflow3@master ceph]$ kubectl -n rook-ceph get service | egrep "NAME|rook-ceph-mgr-dashboard-external"

  NAME                                    TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE

  rook-ceph-mgr-dashboard-external-http   NodePort    10.110.229.161   <none>        7000:32667/TCP      3m44s

  [ysjeon71_kubeflow3@master ceph]$

 

  yoosungjeon@ysjeon-MacBook-Pro ~ % gcloud compute firewall-rules create allow-rook-ceph-mgr-dashboard-rule --allow=tcp:32667

  yoosungjeon@ysjeon-MacBook-Pro ~ % gcloud compute instances list

  NAME      ZONE        MACHINE_TYPE               PREEMPTIBLE  INTERNAL_IP  EXTERNAL_IP    STATUS

  master    us-east1-b  n1-standard-2                           10.142.0.2   34.75.83.180   RUNNING

  worker-1  us-east1-c  custom (1 vCPU, 4.75 GiB)               10.142.0.3   35.227.3.25    RUNNING

  worker-2  us-east1-d  custom (1 vCPU, 4.75 GiB)               10.142.0.4   34.75.168.175  RUNNING

  worker-3  us-east1-d  n1-standard-1                           10.142.0.5   34.75.65.90    RUNNING

  yoosungjeon@ysjeon-MacBook-Pro ~ %

 

  URL:

    http://35.227.3.25:32667/ceph-dashboard/

       Username: admin

       Password: Jq-|.Fgu"g@;I:T1*:@\

  [ysjeon71_kubeflow3@master ~]$ kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath="{['data']['password']}" | base64 --decode && echo

  Jq-|.Fgu"g@;I:T1*:@\

  [ysjeon71_kubeflow3@master ~]$

    

7. Storage

    - Ceph storage

   • Block: Create block storage to be consumed by a pod

   • Shared Filesystem: Create a filesystem to be shared across multiple pods

   • Object: Create an object store that is accessible inside or outside the Kubernetes cluster

 

    a. Block

       https://rook.io/docs/rook/v1.3/ceph-block.html

        CephBlockPool -> storageClass -> PersistentVolumeClaim -> volumes

        Each OSD must be located on a different node, because the failureDomain is set to host and the replicated.size is set to 3.

 

        - Provision Storage

     [ysjeon71_kubeflow3@master ~]$ cd rook/cluster/examples/kubernetes/ceph/

     [ysjeon71_kubeflow3@master ~]$ vi storageclass.yaml

     apiVersion: ceph.rook.io/v1

     kind: CephBlockPool

     metadata:

       name: replicapool

       namespace: rook-ceph

     spec:

       failureDomain: host

       replicated:

         size: 3

     ---

     apiVersion: storage.k8s.io/v1

     kind: StorageClass

     metadata:

        name: rook-ceph-block

     # Change "rook-ceph" provisioner prefix to match the operator namespace if needed

     provisioner: rook-ceph.rbd.csi.ceph.com

     parameters:

         # clusterID is the namespace where the rook cluster is running

         clusterID: rook-ceph

         # Ceph pool into which the RBD image shall be created

         pool: replicapool

         # RBD image format. Defaults to "2".

         imageFormat: "2"

         # RBD image features. Available for imageFormat: "2". CSI RBD currently supports only `layering` feature.

         imageFeatures: layering

         # The secrets contain Ceph admin credentials.

         csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner

         csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph

         csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node

         csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph

         # Specify the filesystem type of the volume. If not specified, csi-provisioner

         # will set default as `ext4`.

         csi.storage.k8s.io/fstype: xfs

     # Delete the rbd volume when a PVC is deleted

     reclaimPolicy: Delete

     [ysjeon71_kubeflow3@master ~] k apply -f storageclass.yaml

 

        - Consume the storage: Wordpress sample

     [ysjeon71_kubeflow3@master ceph]$ cd ~/rook/cluster/examples/kubernetes/

     [ysjeon71_kubeflow3@master ceph]$ cat mysql.yaml

     …

     ---

     apiVersion: v1

     kind: PersistentVolumeClaim

     metadata:

       name: mysql-pv-claim

       labels:

         app: wordpress

     spec:

       storageClassName: rook-ceph-block

       accessModes:

       - ReadWriteOnce

       resources:

         requests:

           storage: 1Gi

     ---

     apiVersion: apps/v1

     kind: Deployment

     …

         spec:

           containers:

           - image: mysql:5.6

     …

             volumeMounts:

             - name: mysql-persistent-storage

               mountPath: /var/lib/mysql

           volumes:

           - name: mysql-persistent-storage

             persistentVolumeClaim:

               claimName: mysql-pv-claim

     [ysjeon71_kubeflow3@master kubernetes]$ k apply -f mysql.yaml

     service/wordpress-mysql created

     persistentvolumeclaim/mysql-pv-claim created

     deployment.apps/wordpress-mysql created

     [ysjeon71_kubeflow3@master kubernetes]$ k apply -f wordpress.yaml

     service/wordpress created

     persistentvolumeclaim/wp-pv-claim created

     deployment.apps/wordpress created

     [ysjeon71_kubeflow3@master kubernetes]$ k get pvc

     NAME             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE

     mysql-pv-claim   Bound    pvc-17fedd75-3998-4b09-bc13-de112afd8ab7   1Gi        RWO            rook-ceph-block   16m

     wp-pv-claim      Bound    pvc-adf8d37b-8882-4568-badf-fead0f499a34   1Gi        RWO            rook-ceph-block   3m10s

     [ysjeon71_kubeflow3@master kubernetes]$

 

    b. Shared Filesystem

        https://rook.io/docs/rook/v1.3/ceph-filesystem.html

        By default only one shared filesystem can be created with Rook.

        CephFilesystem -> storageClass -> PersistentVolumeClaim -> volumes

 

        - Create the Filesystem

           Create the filesystem by specifying the desired settings for the metadata pool, data pools, and metadata server in the CephFilesystem CRD.

     [ysjeon71_kubeflow3@master ~]]$ cd rook/cluster/examples/kubernetes/ceph/

     [ysjeon71_kubeflow3@master ceph]$ cat filesystem.yaml

     apiVersion: ceph.rook.io/v1

     kind: CephFilesystem

     metadata:

       name: myfs

       namespace: rook-ceph

     spec:

       metadataPool:

         replicated:

           size: 3

       dataPools:

         - replicated:

             size: 3

       preservePoolsOnDelete: true

       metadataServer:

         activeCount: 1

         activeStandby: true

     [ysjeon71_kubeflow3@master ceph]$ k apply -f filesystem.yaml

     cephfilesystem.ceph.rook.io/myfs created

     [ysjeon71_kubeflow3@master ceph]$ kubectl -n rook-ceph get pod -l app=rook-ceph-mds

     NAME                                    READY   STATUS    RESTARTS   AGE

     rook-ceph-mds-myfs-a-78d59c77c7-ndps7   1/1     Running   0          10m

     rook-ceph-mds-myfs-b-9b59d896c-q898t    1/1     Running   0          10m

     [ysjeon71_kubeflow3@master ceph]$

 

        - Provision Storage

     [ysjeon71_kubeflow3@master ceph]$ vi storageclass_fs.yaml

     apiVersion: storage.k8s.io/v1

     kind: StorageClass

     metadata:

       name: rook-cephfs

     # Change "rook-ceph" provisioner prefix to match the operator namespace if needed

     provisioner: rook-ceph.cephfs.csi.ceph.com

     parameters:

       # clusterID is the namespace where operator is deployed.

       clusterID: rook-ceph

       # CephFS filesystem name into which the volume shall be created

       fsName: myfs

       # Ceph pool into which the volume shall be created

       # Required for provisionVolume: "true"

       pool: myfs-data0

       # Root path of an existing CephFS volume

       # Required for provisionVolume: "false"

       # rootPath: /absolute/path

       # The secrets contain Ceph admin credentials. These are generated automatically by the operator

       # in the same namespace as the cluster.

       csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner

       csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph

       csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node

       csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph

     reclaimPolicy: Delete

     [ysjeon71_kubeflow3@master ceph]$ k apply -f storageclass_fs.yaml

     storageclass.storage.k8s.io/rook-cephfs created

     [ysjeon71_kubeflow3@master ceph]$ k get StorageClass

     NAME              PROVISIONER                     AGE

     rook-ceph-block   rook-ceph.rbd.csi.ceph.com      3h47m

     rook-cephfs       rook-ceph.cephfs.csi.ceph.com   53s

     [ysjeon71_kubeflow3@master ceph]$

 

        - Consume the Shared Filesystem: K8s Registry Sample

          As an example, we will start the kube-registry pod with the shared filesystem as the backing store

     $ cd ~/rook/cluster/examples/kubernetes/ceph/csi/cephfs

     $ cat kube-registry.yaml

     apiVersion: v1

     kind: PersistentVolumeClaim

     metadata:

       name: cephfs-pvc

     spec:

       accessModes:

       - ReadWriteMany

       resources:

         requests:

           storage: 1Gi

       storageClassName: rook-cephfs

     ---

     apiVersion: apps/v1

     kind: Deployment

     …

         spec:

           containers:

           - name: registry

             image: registry:2

     …

             volumeMounts:

             - name: image-store

               mountPath: /var/lib/registry

     …

           volumes:

           - name: image-store

             persistentVolumeClaim:

               claimName: cephfs-pvc

               readOnly: false

     $

     [ysjeon71_kubeflow3@master cephfs]$ k apply -f kube-registry.yaml

     persistentvolumeclaim/cephfs-pvc created

     deployment.apps/kube-registry created

     [ysjeon71_kubeflow3@master cephfs]$

 

    c. Object storage

       https://rook.io/docs/rook/v1.3/ceph-object.html

        Object storage exposes an S3 API to the storage cluster for applications to put and get data.

        CephObjectStore -> storageClass -> ObjectBucketClaim

 

        - Create an Object Store

     $ cd ~/rook/cluster/examples/kubernetes/ceph

     [ysjeon71_kubeflow3@master ceph]$ k apply -f object.yaml

     cephobjectstore.ceph.rook.io/my-store created

     [ysjeon71_kubeflow3@master ceph]$

     # To confirm the object store is configured, wait for the rgw pod to start

     [ysjeon71_kubeflow3@master ceph]$ kubectl -n rook-ceph get pod -l app=rook-ceph-rgw

     NAME                                        READY   STATUS    RESTARTS   AGE

     rook-ceph-rgw-my-store-a-689745fc44-dnlbm   1/1     Running   0          40s

     [ysjeon71_kubeflow3@master ceph]$

 

        - Create a Bucket

     [ysjeon71_kubeflow3@master ceph]$ k apply -f storageclass-bucket-delete.yaml

     storageclass.storage.k8s.io/rook-ceph-delete-bucket created

     [ysjeon71_kubeflow3@master ceph]$

     A secret and ConfigMap are created with the same name as the OBC and in the same namespace

     [ysjeon71_kubeflow3@master ceph]$ cat object-bucket-claim-delete.yaml

     apiVersion: objectbucket.io/v1alpha1

     kind: ObjectBucketClaim

     metadata:

       name: ceph-delete-bucket

     spec:

       generateBucketName: ceph-bkt

       storageClassName: rook-ceph-delete-bucket

     [ysjeon71_kubeflow3@master ceph]$

     [ysjeon71_kubeflow3@master ceph]$ k apply -f object-bucket-claim-delete.yaml

     objectbucketclaim.objectbucket.io/ceph-delete-bucket created

     [ysjeon71_kubeflow3@master ceph]$

     $ export AWS_HOST=$(kubectl -n default get cm ceph-delete-bucket -o yaml | grep BUCKET_HOST | awk '{print $2}')

     $ export AWS_ACCESS_KEY_ID=$(kubectl -n default get secret ceph-delete-bucket -o yaml | grep AWS_ACCESS_KEY_ID | awk '{print $2}' | base64 --decode)

     $ export AWS_SECRET_ACCESS_KEY=$(kubectl -n default get secret ceph-delete-bucket -o yaml | grep AWS_SECRET_ACCESS_KEY | awk '{print $2}' | base64 --decode)

     $ export AWS_ENDPOINT_IP=$(kubectl -n rook-ceph get svc rook-ceph-rgw-my-store -o yaml | grep clusterIP | awk '{print $2}')

     $ export AWS_ENDPOINT_PORT=$(kubectl -n rook-ceph get svc rook-ceph-rgw-my-store -o yaml | grep -w port | awk '{print $2}')

     $ export AWS_ENDPOINT=$AWS_ENDPOINT_IP:$AWS_ENDPOINT_PORT

     $ export AWS_BUCKET=$(k get ObjectBucketClaim ceph-delete-bucket -o yaml | grep -w bucketName | awk  '{print $2}')

     [ysjeon71_kubeflow3@master ceph]$ echo $AWS_HOST

     rook-ceph-rgw-my-store.rook-ceph

     [ysjeon71_kubeflow3@master ceph]$ echo $AWS_ACCESS_KEY_ID

     TWHS4NPWZRWLI5RQUS6H

     [ysjeon71_kubeflow3@master ceph]$ echo $AWS_SECRET_ACCESS_KEY

     zypu1DtYM4EdeqnpOzvReTFt64Rp1GrDGISGqLvP

     [ysjeon71_kubeflow3@master ceph]$ echo $AWS_ENDPOINT

     10.109.8.217:80

     [ysjeon71_kubeflow3@master ceph]$ echo $AWS_BUCKET

     ceph-bkt-70338ace-9fb9-44a1-8d80-37d20fc1864a

     [ysjeon71_kubeflow3@master ceph]$

 

        - Consume the Object Storage

     To simplify the s3 client commands, you will want to set the four environment variables for use by your client (ie. inside the toolbox)

     [ysjeon71_kubeflow3@master ceph]$ kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash

     [root@rook-ceph-tools-58df7d6b5c-hk88n /]# yum --assumeyes install s3cmd

     …

     [root@rook-ceph-tools-58df7d6b5c-hk88n /]#export AWS_HOST=rook-ceph-rgw-my-store.rook-ceph

     [root@rook-ceph-tools-58df7d6b5c-hk88n /]# export AWS_ENDPOINT=10.109.8.217:80

     [root@rook-ceph-tools-58df7d6b5c-hk88n /]# export AWS_ACCESS_KEY_ID=TWHS4NPWZRWLI5RQUS6H

     [root@rook-ceph-tools-58df7d6b5c-hk88n /]# export AWS_SECRET_ACCESS_KEY=zypu1DtYM4EdeqnpOzvReTFt64Rp1GrDGISGqLvP

     [root@rook-ceph-tools-58df7d6b5c-hk88n /]# export AWS_BUCKET=ceph-bkt-70338ace-9fb9-44a1-8d80-37d20fc1864a

     [root@rook-ceph-tools-58df7d6b5c-hk88n /]# s3cmd put /tmp/rookObj --no-ssl --host=${AWS_HOST} --host-bucket= s3://${AWS_BUCKET}

     upload: '/tmp/rookObj' -> 's3://ceph-bkt-70338ace-9fb9-44a1-8d80-37d20fc1864a/rookObj'  [1 of 1]

      11 of 11   100% in    0s    25.44 B/s  done

     [root@rook-ceph-tools-58df7d6b5c-hk88n /]#

     [root@rook-ceph-tools-58df7d6b5c-hk88n /]# s3cmd get s3://${AWS_BUCKET}/rookObj /tmp/rookObj-download --no-ssl --host=${AWS_HOST} --host-bucket=

     download: 's3://ceph-bkt-70338ace-9fb9-44a1-8d80-37d20fc1864a/rookObj' -> '/tmp/rookObj-download'  [1 of 1]

      11 of 11   100% in    0s   733.53 B/s  done

     [root@rook-ceph-tools-58df7d6b5c-hk88n /]# cat /tmp/rookObj-download

     Hello Rook

     [root@rook-ceph-tools-58df7d6b5c-hk88n /]#

 

 

8. Prometheus Monitoring

   - https://rook.io/docs/rook/v1.3/ceph-monitoring.html

   - Prometheus Operator

     the Prometheus operator needs to be started in the cluster so it can watch for our requests to start monitoring Rook and respond by deploying the correct Prometheus pods and configuration.

  [ysjeon71_kubeflow3@master ~]$ kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/v0.26.0/bundle.yaml

  clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator created

  clusterrole.rbac.authorization.k8s.io/prometheus-operator created

  deployment.apps/prometheus-operator created

  serviceaccount/prometheus-operator created

  [ysjeon71_kubeflow3@master ~]$

  [ysjeon71_kubeflow3@master ~]$ k get pod --all-namespaces | grep prometheus-operator

  default                prometheus-operator-6bfb74db6-6msll                  2/2     Running     2          2m9s

  [ysjeon71_kubeflow3@master ~]$

 

    - Prometheus Instances

      we can create a service monitor that will watch the Rook cluster and collect metrics regularly.

  [ysjeon71_kubeflow3@master monitoring]$ cd ~/rook/cluster/examples/kubernetes/ceph/monitoring

  [ysjeon71_kubeflow3@master monitoring]$ kubectl create -f service-monitor.yaml

  servicemonitor.monitoring.coreos.com/rook-ceph-mgr created

  [ysjeon71_kubeflow3@master monitoring]$ kubectl create -f prometheus.yaml

  serviceaccount/prometheus created

  clusterrole.rbac.authorization.k8s.io/prometheus created

  clusterrole.rbac.authorization.k8s.io/prometheus-rules created

  clusterrolebinding.rbac.authorization.k8s.io/prometheus created

  prometheus.monitoring.coreos.com/rook-prometheus created

  [ysjeon71_kubeflow3@master monitoring]$ kubectl create -f prometheus-service.yaml

  service/rook-prometheus created

  [ysjeon71_kubeflow3@master monitoring]$

  [ysjeon71_kubeflow3@master monitoring]$ kubectl -n rook-ceph get pod prometheus-rook-prometheus-0

  NAME                           READY   STATUS    RESTARTS   AGE

  prometheus-rook-prometheus-0   3/3     Running   1          32s

  [ysjeon71_kubeflow3@master monitoring]$

 

    - Prometheus Web Console

  [ysjeon71_kubeflow3@master monitoring]$ echo "http://$(kubectl -n rook-ceph -o jsonpath={.status.hostIP} get pod prometheus-rook-prometheus-0):30900"

  http://10.142.0.4:30900

  [ysjeon71_kubeflow3@master monitoring]$

  [ysjeon71_kubeflow3@master monitoring]$ k get services rook-prometheus -n rook-ceph

  NAME              TYPE       CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE

  rook-prometheus   NodePort   10.108.145.174   <none>        9090:30900/TCP   4m1s

  [ysjeon71_kubeflow3@master monitoring]$

 

    URI: http://34.75.82.128:30900/

       

 

    - Prometheus Alerts

  [ysjeon71_kubeflow3@master monitoring]$ cd ~/rook/cluster/examples/kubernetes/ceph/monitoring

  kubectl apply -f  rbac.yaml

  kubectl edit ~/rook/cluster/examples/kubernetes/ceph/cluster.yaml -n rook-ceph

  apiVersion: ceph.rook.io/v1

  kind: CephCluster

  metadata:

    name: rook-ceph

    namespace: rook-ceph

  […]

  spec:

  []

    monitoring:

      enabled: true     # 라인 추가

      rulesNamespace: "rook-ceph"

  []

 

    - Grafana Dashboards

      The following Grafana dashboards are available:

     • Ceph - Cluster

     • Ceph - OSD

     • Ceph - Pools

 

 

9. Trouble shooting

    - Case #1

      Problem:

          rook-ceph-mon-a-7d67c65584-m7nrf 배포시 오류(CrashLoopBackOff) 발생

    $ kubectl apply -f cluster.yaml

   …

   $ kubectl get pods -n rook-ceph

   …

    rook-ceph-mon-a-7d67c65584-m7nrf                     0/1     Init:CrashLoopBackOff   6        10m

   …

   $

      Cause:

     $ [ysjeon71_kubeflow3@master ~]$ k logs rook-ceph-mon-a-7d67c65584-m7nrf -n rook-ceph

     Error from server (BadRequest): container "mon" in pod "rook-ceph-mon-a-7d67c65584-m7nrf" is waiting to start: PodInitializing

     [ysjeon71_kubeflow3@master ~]$ k logs rook-ceph-mon-a-7d67c65584-m7nrf -n rook-ceph -c chown-container-data-dir

     failed to change ownership of '/var/log/ceph' from root:root to ceph:ceph

     failed to change ownership of '/var/lib/ceph/crash' from root:root to ceph:ceph

     failed to change ownership of '/var/lib/ceph/mon/ceph-a' from root:root to ceph:ceph

     chown: changing ownership of '/var/log/ceph': Permission denied

     chown: changing ownership of '/var/lib/ceph/crash': Permission denied

     chown: changing ownership of '/var/lib/ceph/mon/ceph-a': Permission denied

      Solution:

      [ysjeon71_kubeflow3@master ceph]$ k edit deployments rook-ceph-operator -n rook-ceph

      …

         - name: ROOK_HOSTPATH_REQUIRES_PRIVILEGED

          value: "true"    # false를 true로 변경

      …

      [ysjeon71_kubeflow3@master ~]$ k delete -f cluster.yaml

      $ k apply -f cluster.yaml

 

    - Case #2

      Problem: OSD pods are not created on my devices

          https://rook.io/docs/rook/v1.3/ceph-common-issues.html#osd-pods-are-not-created-on-my-devices

      Cause:

          if Rook determines that a device is not available (has existing partitions or a formatted file system), Rook will skip consuming the devices. 

    [ysjeon71_kubeflow3@worker-1 ~]$ kubectl -n rook-ceph get pod -l app=rook-ceph-osd-prepare

    …

    # view the logs for the node of interest in the "provision" container

    [ysjeon71_kubeflow3@worker-1 ~]$ k logs rook-ceph-osd-prepare-worker-1-2lzf6 -n rook-ceph

    …

    2020-05-08 00:06:45.931260 I | cephosd: skipping device "sda1" because it contains a filesystem "vfat"

    2020-05-08 00:06:45.931264 I | cephosd: skipping device "sda2" because it contains a filesystem "xfs"

    2020-05-08 00:06:45.931268 I | cephosd: skipping device "sdb" because it contains a filesystem "ext4"

    2020-05-08 00:06:45.958256 I | cephosd: configuring osd devices: {"Entries":{}}

    …

    [ysjeon71_kubeflow3@worker-1 ~]

    [ysjeon71_kubeflow3@worker-1 ~]$ lsblk -f

    NAME   FSTYPE LABEL UUID                                 MOUNTPOINT

    sda

    ├─sda1 vfat         79E8-651E                            /boot/efi

    └─sda2 xfs    root  8bf9cd93-b3d1-421e-ba67-5fd6e4189f3d /

    sdb    ext4         5bde26dc-d40f-41fb-88c1-c6c464e6b785 /user1

    [ysjeon71_kubeflow3@worker-1 ~]$

      Solution:

    [root@worker-1 ~]# wipefs -a /dev/sdb -f

    /dev/sdb: 2 bytes were erased at offset 0x00000438 (ext4): 53 ef

    [root@worker-1 ~]# lsblk -f

    NAME   FSTYPE LABEL UUID                                 MOUNTPOINT

    sda

    ├─sda1 vfat         79E8-651E                            /boot/efi

    └─sda2 xfs    root  8bf9cd93-b3d1-421e-ba67-5fd6e4189f3d /

    sdb                                                      /user1

    [root@worker-1 ~]# umount /user1

    [ysjeon71_kubeflow3@master ~]$ k delete pod rook-ceph-operator-7fc446864f-fjq5p -n rook-ceph

    [root@worker-1 ~]# lsblk -f

    NAME                 FSTYPE      LABEL UUID                                   MOUNTPOINT

    sda

    ├─sda1               vfat              79E8-651E                              /boot/efi

    └─sda2               xfs         root  8bf9cd93-b3d1-421e-ba67-5fd6e4189f3d   /

    sdb                  LVM2_member       XGnpK0-KKLt-yEXY-bq2W-JN1v-DIT7-W2wkMX

    └─ceph--2c6e6c30--1ad8--4d12--9459--6ef9a9517363-osd--data--4bd16688--5f51--402d--a934--6a20136dc459

          Worker-2, Worker-3 노드에 Disk (10G) 추가

       yoosungjeon@ysjeon-MacBook-Pro ~ % gcloud compute disks create disk-2 --size 10G --zone=us-east1-d

       yoosungjeon@ysjeon-MacBook-Pro ~ % gcloud compute disks create disk-3 --size 10G --zone=us-east1-d

       yoosungjeon@ysjeon-MacBook-Pro ~ % gcloud compute instances attach-disk worker-2 --disk disk-2 --zone=us-east1-d

       yoosungjeon@ysjeon-MacBook-Pro ~ % gcloud compute instances attach-disk worker-3 --disk disk-3 --zone=us-east1-d

 

    - Case #3

      Problem: kubectl patch 명령어에서 에러 발생

    $ k patch CephCluster rook-ceph -n rook-ceph -p '{ "spec": { "dashboard": { "urlPrefix": "/ceph-dashboard" } } }'

    Error from server (UnsupportedMediaType): the body of the request was in an unknown format - accepted media types include: application/json-patch+json, application/merge-patch+json

    $ k patch CephCluster rook-ceph -n rook-ceph -p '{ "spec": { "dashboard": { "ssl": "false" } } }'

    Error from server (UnsupportedMediaType): the body of the request was in an unknown format - accepted media types include: application/json-patch+json, application/merge-patch+json

    $

      Solution:

           kubectl edit 명령여로 작업

 

    - Case #4

      Problem: cluster.yaml, storageclass.storage.k8s.io 등을 삭제시 블럭 됨

     $ k delete -f cluster.yaml

     cephcluster.ceph.rook.io "rook-ceph" deleted

     …

 

     $ k delete -f storageclass-block.yaml

     cephblockpool.ceph.rook.io "rook-ceph-block-pool-iap" deleted

     storageclass.storage.k8s.io "rook-ceph-block-sc-iap" deleted

     …

      Solution:

     $ k delete pod rook-ceph-operator-6c68bb688-jngnn -n rook-ceph

     pod "rook-ceph-operator-6c68bb688-jngnn" deleted

     $

 

    - Case #5

      Problem: "rpc error: code = Unknown desc = context canceled"

     [iap@iap01 ceph]$ k describe pod csi-cephfsplugin-provisioner-7487dcb679-6lvx6 -n rook-ceph | grep -i events -A 15

     Events:

       Type     Reason     Age                        From               Message

       ----     ------     ----                       ----               -------

       ... 

       Warning  Failed     53s (x2 over 7m25s)        kubelet, iap09     Failed to pull image "quay.io/k8scsi/csi-resizer:v0.4.0": rpc error: code = Unknown desc = context canceled

       Warning  Failed     <invalid> (x2 over 5m14s)  kubelet, iap09     Error: ErrImagePull

       Normal   Pulling    <invalid> (x3 over 11m)    kubelet, iap09     Pulling image "quay.io/k8scsi/csi-attacher:v2.1.0"

       Warning  Failed     <invalid> (x2 over 5m14s)  kubelet, iap09     Failed to pull image "quay.io/k8scsi/csi-provisioner:v1.4.0": rpc error: code = Unknown desc = context canceled

     [iap@iap01 ceph]$

      Cause:

     [root@iap07 ~]# docker pull quay.io/k8scsi/csi-resizer:v0.4.0

     Trying to pull repository quay.io/k8scsi/csi-resizer ...

     v0.4.0: Pulling from quay.io/k8scsi/csi-resizer

     9ff2acc3204b: Downloading

     0d3d64020a22: Downloading

     dial tcp 13.225.112.61:443: i/o timeout

      Solution:

          cdn02.quay.io(13.225.112.61)의 IP가 변경되어 방화벽 룰 변경 요청   

 

    - Case #6

      Problem:

          driver name rook-ceph.rbd.csi.ceph.com not found in the list of registered CSI drivers

          driver name rook-ceph.cephfs.csi.ceph.com not found in the list of registered CSI drivers

     [iap@iap01 rook-storage-yaml]$ k get pod

     NAME                               READY   STATUS              RESTARTS   AGE

     wordpress-mysql-6cc97b86fc-lrqgd   0/1     ContainerCreating   0          75s

     [iap@iap01 ~]$ k describe pod wordpress-mysql-6cc97b86fc-lrqgd | grep Events -A 10

     Events:

       Type     Reason                  Age                From                     Message

       ----     ------                  ----               ----                     -------

       Warning  FailedScheduling        19s (x2 over 20s)  default-scheduler        pod has unbound immediate PersistentVolumeClaims (repeated 5 times)

       Normal   Scheduled               16s                default-scheduler        Successfully assigned default/wordpress-mysql-6cc97b86fc-lrqgd to iap09

       Normal   SuccessfulAttachVolume  16s                attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-2518fd30-5c45-4887-ba6c-accb947745a5"

       Warning  FailedMount             6s (x5 over 14s)   kubelet, iap09           MountVolume.MountDevice failed for volume "pvc-2518fd30-5c45-4887-ba6c-accb947745a5" : driver name rook-ceph.rbd.csi.ceph.com not found in the list of registered CSI drivers

     [iap@iap01 ~]$

      Solution: 

           Rook v1.3.0-beta 사용시 On-Prem 환경에서 위 에러가 발생되었으나, GKE에서는 정상 동작

           Rook v1.3.6으로 버전 업하여 해결

'Kubernetes > Storage' 카테고리의 다른 글

Rook Ceph - OSD autoout  (0) 2021.09.16
Rook Ceph - failed to get status  (0) 2021.09.16
Rook-ceph - OSD/K8s Node 제거  (0) 2021.09.15
Rook ceph vs NFS  (3) 2021.09.15
NFS-Client Provisioner  (0) 2021.09.15

댓글