본문 바로가기
Kubernetes/DBMS

Crunchy PostgreSQL Operator #2

by 여행을 떠나자! 2021. 9. 23.

2021.04.22, 2020.09.10

Crunchy PostgreSQL Operator #1

    1.  Crunchy PostgreSQL Operator 소개

    2. PostgreSQL Operator Configuration

    3. Create a PostgreSQL Cluster (Multi-Tenancy)

    4. pgAdmin4

Crunchy PostgreSQL Operator #2

    5. High-Availability

    6. Disaster Recovery: Backups & Restores

    7. Monitoring

    8. pgBouncer


5. High-Availability

- High-Availability: Scaling Up & Down

   https://access.crunchydata.com/documentation/postgres-operator/4.4.1/pgo-client/common-tasks/#high-availability-scaling-up-down

- The Crunchy PostgreSQL Operator supports a distributed-consensus based high-availability (HA) system that keeps its managed PostgreSQL clusters up and running, even if the PostgreSQL Operator disappears. 

   Additionally, it leverages Kubernetes specific features such as Pod Anti-Affinity to limit the surface area that could lead to a PostgreSQL cluster becoming unavailable.

 

- High-Availability (요약)

   https://access.crunchydata.com/documentation/postgres-operator/4.4.1/architecture/high-availability/   

- HA is enabled by default in any newly created PostgreSQL cluster.

- The Crunchy PostgreSQL Operator High-Availability Algorithm

   The key portion of high-availability that the PostgreSQL Operator provides is that it delegates the management of HA to the PostgreSQL clusters themselves.

   Each HA PostgreSQL cluster maintains its availability using concepts that come from the Raft algorithm to achieve distributed consensus.

   The Raft algorithm (“Reliable, Replicated, Redundant, Fault-Tolerant”) was developed for systems that have one “leader” (i.e. a primary) and one-to-many followers (i.e. replicas) to provide the same fault tolerance and safety.

   Each PostgreSQL cluster leverages the distributed etcd key-value store that is bundled with Kubernetes.

- By default, when a new PostgreSQL cluster is created using the PostgreSQL Operator, pod anti-affinity rules will be applied to any deployments comprising the full PG cluster

 

a. Creating a New Replica

$ pgo scale emo-dev --replica-count=2
created Pgreplica emo-dev-qfbs
created Pgreplica emo-dev-yfhd
$ pgo show cluster emo-dev
cluster : emo-dev (crunchy-postgres-ha:centos7-11.9-4.4.1)
  pod : emo-dev-5dddffb55c-pvkbm (Running) on iap04 (1/1) (primary)
      pvc: emo-dev (100Gi)
  pod : emo-dev-qfbs-564bf6b648-w4z8b (Running) on iap08 (1/1) (replica)
      pvc: emo-dev-qfbs (100Gi)
  pod : emo-dev-yfhd-64f694bcd9-gj4jv (Running) on iap06 (1/1) (replica)
      pvc: emo-dev-yfhd (100Gi)
  resources : Memory: 1Gi
  deployment : emo-dev
  deployment : emo-dev-backrest-shared-repo
  deployment : emo-dev-pgadmin
  deployment : emo-dev-qfbs
  deployment : emo-dev-yfhd
  service : emo-dev - ClusterIP (10.97.187.147)
  service : emo-dev-pgadmin - ClusterIP (10.100.196.158)
  service : emo-dev-replica - ClusterIP (10.99.113.241)
  pgreplica : emo-dev-qfbs
  pgreplica : emo-dev-yfhd
  labels : pgouser=admin autofail=true crunchy-pgha-scope=emo-dev pg-pod-anti-affinity= deployment-name=emo-dev name=emo-dev pg-cluster=emo-dev pgo-backrest=true pgo-version=4.4.1 crunchy-pgadmin=true crunchy-pgbadger=false crunchy_collect=false workflowid=5bf710f2-d772-4e8e-8862-9481e22b6d75
$ kubectl patch service emo-dev-replica -n pgo -p '{ "spec": { "type": "NodePort" } }'
$ kubectl get svc emo-dev-replica -n pgo -o jsonpath='{.spec.ports[?(@.name=="postgres")].nodePort}'
30796
$

- on Remote client

yoosungjeon@ysjeon-Dev crunchy-postgresql-operator % /Users/yoosungjeon/PostgreSQL/9.5/bin/psql -h 14.52.244.134 -p 30796 -U emo emo-dev
psql (9.5.0, server 12.4)
emo-dev=> select count(*) from tab1;
 count
-------
   324
(1 row)
emo-dev=> insert into tab1 values ( 1004 );
ERROR:  cannot execute INSERT in a read-only transaction
emo-dev=>

 

b. Viewing Available Replicas

$ pgo failover --query emo-dev
Cluster: emo-dev
REPLICA               STATUS      NODE        REPLICATION LAG         PENDING RESTART
emo-dev-qfbs          running     iap08                  0 MB                   false
emo-dev-yfhd          running     iap06                  0 MB                   false
$

 

c. Manual Failover

- The PostgreSQL Operator is set up with an automated failover system based on distributed consensus but there may be times where you wish to have your cluster manually failover.

$ pgo failover emo-dev --target=emo-dev-yfhd
WARNING: Are you sure? (yes/no): y
created Pgtask (failover) for cluster emo-dev
$ pgo show cluster emo-dev
cluster : emo-dev (crunchy-postgres-ha:centos7-11.9-4.4.1)
  pod : emo-dev-5dddffb55c-pvkbm (Running) on iap04 (1/1) (replica)
      pvc: emo-dev (1Gi)
  pod : emo-dev-qfbs-564bf6b648-w4z8b (Running) on iap08 (1/1) (replica)
      pvc: emo-dev-qfbs (1Gi)
  pod : emo-dev-yfhd-64f694bcd9-gj4jv (Running) on iap06 (1/1) (primary)
      pvc: emo-dev-yfhd (1Gi)
…
$

- 참고사항

   Non-HA 환경에서 PostgreSQL 종료 시 Kubernetes가 재기동 시키는데 소용 시간 : 53초 (IAP 운영환경 기준)

   Manual failover에 의한 절체 시간 : 4초 (IAP 운영환경 기준)

$ while true
do
    export PGPASSWORD=*****; psql -h 14.52.244.136 -p 31738 -U emo emo-dev -c "select now()" | head -n 3 | tail -n 1
    sleep 1
done
…
 2020-09-09 08:34:20.528232+00
psql: FATAL:  the database system is shutting down
psql: FATAL:  the database system is shutting down
psql: could not connect to server: Connection refused
    Is the server running on host "14.52.244.134" and accepting
    TCP/IP connections on port 31738?
 2020-09-09 08:34:24.760391+00
 2020-09-09 08:34:25.835177+00
…

 

d. Primary PostreSQL Instance 강제 종료

$ k delete pod emo-dev-yfhd-64f694bcd9-gj4jv -n pgo
pod "emo-dev-yfhd-64f694bcd9-gj4jv" deleted
$ pgo show cluster emo-dev
cluster : emo-dev (crunchy-postgres-ha:centos7-11.9-4.4.1)
  pod : emo-dev-5dddffb55c-pvkbm (Running) on iap04 (1/1) (primary)
      pvc: emo-dev (1Gi)
  pod : emo-dev-qfbs-564bf6b648-w4z8b (Running) on iap08 (1/1) (replica)
      pvc: emo-dev-qfbs (1Gi)
  pod : emo-dev-yfhd-64f694bcd9-hj4g8 (Pending) on iap08 (0/1) (unknown) —> 일정 시간 후 replica 상태 변경
      pvc: emo-dev-yfhd (1Gi)
…
$

 

e.  Replica PostgreSQL Instance 강제 종료

-  서비스에 영향을 주지 않음  

$ k rollout restart deployment emo-dev -n pgo
deployment.extensions/emo-dev restarted
$

 

f. Destroying a Replica

$ pgo scaledown emo-dev --target=emo-dev-yfhd
WARNING: Are you sure? (yes/no): yes
deleted replica emo-dev-yfhd
$

 

 

6. Disaster Recovery: Backups & Restores

- The PostgreSQL Operator uses the open source pgBackRest backup and recovery utility for managing backups and PostgreSQL archives. 

   The PostgreSQL Operator supports creating backups in S3 or any object storage system that uses the S3 protocol. 

   https://access.crunchydata.com/documentation/postgres-operator/4.4.1/architecture/disaster-recovery/#using-s3

https://access.crunchydata.com/documentation/postgres-operator/4.4.1/architecture/disaster-recovery/

 

a. Creating a Backup

    - When a new PostgreSQL cluster is provisioned by the PostgreSQL Operator, a full pgBackRest backup is taken by default.

      This is required in order to create new replicas (via pgo scale) for the PostgreSQL cluster as well as healing during a failover scenario.

$ pgo backup emo-dev                                                      # by default, will create an incremental pgBackRest backup.
$ pgo backup emo-dev --backup-opts="--type=full --repo1-retention-full=3” # to create a full backup and retain the previous 3 full backups
$ pgo df emo-dev
PVC               INSTANCE                     POD                               TYPE        USED     CAPACITY  % USED
----------------- ---------------------------- --------------------------------- ----------- -------- --------- ------
emo-dev           emo-dev                      emo-dev-65847bbd9c-jrpqk          data        144MiB   100GiB    0%
emo-dev-pgbr-repo emo-dev-backrest-shared-repo emo-dev-backrest-shared-repo-...  pgbackrest  11MiB    1024MiB   1%
emo-dev-ewiy      emo-dev-ewiy                 emo-dev-ewiy-5dbb977b89-lzs4f     data        127MiB   100GiB    0%
$

 

b. Displaying Backup Information

$ pgo show backup emo-dev
cluster: emo-dev
storage type: local
stanza: db
    status: ok
    cipher: none
    db (current)
        wal archive min/max (11-1)
        full backup: 20200909-054554F
            timestamp start/stop: 2020-09-09 14:45:54 +0900 KST / 2020-09-09 14:46:18 +0900 KST
            wal start/stop: 000000010000000000000003 / 000000010000000000000003
            database size: 30.1MiB, backup size: 30.1MiB
            repository size: 3.6MiB, repository backup size: 3.6MiB
            backup reference list:
        incr backup: 20200909-054554F_20200909-064518I
            timestamp start/stop: 2020-09-09 15:45:18 +0900 KST / 2020-09-09 15:45:24 +0900 KST
            wal start/stop: 00000002000000000000000F / 00000002000000000000000F
            database size: 30.1MiB, backup size: 47.0KiB
            repository size: 3.6MiB, repository backup size: 3.4KiB
            backup reference list: 20200909-054554F
        incr backup: 20200909-054554F_20200909-083154I
            timestamp start/stop: 2020-09-09 17:31:54 +0900 KST / 2020-09-09 17:31:58 +0900 KST
            wal start/stop: 000000030000000000000016 / 000000030000000000000016
            database size: 30.1MiB, backup size: 2.1MiB
            repository size: 3.6MiB, repository backup size: 220.3KiB
            backup reference list: 20200909-054554F, 20200909-054554F_20200909-064518I
…
$

    local: Uses the storage that is provided by the Kubernetes cluster’s Storage Class that you select

 

c. Restore a Cluster

- The PostgreSQL Operator supports the ability to perform a full restore on a PostgreSQL cluster (i.e. a “clone” or “copy”)

$ pgo create cluster data-lake --restore-from=emo-dev  # Full Restore / clone
$ pgo restore emo-dev --pitr-target="2019-12-31 11:59:59.999999+00" --backup-opts="--type=time"  # Point-in-time-Recovery (PITR) / Restore in-place

참고. PostgreSQL

    pg_dump — extract a PostgreSQL database into a script file or other archive file

    pg_dumpall -- extract a PostgreSQL database cluster into a script file

    pg_restore --  restore a PostgreSQL database from an archive file created by pg_dump

 

 

7. Monitoring

a. pgAdmin4 dashboard

b. View Disk Utilization

$ pgo df emo-dev
PVC               INSTANCE                     POD                               TYPE        USED     CAPACITY  % USED
----------------- ---------------------------- --------------------------------- ----------- -------- --------- ------
emo-dev           emo-dev                      emo-dev-65847bbd9c-jrpqk          data        144MiB   1024MiB   14%
emo-dev-pgbr-repo emo-dev-backrest-shared-repo emo-dev-backrest-shared-repo-...  pgbackrest  11MiB    1024MiB   1%
emo-dev-ewiy      emo-dev-ewiy                 emo-dev-ewiy-5dbb977b89-lzs4f     data        127MiB   1024MiB   12%
$

 

c. Grafana

- pgMonitor is your all-in-one tool to easily create an environment to visualize the health and performance of your PostgreSQL cluster.

   https://access.crunchydata.com/documentation/pgmonitor/4.1/

   https://access.crunchydata.com/documentation/postgres-operator/4.4.1/installation/other/ansible/installing-metrics/

- pgMonitor combines a suite of tools to facilitate the collection and visualization of important metrics that you need be aware of in your PostgreSQL database and your host environment, including:

   ✓ Connection counts: how busy is your system being accessed and if connections are hanging

   ✓ Database size: how much disk your cluster is using

   ✓ Replication lag: know if your replicas are falling behind in loading data from your primary

   ✓ Transaction wraparound: don’t let your PostgreSQL database stop working

   ✓ Bloat: how much extra space are your tables and indexes using

   ✓ System metrics: CPU, Memory, I/O, uptime

 

- Grafana:  http://14.52.244.134:31814 admin/*****

   Dashboard list:

   ✓ Bloat Details 

       Every such record that has been deleted but is still taking some space is called a dead tuple.

       The space occupied by these dead tuples may be referred to as Bloat.

       VACUUM reclaims the storage occupied by these dead tuples. 

   ✓ CRUD Details

   ✓ PGBackrest

   ✓ PostgreSQL Details

   ✓ PostgreSQL Overview

   ✓ TableSize Details

- Troubleshooting 

   ▷ Problem:  Invalid Username or Password (Username: admin, Password: *****)

   ▷ Cause:

sh-4.2$ vi /data/defaults.ini
#################################### Security ############################
[security]
# default admin user, created on startup
admin_user = 'admin'

# default admin password, can be changed before first start of grafana,  or in profile settings
admin_password = '*****’
...
$ kubectl exec crunchy-grafana-5746f458bb-cb755 -n pgo -it -- sh
sh-4.2$ sqlite3 /data/grafana/data/grafana.db
SQLite version 3.7.17 2013-05-20 00:56:22
sqlite> .schema user
CREATE TABLE `user` (
`id` INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL
, `version` INTEGER NOT NULL
, `login` TEXT NOT NULL
, `email` TEXT NOT NULL
, `name` TEXT NULL
, `password` TEXT NULL
, `salt` TEXT NULL
...
sqlite>  select login, password, salt from user;
admin|b5375fb1f77a178e76fd32c0ccd5e2eb6273dacaccb551ad97560f0b6b9c6eb7c2e70ef623720c4e9e8539073d705496b14f|TDLtpk6XBE
sqlite> .quit

   ▷ Workaround: grafana-cli 명령어로 admin의 암호를 admin으로 변경

sh-4.2$  /opt/cpm/bin/grafana-6.7.4/bin/grafana-cli --homepath /opt/cpm/bin/grafana-6.7.4 admin reset-admin-password *****
EROR[09-11|09:05:46] Failed to detect generated javascript files in public/build logger=settings
WARN[09-11|09:05:46] [Removed] Session setting was removed in v6.2, use remote_cache option instead logger=settings
INFO[09-11|09:05:46] Connecting to DB                         logger=sqlstore dbtype=sqlite3
INFO[09-11|09:05:46] Starting DB migration                    logger=migrator
Admin password changed successfully ✔
sh-4.2$

 

 

8. pgBouncer

a. pgBouncer

- pgBouncer is a lightweight connection poooler and state manager that provides an efficient gateway to metering connections to PostgreSQL

  https://access.crunchydata.com/documentation/postgres-operator/4.6.2/tutorial/pgbouncer/

 

b. Config

[iap@iap01 ~]$ pgo create pgbouncer ktxtwinproject
ktxtwinproject pgbouncer added
[iap@iap01 ~]$ kubectl patch service ktxtwinproject-pgbouncer -n pgo -p '{ "spec": { "type": "NodePort" } }'
[iap@iap01 ~]$

[iap@iap01 ~]$ k describe secrets ktxtwinproject-pgbouncer-secret -n pgo | grep Data -A10
Data
====
pgbouncer.ini:  544 bytes
users.txt:      49 bytes
password:       24 bytes
pg_hba.conf:    55 bytes
[iap@iap01 ~]$ k get secrets ktxtwinproject-pgbouncer-secret -n pgo -o jsonpath='{.data.\pgbouncer\.ini}' | base64 --decode
[databases]
* = host=ktxtwinproject port=5432 auth_user=pgbouncer

[pgbouncer]
listen_port = 5432
listen_addr = *
auth_type = md5
auth_file = /pgconf/users.txt
auth_query = SELECT username, password from pgbouncer.get_auth($1)
pidfile = /tmp/pgbouncer.pid
logfile = /dev/stdout
admin_users = pgbouncer
stats_users = pgbouncer
default_pool_size = 20
max_client_conn = 100
max_db_connections = 0
min_pool_size = 0
pool_mode = session
reserve_pool_size = 0
reserve_pool_timeout = 5
query_timeout = 0
ignore_startup_parameters = extra_float_digits
[iap@iap01 ~]$ pgo show pgbouncer ktxtwinproject
CLUSTER        SERVICE                  USERNAME  PASSWORD                 CLUSTER IP    EXTERNAL IP
-------------- ------------------------ --------- ------------------------ ------------- -----------
ktxtwinproject ktxtwinproject-pgbouncer pgbouncer *****                    10.97.225.118
[iap@iap01 ~]$ k get svc -n pgo | egrep 'NAME|ktxtwinproject'
NAME                                  TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                                        AGE
ktxtwinproject-pgbouncer              NodePort    10.97.225.118    <none>        5432:32745/TCP                                 52m
ktxtwinproject                        NodePort    10.100.19.10     <none>        9187:30686/TCP,2022:31415/TCP,5432:32119/TCP   63d
ktxtwinproject-replica                ClusterIP   10.111.194.160   <none>        9187/TCP,2022/TCP,5432/TCP                     63d
...
[iap@iap01 ~]$ get pod -n pgo | egrep 'NAME|pgbouncer'
NAME                                                   READY   STATUS             RESTARTS   AGE
ktxtwinproject-pgbouncer-65c9855874-v2b75              1/1     Running            0          4m46s
[iap@iap01 ~]$

 

c. Test

[iap@iap01 ~]$ k exec ktxtwinproject-ukfw-7b5675b449-24c8z -n pgo -it -- bash
bash-4.2$ export PGPASSWORD=ktxtwin@5432; psql -h ktxtwinproject-pgbouncer  -p 5432 -U ktxtwin ktxtwinproject
psql (11.9)
Type "help" for help.

ktxtwinproject=>
or
yoosungjeon@ysjeon-Dev ~ % export PGPASSWORD=ktxtwin@5432; psql -h iap-vip  -p 32745 -U ktxtwin ktxtwinproject
psql (13.2, server 11.9)
Type "help" for help.

ktxtwinproject=>

[iap@iap01 ~]$ k logs ktxtwinproject-pgbouncer-65c9855874-f2428 -n pgo -f
2021-04-22 06:14:34.075 UTC [13] LOG C-0x55c1c4e94950: ktxtwinproject/ktxtwin@10.244.0.0:28593 login attempt: db=ktxtwinproject user=ktxtwin tls=no
2021-04-22 06:14:56.049 UTC [13] LOG stats: 0 xacts/s, 0 queries/s, in 11 B/s, out 37 B/s, xact 33007 us, query 33007 us, wait 969 us
…
2021-04-22 06:15:40.478 UTC [13] LOG C-0x55c1c4e94950: ktxtwinproject/ktxtwin@10.244.0.0:28593 closing because: client close request (age=66s)

 

'Kubernetes > DBMS' 카테고리의 다른 글

MariaDB - replication  (0) 2021.10.29
MariaDB  (0) 2021.10.04
Crunchy PostgreSQL Managements  (0) 2021.09.23
Crunchy PostgreSQL Operator #1  (0) 2021.09.23

댓글