2021.04.22, 2020.09.10
1. Crunchy PostgreSQL Operator 소개
2. PostgreSQL Operator Configuration
3. Create a PostgreSQL Cluster (Multi-Tenancy)
4. pgAdmin4
Crunchy PostgreSQL Operator #2
5. High-Availability
6. Disaster Recovery: Backups & Restores
7. Monitoring
8. pgBouncer
5. High-Availability
- High-Availability: Scaling Up & Down
- The Crunchy PostgreSQL Operator supports a distributed-consensus based high-availability (HA) system that keeps its managed PostgreSQL clusters up and running, even if the PostgreSQL Operator disappears.
Additionally, it leverages Kubernetes specific features such as Pod Anti-Affinity to limit the surface area that could lead to a PostgreSQL cluster becoming unavailable.
- High-Availability (요약)
https://access.crunchydata.com/documentation/postgres-operator/4.4.1/architecture/high-availability/
- HA is enabled by default in any newly created PostgreSQL cluster.
- The Crunchy PostgreSQL Operator High-Availability Algorithm
The key portion of high-availability that the PostgreSQL Operator provides is that it delegates the management of HA to the PostgreSQL clusters themselves.
Each HA PostgreSQL cluster maintains its availability using concepts that come from the Raft algorithm to achieve distributed consensus.
The Raft algorithm (“Reliable, Replicated, Redundant, Fault-Tolerant”) was developed for systems that have one “leader” (i.e. a primary) and one-to-many followers (i.e. replicas) to provide the same fault tolerance and safety.
Each PostgreSQL cluster leverages the distributed etcd key-value store that is bundled with Kubernetes.
- By default, when a new PostgreSQL cluster is created using the PostgreSQL Operator, pod anti-affinity rules will be applied to any deployments comprising the full PG cluster
a. Creating a New Replica
$ pgo scale emo-dev --replica-count=2
created Pgreplica emo-dev-qfbs
created Pgreplica emo-dev-yfhd
$ pgo show cluster emo-dev
cluster : emo-dev (crunchy-postgres-ha:centos7-11.9-4.4.1)
pod : emo-dev-5dddffb55c-pvkbm (Running) on iap04 (1/1) (primary)
pvc: emo-dev (100Gi)
pod : emo-dev-qfbs-564bf6b648-w4z8b (Running) on iap08 (1/1) (replica)
pvc: emo-dev-qfbs (100Gi)
pod : emo-dev-yfhd-64f694bcd9-gj4jv (Running) on iap06 (1/1) (replica)
pvc: emo-dev-yfhd (100Gi)
resources : Memory: 1Gi
deployment : emo-dev
deployment : emo-dev-backrest-shared-repo
deployment : emo-dev-pgadmin
deployment : emo-dev-qfbs
deployment : emo-dev-yfhd
service : emo-dev - ClusterIP (10.97.187.147)
service : emo-dev-pgadmin - ClusterIP (10.100.196.158)
service : emo-dev-replica - ClusterIP (10.99.113.241)
pgreplica : emo-dev-qfbs
pgreplica : emo-dev-yfhd
labels : pgouser=admin autofail=true crunchy-pgha-scope=emo-dev pg-pod-anti-affinity= deployment-name=emo-dev name=emo-dev pg-cluster=emo-dev pgo-backrest=true pgo-version=4.4.1 crunchy-pgadmin=true crunchy-pgbadger=false crunchy_collect=false workflowid=5bf710f2-d772-4e8e-8862-9481e22b6d75
$ kubectl patch service emo-dev-replica -n pgo -p '{ "spec": { "type": "NodePort" } }'
$ kubectl get svc emo-dev-replica -n pgo -o jsonpath='{.spec.ports[?(@.name=="postgres")].nodePort}'
30796
$
- on Remote client
yoosungjeon@ysjeon-Dev crunchy-postgresql-operator % /Users/yoosungjeon/PostgreSQL/9.5/bin/psql -h 14.52.244.134 -p 30796 -U emo emo-dev
psql (9.5.0, server 12.4)
emo-dev=> select count(*) from tab1;
count
-------
324
(1 row)
emo-dev=> insert into tab1 values ( 1004 );
ERROR: cannot execute INSERT in a read-only transaction
emo-dev=>
b. Viewing Available Replicas
$ pgo failover --query emo-dev
Cluster: emo-dev
REPLICA STATUS NODE REPLICATION LAG PENDING RESTART
emo-dev-qfbs running iap08 0 MB false
emo-dev-yfhd running iap06 0 MB false
$
c. Manual Failover
- The PostgreSQL Operator is set up with an automated failover system based on distributed consensus but there may be times where you wish to have your cluster manually failover.
$ pgo failover emo-dev --target=emo-dev-yfhd
WARNING: Are you sure? (yes/no): y
created Pgtask (failover) for cluster emo-dev
$ pgo show cluster emo-dev
cluster : emo-dev (crunchy-postgres-ha:centos7-11.9-4.4.1)
pod : emo-dev-5dddffb55c-pvkbm (Running) on iap04 (1/1) (replica)
pvc: emo-dev (1Gi)
pod : emo-dev-qfbs-564bf6b648-w4z8b (Running) on iap08 (1/1) (replica)
pvc: emo-dev-qfbs (1Gi)
pod : emo-dev-yfhd-64f694bcd9-gj4jv (Running) on iap06 (1/1) (primary)
pvc: emo-dev-yfhd (1Gi)
…
$
- 참고사항
Non-HA 환경에서 PostgreSQL 종료 시 Kubernetes가 재기동 시키는데 소용 시간 : 53초 (IAP 운영환경 기준)
Manual failover에 의한 절체 시간 : 4초 (IAP 운영환경 기준)
$ while true
do
export PGPASSWORD=*****; psql -h 14.52.244.136 -p 31738 -U emo emo-dev -c "select now()" | head -n 3 | tail -n 1
sleep 1
done
…
2020-09-09 08:34:20.528232+00
psql: FATAL: the database system is shutting down
psql: FATAL: the database system is shutting down
psql: could not connect to server: Connection refused
Is the server running on host "14.52.244.134" and accepting
TCP/IP connections on port 31738?
2020-09-09 08:34:24.760391+00
2020-09-09 08:34:25.835177+00
…
d. Primary PostreSQL Instance 강제 종료
$ k delete pod emo-dev-yfhd-64f694bcd9-gj4jv -n pgo
pod "emo-dev-yfhd-64f694bcd9-gj4jv" deleted
$ pgo show cluster emo-dev
cluster : emo-dev (crunchy-postgres-ha:centos7-11.9-4.4.1)
pod : emo-dev-5dddffb55c-pvkbm (Running) on iap04 (1/1) (primary)
pvc: emo-dev (1Gi)
pod : emo-dev-qfbs-564bf6b648-w4z8b (Running) on iap08 (1/1) (replica)
pvc: emo-dev-qfbs (1Gi)
pod : emo-dev-yfhd-64f694bcd9-hj4g8 (Pending) on iap08 (0/1) (unknown) —> 일정 시간 후 replica 상태 변경
pvc: emo-dev-yfhd (1Gi)
…
$
e. Replica PostgreSQL Instance 강제 종료
- 서비스에 영향을 주지 않음
$ k rollout restart deployment emo-dev -n pgo
deployment.extensions/emo-dev restarted
$
f. Destroying a Replica
$ pgo scaledown emo-dev --target=emo-dev-yfhd
WARNING: Are you sure? (yes/no): yes
deleted replica emo-dev-yfhd
$
6. Disaster Recovery: Backups & Restores
- The PostgreSQL Operator uses the open source pgBackRest backup and recovery utility for managing backups and PostgreSQL archives.
The PostgreSQL Operator supports creating backups in S3 or any object storage system that uses the S3 protocol.
a. Creating a Backup
- When a new PostgreSQL cluster is provisioned by the PostgreSQL Operator, a full pgBackRest backup is taken by default.
This is required in order to create new replicas (via pgo scale) for the PostgreSQL cluster as well as healing during a failover scenario.
$ pgo backup emo-dev # by default, will create an incremental pgBackRest backup.
$ pgo backup emo-dev --backup-opts="--type=full --repo1-retention-full=3” # to create a full backup and retain the previous 3 full backups
$ pgo df emo-dev
PVC INSTANCE POD TYPE USED CAPACITY % USED
----------------- ---------------------------- --------------------------------- ----------- -------- --------- ------
emo-dev emo-dev emo-dev-65847bbd9c-jrpqk data 144MiB 100GiB 0%
emo-dev-pgbr-repo emo-dev-backrest-shared-repo emo-dev-backrest-shared-repo-... pgbackrest 11MiB 1024MiB 1%
emo-dev-ewiy emo-dev-ewiy emo-dev-ewiy-5dbb977b89-lzs4f data 127MiB 100GiB 0%
$
b. Displaying Backup Information
$ pgo show backup emo-dev
cluster: emo-dev
storage type: local
stanza: db
status: ok
cipher: none
db (current)
wal archive min/max (11-1)
full backup: 20200909-054554F
timestamp start/stop: 2020-09-09 14:45:54 +0900 KST / 2020-09-09 14:46:18 +0900 KST
wal start/stop: 000000010000000000000003 / 000000010000000000000003
database size: 30.1MiB, backup size: 30.1MiB
repository size: 3.6MiB, repository backup size: 3.6MiB
backup reference list:
incr backup: 20200909-054554F_20200909-064518I
timestamp start/stop: 2020-09-09 15:45:18 +0900 KST / 2020-09-09 15:45:24 +0900 KST
wal start/stop: 00000002000000000000000F / 00000002000000000000000F
database size: 30.1MiB, backup size: 47.0KiB
repository size: 3.6MiB, repository backup size: 3.4KiB
backup reference list: 20200909-054554F
incr backup: 20200909-054554F_20200909-083154I
timestamp start/stop: 2020-09-09 17:31:54 +0900 KST / 2020-09-09 17:31:58 +0900 KST
wal start/stop: 000000030000000000000016 / 000000030000000000000016
database size: 30.1MiB, backup size: 2.1MiB
repository size: 3.6MiB, repository backup size: 220.3KiB
backup reference list: 20200909-054554F, 20200909-054554F_20200909-064518I
…
$
local: Uses the storage that is provided by the Kubernetes cluster’s Storage Class that you select
c. Restore a Cluster
- The PostgreSQL Operator supports the ability to perform a full restore on a PostgreSQL cluster (i.e. a “clone” or “copy”)
$ pgo create cluster data-lake --restore-from=emo-dev # Full Restore / clone
$ pgo restore emo-dev --pitr-target="2019-12-31 11:59:59.999999+00" --backup-opts="--type=time" # Point-in-time-Recovery (PITR) / Restore in-place
참고. PostgreSQL
pg_dump — extract a PostgreSQL database into a script file or other archive file
pg_dumpall -- extract a PostgreSQL database cluster into a script file
pg_restore -- restore a PostgreSQL database from an archive file created by pg_dump
7. Monitoring
a. pgAdmin4 dashboard
b. View Disk Utilization
$ pgo df emo-dev
PVC INSTANCE POD TYPE USED CAPACITY % USED
----------------- ---------------------------- --------------------------------- ----------- -------- --------- ------
emo-dev emo-dev emo-dev-65847bbd9c-jrpqk data 144MiB 1024MiB 14%
emo-dev-pgbr-repo emo-dev-backrest-shared-repo emo-dev-backrest-shared-repo-... pgbackrest 11MiB 1024MiB 1%
emo-dev-ewiy emo-dev-ewiy emo-dev-ewiy-5dbb977b89-lzs4f data 127MiB 1024MiB 12%
$
c. Grafana
- pgMonitor is your all-in-one tool to easily create an environment to visualize the health and performance of your PostgreSQL cluster.
https://access.crunchydata.com/documentation/pgmonitor/4.1/
- pgMonitor combines a suite of tools to facilitate the collection and visualization of important metrics that you need be aware of in your PostgreSQL database and your host environment, including:
✓ Connection counts: how busy is your system being accessed and if connections are hanging
✓ Database size: how much disk your cluster is using
✓ Replication lag: know if your replicas are falling behind in loading data from your primary
✓ Transaction wraparound: don’t let your PostgreSQL database stop working
✓ Bloat: how much extra space are your tables and indexes using
✓ System metrics: CPU, Memory, I/O, uptime
- Grafana: http://14.52.244.134:31814 admin/*****
Dashboard list:
✓ Bloat Details
Every such record that has been deleted but is still taking some space is called a dead tuple.
The space occupied by these dead tuples may be referred to as Bloat.
VACUUM reclaims the storage occupied by these dead tuples.
✓ CRUD Details
✓ PGBackrest
✓ PostgreSQL Details
✓ PostgreSQL Overview
✓ TableSize Details
- Troubleshooting
▷ Problem: Invalid Username or Password (Username: admin, Password: *****)
▷ Cause:
sh-4.2$ vi /data/defaults.ini
#################################### Security ############################
[security]
# default admin user, created on startup
admin_user = 'admin'
# default admin password, can be changed before first start of grafana, or in profile settings
admin_password = '*****’
...
$ kubectl exec crunchy-grafana-5746f458bb-cb755 -n pgo -it -- sh
sh-4.2$ sqlite3 /data/grafana/data/grafana.db
SQLite version 3.7.17 2013-05-20 00:56:22
sqlite> .schema user
CREATE TABLE `user` (
`id` INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL
, `version` INTEGER NOT NULL
, `login` TEXT NOT NULL
, `email` TEXT NOT NULL
, `name` TEXT NULL
, `password` TEXT NULL
, `salt` TEXT NULL
...
sqlite> select login, password, salt from user;
admin|b5375fb1f77a178e76fd32c0ccd5e2eb6273dacaccb551ad97560f0b6b9c6eb7c2e70ef623720c4e9e8539073d705496b14f|TDLtpk6XBE
sqlite> .quit
▷ Workaround: grafana-cli 명령어로 admin의 암호를 admin으로 변경
sh-4.2$ /opt/cpm/bin/grafana-6.7.4/bin/grafana-cli --homepath /opt/cpm/bin/grafana-6.7.4 admin reset-admin-password *****
EROR[09-11|09:05:46] Failed to detect generated javascript files in public/build logger=settings
WARN[09-11|09:05:46] [Removed] Session setting was removed in v6.2, use remote_cache option instead logger=settings
INFO[09-11|09:05:46] Connecting to DB logger=sqlstore dbtype=sqlite3
INFO[09-11|09:05:46] Starting DB migration logger=migrator
Admin password changed successfully ✔
sh-4.2$
8. pgBouncer
a. pgBouncer
- pgBouncer is a lightweight connection poooler and state manager that provides an efficient gateway to metering connections to PostgreSQL
https://access.crunchydata.com/documentation/postgres-operator/4.6.2/tutorial/pgbouncer/
b. Config
[iap@iap01 ~]$ pgo create pgbouncer ktxtwinproject
ktxtwinproject pgbouncer added
[iap@iap01 ~]$ kubectl patch service ktxtwinproject-pgbouncer -n pgo -p '{ "spec": { "type": "NodePort" } }'
[iap@iap01 ~]$
[iap@iap01 ~]$ k describe secrets ktxtwinproject-pgbouncer-secret -n pgo | grep Data -A10
Data
====
pgbouncer.ini: 544 bytes
users.txt: 49 bytes
password: 24 bytes
pg_hba.conf: 55 bytes
[iap@iap01 ~]$ k get secrets ktxtwinproject-pgbouncer-secret -n pgo -o jsonpath='{.data.\pgbouncer\.ini}' | base64 --decode
[databases]
* = host=ktxtwinproject port=5432 auth_user=pgbouncer
[pgbouncer]
listen_port = 5432
listen_addr = *
auth_type = md5
auth_file = /pgconf/users.txt
auth_query = SELECT username, password from pgbouncer.get_auth($1)
pidfile = /tmp/pgbouncer.pid
logfile = /dev/stdout
admin_users = pgbouncer
stats_users = pgbouncer
default_pool_size = 20
max_client_conn = 100
max_db_connections = 0
min_pool_size = 0
pool_mode = session
reserve_pool_size = 0
reserve_pool_timeout = 5
query_timeout = 0
ignore_startup_parameters = extra_float_digits
[iap@iap01 ~]$ pgo show pgbouncer ktxtwinproject
CLUSTER SERVICE USERNAME PASSWORD CLUSTER IP EXTERNAL IP
-------------- ------------------------ --------- ------------------------ ------------- -----------
ktxtwinproject ktxtwinproject-pgbouncer pgbouncer ***** 10.97.225.118
[iap@iap01 ~]$ k get svc -n pgo | egrep 'NAME|ktxtwinproject'
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ktxtwinproject-pgbouncer NodePort 10.97.225.118 <none> 5432:32745/TCP 52m
ktxtwinproject NodePort 10.100.19.10 <none> 9187:30686/TCP,2022:31415/TCP,5432:32119/TCP 63d
ktxtwinproject-replica ClusterIP 10.111.194.160 <none> 9187/TCP,2022/TCP,5432/TCP 63d
...
[iap@iap01 ~]$ get pod -n pgo | egrep 'NAME|pgbouncer'
NAME READY STATUS RESTARTS AGE
ktxtwinproject-pgbouncer-65c9855874-v2b75 1/1 Running 0 4m46s
[iap@iap01 ~]$
c. Test
[iap@iap01 ~]$ k exec ktxtwinproject-ukfw-7b5675b449-24c8z -n pgo -it -- bash
bash-4.2$ export PGPASSWORD=ktxtwin@5432; psql -h ktxtwinproject-pgbouncer -p 5432 -U ktxtwin ktxtwinproject
psql (11.9)
Type "help" for help.
ktxtwinproject=>
or
yoosungjeon@ysjeon-Dev ~ % export PGPASSWORD=ktxtwin@5432; psql -h iap-vip -p 32745 -U ktxtwin ktxtwinproject
psql (13.2, server 11.9)
Type "help" for help.
ktxtwinproject=>
[iap@iap01 ~]$ k logs ktxtwinproject-pgbouncer-65c9855874-f2428 -n pgo -f
2021-04-22 06:14:34.075 UTC [13] LOG C-0x55c1c4e94950: ktxtwinproject/ktxtwin@10.244.0.0:28593 login attempt: db=ktxtwinproject user=ktxtwin tls=no
2021-04-22 06:14:56.049 UTC [13] LOG stats: 0 xacts/s, 0 queries/s, in 11 B/s, out 37 B/s, xact 33007 us, query 33007 us, wait 969 us
…
2021-04-22 06:15:40.478 UTC [13] LOG C-0x55c1c4e94950: ktxtwinproject/ktxtwin@10.244.0.0:28593 closing because: client close request (age=66s)
'Kubernetes > DBMS' 카테고리의 다른 글
MariaDB - replication (0) | 2021.10.29 |
---|---|
MariaDB (0) | 2021.10.04 |
Crunchy PostgreSQL Managements (0) | 2021.09.23 |
Crunchy PostgreSQL Operator #1 (0) | 2021.09.23 |
댓글