How to setup high performance PostgreSQL in an AWS Local Zone

0

How to setup PostgreSQL in an AWS Local Zone

AWS
djinn
asked 10 months ago634 views
1 Answer
0
Accepted Answer

PostgreSQL in local zone provides performance at edge to latency sensitive workloads. This in meant to be used only in situations RDS managed services is not available like a Local Zone.

Identifying the instance type for PostgreSQL

Database, from AWS observations, have specific pattern of resource consumption. We find picking right instance type significantly reduce existence of bottlenecks in workload. AWS has 3 popular lines of EC2 instances; c instance with vCore:GB/of RAM ratio 1:2, m instance with ratio 1:4 and r instance with ratio 1:8. r5d.2xlarge is preferred EC2 instance type for typical customer production deployment. r instance are useful for database workloads because large disk caches saves disk activities because storage pages can be cached on RAM. While c and m instance could be relevant for specific workloads, for database workloads r instance is recommended default.

Operating System Networking

In a high velocity transactions system the networking activity starts to also influence performance. The TCP/IP stack behaviour can influence the database performance as a choke hold. These settings in sysctl makes TCP/IP stack more suitable to high performance database;

// /etc/sysctl.conf
net.ipv4.tcp_sack=1
net.ipv4.tcp_timestamps=1
net.ipv4.tcp_wmem = 4096 65536 4194304
net.ipv4.tcp_rmem = 4096 87380 4194304

net.ipv4.tcp_fin_timeout = 1
net.ipv4.tcp_tw_recycle = 1
net.core.wmem_max = 16777216
net.core.rmem_max = 16777216
net.core.wmem_default = 16777216
net.core.rmem_default = 16777216
net.core.netdev_max_backlog = 262144
net.core.somaxconn = 262144

net.ipv4.tcp_syncookies = 0
net.ipv4.tcp_max_orphans = 262144
net.ipv4.tcp_max_syn_backlog = 262144
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_syn_retries = 2

The configuration is pushed into the kernel using sudo sysctl -p

We can also reduce network overheads by removing iptables because Operating System does not need to support firewall given Security Groups provide the requisite service.

# /etc/modprobe.d/blacklist.conf
2 blacklist ip_tables

followed by sudo reboot

Setting PostgreSQL as the highest priority system

The Linux processes have run-time priority with ranges from lowest to highest 19 to -20. In a single concern system, defining priority of main process is a good idea. In this case we are planning to run PostgreSQL as the main concern. This can be done by adding Nice parameter in systemd file of PostgreSQL

# /etc/systemd/system/multi-user.target.wants/postgresql.service
# systemd service for managing all PostgreSQL clusters on the system. This
# service is actually a systemd target, but we are using a service since
# targets cannot be reloaded.

[Unit]
Description=PostgreSQL RDBMS

[Service]
Type=oneshot
Nice=-20
ExecStart=/bin/true
ExecReload=/bin/true
RemainAfterExit=on

Using separate volume for storing pg stats

PostgreSQL stores in statistics in a specific high activity directory. These statistics are telemetry for the PostgreSQL. Storing this activity in separate volume saves valuable IOPS on core workload of database tables. This activity is 2 step process; Firstly add a volume as candidate directory for storing the data. In this case we use RAM storage.

# Add this line to /etc/fstab
tmpfs /var/lib/pg_stats tmpfs size=1G,nosuid,nodev,noatime 0 0

Followed by adding /var/lib/pg stats/14-main.pg stat tmp as parameter for PostgreSQL variable stats temp directory. We will see entire set of changes in PostgreSQL.

PGTune for your Local Zone EC2 Instance

PGTune is a neat little tool for generating configuration changes for operating hyperscale PostgreSQL. We use the parameters;

ParameterValue
DB Version14
OS TypeLinux
DB TypeOnline Transaction Processing System
Total Memory30 GB
CPU4
Number of Connections100
StorageSSD

PGTune issues this configuration

Add these lines to /etc/postgresql/14/main/postgresql.conf 2
# DB Version: 14
# OS Type: linux
# DB Type: oltp
# Total Memory (RAM): 30 GB
# CPUs num: 4
# Connections num: 100
# Data Storage: ssd

max_connections = 100
shared_buffers = 7680MB
effective_cache_size = 23040MB
maintenance_work_mem = 1920MB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 39321kB
min_wal_size = 2GB
max_wal_size = 8GB
max_worker_processes = 4
max_parallel_workers_per_gather = 2
max_parallel_workers = 4
max_parallel_maintenance_workers = 2
# For pg_stats we also add
stats_temp_directory = ’/var/lib/pg_stats/14-main.pg_stat_tmp’

Conclusion

These changes produce dramatically enhanced OLTP read-write performance for EC2 in Local Zone PostgreSQL setup. These characteristics place Local Zone EC2 self-managed PostgreSQL as a compelling performance alternative to RDS if customers requirements are such. It still does not out of the box provide resilience characteristics and customers is expected to manage the risks involved with self-managed installations.

AWS
djinn
answered 10 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions