Simulating Async Extension Upgrades with pg_upgrade
Major PostgreSQL version transitions are fundamentally constrained by extension compatibility matrices. Executing pg_upgrade directly against production data directories without deterministic pre-validation introduces unacceptable catalog corruption and shared-object resolution risks. Platform engineers and database SREs must isolate extension lifecycle validation into a non-blocking simulation phase that executes parallel to CI/CD artifact staging. This workflow anchors the broader Extension Upgrade Planning & Compatibility Validation process, ensuring shared object resolution, control file alignment, and catalog dependency graphs are verified before any binary swap occurs.
Simulation Architecture & Exact Execution Pattern
pg_upgrade operates synchronously by design, but its --check mode enables async-compatible simulation when decoupled from production data directories. The simulation clones the source cluster, initializes a target version skeleton, and validates extension compatibility without modifying pg_wal or system catalogs. The validation engine reads pg_extension, pg_available_extensions, and pg_proc catalogs, cross-referencing them against the target server’s PG_VERSION_NUM and PG_MODULE_MAGIC expectations.
# 1. Clone source data directory (exclude WAL and lock files).
# NOTE: stop the source cluster cleanly (or snapshot the volume) before cloning;
# copying a live data directory yields an inconsistent pg_control and fails --check.
rsync -a --exclude=pg_wal --exclude=postmaster.pid \
/var/lib/postgresql/15/main/ /tmp/pg_sim_src/
# 2. Initialize target version data directory
/usr/lib/postgresql/17/bin/initdb -D /tmp/pg_sim_tgt \
--encoding=UTF8 --locale=C --data-checksums
# 3. Execute dry-run with extension validation
/usr/lib/postgresql/17/bin/pg_upgrade \
--old-datadir /tmp/pg_sim_src \
--new-datadir /tmp/pg_sim_tgt \
--old-bindir /usr/lib/postgresql/15/bin \
--new-bindir /usr/lib/postgresql/17/bin \
--check --jobs=$(nproc)
The --check flag forces pg_upgrade to validate catalog and shared-object alignment and halt before modifying any system tables — no data is copied or linked during a check run (so --link is intentionally omitted here). Exit code 0 indicates catalog and shared object alignment; non-zero exits require immediate diagnostics. All diagnostic output is routed to pg_upgrade_output.d/pg_upgrade.log, which contains precise failure signatures for automated parsing.
Edge-Case Diagnostics & Resolution Patterns
pg_upgrade simulation failures typically map to three deterministic categories. The following matrix provides exact symptom identification and step-by-step remediation:
| Failure Signature | Root Cause | Resolution |
|---|---|---|
extension "postgis" requires shared library "postgis-3.so" not found |
Target libdir missing exact .so version |
Pre-install extension binaries in target libdir. Verify pg_config --pkglibdir matches shared_preload_libraries paths. |
extension "pg_stat_statements" has no update path from version 1.9 to 1.10 |
Missing pg_stat_statements--1.9--1.10.sql upgrade script in share/extension/ |
Deploy the exact <ext>--<old>--<new>.sql migration file to target share/extension/. Validate with SELECT * FROM pg_available_extensions WHERE name = 'pg_stat_statements'; |
could not load library "$libdir/pgcrypto": undefined symbol: PQencryptPasswordConn |
ABI mismatch between PostgreSQL minor versions | Rebuild the extension against the target server’s headers using pg_config --includedir-server. Ensure the extension package matches the exact target major version. |
old cluster is missing the "pg_catalog.pg_extension" system catalog |
Corrupted or incomplete rsync clone |
Re-run rsync with --checksum and verify pg_control integrity using pg_controldata. Ensure pg_upgrade has read access to all .conf files. |
extension "timescaledb" requires version "2.14.0" but found "2.13.1" |
Control file default_version mismatch |
Align the target package manager state: apt install timescaledb-2-postgresql-17 or equivalent. Run ALTER EXTENSION timescaledb UPDATE; as the first statement in a fresh session (a TimescaleDB requirement) in a staging clone to verify script execution order. |
When resolving undefined symbol errors, always verify dynamic linker resolution before re-running the simulation:
ldd /usr/lib/postgresql/17/lib/pgcrypto.so | grep "not found"
If ldd reports missing dependencies, the extension was compiled against an incompatible libpq or OpenSSL version. Rebuild using the target distribution’s toolchain and validate with pg_isready before re-executing pg_upgrade --check.
Safe Automation & Pipeline Integration
Production-grade extension validation requires idempotent, parallelized execution within ephemeral CI runners. The simulation should be triggered automatically upon detection of new PostgreSQL major version artifacts in the package registry. A robust automation pattern follows these steps:
- Artifact Staging & Directory Mounting: Pull the target PostgreSQL binaries and extension RPMs/DEBs into a containerized runner. Mount the cloned source data directory as read-only to prevent accidental mutation.
- Parallel Matrix Execution: Spawn concurrent
pg_upgrade --checkjobs across different extension combinations.--jobsparallelizes the real upgrade’s dump/restore phase rather than the--checkpass, so achieve simulation parallelism by isolating each matrix combination to a unique/tmp/pg_sim_tgt_*directory and running the checks concurrently. - Deterministic Log Parsing: Extract
pg_upgrade_output.d/pg_upgrade.logand grep forFATAL,ERROR, orWARNING. Exit the pipeline if any non-zero status is returned. Successful runs should archive the log for audit compliance. - Gate Enforcement: Only promote the target PostgreSQL package to the production staging environment if the Async Upgrade Simulation phase completes with a clean exit code across all extension matrices.
For teams managing heterogeneous extension portfolios, integrating this validation into infrastructure-as-code pipelines eliminates manual catalog inspection. The official PostgreSQL pg_upgrade documentation details additional flags like --old-options and --new-options that can be leveraged when custom postgresql.conf parameters interfere with the dry-run parser. By enforcing strict simulation gates, organizations reduce major version maintenance windows from days to predictable, sub-hour operations.