Simulating Async Extension Upgrades with pg_upgrade

Major PostgreSQL version transitions are fundamentally constrained by extension compatibility matrices. Executing pg_upgrade directly against production data directories without deterministic pre-validation introduces unacceptable catalog corruption and shared-object resolution risks. Platform engineers and database SREs must isolate extension lifecycle validation into a non-blocking simulation phase that executes parallel to CI/CD artifact staging. This workflow anchors the broader Extension Upgrade Planning & Compatibility Validation process, ensuring shared object resolution, control file alignment, and catalog dependency graphs are verified before any binary swap occurs.

Simulation Architecture & Exact Execution Pattern

pg_upgrade operates synchronously by design, but its --check mode enables async-compatible simulation when decoupled from production data directories. The simulation clones the source cluster, initializes a target version skeleton, and validates extension compatibility without modifying pg_wal or system catalogs. The validation engine reads pg_extension, pg_available_extensions, and pg_proc catalogs, cross-referencing them against the target server’s PG_VERSION_NUM and PG_MODULE_MAGIC expectations.

# 1. Clone source data directory (exclude WAL and lock files).
# NOTE: stop the source cluster cleanly (or snapshot the volume) before cloning;
# copying a live data directory yields an inconsistent pg_control and fails --check.
rsync -a --exclude=pg_wal --exclude=postmaster.pid \
  /var/lib/postgresql/15/main/ /tmp/pg_sim_src/

# 2. Initialize target version data directory
/usr/lib/postgresql/17/bin/initdb -D /tmp/pg_sim_tgt \
  --encoding=UTF8 --locale=C --data-checksums

# 3. Execute dry-run with extension validation
/usr/lib/postgresql/17/bin/pg_upgrade \
  --old-datadir /tmp/pg_sim_src \
  --new-datadir /tmp/pg_sim_tgt \
  --old-bindir /usr/lib/postgresql/15/bin \
  --new-bindir /usr/lib/postgresql/17/bin \
  --check --jobs=$(nproc)

The --check flag forces pg_upgrade to validate catalog and shared-object alignment and halt before modifying any system tables — no data is copied or linked during a check run (so --link is intentionally omitted here). Exit code 0 indicates catalog and shared object alignment; non-zero exits require immediate diagnostics. All diagnostic output is routed to pg_upgrade_output.d/pg_upgrade.log, which contains precise failure signatures for automated parsing.

Edge-Case Diagnostics & Resolution Patterns

pg_upgrade simulation failures typically map to three deterministic categories. The following matrix provides exact symptom identification and step-by-step remediation:

Failure Signature Root Cause Resolution
extension "postgis" requires shared library "postgis-3.so" not found Target libdir missing exact .so version Pre-install extension binaries in target libdir. Verify pg_config --pkglibdir matches shared_preload_libraries paths.
extension "pg_stat_statements" has no update path from version 1.9 to 1.10 Missing pg_stat_statements--1.9--1.10.sql upgrade script in share/extension/ Deploy the exact <ext>--<old>--<new>.sql migration file to target share/extension/. Validate with SELECT * FROM pg_available_extensions WHERE name = 'pg_stat_statements';
could not load library "$libdir/pgcrypto": undefined symbol: PQencryptPasswordConn ABI mismatch between PostgreSQL minor versions Rebuild the extension against the target server’s headers using pg_config --includedir-server. Ensure the extension package matches the exact target major version.
old cluster is missing the "pg_catalog.pg_extension" system catalog Corrupted or incomplete rsync clone Re-run rsync with --checksum and verify pg_control integrity using pg_controldata. Ensure pg_upgrade has read access to all .conf files.
extension "timescaledb" requires version "2.14.0" but found "2.13.1" Control file default_version mismatch Align the target package manager state: apt install timescaledb-2-postgresql-17 or equivalent. Run ALTER EXTENSION timescaledb UPDATE; as the first statement in a fresh session (a TimescaleDB requirement) in a staging clone to verify script execution order.

When resolving undefined symbol errors, always verify dynamic linker resolution before re-running the simulation:

ldd /usr/lib/postgresql/17/lib/pgcrypto.so | grep "not found"

If ldd reports missing dependencies, the extension was compiled against an incompatible libpq or OpenSSL version. Rebuild using the target distribution’s toolchain and validate with pg_isready before re-executing pg_upgrade --check.

Safe Automation & Pipeline Integration

Production-grade extension validation requires idempotent, parallelized execution within ephemeral CI runners. The simulation should be triggered automatically upon detection of new PostgreSQL major version artifacts in the package registry. A robust automation pattern follows these steps:

  1. Artifact Staging & Directory Mounting: Pull the target PostgreSQL binaries and extension RPMs/DEBs into a containerized runner. Mount the cloned source data directory as read-only to prevent accidental mutation.
  2. Parallel Matrix Execution: Spawn concurrent pg_upgrade --check jobs across different extension combinations. --jobs parallelizes the real upgrade’s dump/restore phase rather than the --check pass, so achieve simulation parallelism by isolating each matrix combination to a unique /tmp/pg_sim_tgt_* directory and running the checks concurrently.
  3. Deterministic Log Parsing: Extract pg_upgrade_output.d/pg_upgrade.log and grep for FATAL, ERROR, or WARNING. Exit the pipeline if any non-zero status is returned. Successful runs should archive the log for audit compliance.
  4. Gate Enforcement: Only promote the target PostgreSQL package to the production staging environment if the Async Upgrade Simulation phase completes with a clean exit code across all extension matrices.

For teams managing heterogeneous extension portfolios, integrating this validation into infrastructure-as-code pipelines eliminates manual catalog inspection. The official PostgreSQL pg_upgrade documentation details additional flags like --old-options and --new-options that can be leveraged when custom postgresql.conf parameters interfere with the dry-run parser. By enforcing strict simulation gates, organizations reduce major version maintenance windows from days to predictable, sub-hour operations.