Connection exhaustion
Spark's one-connection-per-partition default crashes PostgreSQL as concurrency rises. PGStyx decouples tasks from connection limits with internal HikariCP pooling.
PG_ERR: too_many_connectionsPGStyx replaces Spark's generic JDBC with a connector built for PostgreSQL — keyed upserts, pool-safe concurrency, runtime schema tolerance, and type fidelity for JSONB, UUID, and arrays.
PGStyx is a JVM datasource registered as pgstyx. Your existing df.write call keeps its shape — PGStyx takes over inside that call to handle pooling, upserts, type coercion, and schema drift before the rows hit Postgres.
A realistic production upsert: merge incoming orders into a mutable table by order_id, preserve jsonb metadata, and keep Postgres responsive under parallel Spark writes. On the left, what you write today. On the right, what PGStyx collapses it into.
// Stage table + merge + manual type handling val staging = "orders_staging_" + UUID.randomUUID df.repartition(8) // cap parallelism manually .withColumn("metadata", to_json(col("metadata"))) // cast jsonb → string .write .format("jdbc") .option("url", url) .option("dbtable", staging) .option("numPartitions", "8") .save() spark.read.format("jdbc").option("url", url) .option("query", s""" -- hand-written merge into orders by order_id -- preserve metadata and audit columns -- update this every time the shape changes """).load() sql(s"DROP TABLE $staging") // and hope nothing failed mid-run
jsonb casting, staging cleanup// Native upsert, pooled connections, schema drift absorbed df.write .format("pgstyx") .option("url", url) .option("dbtable", "orders") .option("writeMode", "upsert") .option("mergeKeys", "order_id") .option("schemaEvolution", "addColumns,widen") .save()
The connector isn't the problem on day one. It becomes the problem when concurrency rises, tables go mutable, types get awkward, and a teammate merges a column on a Friday afternoon.
Spark's one-connection-per-partition default crashes PostgreSQL as concurrency rises. PGStyx decouples tasks from connection limits with internal HikariCP pooling.
PG_ERR: too_many_connectionsStandard Spark JDBC leaves keyed refresh logic to custom job code. Teams end up in foreachPartition loops or staging tables. PGStyx provides UPSERT in a single .save().
JSONB, UUID, and arrays get mangled as strings or bytes in generic JDBC. PGStyx treats them as first-class citizens with binary fidelity and schema correctness.
jsonb · uuid · int[]A widening column or new attribute should not break the write. PGStyx handles add-column and widening cases at runtime so jobs don't turn into rerun work.
Community is usable for commercial work. Pro adds safer schema and validation controls. Enterprise covers tighter security, streaming, and deeper operational requirements.
Start with the core write path: append, overwrite, single-key upsert, pooling, retries, and metrics.
Start with docs →Use Pro when the question changes from 'can we write?' to 'can we keep this job safe in production?'
Ask about Pro →Enterprise is for streaming workloads, custom certificate material, and deeper operational tuning on stricter platforms.
Ask about Enterprise →Start with the implementation guide and a working first upsert in Scala, Python, or SQL. Move to Pro when the workload asks for composite keys, schema evolution, or TLS.