Skip to content

Latest commit

 

History

History
66 lines (54 loc) · 4.32 KB

postgres.md

File metadata and controls

66 lines (54 loc) · 4.32 KB

Postgres

Protocols

  • binary: Postgres Binary COPY protocol, recommend to use in general since fast data parsing speed.
  • csv: Postgres CSV COPY protocol, recommend to use when network is slow (csv usually results in smaller size than binary).
  • cursor: Conventional wire protocol (slowest one), recommend to use only when binary and csv is not supported by the source (e.g. Redshift).

Postgres Connection

Adding `sslmode=require` to connection uri parameter force SSL connection. Example: `postgresql://username:password@host:port/db?sslmode=require`. `sslmode=disable` to disable SSL connection.

To connect to redshift, replace `postgresql://` with `redshift://`.
import connectorx as cx
conn = 'postgres://username:password@server:port/database'         # connection token
query = "SELECT * FROM table"                                   # query string
cx.read_sql(conn, query)                                        # read data from Postgres

Postgres-Pandas Type Mapping

Postgres Type Pandas Type Comment
BOOL bool, boolean(nullable)
INT2 int64, Int64(nullable)
INT4 int64, Int64(nullable)
INT8 int64, Int64(nullable)
FLOAT4 float64
FLOAT8 float64
NUMERIC float64 cannot support precision larger than 28
TEXT object
BPCHAR object
VARCHAR object
CHAR object
BYTEA object
DATE datetime64[ns]
TIME object
TIMESTAMP datetime64[ns]
TIMESTAMPZ datetime64[ns]
UUID object
JSON object
JSONB object
ENUM object need to convert enum column to text manually (::text) when using csv and cursor protocol
ltree object binary protocol supported only after Postgres version 13
lquery object binary protocol supported only after Postgres version 13
ltxtquery object binary protocol supported only after Postgres version 13
INT2[] object list of i64
INT4[] object list of i64
INT8[] object list of i64
FLOAT4[] object list of f64
FLOAT8[] object list of f64
NUMERIC[] object list of f64

Performance (db.m6g.4xlarge RDS)

  • Time chart, lower is better.

time chart

  • Memory consumption chart, lower is better.

memory chart

In conclusion, ConnectorX uses 3x less memory and 13x less time compared with Pandas.