Skip to content

Commit

Permalink
Merge pull request #18 from jfinzel/version_2
Browse files Browse the repository at this point in the history
Version 2
  • Loading branch information
vipgh0828 authored Sep 29, 2023
2 parents f6c275f + 1623ee0 commit a92f75f
Show file tree
Hide file tree
Showing 48 changed files with 7,450 additions and 631 deletions.
4 changes: 2 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@ EXTENSION = pg_fact_loader
DATA = pg_fact_loader--1.4.sql pg_fact_loader--1.4--1.5.sql \
pg_fact_loader--1.5.sql pg_fact_loader--1.5--1.6.sql \
pg_fact_loader--1.6.sql pg_fact_loader--1.6--1.7.sql \
pg_fact_loader--1.7.sql
MODULES = pg_fact_loader
pg_fact_loader--1.7.sql pg_fact_loader--1.7--2.0.sql \
pg_fact_loader--2.0.sql

REGRESS := 01_create_ext 02_schema 03_audit \
04_seeds 05_pgl_setup 06_basic_workers \
Expand Down
60 changes: 9 additions & 51 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# pg_fact_loader
Build fact tables with Postgres using a queue and background workers
Build fact tables with Postgres using replicated tables and a queue

[Overview](#overview)
- [High Level Description](#high_level)
Expand All @@ -14,7 +14,6 @@ Build fact tables with Postgres using a queue and background workers
- [Backfills](#backfills)

[Administration](#admin)
- [Checking, Stopping, and Starting Workers](#workers)
- [Manually Executing Jobs](#manual)
- [Troubleshooting Errors and Issues](#troubleshoot)

Expand All @@ -24,15 +23,12 @@ Build fact tables with Postgres using a queue and background workers
# <a name="overview"></a>Overview

## <a name="high_level"></a>High Level Description
This extension is for building fact tables asynchronously using queues that contain all
This extension is for building fact tables using queues that contain all
write events (inserts, updates, and deletes) as the driver.

By default, we assume that fact tables are built in a pglogical replica, not an OLTP master,
By default, we assume that fact tables are built in a logical replica, not an OLTP master,
which is why we have logic within the codebase that checks for replication stream delay (but it is
possible to run this whole system locally without any deps on pglogical).

This could be modified in the future to support the new built-in Postgres logical replication
starting in version 10, perhaps in a later version once it has more features.
possible to run this whole system locally without any deps on logical replication).

There are several essential steps to having a working setup where fact tables will be automatically
built for you as new data comes in that is queued for processing:
Expand All @@ -56,7 +52,7 @@ built for you as new data comes in that is queued for processing:
populate fact table data for every customer: `SELECT customers_fact_merge(customer_id) FROM customers;`.

10. Enable the configuration for your fact table.
11. Launch the worker to start continuously processing changes
11. Schedule the fact_loader.worker() function to run to start continuously processing changes


## <a name="full_example"></a>A Full Example
Expand Down Expand Up @@ -447,64 +443,26 @@ Here is the typical process then to enable a job, once your configuration is in
3. Backfill in batches by running your configured `merge` function over the entire set of data. For example:
`SELECT customers_fact_merge(customer_id) FROM customers;`
4. Enable the fact_loader job.
5. ONLY IF there are not already workers running, launch a worker.
5. Run worker function in whatever scheduled way desired (i.e. crontab).

If you need to at any point in the future do another backfill on the table, this is the same set of step
to follow. **However**, it will be better in production to not `TRUNCATE` the fact table, but rather to use
small batches to refresh the whole table while still allowing concurrent access. This will also avoid overloading
any replication stream going out of your system.

To **enable** a fact_table in the `fact_tables` for it to be considered by the worker for refresh,
simply runn an update, i.e.
simply run an update, i.e.
```sql
UPDATE fact_loader.fact_tables SET enabled = TRUE WHERE fact_table_relid = 'test_fact.customers_fact';
```

To **deploy** the background worker which will run every minute, run:
```sql
SELECT fact_loader.launch_worker();
```

It is supported to run as many workers as you want up to `max_worker_processes` of course.

All workers nap for 1 minute. Concurrency is handled by locking fact_tables rows for update, which can be
Concurrency is handled by locking fact_tables rows for update, which can be
seen in the wrapping `worker()` function. Adding more workers means you will have smaller deltas, and
more up to date fact tables.
more up to date fact tables. For example you can schedule 5 calls to `worker()` to kick off from cron every minute.


# <a name="admin"></a>Administration

## <a name="workers"></a>Checking, Stopping, and Starting Workers
To check on a worker:
```sql
SELECT *
FROM pg_stat_activity
--backend_type column supported after pg 10
WHERE backend_type = 'background worker'
AND query = 'SELECT fact_loader.worker();'
```

To terminate a worker, bear in mind it is not a problem to terminate active workers.
Because workers are transactional, you can simply terminate them and no data loss will
result in pg_fact_loader. Likewise, a hard crash of any system using pg_fact_loader
will recover just fine upon re-launching workers.

Still, it is ideal to avoid bloat to cleanly terminate workers and restart them using
this function to kill them, and `launch_workers(int)` to re-launch them:
```sql
SELECT fact_loader.safely_terminate_workers();
```

To start a new worker:
```sql
SELECT fact_loader.launch_worker();
```

To launch a specified number of workers:
```sql
SELECT fact_loader.launch_workers(5);
```

## <a name="manual"></a>Manually Executing Jobs
If for some reason you need to manually execute a job in a concurrency-safe way that is integrated
into `pg_fact_loader`, you can run this function:
Expand Down
7 changes: 7 additions & 0 deletions debian/changelog
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
pg-fact-loader (2.0.0-1) unstable; urgency=medium

* Add support for native logical replication
* Remove support for background worker

-- Jeremy Finzel <[email protected]> Wed, 12 Jul 2023 14:59:48 -0500

pg-fact-loader (1.7.0-3) unstable; urgency=medium

* Disable unstable parts of test 17. (Closes: #1023226)
Expand Down
32 changes: 31 additions & 1 deletion debian/control
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,42 @@ Source: pg-fact-loader
Section: database
Priority: optional
Maintainer: Jeremy Finzel <[email protected]>
Build-Depends: debhelper-compat (= 13), libpq-dev, postgresql-common, postgresql-server-dev-all
Build-Depends: debhelper-compat (= 12), libpq-dev, postgresql-common, postgresql-server-dev-all
Standards-Version: 4.6.1
Rules-Requires-Root: no
Homepage: https://github.com/enova/pg_fact_loader
Vcs-Git: https://github.com/enova/pg_fact_loader.git

Package: postgresql-10-pg-fact-loader
Architecture: any
Depends: postgresql-10, ${shlibs:Depends}, ${misc:Depends}
Description: Build fact tables asynchronously with Postgres
Use queue tables to build fact tables asynchronously for PostgreSQL 10.

Package: postgresql-11-pg-fact-loader
Architecture: any
Depends: postgresql-11, ${shlibs:Depends}, ${misc:Depends}
Description: Build fact tables asynchronously with Postgres
Use queue tables to build fact tables asynchronously for PostgreSQL 11.

Package: postgresql-12-pg-fact-loader
Architecture: any
Depends: postgresql-12, ${shlibs:Depends}, ${misc:Depends}
Description: Build fact tables asynchronously with Postgres
Use queue tables to build fact tables asynchronously for PostgreSQL 12.

Package: postgresql-13-pg-fact-loader
Architecture: any
Depends: postgresql-13, ${shlibs:Depends}, ${misc:Depends}
Description: Build fact tables asynchronously with Postgres
Use queue tables to build fact tables asynchronously for PostgreSQL 13.

Package: postgresql-14-pg-fact-loader
Architecture: any
Depends: postgresql-14, ${shlibs:Depends}, ${misc:Depends}
Description: Build fact tables asynchronously with Postgres
Use queue tables to build fact tables asynchronously for PostgreSQL 14.

Package: postgresql-15-pg-fact-loader
Architecture: any
Depends: postgresql-15, ${shlibs:Depends}, ${misc:Depends}
Expand Down
2 changes: 1 addition & 1 deletion debian/control.in
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ Source: pg-fact-loader
Section: database
Priority: optional
Maintainer: Jeremy Finzel <[email protected]>
Build-Depends: debhelper-compat (= 13), libpq-dev, postgresql-common, postgresql-server-dev-all
Build-Depends: debhelper-compat (= 12), libpq-dev, postgresql-common, postgresql-server-dev-all
Standards-Version: 4.6.1
Rules-Requires-Root: no
Homepage: https://github.com/enova/pg_fact_loader
Expand Down
2 changes: 0 additions & 2 deletions debian/patches/series

This file was deleted.

16 changes: 0 additions & 16 deletions debian/patches/test16

This file was deleted.

84 changes: 0 additions & 84 deletions debian/patches/test17

This file was deleted.

2 changes: 1 addition & 1 deletion debian/tests/installcheck
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
#!/bin/sh
pg_buildext -o shared_preload_libraries=pglogical installcheck
pg_buildext -o shared_preload_libraries=pglogical -o wal_level=logical installcheck
2 changes: 1 addition & 1 deletion expected/01_create_ext.out
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
-- Allow running regression suite with upgrade paths
\set v `echo ${FROMVERSION:-1.7}`
\set v `echo ${FROMVERSION:-2.0}`
SET client_min_messages TO warning;
CREATE EXTENSION pglogical;
CREATE EXTENSION pglogical_ticker;
Expand Down
Loading

0 comments on commit a92f75f

Please sign in to comment.