alan-turing-institute · Iain-S · Jul 6, 2023 · Jun 29, 2023 · Jun 29, 2023 · Jul 5, 2023
diff --git a/docs/source/faq.rst b/docs/source/faq.rst
@@ -1,7 +1,17 @@
 FAQ
 ===
 
-Can sqlsynthgen work with two different schemas
-***********************************************
+Can SqlSynthGen work with two different schemas?
+************************************************
 
-sqlsynthgen can only work with a single source schema and a single destination schema at a time. However, you can choose for the destination schema to have a different name to the source schema by setting the DST_SCHEMA environment variable.
+SqlSynthGen can only work with a single source schema and a single destination schema at a time.
+However, you can choose for the destination schema to have a different name to the source schema by setting the DST_SCHEMA environment variable.
+
+Which DBMSs does SqlSynthGen support?
+*************************************
+
+* SqlSynthGen most fully supports **PostgresSQL**, which it uses for its end-to-end functional tests.
+* SqlSynthGen also supports **MariaDB** with one exception: you cannot use source statistics (i.e. the ``make-stats`` command).
+* SqlSynthGen *might*, work with **SQLite** but this is largely untested.
+
+Please open a GitHub issue if you would like to see support for another DBMS.
diff --git a/docs/source/installation.rst b/docs/source/installation.rst
@@ -10,7 +10,7 @@ To use sqlsynthgen, first install it:
 
 .. code-block:: console
 
-   $ pip install git+https://github.com/alan-turing-institute/sqlsynthgen.git
+   $ pip install sqlsynthgen
 
 and check that you can view the help message with:
 

diff --git a/docs/source/quickstart.rst b/docs/source/quickstart.rst
@@ -27,21 +27,16 @@ For the simplest case, we will need `make-tables`, `make-generators`, `create-ta
 we need to set environment variables to tell sqlsynthgen how to access our source database (where the real data resides now) and destination database (where the synthetic data will go).
 We can do that in the terminal with the `export` keyword, as shown below, or in a file called `.env`.
 The source and destination may be on the same database server, as long as the database or schema names differ.
+If the source and destination schemas are the default schema for the user on that database, you should not set those variables.
+If you are using a DBMS that does not support schemas (e.g. MariaDB), you must not set those variables.
 
 .. code-block:: console
 
-   $ export SRC_HOST_NAME='[email protected]'
-   $ export SRC_USER_NAME='someuser'
-   $ export SRC_PASSWORD='secretpassword'
+   $ export SRC_DSN="postgresql://someuser:[email protected]"
    $ export SRC_SCHEMA='myschema'
-   $ export SRC_DB_NAME='source_db'
 
-   $ export DST_HOST_NAME='[email protected]'
-   $ export DST_USER_NAME='someuser'
-   $ export DST_PASSWORD='secretpassword'
+   $ export DST_DSN="postgresql://someuser:[email protected]/dst_db"
    $ export DST_SCHEMA='myschema'
-   $ export DST_DB_NAME='destination_db'
-
 
 Next, we make a SQLAlchemy file that defines the structure of your database using the `make-tables` command:
 

diff --git a/docs/source/tutorials/airbnb.rst b/docs/source/tutorials/airbnb.rst
@@ -1,6 +1,8 @@
 An Introduction to SqlSynthGen
 ==============================
 
+.. _introduction:
+
 `SqlSynthGen <https://github.com/alan-turing-institute/sqlsynthgen/>`_, or SSG for short, is a software package that we have written for synthetic data generation, focussed on relational data.
 When pointed to an existing relational database, SSG creates another database with the same database schema, and populates it with synthetic data.
 By default the synthetic data is crudely low fidelity, but the user is given various ways to configure the behavior of SSG to increase fidelity, while maintaining transparency and control over how the original data is used to inform the synthetic data, to control privacy risks.
@@ -26,14 +28,8 @@ First, we need to provide SSG with the connection parameters, using a ``.env`` f
 
 .. code-block:: console
 
-    SRC_HOST_NAME=localhost
-    SRC_USER_NAME=postgres
-    SRC_PASSWORD=password
-    SRC_DB_NAME=airbnb
-    DST_HOST_NAME=localhost
-    DST_USER_NAME=postgres
-    DST_PASSWORD=password
-    DST_DB_NAME=dst
+    SRC_DSN='postgresql://postgres:password@localhost/airbnb'
+    DST_DSN='postgresql://postgres:password@localhost/dst'
 
 We can start the schema migration process by running the following command::
 

diff --git a/docs/source/tutorials/loan_applications.rst b/docs/source/tutorials/loan_applications.rst
@@ -0,0 +1,65 @@
+Tutorial: Loan Data
+===================
+
+There are many potential applications of synthetic data in banking and finance where the nature of the data, being both personally and commercially sensitive, may rule out sharing real, identifiable data.
+
+Here, we show how to use SqlSynthGen to generate a simple (uniformly random) synthetic version of the freely-available `PKDD'99 <https://relational.fit.cvut.cz/dataset/Financial>`_ dataset.
+This dataset contains 606 successful and 76 not successful loan applications.
+
+The PKDD'99 dataset is stored on a MariaDB database, which means that we need a local MariaDB database to store the synthetic data.
+MariaDB installation instructions can be found `here <https://mariadb.org/download/?t=mariadb&p=mariadb&r=11.2.0#entry-header>`_.
+We presume that you have a local server running on port 3306, with a user called ``myuser``, a password ``mypassword`` and a database called ``financial``.
+
+.. code-block:: console
+
+    $ mysql
+    MariaDB > create user 'myuser'@'localhost' identified by 'mypassword';
+    MariaDB > create database financial;
+    MariaDB > grant all privileges on financial.* to 'myuser'@'localhost';
+    MariaDB > \q
+
+After :ref:`installing SqlSynthGen <enduser>`, we create a `.env` file to set some environment variables to define the source database as the one linked at the bottom of the PKDD'99 page, and the destination database as the local one:
+
+**.env**
+
+.. code-block:: console
+
+    SRC_DSN="mariadb+pymysql://guest:[email protected]:3306/Financial_ijs"
+    DST_DSN="mariadb+pymysql://myuser:mypassword@localhost:3306/financial"
+
+We run SqlSynthGen's ``make-tables`` command to create a file called ``orm.py`` that contains the schema of the source database.
+
+.. code-block:: console
+
+    $ sqlsynthgen make-tables
+
+Inspecting the ``orm.py`` file, we see that the ``tkeys`` table has column called ``goodClient``, which is a ``TINYINT``.
+SqlSynthGen doesn't know what to do with ``TINYINT`` columns, so we need to create a config file to tell it how to handle them. This isn't necessary for normal ``Integer`` columns.
+
+**config.yaml**
+
+.. literalinclude:: ../../../tests/examples/loans/config.yaml
+   :language: yaml
+
+We run SqlSynthGen's ``make-generators`` command to create ``ssg.py``, which contains a generator class for each table in the source database:
+
+.. code-block:: console
+
+    $ sqlsynthgen make-generators --config config.yaml
+
+We then run SqlSynthGen's ``create-tables`` command to create the tables in the destination database:
+
+.. code-block:: console
+
+    $ sqlsynthgen create-tables
+
+Note that, alternatively, you could use another tool, such as ``mysqldump`` to create the tables in the destination database.
+
+Finally, we run SqlSynthGen's ``create-data`` command to populate the tables with synthetic data:
+
+.. code-block:: console
+
+    $ sqlsynthgen create-data --num-passes 100
+
+This will make 100 rows in each of the nine tables.
+The data will be entirely random so you may wish to fine tune it using the source-statistics, custom generators or "story generators" explained in the longer :ref:`introduction <introduction>`.