From c07a74476682255da72e9ca4df6cf3fa0343acf6 Mon Sep 17 00:00:00 2001 From: Markus Hauru Date: Wed, 25 Oct 2023 12:53:18 +0100 Subject: [PATCH] Fix typos in docs --- docs/source/configuration.rst | 2 +- docs/source/faq.rst | 4 ++-- docs/source/health_data.rst | 16 ++++++++-------- docs/source/introduction.rst | 10 +++++----- docs/source/loan_data.rst | 2 +- 5 files changed, 17 insertions(+), 17 deletions(-) diff --git a/docs/source/configuration.rst b/docs/source/configuration.rst index c037c03..76a97ab 100644 --- a/docs/source/configuration.rst +++ b/docs/source/configuration.rst @@ -1,7 +1,7 @@ Configuration Reference ======================= -SqlSynthGen is configured using a YAML file, which is passed to several commands with the ``--config`` option. +SqlSynthGen is configured using a YAML file, which is passed to several commands with the ``--config-file`` option. Throughout the docs, we will refer to this file as ``config.yaml`` but it can be called anything (the exception being that there will be a naming conflict if you have a vocabulary table called ``config``). Below, we see the schema for the configuration file. diff --git a/docs/source/faq.rst b/docs/source/faq.rst index 2f816a1..bf2c2f7 100644 --- a/docs/source/faq.rst +++ b/docs/source/faq.rst @@ -5,14 +5,14 @@ Can SqlSynthGen work with two different schemas? ************************************************ SqlSynthGen can only work with a single source schema and a single destination schema at a time. -However, you can choose for the destination schema to have a different name to the source schema by setting the DST_SCHEMA environment variable. +However, you can choose for the destination schema to have a different name to the source schema by setting the ``DST_SCHEMA`` environment variable. Which DBMSs does SqlSynthGen support? ************************************* * SqlSynthGen most fully supports **PostgresSQL**, which it uses for its end-to-end functional tests. * SqlSynthGen also supports **MariaDB**, as long as you don't set ``use-asyncio: true`` in your config. -* SqlSynthGen *might*, work with **SQLite** but this is largely untested. +* SqlSynthGen *might* work with **SQLite** but this is largely untested. * SqlSynthGen may also work with SQL Server. To connect to SQL Server, you will need to install `pyodbc `_ and an `ODBC driver `_, after which you should be able to use a DSN setting similar to ``SRC_DSN="mssql+pyodbc://username:password@hostname/dbname?driver=ODBC Driver 18 for SQL Server"``. diff --git a/docs/source/health_data.rst b/docs/source/health_data.rst index 6be9218..1467572 100644 --- a/docs/source/health_data.rst +++ b/docs/source/health_data.rst @@ -13,10 +13,10 @@ The full configuration we wrote for the CCHIC data set is available `here `_ for this are * ``count_measurements``, which counts the relative frequencies of various types of measurements, like blood pressure, pulse taking, different lab results, etc. diff --git a/docs/source/introduction.rst b/docs/source/introduction.rst index 1fa80ab..8b584a9 100644 --- a/docs/source/introduction.rst +++ b/docs/source/introduction.rst @@ -106,7 +106,7 @@ Now when we run ``create-data`` we get valid, if not very sensible, values in ea - 485 - 534 -SSG’s default generators have minimal fidelity: All data is generated based purely on the datatype of the its column, e.g. random strings in string columns. +SSG’s default generators have minimal fidelity: All data is generated based purely on the datatype of the column, e.g. random strings in string columns. Foreign key relations are respected by picking random rows from the table referenced. Even this synthetic data, nearly the crudest imaginable, can be useful for instance for testing software pipelines. Note that this data has no privacy implications, since it is only based on the schema. @@ -121,7 +121,7 @@ This should of course only be done for tables that hold no privacy-sensitive dat For instance, in the AirBnB dataset, the ``users`` table has a foreign key reference to a table of world countries: ``users.country_destination`` references the ``countries.country_destination`` primary key column. Since the ``countries`` table doesn’t contain personal data, we can make it a vocabulary table. -Besides manual edition, on SSG we can also customise the generation of ``ssg.py`` via a YAML file, +Besides manually editing it, we can also customise the generation of ``ssg.py`` via a YAML file, typically named ``config.yaml``. We identify ``countries`` as a vocabulary table in our ``config.yaml`` file: @@ -164,7 +164,7 @@ We need to truncate any tables in our destination database before importing the $ sqlsynthgen remove-data --config-file config.yaml $ sqlsynthgen create-vocab -Since ``make-generators`` rewrote ``ssg.py``, we must now re-edit it to add the primary key ``VARCHAR`` workaroundsfor the ``users`` and ``age_gender_bkts`` tables, as we did in section above. +Since ``make-generators`` rewrote ``ssg.py``, we must now re-edit it to add the primary key ``VARCHAR`` workarounds for the ``users`` and ``age_gender_bkts`` tables, as we did in section above. Once this is done, we can generate random data for the other three tables with:: $ sqlsynthgen create-data @@ -293,7 +293,7 @@ Then, we tell SSG to import our custom ``airbnb_generators.py`` and assign the r columns_assigned: ["date_account_created", "date_first_booking"] Note how we pass the ``generic`` object as a keyword argument to ``user_dates_provider``. -Row generators can have positional arguments specified as a list under the ``args`` list and keyword arguments as a dictionary under the ``kwargs`` entry. +Row generators can have positional arguments specified as a list under the ``args`` entry and keyword arguments as a dictionary under the ``kwargs`` entry. Limitations to this approach to increasing fidelity are that rows can not be correlated with other rows in the same table, nor with any rows in other tables, except for trivially fulfilling foreign key constraints as in the default configuration. We will see how to address this later when we talk about :ref:`story generators `. @@ -537,7 +537,7 @@ For instance, it may first yield a row specifying a person in the ``users`` tabl Three features make story generators more practical than simply manually writing code that creates the synthetic data bit-by-bit: 1. When a story generator yields a row, it can choose to only specify values for some of the columns. The values for the other columns will be filled by custom row generators (as explained in a previous section) or, if none are specified, by SSG's default generators. Above, we have chosen to specify the value for ``first_device_type`` but the date columns will still be handled by our ``user_dates_provider`` and the age column will still be populated by the ``user_age_provider``. -2. Any default values that are set when the rows yielded by the story generator are written into the database are available to the story generator when it resumes. In our example, the user's ``id`` is available so that we can respect the foreign key relationship between ``users`` and ``sessions``, even though we did not explicitly set the user's ``id`` when creating the user. +2. Any default values that are set when the rows yielded by the story generator are written into the database are available to the story generator when it resumes. In our example, the user's ``id`` is available so that we can respect the foreign key relationship between ``users`` and ``sessions``, even though we did not explicitly set the user's ``id`` when creating the user on line 8. To use and get the most from story generators, we will need to make some changes to our configuration: diff --git a/docs/source/loan_data.rst b/docs/source/loan_data.rst index 306b625..4ccaad6 100644 --- a/docs/source/loan_data.rst +++ b/docs/source/loan_data.rst @@ -104,7 +104,7 @@ We notice that the ``districts`` table doesn't contain any sensitive data so we .. literalinclude:: ../../examples/loans/config2.yaml :language: yaml -We can export the vocabularies to `.yaml` files, delete the old synthetic data, import the vocabularies and create new synthetic data with: +We can export the vocabularies to ``.yaml`` files, delete the old synthetic data, import the vocabularies and create new synthetic data with: .. code-block:: console