Skip to content

Commit

Permalink
feat: Update README News and fix the link of python package badge. (#243
Browse files Browse the repository at this point in the history
)
  • Loading branch information
jalr4ever authored Nov 21, 2024
1 parent edfab2e commit b218059
Show file tree
Hide file tree
Showing 3 changed files with 7 additions and 3 deletions.
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<p align="center">

<p align="center">
<a href="https://github.com/hitsz-ids/synthetic-data-generator/actions"><img alt="Actions Status" src="https://github.com/hitsz-ids/synthetic-data-generator/actions/workflows/python-package.yml/badge.svg"></a>
<a href="https://github.com/hitsz-ids/synthetic-data-generator/actions"><img alt="Actions Status" src="https://github.com/hitsz-ids/synthetic-data-generator/actions/workflows/ci-test-python-package.yml/badge.svg"></a>
<a href='https://synthetic-data-generator.readthedocs.io/en/latest/?badge=latest'><img src='https://readthedocs.org/projects/synthetic-data-generator/badge/?version=latest' alt='Documentation Status' /></a>
<a href="https://results.pre-commit.ci/latest/github/hitsz-ids/synthetic-data-generator/main"><img alt="pre-commit.ci status" src="https://results.pre-commit.ci/badge/github/hitsz-ids/synthetic-data-generator/main.svg"></a>
<a href="https://github.com/hitsz-ids/synthetic-data-generator/blob/main/LICENSE"><img alt="LICENSE" src="https://img.shields.io/github/license/hitsz-ids/synthetic-data-generator"></a>
Expand Down Expand Up @@ -50,6 +50,8 @@ We are excited to have you here and look forward to your contributions, get star

Our current key achievements and timelines are as follows:

🔥 Nov 21, 2024: 1) Model Integration - We've integrated the `GaussianCopula` model into our Data Processor System. Check out the code example in this [PR](https://github.com/hitsz-ids/synthetic-data-generator/pull/241); 2) Synthetic Quality - We implemented automatic detection of data column relationships and allowed for relationship specification, improved the quality of synthetic data([Code Example](https://synthetic-data-generator.readthedocs.io/en/latest/user_guides/single_table_column_combinations.html)); 3) Performance Enhancement - We significantly reduced the memory usage of GaussianCopula when handling discrete data, enabling training on thousands of categorical data entries with a `2C4G` setup!

🔥 May 30, 2024: The Data Processor module was officially merged. This module will: 1) help SDG convert the format of some data columns (such as Datetime columns) before feeded into the model (so as to avoid being treated as discrete types), and reversely convert the model-generated data into the original format; 2) perform more customized pre-processing and post-processing on various data types; 3) easily deal with problems such as null values ​​in the original data; 4) support the plug-in system.

🔥 Feb 20, 2024: a single-table data synthesis model based on LLM is included, view colab example: <a href="https://colab.research.google.com/drive/1VFnP59q3eoVtMJ1PvcYjmuXtx9N8C7o0?usp=sharing" target="value"> LLM: Data Synthesis</a> and <a href="https://colab.research.google.com/drive/1_chuTVZECpj5fklj-RAp7ZVrew8weLW_?usp=sharing" target="value"> LLM: Off-table Feature Inference</a>.
Expand Down
4 changes: 3 additions & 1 deletion README_ZH_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
<p align="center">

<p align="center">
<a href="https://github.com/hitsz-ids/synthetic-data-generator/actions"><img alt="Actions Status" src="https://github.com/hitsz-ids/synthetic-data-generator/actions/workflows/python-package.yml/badge.svg"></a>
<a href="https://github.com/hitsz-ids/synthetic-data-generator/actions"><img alt="Actions Status" src="https://github.com/hitsz-ids/synthetic-data-generator/actions/workflows/ci-test-python-package.yml/badge.svg"></a>
<a href='https://synthetic-data-generator.readthedocs.io/en/latest/?badge=latest'><img src='https://readthedocs.org/projects/synthetic-data-generator/badge/?version=latest' alt='Documentation Status' /></a>
<a href="https://results.pre-commit.ci/latest/github/hitsz-ids/synthetic-data-generator/main"><img alt="pre-commit.ci status" src="https://results.pre-commit.ci/badge/github/hitsz-ids/synthetic-data-generator/main.svg"></a>
<a href="https://github.com/hitsz-ids/synthetic-data-generator/blob/main/LICENSE"><img alt="LICENSE" src="https://img.shields.io/github/license/hitsz-ids/synthetic-data-generator"></a>
Expand Down Expand Up @@ -51,6 +51,8 @@

我们的里程碑和时间节点如下所示:

🔥 2024年11月21日:1) 模型集成 - 现在我们集成 GaussianCopula 模型到我们的 Data Processor 体系,可以查看此 [PR](https://github.com/hitsz-ids/synthetic-data-generator/pull/241) 的代码实例; 2) 合成质量增强 - 我们做了数据列关系的自动检测,同时也提供数据列的关系指定,进一步提高合成数据保真度质量([代码实例](https://synthetic-data-generator.readthedocs.io/en/latest/user_guides/single_table_column_combinations.html)); 3) 性能增强 - 我们大大降低 GaussianCopula 处理离散数据的内存占用,使其能在 2C4G 的配置下完成万级别的离散列数据训练!

🔥 2024年5月30日:Data Processor 模块被正式合并,该模块可以:1)可以帮助 SDG 将部分数据列(如 Datetime 类型的列)在送入模型前进行格式转换,从而避免被当作离散类型处理,对模型生成数据反向转换成原有格式;2)对各种数据类型进行更加定制化的预处理和后处理;3)轻松应对原始数据中的空值等问题;4)支持插件系统。

🔥 2024年2月20日:基于LLM的单表数据合成模型已包含,查看Colab示例:<a href="https://colab.research.google.com/drive/1VFnP59q3eoVtMJ1PvcYjmuXtx9N8C7o0?usp=sharing" target="value">LLM:数据合成</a> 和 <a href="https://colab.research.google.com/drive/1_chuTVZECpj5fklj-RAp7ZVrew8weLW_?usp=sharing" target="value">LLM:表外特征推断</a>。
Expand Down
2 changes: 1 addition & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ SDG: Synthetic Data Generator
</p>

<p align="center">
<a href="https://github.com/hitsz-ids/synthetic-data-generator/actions"><img alt="Actions Status" src="https://github.com/hitsz-ids/synthetic-data-generator/actions/workflows/python-package.yml/badge.svg"></a>
<a href="https://github.com/hitsz-ids/synthetic-data-generator/actions"><img alt="Actions Status" src="https://github.com/hitsz-ids/synthetic-data-generator/actions/workflows/ci-test-python-package.yml/badge.svg"></a>
<a href='https://synthetic-data-generator.readthedocs.io/en/latest/?badge=latest'><img src='https://readthedocs.org/projects/synthetic-data-generator/badge/?version=latest' alt='Documentation Status' /></a>
<a href="https://results.pre-commit.ci/latest/github/hitsz-ids/synthetic-data-generator/main"><img alt="pre-commit.ci status" src="https://results.pre-commit.ci/badge/github/hitsz-ids/synthetic-data-generator/main.svg"></a>
<a href="https://github.com/hitsz-ids/synthetic-data-generator/blob/main/LICENSE"><img alt="LICENSE" src="https://img.shields.io/github/license/hitsz-ids/synthetic-data-generator"></a>
Expand Down

0 comments on commit b218059

Please sign in to comment.