This script generates a specified number of fake records and saves each record as a JSON file in a designated folder. It uses the Faker
library to create realistic-looking data.
- Generates fake identity, contact details, and location data.
- Saves each record as a separate JSON file.
- Deletes any existing folder with dummy files before creating new ones.
- Allows control over the number of records generated via command-line arguments.
- Python 3.x
Faker
library
- Install Python 3.x from python.org.
- Create and activate virtual environment.
python -m venv env
source env/Scripts/activate
-
Install the required packages from file:
pip install -r requirements.txt
-
Clone the repository or download the script.
-
Run the script from the command line:
python generate_fake_data.py [number_of_records]
number_of_records
(optional): The number of fake records to generate. Defaults to 1,000,000 if not specified.
-
Run using Docker.
docker build -t pydatagen .
docker run pydatagen
To generate 100,000 fake records:
python generate_fake_data.py 100000
Each record generated by the script includes the following fields:
-
Identity
id
: A unique identifier for the record.full_name
: A randomly generated full name.ssn
: A randomly generated Social Security Number.dob
: A randomly generated date of birth in the formatYYYY-MM-DD
.
-
Contact Details
email
: A randomly generated email address.phone
: A randomly generated phone number.
-
Location
address
: A randomly generated street address.city
: A randomly generated city name.state
: A randomly generated state name.zipcode
: A randomly generated ZIP code.country
: A randomly generated country name.
The script performs the following steps:
- Control the Number of Records: Checks if a number of records is provided as a command-line argument. If not, defaults to 1,000,000 records.
- Delete Existing Folder: Deletes the existing
files
folder if it exists. - Recreate the Folder: Creates a new
files
folder. - Generate Fake Data: Uses the
Faker
library to generate fake data for each record, including identity, contact details, and location. - Save Data: Saves each record as a JSON file in the
files
folder.
json
: For handling JSON data.sys
: For accessing command-line arguments.pathlib
: For handling file paths.shutil
: For removing directories.Faker
: For generating fake data.tqdm
: For displaying a progress bar.
This project is licensed under the MIT License.