Skip to content



Repository files navigation

Welcome to NPPAD

NPPAD: An Open Time-series Dataset Covering Various Accidents for Nuclear Power Plants

GitHub GitHub followers GitHub License Contributors Open Pull Requests HitCount Made with Love

This repository contains:

  1. The background of this project

  2. Introduction to the dataset

  3. Related data processing scripts

Hopefully, you can use this project to get the needed accident data for nuclear power plants and then develop new accident diagnosis algorithms and benchmarks.

  1. Background
  2. Introduction to the dataset
  3. Related scripts
  4. Installation
  5. Maintainers
  6. Contributing
  7. License
  8. Citing


Nuclear energy plays an important role in global energy supply, especially as a key low-carbon source of power. Safe operation is critical in the generation of nuclear energy, i.e. in nuclear power plants. Given the significant impact of human-caused errors on three serious nuclear accidents in history, artificial intelligence technologies are increasingly being used to assist plant operators in making decisions. Specifically, artificial intelligence algorithms are used to identify the presence of accidents and their root causes. A continuing challenge is the lack of an open dataset in the nuclear power plant domain to measure the performance of various algorithms. we presents a first-of-its-kind public dataset created with the help of PCTRAN, a pre-developed and widely used simulation software for nuclear power plants. The dataset, NPPAD, basically covers most of the common types of accidents that can occur in pressurized water reactor nuclear power plants. It contains time-series data on the status or actions of various subsystems as well as the accident types and severity information. The dataset also incorporates other simulation data like the amount of radionuclide released, which can help users to conduct research beyond accident diagnosis.

Introduction to the dataset

Workflow overview

Fig. 1 Overall Workflow Of The Simulation Data Generation

The overall workflow implemented in the script to generate the nuclear power plant accident dataset is shown in Fig. 1. First, we started the software by an automation script. Once the software is launched, the nuclear plant operating at 100% power is initialized.Then we select different operating conditions. If the normal operating condition is treated, the simulator will run for a certain time that we configured to get the data output. Besides, for abnormal operating conditions, accident type, accident parameters and simulation time are configured, and then simulation data is output. The accidents covered in this work is shown in Table 1 . Specifically, the parameter selection screen is shown in the Fig. 2.

Fig. 2 Accident type selection and parameter setting

After that, PCTRAN will simulate automatedly. The detailed process of accident simulation in PCTRAN is shown in Box 1. First, a set of input parameters are configured according to the operations. which decide the way of the corresponding simulations. And we can get the output data in a certain time. Finally, we get the dataset NPPAD with different conditions. PS: The dataset in this work does not include cases where mitigation system failures are superimposed on nuclear plant accidents, as such superimposed cases are too numerous to cover.

Table1 Accident sets covered by NPPAD
Folder name Accident Type Severity
NORM Normal operating - -
LOCA Loss of Coolant Accident (Hot Leg) Severity % of 100 cm2
LOCAC Loss of Coolant Accident (Cold Leg) Severity % of 100 cm2
SLBIC Steam Line Break Inside Containment Severity % of 100 cm2
SLBOC Steam Line Break Outside Containment Severity % of 100 cm2
SP Spark Presence for Hydrogen Burn Other -
LACP Loss of AC Power Other -
LOF Loss of Flow (Locked Rotor) Other -
ATWS Anticipated Transient Without Scram Other -
TT Turbine Trip Other -
SGATR Steam Generator A Tube Rupture Severity % of 1 full tube rupture
SGBTR Steam Generator B Tube Rupture Severity % of 1 full tube rupture
RW Rod Withdrawal Severity % (+/-) withdrawn
RI Rod Insertion Severity % (+/-) insertion
FLB Feedwater Line Break Severity % of 100 cm2
MD Moderator Dilution Severity % of unborated injection
LR Load Rejection Severity % of full load rejected
LLB Letdown Line Break in auxiliary buildings Severity % of nominal letdown flow

Dataset structure

The NPPAD dataset covers 18 types of operating conditions, with Box 2 shows partially. Each operating condition sample contains three files, two in mdb format and the other in txt format. The mdb file can be opened directly through Microsoft Access. For example, the content of 1.mdb (PlotData) is shown in box 3, it represents the time series of the status parameters with a 1% of 100 cm2 break of LOCA, while PlotData represents the sub-table in the 1.mdb file. Another useful sub-table is ListPlotVariables, as shown in Box 6, which describes the parameters corresponding to the abbreviations in PlotData. And in box 4, 1Dose.mdb represents the time series of the radionuclide in the nuclear power plant. In addition to the mdb format, we also provide CSV format in the folders Operation_csv_data and Dose_csv_data. Besides, 1Transient Report.txt in box 5 describes the actions in the subsystems of the nuclear plant over the simulation time for each accident, which can help the user to understand the changes in the plant status. The numbers in front of the files in other operating conditions (e.g. 1.mdb, 2.mdb) correspond to the severity of the accident, and the exact meaning can be determined by column ‘severity’ of Table 1.

Related scripts

The following three scripts are provided in Data

  • Method mdbtocsv

Use this method to convert files from mdb format to csv format, the files Dose_csv_data and Operation_csv_data in this project are the result of converting the original dataset DATA into csv format.

    def mdbtocsv(self):
        driver = '{Microsoft Access Driver (*.mdb, *.accdb)}'
        if (os.path.exists(self.operation_data_csv_path) == False):
            os.makedirs(self.operation_data_csv_path) #Create folder of operation parameters
        if (os.path.exists(self.dose_data_csv_path) == False):
            os.makedirs(self.dose_data_csv_path)  #Create folder of dose parameters
        for accident in os.listdir(self.data_path):
            accident_path = self.data_path + '\\'+ accident
            os.chdir(self.project_path) # Make sure it is the project path
            for name in os.listdir(accident_path):
                if not ('Transient Report.txt',name))  :
                    os.chdir(self.project_path) # Make sure the database conect normally
                    mdb_file = accident_path + '\\' + name
                    cnxn = pyodbc.connect(f'Driver={driver};DBQ={mdb_file}')
                    if'\d+' + '.mdb', name) : #Operation data
                        data_table = pd.read_sql('SELECT * FROM PlotData', cnxn)
                        data_table.sort_values(by=['TIME'], ascending=True,
                                               inplace=True)  # Some mdbs have problems with not being in time order
                        csv_accident_path = self.operation_data_csv_path + '\\' + accident
                        if (os.path.exists(csv_accident_path) == False):
                        csv_name = name.replace('mdb','csv')
                        data_table.to_csv(csv_name, header=True, index=False)
                    elif'\d+' + 'dose' + '.mdb', name) : #Dose data
                        data_table = pd.read_sql('SELECT * FROM ListDS', cnxn)
                        data_table.sort_values(by=['TIME'], ascending=True,
                                               inplace=True)  # Some mdbs have problems with not being in time order
                        csv_accident_path = self.dose_data_csv_path + '\\' + accident
                        if(os.path.exists(csv_accident_path) == False):
                        csv_name = name.replace('mdb', 'csv')
                        data_table.to_csv(csv_name, header=True, index=False)
  • Method generate_dataset

Use this method to generate a standard dataset for supervised learning tasks.

    def generate_dataset(self, dataset_source_path):
        class Mydataset(Dataset):
            def __int__(self, dataset_path):
                self.dataset_path = dataset_path
                self.feature = []
                self.label = []
                1. Read all csv files in order
                2. Add labels(accident types) to self.label, 
                add features(operation data or dose data) to self.feature
                for accident in os.listdir(self.dataset_path):
                    accident_path = self.data_path + '\\' + accident
                    for size_name in os.listdir(accident_path):
                        csv_data_path = accident_path + '\\' + size_name
                        sample_df = pd.read_csv(csv_data_path)
                        sample_value = (sample_df.iloc[:150, 1:]).values  # Take the data of 1500s
                        sample_list = list(chain.from_iterable(sample_value))  # Convert 2-D list to 1-D list
                self.label = (pd.Categorical(self.label)).codes
                assert len(self.label) == len(self.feature)
                self.length = len(self.feature)
            def __getitem__(self, index):
                x = self.feature[index]
                x = torch.Tensor(x)
                y = self.label[index]
                return {"x": x, "y": y}

            def __len__(self):
                return self.length
        return Mydataset(dataset_source_path)
  • Method show_parameters

Use this method to plot the variation of physical parameters.

    def show_parametes(self, variables, plot_data_path, figture_save_path):
        if (os.path.exists(figture_save_path) == False):
        plot_data = pd.read_csv(plot_data_path)
        plot_df = plot_data[variables]
        fig_plot = plot_df.plot()
        fig_name = ''
        for var in range(1,len(variables)):
            fig_name = fig_name  + variables[var] + '-'
        fig_save = fig_plot.get_figure()
        fig_path = figture_save_path + "\\" + fig_name


First, Python 3.6 or higher is already installed by default.

To install NPPAD from the soure code:

$ git clone 
$ cd NuclearPowerPlantAccidentData/
$ pip install -r requirements.txt



We appreciate all contributions. Please let us know if you encounter a bug by filing an issue.


NPPAD has a MIT license, as found in the LICENSE file.


Citing NPPAD in your research:

Qi, B., Xiao, X., Liang, J. et al. An open time-series simulated dataset covering various accidents 
for nuclear power plants.Sci Data 9, 766 (2022).


We collect various accident data of NPP by PCTRAN







No releases published


No packages published


  • Python 63.3%
  • q 36.7%