Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

files for outlier detection for demand data, where slope is used to i… #20

Merged
merged 41 commits into from
Jan 11, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
6245d18
files for outlier detection for demand data, where slope is used to i…
mlamherr Nov 9, 2018
19038cf
BA files for testing
mlamherr Nov 9, 2018
a934eeb
Update README.md
mlamherr Nov 10, 2018
d619f7d
Update README.md
mlamherr Nov 10, 2018
26d6357
initial commit, not yet done
mlamherr Nov 14, 2018
eac9f64
Merge branch 'anomaly_detect_1' of https://github.com/intvenlab/PreRE…
mlamherr Nov 14, 2018
f057087
minor changes
mlamherr Nov 14, 2018
6d433d7
fixed case when there are 2-5 consecutive zeros in data; for case whe…
Dec 13, 2018
bb9d2c0
notebook used for scaling demand data for CA simulations
Dec 15, 2018
27fa02b
changed date range so there's output for the demo
Jan 9, 2019
fceffa2
changed to relative data directory
Jan 9, 2019
db960e4
added files for other demo notebooks
Jan 10, 2019
9f3325d
demo notebook for county-based demand assignment
Jan 10, 2019
e5385c1
fixed westernint commands
Jan 10, 2019
a252f69
minor changes
Jan 10, 2019
7cb5a2e
Delete AssignBAByCounty.ipynb
mlamherr Jan 10, 2019
5b3c83a
Delete .DS_Store
mlamherr Jan 10, 2019
768f468
Delete Makefile
mlamherr Jan 10, 2019
1f59b9c
Delete officialTest.ipynb
mlamherr Jan 10, 2019
6a3210f
Delete officialTest.py
mlamherr Jan 10, 2019
0df4ee0
Delete testing_getDataClass.ipynb
mlamherr Jan 10, 2019
377af40
Delete test_slope_interpolate.cpython-36-PYTEST.pyc
mlamherr Jan 10, 2019
f4996ac
Delete test_from_excel.cpython-36-PYTEST.pyc
mlamherr Jan 10, 2019
290a6a1
Delete test_eia_download.cpython-36-PYTEST.pyc
mlamherr Jan 10, 2019
9a6b582
Delete test_EIAdownload.cpython-36-PYTEST.pyc
mlamherr Jan 10, 2019
797b673
Delete __init__.cpython-36.pyc
mlamherr Jan 10, 2019
6b7a3c0
Delete CheckDemandData-Copy1-checkpoint.ipynb
mlamherr Jan 10, 2019
be49e0e
Delete CheckDemandData-checkpoint.ipynb
mlamherr Jan 10, 2019
4319358
Delete CheckDemandVsLineCapacity-checkpoint.ipynb
mlamherr Jan 10, 2019
d0b9cc9
Delete officialTest-checkpoint.ipynb
mlamherr Jan 10, 2019
5735486
Delete testEIA-checkpoint.ipynb
mlamherr Jan 10, 2019
2803fab
Delete testPCA-checkpoint.ipynb
mlamherr Jan 10, 2019
a195530
Delete testing_getDataClass-checkpoint.ipynb
mlamherr Jan 10, 2019
f64e828
Merge branch 'develop' into anomaly_detect_1
rouille Jan 11, 2019
b5daf31
fix: Update __all__
rouille Jan 11, 2019
68ef510
docs: Format docstring
rouille Jan 11, 2019
15b2d9d
fix: Use absolute path
rouille Jan 11, 2019
15caf81
chore: Remove umused import
rouille Jan 11, 2019
b92bf79
docs: Follow PEP8
rouille Jan 11, 2019
38ec4c5
chore: Remove unused import
rouille Jan 11, 2019
ce94b6e
docs: Correct typos
rouille Jan 11, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 18 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ A scenario can be defined by adding an entry to the scenario list ***ScenarioLis
* **name**: `scenario_name`;
* **folder_location**: path to folder where MATLAB files are located;
* **input_data_location**: path to folder where input data are located;
* **output_data_location**: path to folder where input will be saved;
* **output_data_location**: path to folder where output data will be saved;
* **start_index**: start index;
* **end_index**: end index;
* **extract**: True/False. Should output data be converted to csv and;
Expand Down Expand Up @@ -138,14 +138,30 @@ from prereise.gather.demanddata.eia.test import test_from_excel
test_from_excel.test_from_excel()
```

The notebook [AssembleBAfromExcel_demo.ipynb](https://github.com/intvenlab/PreREISE/blob/sam/prereise/gather/demanddata/EIA/demo/AssembleBAfromExcel_demo.ipynb) illustrates usage.
The [AssembleBAfromExcel_demo.ipynb](https://github.com/intvenlab/PreREISE/blob/anomaly_detect_1/prereise/gather/demanddata/EIA/demo/AssembleBAfromExcel_demo.ipynb) notebook illustrates usage.

To output the demand profile, cleaning steps were applied to the EIA data:
1) missing data imputation - the EIA method was used, i.e., EIA published data was used; beyond this, NA's were converted to float zeros;
2) missing hours were added.

The BA counts were then distributed across each region where the BA operates, using the region populations as weights. For example, if a BA operates in both WA and OR, the counts for WA are weighted by the fraction of the total counts in WA relative to the total population of WA and OR.

The next step consist in detecting outliers by looking for large changes in the slope of the demand data. The underlying physical rationale is that demand changes are mostly driven by weather temperature changes (first or higher order), and thermal mass limits the rate at which demand values can change. By looking at the slope of demand data, it is seen that the slope distribution is normally distributed, and outliers can be easily found by imposing a z-score threshold value of 3. These outliers are then replaced by linear interpolation.

To use the outlier detector:
```python
from prereise.gather.demanddata.eia import find_fix_outliers

fixed_data = find_fix_outliers.slope_interpolate(BA_file, threshold)
```

To test
```python
from prereise.gather.demanddata.eia.test import test_slope_interpolate
test_slope_interpolate.test_slope_interpolate()
```
The [BA_Anomaly_Detection_demo.ipynb](https://github.com/intvenlab/PreREISE/blob/anomaly_detect_1/prereise/gather/demanddata/EIA/demo//BA_Anomaly_Detection_demo.ipynb) notebook illustrates usage.



## 4. Start simulation
Expand Down
2 changes: 1 addition & 1 deletion prereise/gather/demanddata/EIA/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__all__ = ["get_eia_data"]
__all__ = ["get_eia_data", "find_fix_outliers"]
53 changes: 43 additions & 10 deletions prereise/gather/demanddata/EIA/demo/AssembleBAfromExcel_demo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Create BA demand counts dataframe from Excel Spreadsheets.\n"
"Create BA demand counts dataframe from Excel Spreadsheets.\n",
"Tested to work on Mac and Windows.\n"
]
},
{
Expand All @@ -15,10 +16,6 @@
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"\n",
"from datetime import datetime\n",
"from dateutil.parser import parse\n",
"\n",
"import os\n",
"\n",
"from prereise.gather.demanddata.eia import get_eia_data"
Expand All @@ -33,8 +30,11 @@
"#Set location of EIA data\n",
"dir1 = os.path.join('..','test','data')\n",
"#Set dates\n",
"start = pd.to_datetime('2016-01-01 00:00:00')\n",
"end = pd.to_datetime('2016-12-31 23:00:00')"
"#start = pd.to_datetime('2016-01-01 00:00:00')\n",
"#end = pd.to_datetime('2016-12-31 23:00:00')\n",
"\n",
"start = pd.to_datetime('2015-08-01 00:00:00')\n",
"end = pd.to_datetime('2015-10-31 23:00:00')"
]
},
{
Expand Down Expand Up @@ -135,14 +135,47 @@
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2015-08-01 00:00:00</th>\n",
" <td>7661</td>\n",
" <td>40259.0</td>\n",
" <td>1518</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2015-08-01 01:00:00</th>\n",
" <td>7682</td>\n",
" <td>40616.0</td>\n",
" <td>1487</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2015-08-01 02:00:00</th>\n",
" <td>7560</td>\n",
" <td>40259.0</td>\n",
" <td>1416</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2015-08-01 03:00:00</th>\n",
" <td>7363</td>\n",
" <td>39147.0</td>\n",
" <td>1396</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2015-08-01 04:00:00</th>\n",
" <td>7143</td>\n",
" <td>37567.0</td>\n",
" <td>1344</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"Empty DataFrame\n",
"Columns: [BPAT, CISO, EPE]\n",
"Index: []"
" BPAT CISO EPE\n",
"2015-08-01 00:00:00 7661 40259.0 1518\n",
"2015-08-01 01:00:00 7682 40616.0 1487\n",
"2015-08-01 02:00:00 7560 40259.0 1416\n",
"2015-08-01 03:00:00 7363 39147.0 1396\n",
"2015-08-01 04:00:00 7143 37567.0 1344"
]
},
"execution_count": 5,
Expand Down
Loading