Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ropenaqoknow #376

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,6 @@ Suggests:
rgdal,
rmarkdown,
purrr,
ggmap,
ropenaq,
vcr (>= 0.5.4),
webmockr
Expand Down
Binary file modified vignettes/data/measurementsIndianapolis.RData
Binary file not shown.
Binary file modified vignettes/data/stationsIndianapolis.RData
Binary file not shown.
82 changes: 66 additions & 16 deletions vignettes/rnoaa_ropenaq.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: "ropenaq and rnoaa"
subtitle: "Complementing air quality data with weather data using rnoaa"
author: "Maëlle Salmon"
date: "2020-07-27"
date: "2020-10-19"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{ropenaq and rnoaa}
Expand Down Expand Up @@ -37,6 +37,20 @@ We filter negative values.
load("data/measurementsIndianapolis.RData")
library("dplyr")
measurementsIndianapolis %>% head() %>% knitr::kable()
```



|location |parameter | value|unit |country |city | latitude| longitude|date.utc |cityURL |locationURL |dateUTC |dateLocal |
|:--------------------|:---------|-----:|:-----|:-------|:------------|--------:|---------:|:--------------------|:------------|:--------------------|:-------------------|:----------|
|Indpls-W.18th St |pm25 | 11.9|µg/m³ |US |Indianapolis | 39.78890| -86.21470|2020-10-19T00:00:00Z |Indianapolis |Indpls-W.18th+St |2020-10-19 00:00:00 |2020-10-18 |
|Indpls-Washington Pa |pm25 | 10.9|µg/m³ |US |Indianapolis | 39.81080| -86.11470|2020-10-19T00:00:00Z |Indianapolis |Indpls-Washington+Pa |2020-10-19 00:00:00 |2020-10-18 |
|Indpls - I-70 E |pm25 | 12.5|µg/m³ |US |Indianapolis | 39.78793| -86.13088|2020-10-19T00:00:00Z |Indianapolis |Indpls+-+I-70+E |2020-10-19 00:00:00 |2020-10-18 |
|Indpls-Washington Pa |pm25 | 11.1|µg/m³ |US |Indianapolis | 39.81080| -86.11470|2020-10-18T23:00:00Z |Indianapolis |Indpls-Washington+Pa |2020-10-18 23:00:00 |2020-10-18 |
|Indpls-W.18th St |pm25 | 13.8|µg/m³ |US |Indianapolis | 39.78890| -86.21470|2020-10-18T23:00:00Z |Indianapolis |Indpls-W.18th+St |2020-10-18 23:00:00 |2020-10-18 |
|Indpls - I-70 E |pm25 | 12.7|µg/m³ |US |Indianapolis | 39.78793| -86.13088|2020-10-18T23:00:00Z |Indianapolis |Indpls+-+I-70+E |2020-10-18 23:00:00 |2020-10-18 |

```r
measurementsIndianapolis <- filter(measurementsIndianapolis, value > 0)
```

Expand All @@ -48,16 +62,26 @@ We now transform these data into daily data.
measurementsIndianapolis <- filter(measurementsIndianapolis, !is.na(latitude))
# now transform to daily data
measurementsIndianapolis <- measurementsIndianapolis %>%
#mutate(day = as.Date(dateLocal)) %>%
mutate(day = as.Date(dateLocal)) %>%
group_by(location, day) %>%
summarize(value = mean(value),
longitude = longitude[1],
latitude = latitude[1]) %>%
ungroup()
measurementsIndianapolis %>% head() %>% knitr::kable()

```



|location |day | value| longitude| latitude|
|:---------------|:----------|---------:|---------:|--------:|
|Indpls - I-70 E |2020-09-18 | 6.300000| -86.13088| 39.78793|
|Indpls - I-70 E |2020-09-19 | 5.190476| -86.13088| 39.78793|
|Indpls - I-70 E |2020-09-20 | 6.442857| -86.13088| 39.78793|
|Indpls - I-70 E |2020-09-21 | 5.370000| -86.13088| 39.78793|
|Indpls - I-70 E |2020-09-22 | 11.658824| -86.13088| 39.78793|
|Indpls - I-70 E |2020-09-23 | 10.900000| -86.13088| 39.78793|

Air quality and weather are correlated, so one could be interested in getting a time series of say temperature for the same location. The OpenAQ platform itself does not provide weather data but nearly all stations have geographical coordinates. Our goal here will be to use rnoaa to complement this table with precipitation and temperature.

# Find weather stations
Expand All @@ -70,6 +94,10 @@ Here we query stations with a less than 15km distance from the air quality stati
```r
library("rnoaa")
station_data <- ghcnd_stations()
#> using cached file: /home/maelle/.cache/R/noaa_ghcnd/ghcnd-stations.rds
#> date created (size, mb): 2020-10-19 06:50:53 (2.084)
#> using cached file: /home/maelle/.cache/R/noaa_ghcnd/ghcnd-inventory.rds
#> date created (size, mb): 2020-10-19 06:51:48 (2.572)
lat_lon_df <- select(measurementsIndianapolis,
location,
latitude,
Expand All @@ -86,7 +114,6 @@ stationsIndianapolis <- meteo_nearby_stations(lat_lon_df = as.data.frame(lat_lon
stationsIndianapolis <- unique(bind_rows(stationsIndianapolis) %>% select(- distance))

save(stationsIndianapolis, file = "data/stationsIndianapolis.RData")

```


Expand All @@ -95,20 +122,13 @@ load("data/stationsIndianapolis.RData")
stationsIndianapolis %>% knitr::kable()
```

Now let us plot the AQ and weather stations on a quick and dirty map with no legend, red for AQ stations, blue for weather stations.

> Not shown


```r
library("ggmap")
map <- get_map(location = "Indianapolis", zoom = 11)
ggmap(map) +
geom_point(aes(x = longitude, y = latitude),
data = stationsIndianapolis, col = "blue", size = 4)+
geom_point(aes(x = longitude, y = latitude),
data = measurementsIndianapolis, col = "red", size = 4)
```
|id |name | latitude| longitude|
|:-----------|:--------------------|--------:|---------:|
|US1INMR0134 |INDIANAPOLIS 1.3 SSE | 39.7574| -86.1404|
|US1INMR0122 |INDIANAPOLIS 3.4 WSW | 39.7565| -86.2044|
|US1INMR0155 |INDIANAPOLIS 5.1 ENE | 39.8130| -86.0620|

# Query weather data for these stations

Expand All @@ -122,9 +142,26 @@ all_monitors_clean <- meteo_pull_monitors(monitors,
date_min = as.character(Sys.Date() - 30),
date_max = as.character(Sys.Date())) %>%
dplyr::rename(day = date, location = id)
#> using cached file: /home/maelle/.cache/R/noaa_ghcnd/US1INMR0134.dly
#> date created (size, mb): 2020-10-12 11:35:40 (0.008)
#> file min/max dates: Inf / -Inf
#> using cached file: /home/maelle/.cache/R/noaa_ghcnd/US1INMR0122.dly
#> date created (size, mb): 2020-10-12 11:37:24 (0.008)
#> file min/max dates: Inf / -Inf
#> using cached file: /home/maelle/.cache/R/noaa_ghcnd/US1INMR0155.dly
#> date created (size, mb): 2020-10-19 06:51:57 (0.055)
#> file min/max dates: 2018-07-01 / 2020-04-30
#> Warning in meteo_pull_monitors(monitors, date_min = as.character(Sys.Date() - : The following stations could not be pulled from the GHCN ftp:
#> US1INMR0134, US1INMR0122
#> Any other monitors were successfully pulled from GHCN.
all_monitors_clean %>% head() %>% knitr::kable()
```



|location |day |
|:--------|:---|

Here we notice some values are not available. Therefore, we might need to go back to weather stations searching with, for instance, a larger radius. In this case let's say we're ok with the result of the search.

# Join the two tables, thus complementing the original table
Expand All @@ -138,6 +175,17 @@ measurementsIndianapolis %>% head() %>% knitr::kable()
```



|location |day | value| longitude| latitude|
|:---------------|:----------|---------:|---------:|--------:|
|Indpls - I-70 E |2020-09-18 | 6.300000| -86.13088| 39.78793|
|Indpls - I-70 E |2020-09-19 | 5.190476| -86.13088| 39.78793|
|Indpls - I-70 E |2020-09-20 | 6.442857| -86.13088| 39.78793|
|Indpls - I-70 E |2020-09-21 | 5.370000| -86.13088| 39.78793|
|Indpls - I-70 E |2020-09-22 | 11.658824| -86.13088| 39.78793|
|Indpls - I-70 E |2020-09-23 | 10.900000| -86.13088| 39.78793|


Now some locations are air quality locations and have only missing values in the weather columns, and some locations are weather locations and have only missing values in the air quality columns.

We can plot the data we got.
Expand All @@ -156,3 +204,5 @@ ggplot(data_plot) +
geom_line(aes(x = day, y = value, col = location)) +
facet_grid(parameter ~ ., scales = "free_y")
```

![plot of chunk unnamed-chunk-9](figure/unnamed-chunk-9-1.png)
34 changes: 10 additions & 24 deletions vignettes/rnoaa_ropenaq.Rmd.og
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
purl = FALSE,
eval = FALSE
eval = TRUE
)
```

Expand All @@ -25,7 +25,7 @@ This vignette aims at explaining how you can complement a data.frame with weathe

First, we need to know how many measures are available for Indianapolis for March 2016.

```{r, message = FALSE, warning = FALSE, eval = FALSE, echo = TRUE}
```{r, message = FALSE, warning = FALSE, eval = TRUE, echo = TRUE}
library("ropenaq")

measurementsIndianapolis <- aq_measurements(city = "Indianapolis", parameter = "pm25",
Expand All @@ -37,7 +37,7 @@ save(measurementsIndianapolis, file = "data/measurementsIndianapolis.RData")

We filter negative values.

```{r, message = FALSE, warning = FALSE, eval = FALSE, echo = TRUE}
```{r, message = FALSE, warning = FALSE, eval = TRUE, echo = TRUE}

load("data/measurementsIndianapolis.RData")
library("dplyr")
Expand All @@ -47,12 +47,12 @@ measurementsIndianapolis <- filter(measurementsIndianapolis, value > 0)

We now transform these data into daily data.

```{r, message = FALSE, warning = FALSE, eval = FALSE}
```{r, message = FALSE, warning = FALSE, eval = TRUE}
# only keep stations with geographical information
measurementsIndianapolis <- filter(measurementsIndianapolis, !is.na(latitude))
# now transform to daily data
measurementsIndianapolis <- measurementsIndianapolis %>%
#mutate(day = as.Date(dateLocal)) %>%
mutate(day = as.Date(dateLocal)) %>%
group_by(location, day) %>%
summarize(value = mean(value),
longitude = longitude[1],
Expand All @@ -70,7 +70,7 @@ For finding the right station(s) we shall use the `meteo_nearby_stations` functi

Here we query stations with a less than 15km distance from the air quality stations, with precipitation and temperature data, and with data starting from 2016. Note that the query takes a while.

```{r, eval = FALSE, echo = TRUE, eval = FALSE}
```{r, eval = TRUE, echo = TRUE, eval = TRUE}
library("rnoaa")
station_data <- ghcnd_stations()
lat_lon_df <- select(measurementsIndianapolis,
Expand All @@ -92,30 +92,16 @@ save(stationsIndianapolis, file = "data/stationsIndianapolis.RData")

```

```{r eval = FALSE}
```{r eval = TRUE}
load("data/stationsIndianapolis.RData")
stationsIndianapolis %>% knitr::kable()
```

Now let us plot the AQ and weather stations on a quick and dirty map with no legend, red for AQ stations, blue for weather stations.

> Not shown

```{r, message = FALSE, warning = FALSE, fig.width = 10, fig.height = 10, eval=FALSE}
library("ggmap")
map <- get_map(location = "Indianapolis", zoom = 11)
ggmap(map) +
geom_point(aes(x = longitude, y = latitude),
data = stationsIndianapolis, col = "blue", size = 4)+
geom_point(aes(x = longitude, y = latitude),
data = measurementsIndianapolis, col = "red", size = 4)
```

# Query weather data for these stations

For pulling weather data from these weather monitors, we shall use the `meteo_pull_monitors` function.

```{r eval = FALSE}
```{r eval = TRUE}
library("rnoaa")
monitors <- stationsIndianapolis$id
all_monitors_clean <- meteo_pull_monitors(monitors,
Expand All @@ -131,7 +117,7 @@ Here we notice some values are not available. Therefore, we might need to go bac

Therefore, in this case we will bind the rows of the air quality table with the weather table.

```{r eval = FALSE}
```{r eval = TRUE}
measurementsIndianapolis <- bind_rows(measurementsIndianapolis, all_monitors_clean)
measurementsIndianapolis %>% head() %>% knitr::kable()
```
Expand All @@ -141,7 +127,7 @@ measurementsIndianapolis %>% head() %>% knitr::kable()

We can plot the data we got.

```{r, fig.width = 10, fig.height = 5, warning = FALSE, eval = FALSE}
```{r, fig.width = 10, fig.height = 5, warning = FALSE, eval = TRUE}
data_plot <- measurementsIndianapolis %>%
rename(pm25 = value) %>%
select(- longitude, - latitude)
Expand Down