-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathDataSourcces.txt
33 lines (24 loc) · 1.35 KB
/
DataSourcces.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
Data source (healthcare expenditures):
https://www.cms.gov/data-research/statistics-trends-and-reports/national-health-expenditure-data/historical
In retrospect, this is a small and quite messy dataset. It has freestyle header and footer text and the bulk of the tabular data is not of interest.
Downloaded:
"NHE Summary, including share of GDP, CY 1960-2022 (ZIP)"
1) "Extract al..." applied to the ZIP folder;
2) Used Google Sheets to open ("Import") "NHE_Summary.xlsx";
3) Deleted all rows except the original Row 2 and Row 31 (becoming Rows 1 & 2);
4) Deleted Column A;
6) Saved ("Download") data in CSV format as "NHE_cleaned.csv";
7) Read that into an R data.frame and reformatted in R:
nhe_wide <- read_csv("data/NHE22_cleaned.csv")
nhe_long <- pivot_longer(nhe_wide, cols=everything())
colnames(nhe_long) <- c("Year", "PCTGDP")
Data source (life expectancy [all races, both sexes]):
1) From https://catalog.data.gov/dataset/nchs-death-rates-and-life-expectancy-at-birth downloaded the CSV format;
2) Imported into GSheets;
3) Kept Rows 1 - 120 (all races, both sexes);
4) Kept only Column A & D (became A & B);
5) renamed Column B to "LifeExpectancy";
5) Saved/"Download" as "LifeExpectancy_cleaned.csv";
6) Column names: nchs_long <- read_csv("data/NCHS_cleaned.csv")
colnames(nchs_long) <- c("Year", "LifeExp")
That prepared the data for analysis and visualization.