getting-started.rmdThe usdeaths package provides tools for downloading,
reading, and decoding CDC vital statistics mortality files. These are
fixed-width files published by the National Center for Health Statistics
(NCHS) and cover US death records going back to the 1960s. Because the
raw files use numeric codes rather than human-readable labels, working
with them directly is cumbersome. This package handles the download,
parsing, and decoding for you.
library(usdeaths)For most users, cdc_import() is all you need. Pass it a
section name and year and it handles everything — resolving the download
URL, fetching the file from the CDC FTP server, reading the fixed-width
format, and decoding all coded columns into human-readable labels:
mort1969 <- cdc_import("mortality_multiple", 1969)
#> Decoding 1921990 rows...
#> # A tibble: 1,921,990 × 43
#> last_digit_data_year shipment_number reporting_area certificate_number ...
#> <chr> <chr> <chr> <chr>
#> 1 1969 13 All other areas NA
#> 2 1969 05 All other areas NA
#> ...Note: these files can be several hundred megabytes and contain millions of records. The 1969 mortality file has nearly 2 million rows. Download and decode time will vary depending on your connection and hardware.
cdc_import() is a wrapper around four lower-level
functions. You can call these individually if you need more control over
any step.
get_cdc_url() looks up the download URL for a given
section and year from the internal CDC link table:
url <- get_cdc_url("mortality_multiple", 1969)download_cdc() fetches the zip file from the CDC FTP
server and returns a path to a temporary file:
temp <- download_cdc(url)load_data() needs the metadata for the year you are
working with. Each year’s file has its own fixed-width layout and code
dictionary, shipped as a data object in the package:
meta <- data_mortality_multiple_1969
mort1969 <- load_data(temp, meta)The raw data comes back with numeric codes in every column:
mort1969
#> # A tibble: 1,921,990 × 43
#> last_digit_data_year shipment_number reporting_area certificate_number ...
#> <chr> <chr> <chr> <chr>
#> 1 9 13 0 NA
#> 2 9 05 0 NA
#> ...decode_all() decodes every coded column in place and
returns the full dataset with human-readable labels:
mort1969_decoded <- decode_all(mort1969, meta)For large files you may want to verify the codes are mapping
correctly before committing to a full decode.
decode_preview() lets you inspect a subset of columns and
rows side by side with their decoded labels:
# first 5 coded columns, 1000 rows
decode_preview(mort1969, meta)
# first 10 coded columns
decode_preview(mort1969, meta, first_n = 10)
# first 3 coded columns plus columns 12 and 20 specifically
decode_preview(mort1969, meta, first_n = 3, numbers = c(12, 20))