Applies section-specific cleaning and reshaping to each element of the raw scraped data list. Each section has unique quirks that require custom handling before the data can be combined into a lookup table.

clean_all_sections(datas, url_pdf)

Arguments

datas

A named list of raw tibbles as returned by scrape_all_sections.

url_pdf

URL to the CDC mortality public use data page, used to scrape mortality user guide links separately.

Value

The same named list with each element cleaned and pivoted to wide format, with one row per year and columns for each subsection's URL, file size, and file type.

Details

Section-specific handling:

Births

Addenda filtered out; user guide URL forward-filled for years without a dedicated guide.

Period Linked

Rows with no U.S. Data URL are dropped.

Matched Multiple

Redundant 1995-1997 file dropped in favour of the superseding 1995-2000 file.

Mortality Multiple

User guides are hosted on a separate page and scraped independently. 1997 and 1998 require a further dedicated scrape to extract the Detail Record Layout PDF. User guide URL is forward-filled for years without a dedicated guide.

Fetal Death

For 2014 and 2015, the plain U.S. Data and U.S. Territories files are dropped in favour of the richer "with cause of death" versions.