Extracts downloadable file links from a CDC Vital Statistics page section identified by an anchor ID. The function navigates the HTML structure, collects links from listScroll elements, and returns a tidy tibble with metadata about each file.

scrape_cdc_section(page, anchor_id, section_name, subsection_names)

Arguments

page

An HTML document returned by rvest::read_html().

anchor_id

Character string giving the HTML anchor ID for the section.

section_name

Human-readable name of the section.

subsection_names

Character vector of subsection names. Must match the number of listScroll elements found in the section.

Value

A tibble with columns:

section

Section name

subsection

Subsection name

link_text

Text of the download link

year

Extracted year or leading label

file_size

File size string, if present

url

Absolute URL to the file

file_type

File extension

Details

The function assumes the CDC page structure uses .listScroll containers and that the anchor is nested three levels below the section root. Changes to page structure may require updating the DOM traversal.