The population database contains counts for six age groups, whereas the California and LA County databases use four. Because the groupings align — “0–4,” “5–11,” and “12–17” in the population database correspond to the “0–17” group in the others — we summarized the population counts to match the four age-group format.
In the original population database, counts are reported for each single year of age within every demographic subgroup (e.g., county, sex, race/ethnicity). To obtain total estimates per age group and demographic category, the single-year counts were aggregated (summed).
Table 1 below presents a subset of the original and restructured population data to illustrate the resulting summarized age groupings.
Code
step2_pop_df <- step1_pop_df %>%group_by( county, health_officer_region, sex, race_ethnicity, age_cat) %>%summarise(pop =sum(pop), .groups ="drop")#-aggregate population counts ##--to 4 age category format:step2_pop_df_recat <- step1_pop_df %>%mutate(new_age_group =if_else( age_cat %in%c("0-4", "5-11", "12-17"),"0-17", age_cat)) %>%group_by( county, health_officer_region, sex, race_ethnicity, new_age_group) %>%summarise(pop =sum(pop), .groups ="drop") %>%rename("age_cat"="new_age_group")
Table 1: Reconciling Age Group Categories and Population Counts