To calculate infection rates by demographic groups (such as county, health officer region, sex, or race/ethnicity), we first summarize total population counts for each demographic category within the population dataframe. The population dataset is then joined to both the master database (which merges all three datasets) and the individual California and Los Angeles datasets.
By creating and maintaining these summarized population counts, we avoid having to recalculate them each time we focus on a different demographic group. For example, once this population dataset is joined to the California data, we can easily calculate population-adjusted infection rates that allow valid comparisons across counties with differing population sizes.
Code
step2_pop_df <- step2_pop_df_recat %>%##-- join population database to the race/ethnicity mapmutate(race_short =clean(race_ethnicity)) %>%select(-race_ethnicity) %>%left_join(race_ethnicity_map, by ="race_short") %>%relocate(race_coded, race_short, race_long, .after = sex) %>%##-- calculate population totals by demographicgroup_by(county, health_officer_region) %>%mutate(total_cnty_pop =sum(pop)) %>%ungroup() %>%group_by(county, health_officer_region, race_coded, race_short, race_long) %>%mutate(total_race_pop =sum(pop)) %>%ungroup() %>%group_by(county, age_cat) %>%mutate(total_age_pop =sum(pop)) %>%ungroup() %>%group_by(county, health_officer_region, sex) %>%mutate(total_sex_pop =sum(pop)) %>%ungroup() %>%group_by(health_officer_region) %>%mutate(total_HOR_pop =sum(pop)) %>%ungroup()##-- get total CA populationtotal_ca_pop <- step2_pop_df %>%distinct(health_officer_region, total_HOR_pop) %>%summarise(total_ca_pop =sum(total_HOR_pop, na.rm =TRUE)) %>%pull(total_ca_pop)##-- add in to dfstep2_pop_df <- step2_pop_df %>%mutate(total_ca_pop = total_ca_pop)