12  Combined Dataset

After reconciling all column names across the 3 datasets, standardizing age group, race/ethnicity, and infection date variables, we merge them together to generate a final, complete database and begin our descriptive explorations and analyses. The final, combined database as well as the cleaned individual datasets (California, Los Angeles, and population data) are stored and saved for future reference.

Download Combined Database (CSV)

Code
combined_df <- 
  rbind(step2_ca_df, step2_la_cnty_df) %>%
  relocate(health_officer_region,  .after = county) %>%
  relocate(pop, .after = race_long) 

Exports:

Code
# Final Exports to "cleaned_data" project directory:

#-write.csv(combined_df, file = here("data/cleaned_data/combined_df.csv"), row.names = FALSE)
#-write.csv(step2_ca_df, file = here("data/cleaned_data/cleaned_ca_df.csv"), row.names = FALSE)
#_write.csv(step2_la_cnty_df, file = here("data/cleaned_data/cleaned_la_cnty_df.csv"), row.names = FALSE)
#-write.csv(step2_pop_df, file = here("data/cleaned_data/cleaned_pop_df.csv"), row.names = FALSE)

Glimpse of database structure:

Code
dplyr::glimpse(combined_df)
Rows: 100,688
Columns: 24
$ mmwr_year              <dbl> 2023, 2023, 2023, 2023, 2023, 2023, 2023, 2023,…
$ mmwr_week              <dbl> 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,…
$ start_date             <date> 2023-05-28, 2023-06-04, 2023-06-11, 2023-06-18…
$ end_date               <date> 2023-06-03, 2023-06-10, 2023-06-17, 2023-06-24…
$ county                 <chr> "Alameda", "Alameda", "Alameda", "Alameda", "Al…
$ health_officer_region  <chr> "Bay Area", "Bay Area", "Bay Area", "Bay Area",…
$ age_cat                <chr> "0-17", "0-17", "0-17", "0-17", "0-17", "0-17",…
$ sex                    <chr> "FEMALE", "FEMALE", "FEMALE", "FEMALE", "FEMALE…
$ race_coded             <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ race_short             <chr> "White NH", "White NH", "White NH", "White NH",…
$ race_long              <chr> "White, Non-Hispanic", "White, Non-Hispanic", "…
$ pop                    <dbl> 34155, 34155, 34155, 34155, 34155, 34155, 34155…
$ new_infections         <dbl> 6, 1, 2, 10, 19, 25, 23, 18, 22, 35, 29, 43, 69…
$ cumulative_infected    <dbl> 6, 7, 9, 19, 38, 63, 86, 104, 126, 161, 190, 23…
$ new_unrecovered        <dbl> 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 3, 2,…
$ cumulative_unrecovered <dbl> 0, 1, 1, 1, 1, 2, 2, 3, 4, 5, 6, 6, 7, 7, 10, 1…
$ new_severe             <dbl> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ cumulative_severe      <dbl> 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ total_cnty_pop         <dbl> 1656037, 1656037, 1656037, 1656037, 1656037, 16…
$ total_race_pop         <dbl> 485940, 485940, 485940, 485940, 485940, 485940,…
$ total_age_pop          <dbl> 333476, 333476, 333476, 333476, 333476, 333476,…
$ total_sex_pop          <dbl> 849329, 849329, 849329, 849329, 849329, 849329,…
$ total_HOR_pop          <dbl> 8391874, 8391874, 8391874, 8391874, 8391874, 83…
$ total_ca_pop           <dbl> 39109070, 39109070, 39109070, 39109070, 3910907…