Code
combined_df <-
rbind(step2_ca_df, step2_la_cnty_df) %>%
relocate(health_officer_region, .after = county) %>%
relocate(pop, .after = race_long) After reconciling all column names across the 3 datasets, standardizing age group, race/ethnicity, and infection date variables, we merge them together to generate a final, complete database and begin our descriptive explorations and analyses. The final, combined database as well as the cleaned individual datasets (California, Los Angeles, and population data) are stored and saved for future reference.
Download Combined Database (CSV)
combined_df <-
rbind(step2_ca_df, step2_la_cnty_df) %>%
relocate(health_officer_region, .after = county) %>%
relocate(pop, .after = race_long) Exports:
# Final Exports to "cleaned_data" project directory:
#-write.csv(combined_df, file = here("data/cleaned_data/combined_df.csv"), row.names = FALSE)
#-write.csv(step2_ca_df, file = here("data/cleaned_data/cleaned_ca_df.csv"), row.names = FALSE)
#_write.csv(step2_la_cnty_df, file = here("data/cleaned_data/cleaned_la_cnty_df.csv"), row.names = FALSE)
#-write.csv(step2_pop_df, file = here("data/cleaned_data/cleaned_pop_df.csv"), row.names = FALSE)Glimpse of database structure:
dplyr::glimpse(combined_df)Rows: 100,688
Columns: 24
$ mmwr_year <dbl> 2023, 2023, 2023, 2023, 2023, 2023, 2023, 2023,…
$ mmwr_week <dbl> 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,…
$ start_date <date> 2023-05-28, 2023-06-04, 2023-06-11, 2023-06-18…
$ end_date <date> 2023-06-03, 2023-06-10, 2023-06-17, 2023-06-24…
$ county <chr> "Alameda", "Alameda", "Alameda", "Alameda", "Al…
$ health_officer_region <chr> "Bay Area", "Bay Area", "Bay Area", "Bay Area",…
$ age_cat <chr> "0-17", "0-17", "0-17", "0-17", "0-17", "0-17",…
$ sex <chr> "FEMALE", "FEMALE", "FEMALE", "FEMALE", "FEMALE…
$ race_coded <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ race_short <chr> "White NH", "White NH", "White NH", "White NH",…
$ race_long <chr> "White, Non-Hispanic", "White, Non-Hispanic", "…
$ pop <dbl> 34155, 34155, 34155, 34155, 34155, 34155, 34155…
$ new_infections <dbl> 6, 1, 2, 10, 19, 25, 23, 18, 22, 35, 29, 43, 69…
$ cumulative_infected <dbl> 6, 7, 9, 19, 38, 63, 86, 104, 126, 161, 190, 23…
$ new_unrecovered <dbl> 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 3, 2,…
$ cumulative_unrecovered <dbl> 0, 1, 1, 1, 1, 2, 2, 3, 4, 5, 6, 6, 7, 7, 10, 1…
$ new_severe <dbl> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ cumulative_severe <dbl> 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ total_cnty_pop <dbl> 1656037, 1656037, 1656037, 1656037, 1656037, 16…
$ total_race_pop <dbl> 485940, 485940, 485940, 485940, 485940, 485940,…
$ total_age_pop <dbl> 333476, 333476, 333476, 333476, 333476, 333476,…
$ total_sex_pop <dbl> 849329, 849329, 849329, 849329, 849329, 849329,…
$ total_HOR_pop <dbl> 8391874, 8391874, 8391874, 8391874, 8391874, 83…
$ total_ca_pop <dbl> 39109070, 39109070, 39109070, 39109070, 3910907…