9 Infection Dates

To support date-based analyses, infection records will be aggregated by MMWR week and year. For both the Los Angeles County and California datasets, we will generate two new columns: mmwr_year and mmwr_week, then remove the original date-based fields. We will also add columns start_date and end_date to serve as reference points should we need them later.

In the California dataset, the field time_int encodes the year and MMWR week as a six-digit integer (YYYYWW). To create the new fields, we extract the first four digits as mmwr_year and the last two digits as mmwr_week, then drop the original time_int column.

In the Los Angeles County dataset, the codebook identifies a field dt_report as the last day of the MMWR week. However, this field contained only missing values, so it was removed. Instead, we convert the infection date field, dt_dx to a proper date format, and then use the MMWRweek package to derive the mmwr_year and mmwr_week.

Code

##-- California dataset:
step2_ca_df <- step1_ca_df %>%
##--pull MMWR week and year from time_int field
mutate(
  mmwr_year = factor(time_int %/% 100), 
  mmwr_week = factor(time_int %% 100)
) %>%
add_start_end_dates() %>%
select(-time_int) %>%
relocate(mmwr_year, mmwr_week, start_date, 
         end_date, .before = everything())


##-- LA county dataset:
step2_la_cnty_df <- step1_la_cnty_df %>%
##--restructure to proper date format
mutate(
  DATE_FIX = 
  as.Date(parse_date_time(dt_dx, "%d%b%Y"), 
          format = "%Y-%m-%d")
) %>% 
##--use date to create new MMWR fields
add_mmwr_week_columns(date_col = "DATE_FIX") %>%
add_start_end_dates() %>%
select(-c(DATE_FIX, dt_dx)) %>%
relocate(mmwr_year, mmwr_week, start_date, 
         end_date, .before = everything()) %>%
relocate(county, .before = age_cat)

To streamline this process, we created two helper functions:

add_mmwr_week_columns() : takes date column and adds two fields: mmwr_year and mmwr_week
add_start_end_dates() : uses those values to generate corresponding MMWR week start and end dates

The dataframes now have a structure that looks like this:

mmwr_year	mmwr_week	start_date	end_date	county	age_cat	new_infections
2023	22	2023-05-28	2023-06-03	Los Angeles	0-17	15
2023	23	2023-06-04	2023-06-10	Los Angeles	0-17	17
2023	24	2023-06-11	2023-06-17	Los Angeles	0-17	23