Requirement
In many instances for indications and modelling work, premium records need to be earned over calendar periods in order to better match premium with losses. Typically in SAS this is done by a loop that increments between the effective and expiry date of a policy and outputting the results at each step.
In R this can be done in a much easier way with both data.table
and dplyr
, although the data.table
solution is much faster. The code below will show how to split an entire policy period over a user-selected period as a fraction of 1.000
. This earned_factor
can be applied against any written measures to get an earned result.
References
Required packges
data.table
- Extension ofdata.frame
dplyr
- A Grammar of Data Manipulationtidyr
- Easily Tidy Data withspread()
andgather()
Functionszoo
- S3 Infrastructure for Regular and Irregular Time Series (Z’s Ordered Observations)
Inputs
policy_data
- Sample data.frame containing:policy
- Policy numberpolicy_eff
- Policy effective datepolicy_exp
- Policy expiry date- NOTE: There is an assumption that the combinations of the three values above are unique
increment_period
- Length of the earning period, e.g.day
,week
,month
,quarter
oryear
, seeseq.Date
. For this example the increment_period is1 quarter
to match theas.yearqtr
function used. If using other periods, then theas.yearqtr
function needs to be changed.reporting_date
- Date the earning will stop
Outputs
increment_start
- Beginning of the earning periodincrement_end
- End of the earning periodearned_period
- The amount of time (as a fraction of 1) earned in the period
Initiate a sample problemset
# define data
policy_data <- data.frame( policy = c("a","b","c"),
pol_eff = as.Date(c("2018-01-01","2018-07-03","2018-06-30")),
pol_exp = as.Date(c("2018-07-03","2019-10-16","2019-10-15")))
policy_data_by <- c("policy", "pol_eff", "pol_exp")
# increment_period <- "1 quarter"
reporting_date <- as.Date("2019-07-25")
Solution 1: data.table
library(data.table)
library(zoo)
# convert the data.frame to a data.table
# keep records where the pol_eff is before the reporting_date
# create increment_date as a vector of quarterly dates in each row of data up to earlier of pol_exp and reporting_date
# transposed for each combination of policy_data_by
policy_data_dt <- setDT(policy_data)[pol_eff <= reporting_date,
.(increment_date = seq(from = as.Date(as.yearqtr(pol_eff), frac = 0),
to = pmin(pol_exp, reporting_date),
by = "1 quarter")),
by = c(policy_data_by)]
# use the increment_date to define the first and last date of the increment_period
policy_data_dt[, `:=` (increment_start = pmax(as.Date(as.yearqtr(increment_date), frac = 0), pol_eff),
increment_end = pmin(as.Date(as.yearqtr(increment_date), frac = 1), pmin(pol_exp, reporting_date))
)]
# calculate the earned_period
# remove the increment_date variable
policy_data_dt[, `:=` (earned_period = round( as.numeric(increment_end - increment_start + 1) /
as.numeric(pol_exp - pol_eff + 1), 3),
increment_date = NULL
)]
## policy pol_eff pol_exp increment_start increment_end earned_period
## 1: a 2018-01-01 2018-07-03 2018-01-01 2018-03-31 0.489
## 2: a 2018-01-01 2018-07-03 2018-04-01 2018-06-30 0.495
## 3: a 2018-01-01 2018-07-03 2018-07-01 2018-07-03 0.016
## 4: b 2018-07-03 2019-10-16 2018-07-03 2018-09-30 0.191
## 5: b 2018-07-03 2019-10-16 2018-10-01 2018-12-31 0.195
## 6: b 2018-07-03 2019-10-16 2019-01-01 2019-03-31 0.191
## 7: b 2018-07-03 2019-10-16 2019-04-01 2019-06-30 0.193
## 8: b 2018-07-03 2019-10-16 2019-07-01 2019-07-25 0.053
## 9: c 2018-06-30 2019-10-15 2018-06-30 2018-06-30 0.002
## 10: c 2018-06-30 2019-10-15 2018-07-01 2018-09-30 0.195
## 11: c 2018-06-30 2019-10-15 2018-10-01 2018-12-31 0.195
## 12: c 2018-06-30 2019-10-15 2019-01-01 2019-03-31 0.190
## 13: c 2018-06-30 2019-10-15 2019-04-01 2019-06-30 0.192
## 14: c 2018-06-30 2019-10-15 2019-07-01 2019-07-25 0.053
Solution 2: dplyr
library(dplyr)
library(tidyr)
library(zoo)
policy_data_dp <- policy_data %>%
# turn the data.frame into a tibble which is more efficent
as_tibble() %>%
# keep records where the pol_eff is before the reporting_date
filter(pol_eff <= reporting_date) %>%
# set the processing to be done on each row
rowwise() %>%
# create increment_date as a vector of quarterly dates in each row of data up to earlier of pol_exp and reporting_date
mutate(increment_date = list(seq(from = as.Date(as.yearqtr(pol_eff), frac = 0),
to = pmin(pol_exp, reporting_date),
by = "1 quarter"))) %>%
# transposed for each row of data
unnest(increment_date) %>%
# reset grouping
ungroup() %>%
# use the increment_date to define the first and last date of the increment_period
# calculate the earned_period
mutate( increment_start = pmax(as.Date(as.yearqtr(increment_date), frac = 0), pol_eff),
increment_end = pmin(as.Date(as.yearqtr(increment_date), frac = 1), pmin(pol_exp, reporting_date)),
earned_period = round( as.numeric(increment_end - increment_start + 1) /
as.numeric(pol_exp - pol_eff + 1), 3)
) %>%
# remove the increment_date variable
select(-increment_date) %>%
# identity
identity()
## # A tibble: 14 x 6
## policy pol_eff pol_exp increment_start increment_end earned_period
## <chr> <date> <date> <date> <date> <dbl>
## 1 a 2018-01-01 2018-07-03 2018-01-01 2018-03-31 0.489
## 2 a 2018-01-01 2018-07-03 2018-04-01 2018-06-30 0.495
## 3 a 2018-01-01 2018-07-03 2018-07-01 2018-07-03 0.016
## 4 b 2018-07-03 2019-10-16 2018-07-03 2018-09-30 0.191
## 5 b 2018-07-03 2019-10-16 2018-10-01 2018-12-31 0.195
## 6 b 2018-07-03 2019-10-16 2019-01-01 2019-03-31 0.191
## 7 b 2018-07-03 2019-10-16 2019-04-01 2019-06-30 0.193
## 8 b 2018-07-03 2019-10-16 2019-07-01 2019-07-25 0.053
## 9 c 2018-06-30 2019-10-15 2018-06-30 2018-06-30 0.002
## 10 c 2018-06-30 2019-10-15 2018-07-01 2018-09-30 0.195
## 11 c 2018-06-30 2019-10-15 2018-10-01 2018-12-31 0.195
## 12 c 2018-06-30 2019-10-15 2019-01-01 2019-03-31 0.19
## 13 c 2018-06-30 2019-10-15 2019-04-01 2019-06-30 0.192
## 14 c 2018-06-30 2019-10-15 2019-07-01 2019-07-25 0.053