Derivation of weights

Author

Thomas E. Metherell

Dependencies

library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(ggplot2)
Warning: package 'ggplot2' was built under R version 4.4.1
library(knitr)
Warning: package 'knitr' was built under R version 4.4.2
library(mice)

Attaching package: 'mice'
The following object is masked from 'package:stats':

    filter
The following objects are masked from 'package:base':

    cbind, rbind
library(targets)
Warning: package 'targets' was built under R version 4.4.2
imputed_data <- tar_read(weighting_data_imputed)
weights_raw <- tar_read(exclusion_weights_raw)
weights_final <- tar_read(exclusion_weights_truncated)

Establishing multiple imputation convergence

Below are the trace plots from the multiple imputation carried out for the purpose of deriving weights.

plot(imputed_data)

Raw weights distributions

The histograms below show the distributions of raw derived weights. Below that, some extreme percentiles (of participants included in analysis) are printed. The distributions of participants included in the analysis are shown in blue, and those of excluded participants in red.

Period 1

Density plot

data_raw_1 <- data.frame(weight = weights_raw$period1$weights, included = weights_raw$period1$included)
data_raw_1 <- data_raw_1 %>% mutate(included = case_match(included, 1 ~ "Yes", 0 ~ "No"))

ggplot(data = data_raw_1, mapping = aes(x = weight, y = after_stat(density))) +
  stat_density(fill = "grey") +
  stat_density(mapping = aes(fill = included), alpha = 0.5) +
  scale_x_continuous(limits = c(1, 100), transform = "log10") +
  labs(x = "Weight (truncated)", y = "Density", fill = "Included?")
Warning: Removed 2931 rows containing non-finite outside the scale range
(`stat_density()`).
Removed 2931 rows containing non-finite outside the scale range
(`stat_density()`).

Quantiles (included participants only)

quantile(data_raw_1$weight[data_raw_1$included == "Yes"], seq(0.9, 1, 0.01)) |> kable()
x
90% 2.175036
91% 2.248922
92% 2.321564
93% 2.421259
94% 2.525850
95% 2.685853
96% 2.979170
97% 5.727629
98% 7.703989
99% 11.180924
100% 486.939526

Period 2

Density plot

data_raw_2 <- data.frame(weight = weights_raw$period2$weights, included = weights_raw$period2$included)
data_raw_2 <- data_raw_2 %>% mutate(included = case_match(included, 1 ~ "Yes", 0 ~ "No"))

ggplot(data = data_raw_2, mapping = aes(x = weight, y = after_stat(density))) +
  stat_density(fill = "grey") +
  stat_density(mapping = aes(fill = included), alpha = 0.5) +
  scale_x_continuous(limits = c(1, 100), transform = "log10") +
  labs(x = "Weight (truncated)", y = "Density", fill = "Included?")
Warning: Removed 2606 rows containing non-finite outside the scale range
(`stat_density()`).
Removed 2606 rows containing non-finite outside the scale range
(`stat_density()`).

Quantiles (included participants only)

quantile(data_raw_2$weight[data_raw_2$included == "Yes"], seq(0.9, 1, 0.01)) |> kable()
x
90% 2.040035
91% 2.098268
92% 2.178179
93% 2.321083
94% 2.599574
95% 4.105450
96% 4.735356
97% 5.583270
98% 6.695225
99% 21.900281
100% 165.648023

On the basis of the above distributions we elected to truncate weights at the 98th percentile (i.e. any values above the 98th percentile were reduced to the value at the 98th percentile). This is to avoid certain extreme predictions dominating the results.

Truncated weights distributions

Below, the distributions of weights as truncated are shown.

Period 1

data_final_1 <- data.frame(weight = weights_final$period1$weights, included = weights_final$period1$included)
data_final_1 <- data_final_1 %>% mutate(included = case_match(included, 1 ~ "Yes", 0 ~ "No"))

ggplot(data = data_final_1, mapping = aes(x = weight, y = after_stat(density))) +
  stat_density(fill = "grey") +
  stat_density(mapping = aes(fill = included), alpha = 0.5) +
  scale_x_log10() +
  labs(x = "Weight (truncated)", y = "Density", fill = "Included?")

Period 2

data_final_2 <- data.frame(weight = weights_final$period2$weights, included = weights_final$period2$included)
data_final_2 <- data_final_2 %>% mutate(included = case_match(included, 1 ~ "Yes", 0 ~ "No"))

ggplot(data = data_final_2, mapping = aes(x = weight, y = after_stat(density))) +
  stat_density(fill = "grey") +
  stat_density(mapping = aes(fill = included), alpha = 0.5) +
  scale_x_log10() +
  labs(x = "Weight (truncated)", y = "Density", fill = "Included?")