--- title: "Validation against the survey package" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Validation against the survey package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") library(weightflow) has_survey <- requireNamespace("survey", quietly = TRUE) ``` weightflow's calibration is meant to reproduce the established results of the `survey` package on the methods they share — raking, post-stratification and linear (GREG) calibration — while adding the staged cascade (eligibility, nonresponse, selection) and a recipe-aware bootstrap on top. This vignette checks that agreement directly: on the same starting weights and the same control totals, the two packages return the same weights. To make every unit comparable one-to-one, the recipes below use **only** the calibration step (no dropping or nonresponse), so no rows are removed. ```{r data} d <- sample_survey N <- nrow(population) ``` ## Post-stratification Post-stratifying to the population counts of `region`: each region's weights are rescaled so the weighted count matches the known total. ```{r poststratify, eval = has_survey} library(survey) # weightflow wf <- weighting_spec(d, base_weights = pw) |> step_calibrate(method = "poststratify", margins = list(region = c(table(population$region)))) |> prep() w_wf <- wf$final_weight # survey des <- svydesign(ids = ~1, weights = ~pw, data = d) pr <- data.frame(region = names(table(population$region)), Freq = as.numeric(table(population$region))) des_ps <- postStratify(des, ~region, pr) w_sv <- weights(des_ps) c(max_abs_weight_diff = max(abs(w_wf - w_sv))) ``` ## Raking Raking (iterative proportional fitting) to the `region` and `sex` margins. We tighten `survey`'s convergence so both solve the system to the same precision. ```{r raking, eval = has_survey} # weightflow wf <- weighting_spec(d, base_weights = pw) |> step_calibrate(method = "raking", margins = list(region = c(table(population$region)), sex = c(table(population$sex)))) |> prep() w_wf <- wf$final_weight # survey (tight epsilon so it fully converges, like weightflow) des <- svydesign(ids = ~1, weights = ~pw, data = d) ps <- data.frame(sex = names(table(population$sex)), Freq = as.numeric(table(population$sex))) des_rk <- rake(des, list(~region, ~sex), list(pr, ps), control = list(epsilon = 1e-10, maxit = 100)) w_sv <- weights(des_rk) c(max_abs_weight_diff = max(abs(w_wf - w_sv))) ``` ## Linear (GREG) calibration Linear calibration to the totals of the design matrix of `~ region + sex`, including the intercept (the population size `N`). ```{r greg, eval = has_survey} totals <- colSums(model.matrix(~ region + sex, population)) # weightflow wf <- weighting_spec(d, base_weights = pw) |> step_calibrate(method = "linear", formula = ~ region + sex, totals = totals) |> prep() w_wf <- wf$final_weight # survey des <- svydesign(ids = ~1, weights = ~pw, data = d) des_cal <- calibrate(des, ~ region + sex, population = totals, calfun = "linear") w_sv <- weights(des_cal) c(max_abs_weight_diff = max(abs(w_wf - w_sv))) ``` ## Same estimates The agreement carries over to estimates. A calibrated total of a survey outcome matches between the two packages: ```{r estimate, eval = has_survey} wf <- weighting_spec(d, base_weights = pw) |> step_calibrate(method = "raking", margins = list(region = c(table(population$region)), sex = c(table(population$sex)))) |> prep() total_wf <- sum(wf$final_weight * d$employed, na.rm = TRUE) des <- svydesign(ids = ~1, weights = ~pw, data = d) des_rk <- rake(des, list(~region, ~sex), list(pr, ps), control = list(epsilon = 1e-10, maxit = 100)) total_sv <- as.numeric(svytotal(~employed, des_rk, na.rm = TRUE)) c(weightflow = total_wf, survey = total_sv, difference = total_wf - total_sv) ``` ## What weightflow adds The point of agreement is trust: where the methods overlap, weightflow returns exactly what `survey` does. On top of that shared core, weightflow contributes the **staged cascade** — unknown eligibility, ineligible dropping, within-household selection, and person- or household-level nonresponse, each as a pipeable step with diagnostics — and a **bootstrap that re-applies the whole recipe** on each replicate, so the variance reflects every adjustment (see the *Variance estimation* article). For design-based inference you can always export the final weights back to `survey`/`srvyr`.