resampling-strategieslisted
Install: claude install-skill choxos/BiostatAgent
# Resampling Strategies
## Overview
Comprehensive guide to resampling methods for model validation using the rsample package. Covers cross-validation, bootstrapping, and specialized resampling for time series and grouped data.
## Data Splitting
### Basic Train/Test Split
```r
library(rsample)
set.seed(123)
# Simple split (75% training)
split <- initial_split(data, prop = 0.75)
# Stratified split (maintain outcome proportions)
split <- initial_split(data, prop = 0.75, strata = outcome)
# Stratified with breaking for continuous outcomes
split <- initial_split(data, prop = 0.75, strata = outcome, breaks = 4)
# Extract sets
train <- training(split)
test <- testing(split)
```
### Three-Way Split (Train/Validation/Test)
```r
# Single validation set
split <- initial_validation_split(data, prop = c(0.6, 0.2))
train <- training(split)
val <- validation(split)
test <- testing(split)
# Create validation set from resampling
val_set <- validation_set(split)
```
## Cross-Validation
### V-Fold Cross-Validation
```r
# Basic 10-fold CV
folds <- vfold_cv(train_data, v = 10)
# Stratified CV
folds <- vfold_cv(train_data, v = 10, strata = outcome)
# Repeated CV
folds <- vfold_cv(train_data, v = 10, repeats = 5, strata = outcome)
# Access individual folds
folds$splits[[1]]
analysis(folds$splits[[1]]) # training fold
assessment(folds$splits[[1]]) # validation fold
```
### Leave-One-Out CV
```r
# LOO CV (useful for small datasets)
loo_folds <- loo_cv(train_data)
```
### Mont