fit <- didgpu(
panel, ...,
bootstrap_reps = 2000,
checkpoint_dir = "~/didgpu-runs/2026-05-28-fit",
seed = 17
)11 Reproducibility & checkpointing
12 Reproducibility & checkpointing
didgpu’s checkpointing is the difference between a 12-hour bootstrap that resumes after a crash and a 12-hour bootstrap that has to start over.
12.1 Turn it on
didgpu writes one small RDS file per bootstrap cell (the per-rep sufficient statistics, not the panel — usually a few KB each).
12.2 Resume after a crash
Just rerun the same call with the same checkpoint_dir and seed. didgpu reads the existing cells, picks up at the next missing one, and finishes:
fit <- didgpu(
panel, ...,
bootstrap_reps = 2000,
checkpoint_dir = "~/didgpu-runs/2026-05-28-fit", # same path
seed = 17 # same seed
)No tryCatch, no manual sharding, no “what was I up to?” archaeology.
12.3 How “same seed = same result” works
didgpu’s bootstrap uses a deterministic per-cell seed derived from (seed, replicate_index). So:
- Cells produced before the crash are bit-identical to what the resumed run would produce.
- The aggregator sees a complete cell set regardless of which run wrote each one.
- Bootstrap point estimate, SE, CI, and joint p-values are reproducible across crashes.
12.4 Aggregating already-saved cells
If you want to look at a partial run without re-fitting:
partial <- didgpu_aggregate_cells(
checkpoint_dir = "~/didgpu-runs/2026-05-28-fit"
)
partialReturns the aggregated fit using whatever cells exist on disk. Useful for sanity-checking convergence while a long run is still going.
A worked example — deliberately interrupt a 1000-rep bootstrap, then resume it, then compare to a fresh run — is in progress.