--- title: "Binary ODA: Voting on the Refugee Act of 1980" author: "oda" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Binary ODA: Voting on the Refugee Act of 1980} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") ``` ## Research question The Refugee Act of 1980 extended asylum protections in U.S. immigration law. Because the Act was sponsored by Democrats, a plausible hypothesis is that support was partisan: Democrats would tend to vote in favor (Pro) and Republicans against (Con).^[Yarnold, B.M. (1990). *Refugees without refuge: Formation and failed implementation of U.S. political asylum policy in the 1980's.* Lanham, MD: University Press of America.] The data below record the vote of 407 U.S. House members alongside their party affiliation. Optimal Data Analysis (UniODA) is used to determine whether party affiliation discriminates vote direction, and to quantify the strength of the association. ## Data Party affiliation (0 = Republican, 1 = Democrat) is the attribute; vote (0 = Con, 1 = Pro) is the class variable. Published cell frequencies are reconstructed directly into observation-level vectors - no external data file is required. ```{r data} library(oda) # Cross-classification: rows = vote (class), cols = party (attribute). # Rep (0) Dem (1) # Con (0) 118 78 n(Con) = 196 # Pro (1) 34 177 n(Pro) = 211 vote <- c(rep(0L, 118), rep(0L, 78), rep(1L, 34), rep(1L, 177)) party <- c(rep(0L, 118), rep(1L, 78), rep(0L, 34), rep(1L, 177)) table(vote, party, dnn = c("Vote (0=Con, 1=Pro)", "Party (0=Rep, 1=Dem)")) ``` ## Fit the ODA model Party affiliation is binary (0/1); ODA scans it as an ordered attribute (no categorical flag), which is consistent with the MegaODA reference analysis. Because the analysis specifies a directional hypothesis *a priori* (Democrats favor the Act, i.e. higher party values predict vote = Pro), `direction = "greater"` enforces MPE Chapter 2 directional ordered ODA. Leave-one-out (LOO) jackknife validity analysis is included. ```{r fit-canonical, eval=FALSE} # Canonical reference run (mc_iter = 25000L; not evaluated in CRAN vignette) fit <- oda_fit( x = party, y = vote, attr_type = "ordered", direction = "greater", mc_iter = 25000L, loo = "on" ) ``` ```{r fit} # CRAN-safe run: mc_iter = 500L for vignette rendering speed. # Training rule, ESS, and confusion matrix are identical to the canonical run. # The MC p-value reflects fewer permutations; use the canonical run for publication. fit <- oda_fit( x = party, y = vote, attr_type = "ordered", direction = "greater", mc_iter = 500L, mc_seed = 42L, loo = "on" ) ``` ## Rule and confusion matrix ```{r print-fit} print(fit) ``` ODA identified a single cut at 0.5, separating the two party values: - If party <= 0.5 (Republican) -> predict vote = Con (0) - If party > 0.5 (Democrat) -> predict vote = Pro (1) This rule is consistent with the directional hypothesis: Democrats supported the Act and Republicans opposed it. ```{r confusion} # Confusion matrix: actual vote (rows) x predicted vote (cols) conf_mat <- matrix( c(fit$confusion$TN, fit$confusion$FP, fit$confusion$FN, fit$confusion$TP), nrow = 2L, byrow = TRUE, dimnames = list(Actual = c("Con(0)", "Pro(1)"), Predicted = c("Con(0)", "Pro(1)")) ) print(conf_mat) ``` ## ESS / PAC / PV interpretation ```{r metrics} summary(fit) ``` ```{r pv} # Predictive value: accuracy when the model makes a prediction into each class pv_con <- fit$confusion$TN / (fit$confusion$TN + fit$confusion$FN) pv_pro <- fit$confusion$TP / (fit$confusion$TP + fit$confusion$FP) cat("PV Con (0):", round(pv_con * 100, 1), "%\n") cat("PV Pro (1):", round(pv_pro * 100, 1), "%\n") ``` - **PAC (sensitivity per class):** 60.2% for Republican members (Con vote), 83.9% for Democratic members (Pro vote). Because 50% correct per class is expected by chance, the model classifies Democratic members nearly twice as well as chance. - **ESS = 44.09%** indicates a moderate effect.^[Yarnold, P.R., & Soltysik, R.C. (2005). *Optimal Data Analysis: A Guidebook with Software for Windows.* Washington, D.C.: APA Books.] The asymmetry (60% vs. 84%) reflects stronger partisan signal on the Democratic side - Democrats who sponsored the Act voted for it far more uniformly. - **PV:** When the model predicts a Con vote, it is correct ~77.6% of the time; when it predicts a Pro vote, ~69.4%. ## Monte Carlo and LOO validity The directional MC p-value and LOO result are shown in the `summary` output above. - **LOO stability:** The leave-one-out ESS equals the training ESS exactly (44.09%), indicating the rule is completely stable - no single observation materially alters the model. - **LOO Fisher exact p < .001:** Statistical significance confirmed in holdout analysis; the one-sided Fisher test is appropriate for the directional hypothesis. - **MC p-value:** The printed `p(MC)` is a directional Fisher-randomization p-value (one-tailed), consistent with the *a priori* hypothesis that Democrats vote Pro. The directional p-value is at most half the nondirectional p-value. Interpret by decision threshold (e.g., p < 0.05). ## Notes on reproducibility and current scope **Fixture parity.** The training rule, confusion matrix, and ESS are verified against MegaODA.exe output in the package test suite (`tests/testthat/test-fixture-vignettes.R`, Example 1). **MC p-value calibration.** The MC p shown here reflects `mc_iter = 500L` for CRAN build speed. MegaODA reports p = 0.000000 (exact zero) at 25000 iterations; with 500 iterations a near-zero p will still be reported accurately (STOP fires early). Use the canonical run with `mc_iter = 25000L` (chunk `fit-canonical`, `eval=FALSE`) for publication-quality results. Training ESS and confusion matrix are unaffected by `mc_iter`. **Directional analysis.** This vignette uses `direction = "greater"` (MPE Chapter 2 binary ordered directional ODA). The directional constraint restricts the MC permutation and LOO searches to the hypothesized direction only, yielding a one-tailed p-value consistent with the *a priori* hypothesis. MPE Chapter 4 categorical/table DIRECTIONAL is Phase 6C (not yet implemented).