--- title: "Getting started with oda" author: "oda" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting started with oda} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") ``` `oda` is a pure-R implementation of the MegaODA / CTA classification engine. It provides three main tools: - **`oda_fit()`** - Optimal Data Analysis (ODA): univariate binary or multiclass classification with a single attribute. - **`cta_fit()`** - Classification Tree Analysis (CTA): recursive ODA-node trees with ENUMERATE, LOO STABLE, and MINDENOM endpoint constraints. - **Translation and graphics** - endpoint staging, propensity weights, tree plots, and NOVOmetric bootstrap CIs via `cta_staging_table()`, `cta_propensity_weights()`, `novo_boot_ci()`, `plot.cta_tree()`, and the ggplot2 renderers `plot_cta_tree()` / `plot_lort_tree()` (requires ggplot2). ## Binary ODA `oda_fit()` finds the single cutpoint that maximises Mean PAC (percentage of accurate classifications, averaged across classes with inverse-frequency weighting when `priors_on = TRUE`). ESS (Effect Strength for Sensitivity) measures how far Mean PAC exceeds the chance benchmark of 50%. ```{r oda-binary} library(oda) x <- c(1, 2, 3, 4, 5, 6, 7, 8) y <- c(0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L) fit <- oda_fit(x, y, mcarlo = FALSE) print(fit) ``` The fitted rule, per-class PAC, Mean PAC, and ESS are available directly: ```{r oda-fields} fit$rule$cut_value # cutpoint fit$rule$direction # which side maps to class 1 fit$ess # ESS (%) ``` ## Classification Tree Analysis `cta_fit()` grows a binary classification tree in which each split node is an ODA model. `mindenom` sets the minimum leaf size; `mc_iter` controls the Monte Carlo permutation test used to screen candidate splits; `loo = "stable"` keeps only splits whose leave-one-out ESS matches the training ESS. The example below uses a small synthetic dataset. For publication analyses use at least `mc_iter = 5000L`. ```{r cta-small} X <- data.frame( x1 = c(1, 2, 3, 4, 5, 6, 7, 8), x2 = c(0L, 0L, 1L, 0L, 1L, 1L, 0L, 1L) ) y <- c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L) tree <- cta_fit(X, y, priors_on = TRUE, mindenom = 2L, mc_iter = 500L, mc_seed = 42L, loo = "off", attr_names = c("x1", "x2") ) print(tree) ``` ```{r cta-plot, fig.width = 5, fig.height = 3.5} plot(tree) ``` ## Further reading The public pkgdown article series currently focuses on stable, publication-ready workflows: | Article | Content | |---------|---------| | ODA basics | `oda_fit()` in depth: rule, PAC, ESS, LOO | | Directional ODA | Directional binary ODA and fixed/identity-map categorical examples | | Multiclass ODA | C >= 3 class variables; PAC per class; K-class chance benchmark | | CTA basics | `cta_fit()`: MINDENOM, LOO STABLE, ENUMERATE, pruning | | CTA graphics | `cta_plot_data()` data contract; `plot.cta_tree()` and `plot_cta_tree()` renderers | | NOVOboot CI | `novo_boot_ci()`: fixed-confusion NOVOmetric bootstrap for binary 2 x 2 confusion tables | The example-driven package vignettes are available after installing with `build_vignettes = TRUE` and running `browseVignettes("oda")`: | Vignette | Content | |----------|---------| | Binary ODA: Migraine Attacks | Binary ordered ODA example | | Binary Ordered Directional ODA: The Refugee Act | Directional ordered ODA example | | Binary Categorical ODA: Gully Erosion | Nondirectional MegaODA parity run for the gully example | | Multiclass Categorical ODA: Protein Type | Nondirectional MegaODA parity run for the protein example | Note: Some longer-form draft articles, including the full practitioner guide, validation tiers, CTA translation, MDSA family, and myeloma CTA walkthrough, are intentionally withheld from the public site until their examples and artifacts are finalized.