---
title: "Getting started with oda"
author: "oda"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting started with oda}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
```

`oda` is a pure-R implementation of the MegaODA / CTA classification
engine. It provides three main tools:

- **`oda_fit()`**  -  Optimal Data Analysis (ODA): univariate binary or
  multiclass classification with a single attribute.
- **`cta_fit()`**  -  Classification Tree Analysis (CTA): recursive ODA-node
  trees with ENUMERATE, LOO STABLE, and MINDENOM endpoint constraints.
- **Translation and graphics**  -  endpoint staging, propensity weights, tree
  plots, and NOVOmetric bootstrap CIs via `cta_staging_table()`,
  `cta_propensity_weights()`, `novo_boot_ci()`, `plot.cta_tree()`, and the
  ggplot2 renderers `plot_cta_tree()` / `plot_lort_tree()` (requires ggplot2).

## Binary ODA

`oda_fit()` finds the single cutpoint that maximises Mean PAC (percentage of
accurate classifications, averaged across classes with inverse-frequency
weighting when `priors_on = TRUE`). ESS (Effect Strength for Sensitivity)
measures how far Mean PAC exceeds the chance benchmark of 50%.

```{r oda-binary}
library(oda)

x <- c(1, 2, 3, 4, 5, 6, 7, 8)
y <- c(0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L)

fit <- oda_fit(x, y, mcarlo = FALSE)
print(fit)
```

The fitted rule, per-class PAC, Mean PAC, and ESS are available directly:

```{r oda-fields}
fit$rule$cut_value    # cutpoint
fit$rule$direction    # which side maps to class 1
fit$ess               # ESS (%)
```

## Classification Tree Analysis

`cta_fit()` grows a binary classification tree in which each split node is an
ODA model. `mindenom` sets the minimum leaf size; `mc_iter` controls the Monte
Carlo permutation test used to screen candidate splits; `loo = "stable"` keeps
only splits whose leave-one-out ESS matches the training ESS.

The example below uses a small synthetic dataset. For publication analyses use
at least `mc_iter = 5000L`.

```{r cta-small}
X <- data.frame(
  x1 = c(1, 2, 3, 4, 5, 6, 7, 8),
  x2 = c(0L, 0L, 1L, 0L, 1L, 1L, 0L, 1L)
)
y <- c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L)

tree <- cta_fit(X, y,
  priors_on  = TRUE,
  mindenom   = 2L,
  mc_iter    = 500L,
  mc_seed    = 42L,
  loo        = "off",
  attr_names = c("x1", "x2")
)
print(tree)
```

```{r cta-plot, fig.width = 5, fig.height = 3.5}
plot(tree)
```

## Further reading

The public pkgdown article series currently focuses on stable,
publication-ready workflows:

| Article | Content |
|---------|---------|
| ODA basics | `oda_fit()` in depth: rule, PAC, ESS, LOO |
| Directional ODA | Directional binary ODA and fixed/identity-map categorical examples |
| Multiclass ODA | C >= 3 class variables; PAC per class; K-class chance benchmark |
| CTA basics | `cta_fit()`: MINDENOM, LOO STABLE, ENUMERATE, pruning |
| CTA graphics | `cta_plot_data()` data contract; `plot.cta_tree()` and `plot_cta_tree()` renderers |
| NOVOboot CI | `novo_boot_ci()`: fixed-confusion NOVOmetric bootstrap for binary 2 x 2 confusion tables |

The example-driven package vignettes are available after installing with
`build_vignettes = TRUE` and running `browseVignettes("oda")`:

| Vignette | Content |
|----------|---------|
| Binary ODA: Migraine Attacks | Binary ordered ODA example |
| Binary Ordered Directional ODA: The Refugee Act | Directional ordered ODA example |
| Binary Categorical ODA: Gully Erosion | Nondirectional MegaODA parity run for the gully example |
| Multiclass Categorical ODA: Protein Type | Nondirectional MegaODA parity run for the protein example |

Note: Some longer-form draft articles, including the full practitioner guide,
validation tiers, CTA translation, MDSA family, and myeloma CTA walkthrough,
are intentionally withheld from the public site until their examples and
artifacts are finalized.