---
title: "Tutorial: Creating FFTs for heart disease"
author: "Nathaniel Phillips and Hansjörg Neth"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
bibliography: fft.bib
csl: apa.csl
vignette: >
  %\VignetteIndexEntry{Tutorial: Creating FFTs for heart disease}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, echo = FALSE}
knitr::opts_chunk$set(collapse = FALSE, 
                      comment = "#>", 
                      prompt = FALSE,
                      tidy = FALSE,
                      echo = TRUE, 
                      message = FALSE,
                      warning = FALSE,
                      # Default figure options:
                      dpi = 100, 
                      fig.align = 'center', 
                      fig.height = 6.0, 
                      fig.width  = 6.5, 
                      out.width = "580px")
```


```{r pkgs, echo = FALSE, message = FALSE, results = 'hide'}
library(FFTrees)
```


## Tutorial: Creating FFTs for heart disease

This tutorial on using the **FFTrees** package follows the examples presented in @phillips2017FFTrees (freely available in [html](https://journal.sjdm.org/17/17217/jdm17217.html) | [PDF](https://journal.sjdm.org/17/17217/jdm17217.pdf)):

- Phillips, N. D., Neth, H., Woike, J. K. & Gaissmaier, W. (2017). 
FFTrees: A toolbox to create, visualize, and evaluate fast-and-frugal decision trees. 
_Judgment and Decision Making_, _12_ (4), 344–368.

In the following, we explain how to use **FFTrees** to create, evaluate and visualize FFTs in four simple steps. 


### Step\ 1: Install and load the FFTrees package

We can install FFTrees from CRAN using `install.packages()`. (We only need to do this once.) 

```{r install-pkg, eval = FALSE}
# Install the package from CRAN:
install.packages("FFTrees")
```

To use the package, we first need to load it into your current R session. 
We load the package using `library()`: 

```{r load-pkg-2, eval = TRUE, message = TRUE}
# Load the package:
library(FFTrees)
```

The **FFTrees** package contains several vignettes that guide through the package's functionality (like this one). 
To open the main guide, run `FFTrees.guide()`: 

```{r load-guide, eval = FALSE}
# Open the main package guide: 
FFTrees.guide()
```


### Step\ 2: Create FFTs from training data (and test on testing data)

In this example, we will create FFTs from a heart disease data set. 
The training data are in an object called `heart.train`, and the testing data are in an object called `heart.test`. For these data, we will predict `diagnosis`, a binary criterion that indicates whether each patient has or does not have heart disease (i.e., is at high-risk or low-risk).

To create an `FFTrees` object, we use the function `FFTrees()` with two main arguments: 

1. `formula` expects a formula indicating a binary criterion variable as a function of one or more predictor variable(s) to be considered for the tree. The shorthand `formula = diagnosis ~ .` means to include all predictor variables. 

2. `data` specifies the training data used to construct the FFTs (which must include the criterion variable). 

Here is how we can construct our first FFTs: 

```{r fft-create, message = FALSE}
# Create an FFTrees object:
heart.fft <- FFTrees(formula = diagnosis ~ .,           # Criterion and (all) predictors
                     data = heart.train,                # Training data
                     data.test = heart.test,            # Testing data
                     main = "Heart Disease",            # General label
                     decision.labels = c("Low-Risk", "High-Risk")  # Decision labels (False/True)
                     )
```

Evaluating this expression runs code that examines the data, optimizes thresholds based on our current goals for each cue, and creates and evaluates `r heart.fft$trees$n`\ FFTs. 
The resulting `FFTrees` object that contains the tree definitions, their decisions, and their performance statistics, are assigned to the `heart.fft`\ object.

#### Other arguments

- `algorithm`: There are two different algorithms available to build FFTs `"ifan"` [@phillips2017FFTrees] and `"dfan"` [@phillips2017FFTrees]. (`"max"` [@martignon2008categorization], and `"zigzag"` [@martignon2008categorization] are no longer supported). 

- `max.levels`: Changes the maximum number of levels that are allowed in the tree.

The following arguments apply when using the "ifan" or "dfan" algorithms for creating new FFTs:

- `goal.chase`: The `goal.chase` argument changes which statistic is maximized during tree construction (for the `"ifan"` and `"dfan"` algorithms). Possible arguments include `"acc"`, `"bacc"`, `"wacc"`, `"dprime"`, and `"cost"`. The default is `"wacc"` with a sensitivity weight of\ 0.50 (which renders it identical to `"bacc"`). 

- `goal`: The `goal` argument changes which statistic is maximized when _selecting_ trees after construction (for the `"ifan"` and `"dfan"` algorithms). Possible arguments include `"acc"`, `"bacc"`, `"wacc"`, `"dprime"`, and `"cost"`.

- `my.tree` or `tree.definitions`: We can define a new tree from a verbal description (as a set of sentences), 
or manually specify sets of FFTs as a data frame (in appropriate format). 
See the [Manually specifying FFTs](FFTrees_mytree.html) vignette for details. 


### Step\ 3: Inspect and summarize FFTs

Now we can inspect and summarize the generated decision trees. 
We will start by printing the `FFTrees` object to return basic information to the console: 

```{r fft-print}
# Print an FFTrees object:
heart.fft
```

The output tells us several pieces of information:

- The tree with the highest weighted sensitivity\ `wacc` with a sensitivity weight of\ 0.5 is selected as the best tree.

- Here, the best tree, FFT\ \#1 uses three cues: `thal`, `cp`, and `ca`.

- Several summary statistics for this tree in training and test data are summarized.

All statistics to evaluate each tree can be derived from a 2\ x\ 2 confusion table:  

```{r fft-confusion-table, out.width="50%", echo = FALSE, fig.cap = "**Table 1**: A 2x2 confusion table illustrating the types of frequency counts for 4 possible outcomes."}
knitr::include_graphics("confusiontable.jpg")
```

For definitions of all accuracy statistics, see the [accuracy statistics](FFTrees_accuracy_statistics.html) vignette.


### Step\ 4: Visualise the final FFT

We use `plot(x)` to visualize an FFT (from an\ `FFTrees` object\ `x`). 
Using `data = "train"` evaluates an\ FFT for training data (fitting), whereas `data = "test"` predicts the performance of an\ FFT for a different dataset: 

```{r fft-plot, fig.width = 6.5, fig.height = 6}
# Plot predictions of the best FFT when applied to test data:
plot(heart.fft,      # An FFTrees object
     data = "test")  # data to use (i.e., either "train" or "test")?
```

#### Other arguments

The `plot()` function for `FFTrees` object 

- `tree`: Which tree in the object should beplotted? To plot a tree other than the best fitting tree (FFT \#1), just specify another tree as an integer (e.g.; `plot(heart.fft, tree = 2)`).

- `data`: For which dataset should statistics be shown? 
Either `data = "train"` (showing fitting or "Training" performance by default), or `data = "test"` (showing prediction or "Testing" performance).

- `stats`: Should accuracy statistics be shown with the tree? To show only the tree, without any performance statistics, include the argument `stats = FALSE`. 

```{r fft-no-stats, fig.width = 8, fig.height = 4, out.width = "500px"}
# Plot only the tree, without accuracy statistics:
plot(heart.fft, what = "tree")
# plot(heart.fft, stats = FALSE)  #  The 'stats' argument has been deprecated.
```

- `comp`: Should statistics from competitive algorithms be shown in the ROC curve? To remove the performance statistics of competitive algorithms (e.g.; regression, random forests), include the argument `comp = FALSE`. 

- `what`: Which parts of an `FFTrees` object should be visualized (e.g., `all`, `icontree` and `tree`). 
Using `what = "roc"` plots tree performance as an ROC\ curve. 
To show individual cue accuracies (in ROC space), specify `what = "cues"`:

```{r fft-cues, fig.width = 6, fig.height = 6, out.width = "500px"}
# Plot cue accuracies (for training data) in ROC space:
plot(heart.fft, what = "cues")
```

See the [Plotting FFTrees](FFTrees_plot.html) vignette for details on plotting FFTs. 


### Advanced functions

Creating sets of FFTs and evaluating them on data by printing and plotting individual FFTs provides the core functionality of **FFTrees**. 
However, the package also provides more advanced functions for accessing, defining, using and evaluating FFTs. 

#### Accessing outputs 

An `FFTrees` object contains many different outputs. 
Basic performance information on the current data and set of FFTs is available by the `summary()` function. 
To see and access parts of an `FFTrees` object, use `str()` or `names()`: 

```{r fft-names}
# Show the names of all outputs in heart.fft:
names(heart.fft)
```

Key elements of an `FFTrees` object are explained in the vignette on [Creating FFTs with FFTrees()](FFTrees_function.html). 


#### Predicting for new data

To predict classification outcomes for new data, use the standard `predict()` function. 
For example, here's how to predict the classifications for data in the `heartdisease` object (which actually is just a combination of `heart.train` and `heart.test`): 

```{r fft-predict, eval = FALSE}
# Predict classifications for a new dataset:
predict(heart.fft, 
        newdata = heartdisease)
```


#### Directly defining FFTs

To define a specific FFT and apply it to data, we can define a tree by providing its verbal description to the `my.tree` argument. 
Similarly, we can define sets of FFT definitions (as a data frame) and evaluate them on data by using the `tree.definitions` argument of `FFTrees()`. 
As we often start from an existing set of FFTs, **FFTrees** provides a set of functions for extracting, converting, and modifying tree definitions. 

See the vignette on [Manually specifying FFTs](FFTrees_mytree.html) for defining FFTs from descriptions and modifying tree definitions. 


## Vignettes

<!-- Table of all vignettes: -->

Here is a complete list of the vignettes available in the **FFTrees** package: 

|   | Vignette | Description |
|--:|:------------------------------|:-------------------------------------------------|
|   | [Main guide: FFTrees overview](guide.html) | An overview of the **FFTrees** package |
| 1 | [Tutorial: FFTs for heart disease](FFTrees_heart.html)   | An example of using `FFTrees()` to model heart disease diagnosis |
| 2 | [Accuracy statistics](FFTrees_accuracy_statistics.html) | Definitions of accuracy statistics used throughout the package |
| 3 | [Creating FFTs with FFTrees()](FFTrees_function.html) | Details on the main `FFTrees()` function |
| 4 | [Manually specifying FFTs](FFTrees_mytree.html)   | How to directly create FFTs without using the built-in algorithms |
| 5 | [Visualizing FFTs](FFTrees_plot.html) | Plotting `FFTrees` objects, from full trees to icon arrays |
| 6 | [Examples of FFTs](FFTrees_examples.html) | Examples of FFTs from different datasets contained in the package |


<!-- eof. -->

## References