--- title: "Tutorial: Creating FFTs for heart disease" author: "Nathaniel Phillips and Hansjörg Neth" date: "`r Sys.Date()`" output: rmarkdown::html_vignette bibliography: fft.bib csl: apa.csl vignette: > %\VignetteIndexEntry{Tutorial: Creating FFTs for heart disease} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, echo = FALSE} knitr::opts_chunk$set(collapse = FALSE, comment = "#>", prompt = FALSE, tidy = FALSE, echo = TRUE, message = FALSE, warning = FALSE, # Default figure options: dpi = 100, fig.align = 'center', fig.height = 6.0, fig.width = 6.5, out.width = "580px") ``` ```{r pkgs, echo = FALSE, message = FALSE, results = 'hide'} library(FFTrees) ``` ## Tutorial: Creating FFTs for heart disease This tutorial on using the **FFTrees** package follows the examples presented in @phillips2017FFTrees (freely available in [html](https://journal.sjdm.org/17/17217/jdm17217.html) | [PDF](https://journal.sjdm.org/17/17217/jdm17217.pdf)): - Phillips, N. D., Neth, H., Woike, J. K. & Gaissmaier, W. (2017). FFTrees: A toolbox to create, visualize, and evaluate fast-and-frugal decision trees. _Judgment and Decision Making_, _12_ (4), 344–368. In the following, we explain how to use **FFTrees** to create, evaluate and visualize FFTs in four simple steps. ### Step\ 1: Install and load the FFTrees package We can install FFTrees from CRAN using `install.packages()`. (We only need to do this once.) ```{r install-pkg, eval = FALSE} # Install the package from CRAN: install.packages("FFTrees") ``` To use the package, we first need to load it into your current R session. We load the package using `library()`: ```{r load-pkg-2, eval = TRUE, message = TRUE} # Load the package: library(FFTrees) ``` The **FFTrees** package contains several vignettes that guide through the package's functionality (like this one). To open the main guide, run `FFTrees.guide()`: ```{r load-guide, eval = FALSE} # Open the main package guide: FFTrees.guide() ``` ### Step\ 2: Create FFTs from training data (and test on testing data) In this example, we will create FFTs from a heart disease data set. The training data are in an object called `heart.train`, and the testing data are in an object called `heart.test`. For these data, we will predict `diagnosis`, a binary criterion that indicates whether each patient has or does not have heart disease (i.e., is at high-risk or low-risk). To create an `FFTrees` object, we use the function `FFTrees()` with two main arguments: 1. `formula` expects a formula indicating a binary criterion variable as a function of one or more predictor variable(s) to be considered for the tree. The shorthand `formula = diagnosis ~ .` means to include all predictor variables. 2. `data` specifies the training data used to construct the FFTs (which must include the criterion variable). Here is how we can construct our first FFTs: ```{r fft-create, message = FALSE} # Create an FFTrees object: heart.fft <- FFTrees(formula = diagnosis ~ ., # Criterion and (all) predictors data = heart.train, # Training data data.test = heart.test, # Testing data main = "Heart Disease", # General label decision.labels = c("Low-Risk", "High-Risk") # Decision labels (False/True) ) ``` Evaluating this expression runs code that examines the data, optimizes thresholds based on our current goals for each cue, and creates and evaluates `r heart.fft$trees$n`\ FFTs. The resulting `FFTrees` object that contains the tree definitions, their decisions, and their performance statistics, are assigned to the `heart.fft`\ object. #### Other arguments - `algorithm`: There are two different algorithms available to build FFTs `"ifan"` [@phillips2017FFTrees] and `"dfan"` [@phillips2017FFTrees]. (`"max"` [@martignon2008categorization], and `"zigzag"` [@martignon2008categorization] are no longer supported). - `max.levels`: Changes the maximum number of levels that are allowed in the tree. The following arguments apply when using the "ifan" or "dfan" algorithms for creating new FFTs: - `goal.chase`: The `goal.chase` argument changes which statistic is maximized during tree construction (for the `"ifan"` and `"dfan"` algorithms). Possible arguments include `"acc"`, `"bacc"`, `"wacc"`, `"dprime"`, and `"cost"`. The default is `"wacc"` with a sensitivity weight of\ 0.50 (which renders it identical to `"bacc"`). - `goal`: The `goal` argument changes which statistic is maximized when _selecting_ trees after construction (for the `"ifan"` and `"dfan"` algorithms). Possible arguments include `"acc"`, `"bacc"`, `"wacc"`, `"dprime"`, and `"cost"`. - `my.tree` or `tree.definitions`: We can define a new tree from a verbal description (as a set of sentences), or manually specify sets of FFTs as a data frame (in appropriate format). See the [Manually specifying FFTs](FFTrees_mytree.html) vignette for details. ### Step\ 3: Inspect and summarize FFTs Now we can inspect and summarize the generated decision trees. We will start by printing the `FFTrees` object to return basic information to the console: ```{r fft-print} # Print an FFTrees object: heart.fft ``` The output tells us several pieces of information: - The tree with the highest weighted sensitivity\ `wacc` with a sensitivity weight of\ 0.5 is selected as the best tree. - Here, the best tree, FFT\ \#1 uses three cues: `thal`, `cp`, and `ca`. - Several summary statistics for this tree in training and test data are summarized. All statistics to evaluate each tree can be derived from a 2\ x\ 2 confusion table: ```{r fft-confusion-table, out.width="50%", echo = FALSE, fig.cap = "**Table 1**: A 2x2 confusion table illustrating the types of frequency counts for 4 possible outcomes."} knitr::include_graphics("confusiontable.jpg") ``` For definitions of all accuracy statistics, see the [accuracy statistics](FFTrees_accuracy_statistics.html) vignette. ### Step\ 4: Visualise the final FFT We use `plot(x)` to visualize an FFT (from an\ `FFTrees` object\ `x`). Using `data = "train"` evaluates an\ FFT for training data (fitting), whereas `data = "test"` predicts the performance of an\ FFT for a different dataset: ```{r fft-plot, fig.width = 6.5, fig.height = 6} # Plot predictions of the best FFT when applied to test data: plot(heart.fft, # An FFTrees object data = "test") # data to use (i.e., either "train" or "test")? ``` #### Other arguments The `plot()` function for `FFTrees` object - `tree`: Which tree in the object should beplotted? To plot a tree other than the best fitting tree (FFT \#1), just specify another tree as an integer (e.g.; `plot(heart.fft, tree = 2)`). - `data`: For which dataset should statistics be shown? Either `data = "train"` (showing fitting or "Training" performance by default), or `data = "test"` (showing prediction or "Testing" performance). - `stats`: Should accuracy statistics be shown with the tree? To show only the tree, without any performance statistics, include the argument `stats = FALSE`. ```{r fft-no-stats, fig.width = 8, fig.height = 4, out.width = "500px"} # Plot only the tree, without accuracy statistics: plot(heart.fft, what = "tree") # plot(heart.fft, stats = FALSE) # The 'stats' argument has been deprecated. ``` - `comp`: Should statistics from competitive algorithms be shown in the ROC curve? To remove the performance statistics of competitive algorithms (e.g.; regression, random forests), include the argument `comp = FALSE`. - `what`: Which parts of an `FFTrees` object should be visualized (e.g., `all`, `icontree` and `tree`). Using `what = "roc"` plots tree performance as an ROC\ curve. To show individual cue accuracies (in ROC space), specify `what = "cues"`: ```{r fft-cues, fig.width = 6, fig.height = 6, out.width = "500px"} # Plot cue accuracies (for training data) in ROC space: plot(heart.fft, what = "cues") ``` See the [Plotting FFTrees](FFTrees_plot.html) vignette for details on plotting FFTs. ### Advanced functions Creating sets of FFTs and evaluating them on data by printing and plotting individual FFTs provides the core functionality of **FFTrees**. However, the package also provides more advanced functions for accessing, defining, using and evaluating FFTs. #### Accessing outputs An `FFTrees` object contains many different outputs. Basic performance information on the current data and set of FFTs is available by the `summary()` function. To see and access parts of an `FFTrees` object, use `str()` or `names()`: ```{r fft-names} # Show the names of all outputs in heart.fft: names(heart.fft) ``` Key elements of an `FFTrees` object are explained in the vignette on [Creating FFTs with FFTrees()](FFTrees_function.html). #### Predicting for new data To predict classification outcomes for new data, use the standard `predict()` function. For example, here's how to predict the classifications for data in the `heartdisease` object (which actually is just a combination of `heart.train` and `heart.test`): ```{r fft-predict, eval = FALSE} # Predict classifications for a new dataset: predict(heart.fft, newdata = heartdisease) ``` #### Directly defining FFTs To define a specific FFT and apply it to data, we can define a tree by providing its verbal description to the `my.tree` argument. Similarly, we can define sets of FFT definitions (as a data frame) and evaluate them on data by using the `tree.definitions` argument of `FFTrees()`. As we often start from an existing set of FFTs, **FFTrees** provides a set of functions for extracting, converting, and modifying tree definitions. See the vignette on [Manually specifying FFTs](FFTrees_mytree.html) for defining FFTs from descriptions and modifying tree definitions. ## Vignettes Here is a complete list of the vignettes available in the **FFTrees** package: | | Vignette | Description | |--:|:------------------------------|:-------------------------------------------------| | | [Main guide: FFTrees overview](guide.html) | An overview of the **FFTrees** package | | 1 | [Tutorial: FFTs for heart disease](FFTrees_heart.html) | An example of using `FFTrees()` to model heart disease diagnosis | | 2 | [Accuracy statistics](FFTrees_accuracy_statistics.html) | Definitions of accuracy statistics used throughout the package | | 3 | [Creating FFTs with FFTrees()](FFTrees_function.html) | Details on the main `FFTrees()` function | | 4 | [Manually specifying FFTs](FFTrees_mytree.html) | How to directly create FFTs without using the built-in algorithms | | 5 | [Visualizing FFTs](FFTrees_plot.html) | Plotting `FFTrees` objects, from full trees to icon arrays | | 6 | [Examples of FFTs](FFTrees_examples.html) | Examples of FFTs from different datasets contained in the package | ## References