Title: | Generate, Visualise, and Evaluate Fast-and-Frugal Decision Trees |
---|---|
Description: | Create, visualize, and test fast-and-frugal decision trees (FFTs) using the algorithms and methods described by Phillips, Neth, Woike & Gaissmaier (2017), <doi:10.1017/S1930297500006239>. FFTs are simple and transparent decision trees for solving binary classification problems. FFTs can be preferable to more complex algorithms because they require very little information, are easy to understand and communicate, and are robust against overfitting. |
Authors: | Nathaniel Phillips [aut] |
Maintainer: | Hansjoerg Neth <[email protected]> |
License: | CC0 |
Version: | 2.0.0.9000 |
Built: | 2025-02-15 04:26:00 UTC |
Source: | https://github.com/ndphillips/fftrees |
add_fft_df
adds the definition(s) of
one or more FFT(s) (in the multi-line format of an FFTrees
object)
or a single FFT (as a tidy data frame)
to the multi-line FFT definitions of an FFTrees
object.
add_fft_df
allows for collecting and combining
(sets of) tree definitions after
manipulating them with other tree trimming functions.
add_fft_df(fft, ffts_df = NULL, quiet = FALSE)
add_fft_df(fft, ffts_df = NULL, quiet = FALSE)
fft |
A (set of) FFT definition(s)
(in the multi-line format of an |
ffts_df |
A set of FFT definitions (as a data frame,
usually from an |
quiet |
Hide feedback messages (as logical)?
Default: |
A (set of) FFT definition(s) in the one line
FFT definition format used by an FFTrees
object
(as a data frame).
get_fft_df
for getting the FFT definitions of an FFTrees
object;
read_fft_df
for reading one FFT definition from tree definitions;
write_fft_df
for writing one FFT to tree definitions;
FFTrees
for creating FFTs from and applying them to data.
Other tree definition and manipulation functions:
add_nodes()
,
drop_nodes()
,
edit_nodes()
,
flip_exits()
,
get_fft_df()
,
read_fft_df()
,
reorder_nodes()
,
select_nodes()
,
write_fft_df()
add_nodes
allows adding
one or more nodes
to an existing FFT definition
(in the tidy data frame format).
add_nodes
allows to directly set and change the value(s) of
class
, cue
, direction
, threshold
, and exit
,
in an FFT definition for the specified nodes
.
There is only rudimentary verification for plausible entries.
Importantly, however, as add_nodes
is ignorant of data
,
the values of its variables are not validated for a specific set of data.
Values in nodes
refer to their new position in the final FFT.
Duplicate values of nodes
are ignored (and only the last
entry is used).
When a new exit node is added, the exit type of a former final node
is set to the signal value (i.e., exit_types[2]
).
add_nodes( fft, nodes = NA, class = NA, cue = NA, direction = NA, threshold = NA, exit = NA, quiet = FALSE )
add_nodes( fft, nodes = NA, class = NA, cue = NA, direction = NA, threshold = NA, exit = NA, quiet = FALSE )
fft |
One FFT definition (as a data frame in tidy format, with one row per node). |
nodes |
The FFT nodes to be added (as an integer vector).
Values refer to their new position in the final FFT
(i.e., after adding all |
class |
The class values of |
cue |
The cue names of |
direction |
The direction values of |
threshold |
The threshold values of |
exit |
The exit values of |
quiet |
Hide feedback messages (as logical)?
Default: |
One FFT definition (as a data frame in tidy format, with one row per node).
drop_nodes
for deleting nodes from an FFT definition;
edit_nodes
for editing nodes in an FFT definition;
flip_exits
for reversing exits in an FFT definition;
reorder_nodes
for reordering nodes of an FFT definition;
select_nodes
for selecting nodes in an FFT definition;
get_fft_df
for getting the FFT definitions of an FFTrees
object;
read_fft_df
for reading one FFT definition from tree definitions;
add_fft_df
for adding FFTs to tree definitions;
FFTrees
for creating FFTs from and applying them to data.
Other tree definition and manipulation functions:
add_fft_df()
,
drop_nodes()
,
edit_nodes()
,
flip_exits()
,
get_fft_df()
,
read_fft_df()
,
reorder_nodes()
,
select_nodes()
,
write_fft_df()
add_stats
assumes the input of the 4 essential classification outcomes
(as frequency counts in a data frame "data"
with variable names "hi"
, "fa"
, "mi"
, and "cr"
)
and uses them to compute various decision accuracy measures.
add_stats( data, correction = 0.25, sens.w = NULL, my.goal = NULL, my.goal.fun = NULL, cost.outcomes = NULL, cost.each = NULL )
add_stats( data, correction = 0.25, sens.w = NULL, my.goal = NULL, my.goal.fun = NULL, cost.outcomes = NULL, cost.each = NULL )
data |
A data frame with 4 frequency counts (as integer values, named |
correction |
numeric. Correction added to all counts for calculating |
sens.w |
numeric. Sensitivity weight (for computing weighted accuracy, |
my.goal |
Name of an optional, user-defined goal (as character string).
Default: |
my.goal.fun |
User-defined goal function (with 4 arguments |
cost.outcomes |
list. A list of length 4 named |
cost.each |
numeric. An optional fixed cost added to all outputs (e.g., the cost of using the cue).
Default: |
Providing numeric values for cost.each
(as a vector) and cost.outcomes
(as a named list)
allows computing cost information for the counts of corresponding classification decisions.
A data frame with variables of computed accuracy and cost measures (but dropping inputs).
Data from the Blood Transfusion Service Center in Hsin-Chu City in Taiwan.
blood
blood
A data frame containing 748 rows and 5 columns.
Months since last donation
Total number of donations
Total blood donated (in c.c.)
Months since first donation
Criterion: Did the person donate blood (in March 2007)?
Values: 0
/no vs. 1
/yes (76.2% vs.\ 23.8%).
https://archive.ics.uci.edu/ml/datasets/Blood+Transfusion+Service+Center
Original owner and donor:
Prof. I-Cheng Yeh
Department of Information Management
Chung-Hua University
Other datasets:
breastcancer
,
car
,
contraceptive
,
creditapproval
,
fertility
,
forestfires
,
heart.cost
,
heart.test
,
heart.train
,
heartdisease
,
iris.v
,
mushrooms
,
sonar
,
titanic
,
voting
,
wine
Physiological data of patients tested for breast cancer.
breastcancer
breastcancer
A data frame containing 699 patients (rows) and 9 variables (columns).
Clump Thickness
Uniformity of Cell Size
Uniformity of Cell Shape
Marginal Adhesion
Single Epithelial Cell Size
Bare Nuclei
Bland Chromatin
Normal Nucleoli
Mitoses
Criterion: Absence/presence of breast cancer.
Values: FALSE
vs. TRUE
(65.0% vs.\ 35.0%).
We made the following enhancements to the original data for improved usability:
The ID number of the cases was excluded.
The numeric criterion with value 2
for benign and 4
for malignant was converted to logical (i.e., TRUE
/FALSE
).
16 cases were excluded because they contained NA
values.
Other than that, the data remains consistent with the original dataset.
https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Original)
Original creator:
Dr. William H. Wolberg (physician) University of Wisconsin Hospitals Madison, Wisconsin, USA
Other datasets:
blood
,
car
,
contraceptive
,
creditapproval
,
fertility
,
forestfires
,
heart.cost
,
heart.test
,
heart.train
,
heartdisease
,
iris.v
,
mushrooms
,
sonar
,
titanic
,
voting
,
wine
A dataset on car evaluations based on basic features, derived from a simple hierarchical decision model.
car
car
A data frame containing 1728 cars (rows) and 7 variables (columns).
price for buying the car, Factor (high, low, med, vhigh)
price of the maintenance, Factor (high, low, med, vhigh)
number of doors, Factor (2, 3, 4, 5more)
capacity in terms of persons to carry, Factor (2, 4, more)
the size of luggage boot, Factor (big, med, small)
estimated safety of the car, Factor (high, low, med)
Criterion: Category of acceptability rating.
Values: unacc
/ vgood
/ good
/ acc
The criterion variable is a car's acceptability
rating.
The criterion for this dataset has not yet been binarized. Before using it with FFTrees, this prerequisite step should be completed based on individual preferences.
http://archive.ics.uci.edu/ml/datasets/Car+Evaluation
Original creators and donors: Marko Bohanec and Blaz Zupan
Bohanec, M., Rajkovic, V. (1990): Expert system for decision making. Sistemica, 1 (1), 145–157.
Other datasets:
blood
,
breastcancer
,
contraceptive
,
creditapproval
,
fertility
,
forestfires
,
heart.cost
,
heart.test
,
heart.train
,
heartdisease
,
iris.v
,
mushrooms
,
sonar
,
titanic
,
voting
,
wine
The main input are 2 logical vectors of prediction and criterion values.
classtable( prediction_v = NULL, criterion_v = NULL, correction = 0.25, sens.w = NULL, cost.outcomes = NULL, cost_v = NULL, my.goal = NULL, my.goal.fun = NULL, quiet_mis = FALSE, na_prediction_action = "ignore" )
classtable( prediction_v = NULL, criterion_v = NULL, correction = 0.25, sens.w = NULL, cost.outcomes = NULL, cost_v = NULL, my.goal = NULL, my.goal.fun = NULL, quiet_mis = FALSE, na_prediction_action = "ignore" )
prediction_v |
logical. A logical vector of predictions. |
criterion_v |
logical. A logical vector of (TRUE) criterion values. |
correction |
numeric. Correction added to all counts for calculating |
sens.w |
numeric. Sensitivity weight parameter (from 0 to 1, for computing |
cost.outcomes |
list. A list of length 4 with names 'hi', 'fa', 'mi', and 'cr' specifying
the costs of a hit, false alarm, miss, and correct rejection, respectively.
For instance, |
cost_v |
numeric. Additional cost value of each decision (as an optional vector of numeric values).
Typically used to include the cue cost of each decision (as a constant for the current level of an FFT).
Default: |
my.goal |
Name of an optional, user-defined goal (as character string). Default: |
my.goal.fun |
User-defined goal function (with 4 arguments |
quiet_mis |
A logical value passed to hide/show |
na_prediction_action |
What happens when no prediction is possible? (Experimental and currently unused.) |
The primary confusion matrix is computed by confusionMatrix
.
A subset of the 1987 National Indonesia Contraceptive Prevalence Survey.
contraceptive
contraceptive
A data frame containing 1473 cases (rows) and 10 variables (columns).
Wife's age, Numeric
Wife's education, Nummeric, (1=low, 2, 3, 4=high)
Husband's education, Nummeric, (1=low, 2, 3, 4=high)
Number of children ever born, Numeric
Wife's religion, Numeric, (0=Non-Islam, 1=Islam)
Wife's now working?, Nummeric, (0=Yes, 1=No)
Husband's occupation, Nummeric, (1, 2, 3, 4)
Standard-of-living index, Nummeric, (1=low, 2, 3, 4=high)
Media exposure, Numeric, (0=Good, 1=Not good)
Criterion: Use of a contraceptive (as logical).
Values: FALSE
vs. TRUE
(42.7% vs. 57.3%).
The samples describe married women who were either not pregnant or do not know if they were pregnant at the time of the interview.
The problem consists in predicting a woman's current contraceptive method choice
(here: binarized cont.crit
)
based on her demographic and socio-economic characteristics.
We made the following enhancements to the original data for improved usability:
The criterion was binarized from a class attribute variable with three levels
(1 = No-use
, 2 = Long-term
, 3 = Short-term
),
into a logical variable (TRUE
vs. FALSE
).
Other than that, the data remains consistent with the original dataset.
https://archive.ics.uci.edu/ml/datasets/Contraceptive+Method+Choice
Original creator and donor: Tjen-Sien Lim
Other datasets:
blood
,
breastcancer
,
car
,
creditapproval
,
fertility
,
forestfires
,
heart.cost
,
heart.test
,
heart.train
,
heartdisease
,
iris.v
,
mushrooms
,
sonar
,
titanic
,
voting
,
wine
This data reports predictors and the result of credit card applications. Its attribute names and values have been changed to symbols to protect confidentiality.
creditapproval
creditapproval
A data frame containing 690 cases (rows) and 15 variables (columns).
categorical: b, a
continuous
continuous
categorical: u, y, l, t
categorical: g, p, gg
categorical: c, d, cc, i, j, k, m, r, q, w, x, e, aa, ff
categorical: v, h, bb, j, n, z, dd, ff, o
continuous
categorical: t, f
categorical: t, f
continuous
categorical: t, f
categorical: g, p, s
continuous
continuous
Criterion: Credit approval.
Values: TRUE
(+) vs. FALSE
(-) (44.5% vs. 55.5%).
This dataset contains a mix of attributes – continuous, nominal with small sample sizes, and nominal with larger sample sizes. There are also a few missing values.
We made the following enhancements to the original data for improved usability:
Any missing values, denoted as "?" in the dataset, were transformed into NA
values.
Binary factor variables with exclusive "t" and "f" values were converted to logical vectors (TRUE
/FALSE
).
Other than that, the data remains consistent with the original dataset.
https://archive.ics.uci.edu/ml/datasets/Credit+Approval
Other datasets:
blood
,
breastcancer
,
car
,
contraceptive
,
fertility
,
forestfires
,
heart.cost
,
heart.test
,
heart.train
,
heartdisease
,
iris.v
,
mushrooms
,
sonar
,
titanic
,
voting
,
wine
Calculate key descriptive statistics for a given set of data.
describe_data(data, data_name, criterion_name, baseline_value)
describe_data(data, data_name, criterion_name, baseline_value)
data |
A data frame with a criterion variable |
data_name |
A character string specifying a name for the data. |
criterion_name |
A character string specifying the criterion name. |
baseline_value |
The value in |
A data frame with the descriptive statistics.
data(heartdisease) describe_data(heartdisease, "heartdisease", criterion_name = "diagnosis", baseline_value = TRUE)
data(heartdisease) describe_data(heartdisease, "heartdisease", criterion_name = "diagnosis", baseline_value = TRUE)
drop_nodes
deletes
one or more nodes
from an existing FFT definition
(by removing the corresponding rows from the FFT definition
in the tidy data frame format).
When dropping the final node, the last remaining node becomes the new final node (i.e., gains a second exit).
Duplicates in nodes
are dropped only once
(rather than incrementally) and nodes
not in
the range 1:nrow(fft)
are ignored.
Dropping all nodes yields an error.
drop_nodes
is the inverse function of select_nodes
.
Inserting new nodes is possible by add_nodes
.
drop_nodes(fft, nodes = NA, quiet = FALSE)
drop_nodes(fft, nodes = NA, quiet = FALSE)
fft |
One FFT definition (as a data frame in tidy format, with one row per node). |
nodes |
The FFT nodes to drop (as an integer vector).
Default: |
quiet |
Hide feedback messages (as logical)?
Default: |
One FFT definition (as a data frame in tidy format, with one row per node).
add_nodes
for adding nodes to an FFT definition;
edit_nodes
for editing nodes in an FFT definition;
select_nodes
for selecting nodes in an FFT definition;
get_fft_df
for getting the FFT definitions of an FFTrees
object;
read_fft_df
for reading one FFT definition from tree definitions;
add_fft_df
for adding FFTs to tree definitions;
FFTrees
for creating FFTs from and applying them to data.
Other tree definition and manipulation functions:
add_fft_df()
,
add_nodes()
,
edit_nodes()
,
flip_exits()
,
get_fft_df()
,
read_fft_df()
,
reorder_nodes()
,
select_nodes()
,
write_fft_df()
edit_nodes
allows manipulating
one or more nodes
from an existing FFT definition
(in the tidy data frame format).
edit_nodes
allows to directly set and change the value(s) of
class
, cue
, direction
, threshold
, and exit
,
in an FFT definition for the specified nodes
.
There is only rudimentary verification for plausible entries.
Importantly, however, as edit_nodes
is ignorant of data
,
the values of its variables are not validated for a specific set of data.
Repeated changes of a node are possible
(by repeating the corresponding integer value in nodes
).
edit_nodes( fft, nodes = NA, class = NA, cue = NA, direction = NA, threshold = NA, exit = NA, quiet = FALSE )
edit_nodes( fft, nodes = NA, class = NA, cue = NA, direction = NA, threshold = NA, exit = NA, quiet = FALSE )
fft |
One FFT definition (as a data frame in tidy format, with one row per node). |
nodes |
The FFT nodes to be edited (as an integer vector).
Default: |
class |
The class values of |
cue |
The cue names of |
direction |
The direction values of |
threshold |
The threshold values of |
exit |
The exit values of |
quiet |
Hide feedback messages (as logical)?
Default: |
One FFT definition (as a data frame in tidy format, with one row per node).
add_nodes
for adding nodes to an FFT definition;
drop_nodes
for deleting nodes from an FFT definition;
flip_exits
for reversing exits in an FFT definition;
reorder_nodes
for reordering nodes of an FFT definition;
select_nodes
for selecting nodes in an FFT definition;
get_fft_df
for getting the FFT definitions of an FFTrees
object;
read_fft_df
for reading one FFT definition from tree definitions;
add_fft_df
for adding FFTs to tree definitions;
FFTrees
for creating FFTs from and applying them to data.
Other tree definition and manipulation functions:
add_fft_df()
,
add_nodes()
,
drop_nodes()
,
flip_exits()
,
get_fft_df()
,
read_fft_df()
,
reorder_nodes()
,
select_nodes()
,
write_fft_df()
Clean factor variables in prediction data
fact_clean(data.train, data.test, show.warning = T)
fact_clean(data.train, data.test, show.warning = T)
data.train |
A training dataset |
data.test |
A testing dataset |
show.warning |
logical |
This dataset describes a sample of 100 volunteers providing a semen sample that was analyzed according to the WHO 2010 criteria.
fertility
fertility
A data frame containing 100 rows and 10 columns.
Season in which the analysis was performed. (winter, spring, summer, fall)
Age at the time of analysis
Childish diseases (ie , chicken pox, measles, mumps, polio) (yes(1), no(0))
Accident or serious trauma (yes(1), no(0))
Surgical intervention (yes(1), no(0))
High fevers in the last year (less than three months ago(-1), more than three months ago (0), no. (1))
Frequency of alcohol consumption (several times a day, every day, several times a week, once a week, hardly ever or never)
Smoking habit (never(-1), occasional (0)) daily (1))
Number of hours spent sitting per day
Criterion: Diagnosis normal (TRUE) vs. altered (FALSE) (88.0% vs.\ 22.0%).
Sperm concentration are related to socio-demographic data, environmental factors, health status, and life habits.
We made the following enhancements to the original data for improved usability:
The criterion was redefined from a factor variable with two levels
(N = Normal
, O = Altered
) into a logical variable (TRUE
vs. FALSE
).
Other than that, the data remains consistent with the original dataset.
https://archive.ics.uci.edu/ml/datasets/Fertility
Original contributors:
David Gil Lucentia Research Group Department of Computer Technology University of Alicante
Jose Luis Girela Department of Biotechnology University of Alicante
Other datasets:
blood
,
breastcancer
,
car
,
contraceptive
,
creditapproval
,
forestfires
,
heart.cost
,
heart.test
,
heart.train
,
heartdisease
,
iris.v
,
mushrooms
,
sonar
,
titanic
,
voting
,
wine
FFTrees
is the workhorse function of the FFTrees package for creating fast-and-frugal trees (FFTs).
FFTs are decision algorithms for solving binary classification tasks, i.e., they predict the values of a binary criterion variable based on 1 or multiple predictor variables (cues).
Using FFTrees
on data
usually generates a range of FFTs and corresponding summary statistics (as an FFTrees
object)
that can then be printed, plotted, and examined further.
The criterion and predictor variables are specified in formula
notation.
Based on the settings of data
and data.test
, FFTs are trained on a (required) training dataset
(given the set of current goal
values) and evaluated on (or predict) an (optional) test dataset.
If an existing FFTrees
object object
or tree.definitions
are provided as inputs,
no new FFTs are created.
When both arguments are provided, tree.definitions
take priority over the FFTs in an existing object
.
Specifically,
If tree.definitions
are provided, these are assigned to the FFTs of x
.
If no tree.definitions
are provided, but an existing FFTrees
object object
is provided,
the trees from object
are assigned to the FFTs of x
.
FFTrees( formula = NULL, data = NULL, data.test = NULL, algorithm = "ifan", train.p = 1, goal = NULL, goal.chase = NULL, goal.threshold = NULL, max.levels = NULL, numthresh.method = "o", numthresh.n = 10, repeat.cues = TRUE, stopping.rule = "exemplars", stopping.par = 0.1, sens.w = 0.5, cost.outcomes = NULL, cost.cues = NULL, main = NULL, decision.labels = c("False", "True"), my.goal = NULL, my.goal.fun = NULL, my.tree = NULL, object = NULL, tree.definitions = NULL, quiet = list(ini = TRUE, fin = FALSE, mis = FALSE, set = TRUE), comp = NULL, force = NULL, rank.method = NULL, rounding = NULL, store.data = NULL, verbose = NULL, do.comp = NULL, do.cart = NULL, do.lr = NULL, do.rf = NULL, do.svm = NULL )
FFTrees( formula = NULL, data = NULL, data.test = NULL, algorithm = "ifan", train.p = 1, goal = NULL, goal.chase = NULL, goal.threshold = NULL, max.levels = NULL, numthresh.method = "o", numthresh.n = 10, repeat.cues = TRUE, stopping.rule = "exemplars", stopping.par = 0.1, sens.w = 0.5, cost.outcomes = NULL, cost.cues = NULL, main = NULL, decision.labels = c("False", "True"), my.goal = NULL, my.goal.fun = NULL, my.tree = NULL, object = NULL, tree.definitions = NULL, quiet = list(ini = TRUE, fin = FALSE, mis = FALSE, set = TRUE), comp = NULL, force = NULL, rank.method = NULL, rounding = NULL, store.data = NULL, verbose = NULL, do.comp = NULL, do.cart = NULL, do.lr = NULL, do.rf = NULL, do.svm = NULL )
formula |
A formula. A |
data |
A data frame. A dataset used for training (fitting) FFTs and alternative algorithms.
|
data.test |
A data frame. An optional dataset used for model testing (prediction) with the same structure as data. |
algorithm |
A character string. The algorithm used to create FFTs. Can be |
train.p |
numeric. What percentage of the data to use for training when |
goal |
A character string indicating the statistic to maximize when selecting trees:
|
goal.chase |
A character string indicating the statistic to maximize when constructing trees:
|
goal.threshold |
A character string indicating the criterion to maximize when optimizing cue thresholds:
|
max.levels |
integer. The maximum number of nodes (or levels) considered for an FFT.
As all combinations of possible exit structures are considered, larger values of |
numthresh.method |
How should thresholds for numeric cues be determined (as character)?
|
numthresh.n |
The number of numeric thresholds to try (as integer).
Default: |
repeat.cues |
May cues occur multiple times within a tree (as logical)?
Default: |
stopping.rule |
A character string indicating the method to stop growing trees. Available options are:
All stopping methods use |
stopping.par |
numeric. A numeric parameter indicating the criterion value for the current |
sens.w |
A numeric value from |
cost.outcomes |
A list of length 4 specifying the cost value for one of the 4 possible classification outcomes.
The list elements must be named |
cost.cues |
A list containing the cost of each cue (in some common unit).
Each list element must have a name corresponding to a cue (i.e., a variable in |
main |
string. An optional label for the dataset. Passed on to other functions, like |
decision.labels |
A vector of strings of length 2 for the text labels for negative and positive decision/prediction outcomes
(i.e., left vs. right, noise vs. signal, 0 vs. 1, respectively, as character).
E.g.; |
my.goal |
The name of an optimization measure defined by |
my.goal.fun |
The definition of an outcome measure to optimize, defined as a function
of the frequency counts of the 4 basic classification outcomes |
my.tree |
A verbal description of an FFT, i.e., an "FFT in words" (as character string).
For example, |
object |
An optional existing |
tree.definitions |
An optional |
quiet |
A list of 4 logical arguments: Should detailed progress reports be suppressed?
Setting list elements to |
comp , do.comp , do.lr , do.cart , do.svm , do.rf , force , rank.method , rounding , store.data , verbose
|
Deprecated arguments (unused or replaced, to be retired in future releases). |
An FFTrees
object with the following elements:
The name of the binary criterion variable (as character).
The names of all potential predictor variables (cues) in the data (as character).
The formula
specified when creating the FFTs.
A list of FFTs created, with further details contained in n
, best
, definitions
, inwords
, stats
, level_stats
, and decisions
.
The original training and test data (if available).
A list of defined control parameters (e.g.; algorithm
, goal
, sens.w
, as well as various thresholds, stopping rule, and cost parameters).
A list of cue information, with further details contained in thresholds
and stats
.
print.FFTrees
for printing FFTs;
plot.FFTrees
for plotting FFTs;
summary.FFTrees
for summarizing FFTs;
inwords
for obtaining a verbal description of FFTs;
showcues
for plotting cue accuracies.
# 1. Create fast-and-frugal trees (FFTs) for heart disease: heart.fft <- FFTrees(formula = diagnosis ~ ., data = heart.train, data.test = heart.test, main = "Heart Disease", decision.labels = c("Healthy", "Diseased") ) # 2. Print a summary of the result: heart.fft # same as: # print(heart.fft, data = "train", tree = "best.train") # 3. Plot an FFT applied to training data: plot(heart.fft) # same as: # plot(heart.fft, what = "all", data = "train", tree = "best.train") # 4. Apply FFT to (new) testing data: plot(heart.fft, data = "test") # predict for Tree 1 plot(heart.fft, data = "test", tree = 2) # predict for Tree 2 # 5. Predict classes and probabilities for new data: predict(heart.fft, newdata = heartdisease) predict(heart.fft, newdata = heartdisease, type = "prob") # 6. Create a custom tree (from verbal description) with my.tree: custom.fft <- FFTrees( formula = diagnosis ~ ., data = heartdisease, my.tree = "If age < 50, predict False. If sex = 1, predict True. If chol > 300, predict True, otherwise predict False.", main = "My custom FFT") # Plot the (pretty bad) custom tree: plot(custom.fft)
# 1. Create fast-and-frugal trees (FFTs) for heart disease: heart.fft <- FFTrees(formula = diagnosis ~ ., data = heart.train, data.test = heart.test, main = "Heart Disease", decision.labels = c("Healthy", "Diseased") ) # 2. Print a summary of the result: heart.fft # same as: # print(heart.fft, data = "train", tree = "best.train") # 3. Plot an FFT applied to training data: plot(heart.fft) # same as: # plot(heart.fft, what = "all", data = "train", tree = "best.train") # 4. Apply FFT to (new) testing data: plot(heart.fft, data = "test") # predict for Tree 1 plot(heart.fft, data = "test", tree = 2) # predict for Tree 2 # 5. Predict classes and probabilities for new data: predict(heart.fft, newdata = heartdisease) predict(heart.fft, newdata = heartdisease, type = "prob") # 6. Create a custom tree (from verbal description) with my.tree: custom.fft <- FFTrees( formula = diagnosis ~ ., data = heartdisease, my.tree = "If age < 50, predict False. If sex = 1, predict True. If chol > 300, predict True, otherwise predict False.", main = "My custom FFT") # Plot the (pretty bad) custom tree: plot(custom.fft)
fftrees_cuerank
takes an FFTrees
object x
and
optimizes its goal.threshold
(from x$params
) for all cues in
newdata
(of type data
).
fftrees_cuerank(x = NULL, newdata = NULL, data = "train", rounding = NULL)
fftrees_cuerank(x = NULL, newdata = NULL, data = "train", rounding = NULL)
x |
An |
newdata |
A dataset with cues to be ranked (as data frame). |
data |
The type of data with cues to be ranked (as character: |
rounding |
integer. An integer value indicating the decimal digit
to which non-integer numeric cue thresholds are to be rounded.
Default: |
fftrees_cuerank
creates a data frame cuerank_df
that is added to x$cues$stats
.
Note that the cue directions and thresholds computed by FFTrees
always predict positive criterion values (i.e., TRUE
or signal,
rather than FALSE
or noise).
Using these thresholds for negative exits (i.e., for predicting instances of
FALSE
or noise) usually requires a reversal (e.g., negating cue direction).
fftrees_cuerank
is called (twice) by the fftrees_grow_fan
algorithm
to grow fast-and-frugal trees (FFTs).
A modified FFTrees
object (with cue rank information
for the current data
type in x$cues$stats
).
fftrees_ffttowords
provides a verbal description
of tree definition (as defined in an FFTrees
object).
Thus, fftrees_ffttowords
translates an abstract FFT definition
into natural language output.
fftrees_ffttowords
is the complement function to
fftrees_wordstofftrees
, which parses a verbal description
of an FFT into the abstract tree definition of an FFTrees
object.
The final sentence (or tree node) of the FFT's description
always predicts positive criterion values (i.e., TRUE
instances) first,
before predicting negative criterion values (i.e., FALSE
instances).
Note that this may require a reversal of exit directions,
if the final cue predicted FALSE
instances.
Note that the cue directions and thresholds computed by FFTrees
always predict positive criterion values (i.e., TRUE
or signal,
rather than FALSE
or noise).
Using these thresholds for negative exits (i.e., for predicting instances of
FALSE
or noise) usually requires a reversal (e.g., negating cue direction).
fftrees_ffttowords(x = NULL, mydata = "train", digits = 2)
fftrees_ffttowords(x = NULL, mydata = "train", digits = 2)
x |
An |
mydata |
The type of data to which a tree is being applied (as character string "train" or "test").
Default: |
digits |
How many digits to round numeric values (as integer)? |
A modified FFTrees
object x
with
x$trees$inwords
containing a list of string vectors.
fftrees_wordstofftrees
for converting a verbal description
of an FFT into an FFTrees
object;
fftrees_create
for creating FFTrees
objects;
fftrees_grow_fan
for creating FFTs by applying algorithms to data;
print.FFTrees
for printing FFTs;
plot.FFTrees
for plotting FFTs;
summary.FFTrees
for summarizing FFTs;
FFTrees
for creating FFTs from and applying them to data.
heart.fft <- FFTrees(diagnosis ~ ., data = heartdisease, decision.labels = c("Healthy", "Disease") ) inwords(heart.fft)
heart.fft <- FFTrees(diagnosis ~ ., data = heartdisease, decision.labels = c("Healthy", "Disease") ) inwords(heart.fft)
fan
algorithmsfftrees_grow_fan
is called by fftrees_define
to create new FFTs by applying the fan
algorithms
(specifically, either ifan
or dfan
) to data.
fftrees_grow_fan(x, repeat.cues = TRUE)
fftrees_grow_fan(x, repeat.cues = TRUE)
x |
An |
repeat.cues |
Can cues be considered/used repeatedly (as logical)?
Default: |
fftrees_create
for creating FFTrees
objects;
fftrees_define
for defining FFTs;
fftrees_grow_fan
for creating FFTs by applying algorithms to data;
fftrees_wordstofftrees
for creating FFTs from verbal descriptions;
FFTrees
for creating FFTs from and applying them to data.
fftrees_ranktrees
ranks trees in an FFTrees
object x
based on the current goal (either "cost"
or as specified in x$params$goal
).
fftrees_ranktrees
is called by the main FFTrees
function
when creating FFTs from and applying them to (training) data.
fftrees_ranktrees(x, data = "train")
fftrees_ranktrees(x, data = "train")
x |
An |
data |
The type of data to be used (as character).
Default: |
FFTrees
for creating FFTs from and applying them to data.
Perform a grid search over factor and return accuracy statistics for a given factor cue
fftrees_threshold_factor_grid( thresholds = NULL, cue_v = NULL, criterion_v = NULL, directions = "=", goal.threshold = NULL, sens.w = NULL, my.goal = NULL, my.goal.fun = NULL, cost.each = NULL, cost.outcomes = NULL )
fftrees_threshold_factor_grid( thresholds = NULL, cue_v = NULL, criterion_v = NULL, directions = "=", goal.threshold = NULL, sens.w = NULL, my.goal = NULL, my.goal.fun = NULL, cost.each = NULL, cost.outcomes = NULL )
thresholds |
numeric. A vector of factor thresholds to consider. |
cue_v |
numeric. Feature/cue values. |
criterion_v |
logical. A logical vector of (TRUE) criterion values. |
directions |
character. Character vector of threshold directions to consider. |
goal.threshold |
A character string indicating the criterion to maximize when optimizing cue thresholds:
|
sens.w |
numeric. Sensitivity weight parameter (from |
my.goal |
Name of an optional, user-defined goal (as character string). Default: |
my.goal.fun |
User-defined goal function (with 4 arguments |
cost.each |
numeric. A constant cost value to add to each value (e.g., the cost of the cue). |
cost.outcomes |
list. A list of length 4 with names 'hi', 'fa', 'mi', and 'cr' specifying
the costs of a hit, false alarm, miss, and correct rejection, respectively, in some common currency.
For instance, |
A data frame containing accuracy statistics for factor thresholds.
fftrees_threshold_numeric_grid
for numeric cues.
Perform a grid search over thresholds and return accuracy statistics for a given numeric cue
fftrees_threshold_numeric_grid( thresholds, cue_v, criterion_v, directions = c(">", "<="), goal.threshold = NULL, sens.w = NULL, my.goal = NULL, my.goal.fun = NULL, cost.each = NULL, cost.outcomes = NULL )
fftrees_threshold_numeric_grid( thresholds, cue_v, criterion_v, directions = c(">", "<="), goal.threshold = NULL, sens.w = NULL, my.goal = NULL, my.goal.fun = NULL, cost.each = NULL, cost.outcomes = NULL )
thresholds |
numeric. A vector of thresholds to consider. |
cue_v |
numeric. Feature values. |
criterion_v |
logical. A logical vector of (TRUE) criterion values. |
directions |
character. Possible directions to consider. |
goal.threshold |
A character string indicating the criterion to maximize when optimizing cue thresholds:
|
sens.w |
numeric. Sensitivity weight parameter (from |
my.goal |
Name of an optional, user-defined goal (as character string). Default: |
my.goal.fun |
User-defined goal function (with 4 arguments |
cost.each |
numeric. A constant cost value to add to each value (e.g., the cost of the cue). |
cost.outcomes |
list. A list of length 4 with names 'hi', 'fa', 'mi', and 'cr' specifying
the costs of a hit, false alarm, miss, and correct rejection, respectively, in some common currency.
For instance, |
A data frame containing accuracy statistics for numeric thresholds.
fftrees_threshold_factor_grid
for factor cues.
FFTrees
objectfftrees_wordstofftrees
converts a verbal description
of an FFT (provided as a string of text) into
a tree definition (of an FFTrees
object).
Thus, fftrees_wordstofftrees
provides a simple
natural language parser for FFTs.
fftrees_wordstofftrees
is the complement function to
fftrees_ffttowords
, which converts an abstract tree definition
(of an FFTrees
object) into a verbal description
(i.e., provides natural language output).
To increase robustness, the parsing of fftrees_wordstofftrees
allows for lower- or uppercase spellings (but not typographical variants)
and ignores the else-part of the final sentence (i.e., the part
beginning with "otherwise").
fftrees_wordstofftrees(x, my.tree)
fftrees_wordstofftrees(x, my.tree)
x |
An |
my.tree |
A character string. A verbal description (as a string of text) defining an FFT. |
An FFTrees
object with a new tree definition as described by my.tree
.
fftrees_ffttowords
for converting FFTs into verbal descriptions;
print.FFTrees
for printing FFTs;
plot.FFTrees
for plotting FFTs;
summary.FFTrees
for summarizing FFTs;
FFTrees
for creating FFTs from and applying them to data.
Open the FFTrees package guide
FFTrees.guide()
FFTrees.guide()
No return value, called for side effects.
flip_exits
reverses the exits of
one or more nodes
from an existing FFT definition
(in the tidy data frame format).
flip_exits
alters the value(s) of the non-final
exits specified in nodes
(from 0 to 1, or from 1 to 0).
By contrast, exits of final nodes
remain unchanged.
Duplicates in nodes
are flipped only once
(rather than repeatedly) and nodes
not in
the range 1:nrow(fft)
are ignored.
flip_exits
is a more specialized function
than edit_nodes
.
flip_exits(fft, nodes = NA, quiet = FALSE)
flip_exits(fft, nodes = NA, quiet = FALSE)
fft |
One FFT definition (as a data frame in tidy format, with one row per node). |
nodes |
The FFT nodes whose exits are to be flipped (as an integer vector).
Default: |
quiet |
Hide feedback messages (as logical)?
Default: |
One FFT definition (as a data frame in tidy format, with one row per node).
add_nodes
for adding nodes to an FFT definition;
edit_nodes
for editing nodes in an FFT definition;
drop_nodes
for deleting nodes from an FFT definition;
reorder_nodes
for reordering nodes of an FFT definition;
select_nodes
for selecting nodes in an FFT definition;
get_fft_df
for getting the FFT definitions of an FFTrees
object;
read_fft_df
for reading one FFT definition from tree definitions;
add_fft_df
for adding FFTs to tree definitions;
FFTrees
for creating FFTs from and applying them to data.
Other tree definition and manipulation functions:
add_fft_df()
,
add_nodes()
,
drop_nodes()
,
edit_nodes()
,
get_fft_df()
,
read_fft_df()
,
reorder_nodes()
,
select_nodes()
,
write_fft_df()
A dataset of forest fire statistics.
forestfires
forestfires
A data frame containing 517 rows and 13 columns.
Integer -x-axis spatial coordinate within the Montesinho park map: 1 to 9
Integer - y-axis spatial coordinate within the Montesinho park map: 2 to 9
Factor - month of the year: "jan" to "dec"
Factor -day of the week: "mon" to "sun"
Numeric -FFMC index from the FWI system: 18.7 to 96.20
Numeric - DMC index from the FWI system: 1.1 to 291.3
Numeric - DC index from the FWI system: 7.9 to 860.6
Numeric - ISI index from the FWI system: 0.0 to 56.10
Numeric - temperature in Celsius degrees: 2.2 to 33.30
Numeric - relative humidity in percent: 15.0 to 100
Numeric - wind speed in km/h: 0.40 to 9.40
Numeric - outside rain in mm/m2 : 0.0 to 6.4
Criterion: Was there a fire (greater than 1.00 ha)?
Values: TRUE
(yes) vs. FALSE
(no) (47.0% vs. 53.0%).
We made the following enhancements to the original data for improved usability:
The criterion was redefined from a numeric variable that indicated the number of hectares that burned in a fire into a logical variable (TRUE
(for values >1) vs. FALSE
(for values <=1)).
Other than that, the data remains consistent with the original dataset.
http://archive.ics.uci.edu/ml/datasets/Forest+Fires
Original creator: Prof. Paulo Cortez and Aníbal Morais Department of Information Systems University of Minho, Portugal
Other datasets:
blood
,
breastcancer
,
car
,
contraceptive
,
creditapproval
,
fertility
,
heart.cost
,
heart.test
,
heart.train
,
heartdisease
,
iris.v
,
mushrooms
,
sonar
,
titanic
,
voting
,
wine
get_best_tree
selects (looks up and identifies) the best tree (as an integer)
from the set (or “fan”) of FFTs contained in the current FFTrees
object x
,
an existing type of data
('train' or 'test'), and
a goal
for which corresponding statistics are available
in the designated data
type (in x$trees$stats
).
get_best_tree(x, data, goal, my.goal.max = TRUE)
get_best_tree(x, data, goal, my.goal.max = TRUE)
x |
An |
data |
The type of data to consider (as character: either 'train' or 'test'). |
goal |
A goal (as character) to be maximized or minimized when selecting a tree
from an existing |
my.goal.max |
Default direction for user-defined |
Importantly, get_best_tree
only identifies and selects the 'tree' identifier
(as an integer) from the set of existing trees with known statistics,
rather than creating new trees or computing new cue thresholds.
More specifically, goal
is used for identifying and selecting the 'tree'
identifier (as an integer) of the best FFT from an existing set of FFTs, but not for
computing new cue thresholds (see goal.threshold
and fftrees_cuerank()
) or
creating new trees (see goal.chase
and fftrees_ranktrees()
).
An integer denoting the tree
that maximizes/minimizes goal
in data
.
FFTrees
for creating FFTs from and applying them to data.
Other utility functions:
get_exit_type()
,
get_fft_df()
x
of FFT exit descriptions)get_exit_type
checks and converts a vector x
of FFT exit descriptions into exits of an FFT
that correspond to the current options of
exit_types
(as a global constant).
get_exit_type(x, verify = TRUE)
get_exit_type(x, verify = TRUE)
x |
A vector of FFT exit descriptions. |
verify |
A flag to turn verification on/off (as logical).
Default: |
get_exit_type
also verifies that the exit types conform to an FFT
(e.g., only the exits of the final node are bi-directional).
A vector of exit_types
(or an error).
FFTrees
for creating FFTs from and applying them to data.
Other utility functions:
get_best_tree()
,
get_fft_df()
get_exit_type(c(0, 1, .5)) get_exit_type(c(FALSE, " True ", 2/4)) get_exit_type(c("noise", "signal", "final")) get_exit_type(c("left", "right", "both"))
get_exit_type(c(0, 1, .5)) get_exit_type(c(FALSE, " True ", 2/4)) get_exit_type(c("noise", "signal", "final")) get_exit_type(c("left", "right", "both"))
FFTrees
object x
)get_fft_df
gets the FFT definitions
of an FFTrees
object x
(as a data.frame
).
get_fft_df(x)
get_fft_df(x)
x |
An |
The FFTs in the data.frame
returned
are represented in the one-line per FFT definition format
used by an FFTrees
object.
In addition to looking up x$trees$definitions
,
get_fft_df
verifies that the FFT definitions
are valid (given current settings).
A set of FFT definitions (as a data.frame
/tibble
,
in the one-line per FFT definition format used by an FFTrees
object).
read_fft_df
for reading one FFT definition from tree definitions;
write_fft_df
for writing one FFT to tree definitions;
add_fft_df
for adding FFTs to tree definitions;
FFTrees
for creating FFTs from and applying them to data.
Other utility functions:
get_best_tree()
,
get_exit_type()
Other tree definition and manipulation functions:
add_fft_df()
,
add_nodes()
,
drop_nodes()
,
edit_nodes()
,
flip_exits()
,
read_fft_df()
,
reorder_nodes()
,
select_nodes()
,
write_fft_df()
heartdisease
dataThis data further characterizes the variables (cues) in the heartdisease
dataset.
heart.cost
heart.cost
A list of length 13 containing the cost of each cue in the heartdisease
dataset (in dollars).
Each list element is a single (positive numeric) value.
https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/costs/
heartdisease
data.
Other datasets:
blood
,
breastcancer
,
car
,
contraceptive
,
creditapproval
,
fertility
,
forestfires
,
heart.test
,
heart.train
,
heartdisease
,
iris.v
,
mushrooms
,
sonar
,
titanic
,
voting
,
wine
Testing data for a heartdisease
data.
This subset is used to test the prediction performance of a model trained on the heart.train
data.
The dataset heartdisease
contains both datasets.
heart.test
heart.test
A data frame containing 153 rows and 14 columns (see heartdisease
for details).
https://archive.ics.uci.edu/ml/datasets/Heart+Disease
heartdisease
dataset.
Other datasets:
blood
,
breastcancer
,
car
,
contraceptive
,
creditapproval
,
fertility
,
forestfires
,
heart.cost
,
heart.train
,
heartdisease
,
iris.v
,
mushrooms
,
sonar
,
titanic
,
voting
,
wine
Training data for a binary prediction model (here: FFT) on (a subset of) the heartdisease
data.
The complementary subset for model testing is heart.test
.
The data in heartdisease
contains both subsets.
heart.train
heart.train
A data frame containing 150 rows and 14 columns (see heartdisease
for details).
https://archive.ics.uci.edu/ml/datasets/Heart+Disease
heartdisease
dataset.
Other datasets:
blood
,
breastcancer
,
car
,
contraceptive
,
creditapproval
,
fertility
,
forestfires
,
heart.cost
,
heart.test
,
heartdisease
,
iris.v
,
mushrooms
,
sonar
,
titanic
,
voting
,
wine
A dataset predicting the diagnosis
of 303 patients tested for heart disease.
heartdisease
heartdisease
A data frame containing 303 rows and 14 columns, with the following variables:
True value of binary criterion: TRUE = Heart disease, FALSE = No heart disease
Age (in years)
Sex, 1 = male, 0 = female
Chest pain type: ta = typical angina, aa = atypical angina, np = non-anginal pain, a = asymptomatic
Resting blood pressure (in mm Hg on admission to the hospital)
Serum cholestoral in mg/dl
Fasting blood sugar > 120 mg/dl: 1 = true, 0 = false
Resting electrocardiographic results. "normal" = normal, "abnormal" = having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV), "hypertrophy" = showing probable or definite left ventricular hypertrophy by Estes' criteria.
Maximum heart rate achieved
Exercise induced angina: 1 = yes, 0 = no
ST depression induced by exercise relative to rest
The slope of the peak exercise ST segment.
Number of major vessels (0-3) colored by flouroscopy
"normal" = normal, "fd" = fixed defect, "rd" = reversible defect
Note that this is a simplified version of the 303 cases of the Cleveland Clinic Foundation (V.A. Medical Center, Long Beach and Cleveland Clinic Foundation; Principal investigator: Robert Detrano, MD, PhD).
The original dataset contains 3 further subsets (from Budapest, Hungary; Long Beach CA; and Zurich, Switzerland), a total of 76 raw attributes, and some missing values.
The original criterion variable num
is integer valued from 0 (no presence) to 4 (maximum).
To obtain a binary criterion diagnosis
, values from 1 to 3 have been collapsed to TRUE
.
https://archive.ics.uci.edu/ml/datasets/Heart+Disease
heart.cost
dataset for cost information.
Other datasets:
blood
,
breastcancer
,
car
,
contraceptive
,
creditapproval
,
fertility
,
forestfires
,
heart.cost
,
heart.test
,
heart.train
,
iris.v
,
mushrooms
,
sonar
,
titanic
,
voting
,
wine
inwords
generates and provides a verbal description
of a fast-and-frugal tree (FFT) from an FFTrees
object.
When data
remains unspecified, inwords
will only look up x$trees$inwords
.
When data
is set to either "train" or "test", inwords
first employs
fftrees_ffttowords
to re-generate the verbal descriptions of FFTs in x
.
inwords(x, data = NULL, tree = 1)
inwords(x, data = NULL, tree = 1)
x |
An |
data |
The type of data to which a tree is being applied (as character string "train" or "test").
Default: |
tree |
The tree to display (as an integer). |
A verbal description of an FFT (as a character string).
fftrees_ffttowords
for converting FFTs into verbal descriptions;
print.FFTrees
for printing FFTs;
plot.FFTrees
for plotting FFTs;
summary.FFTrees
for summarizing FFTs;
FFTrees
for creating FFTs from and applying them to data.
A famous dataset from R.A. Fisher (1936) simplified to predict only the virginica class (i.e., as a binary classification problem).
iris.v
iris.v
A data frame containing 150 rows and 4 columns.
sepal length in cm
sepal width in cm
petal length in cm
petal width in cm
Criterion: Does an iris belong to the class "virginica"?
Values: TRUE
vs. FALSE
(33.33% vs.66.67%).
To improve usability, we made the following changes:
The criterion was binarized from a factor variable with three levels
(Iris-setosa
, Iris-versicolor
, Iris-virginica
),
into a logical variable (i.e., TRUE
for all instances of Iris-virginica
and FALSE
for the two other levels).
Other than that, the data remains consistent with the original dataset.
https://archive.ics.uci.edu/ml/datasets/Iris
Fisher, R.A. (1936): The use of multiple measurements in taxonomic problems. Annual Eugenics, 7, Part II, pp. 179–188.
Other datasets:
blood
,
breastcancer
,
car
,
contraceptive
,
creditapproval
,
fertility
,
forestfires
,
heart.cost
,
heart.test
,
heart.train
,
heartdisease
,
mushrooms
,
sonar
,
titanic
,
voting
,
wine
Data describing poisonous vs. non-poisonous mushrooms.
mushrooms
mushrooms
A data frame containing 8,124 rows and 23 columns.
See http://archive.ics.uci.edu/ml/machine-learning-databases/mushroom/agaricus-lepiota.names for column descriptions.
Criterion: Is the mushroom poisonous?
Values: TRUE
(poisonous) vs. FALSE
(eatable) (48.2% vs.\ 52.8%).
cap-shape, character (bell=b, conical=c, convex=x, flat=f, knobbed=k, sunken=s)
cap-surface, character (fibrous=f, grooves=g, scaly=y, smooth=s)
cap-color, character (brown=n, buff=b, cinnamon=c, gray=g, green=r, pink=p, purple=u, red=e, white=w, yellow=y)
Are there bruises? logical (TRUE/FALSE)
character (almond=a, anise=l, creosote=c, fishy=y, foul=f, musty=m, none=n, pungent=p, spicy=s)
gill-attachment, character (attached=a, descending=d, free=f, notched=n)
gill-spacing, character (close=c, crowded=w, distant=d)
gill-size, character (broad=b, narrow=n)
gill-color, character (black=k, brown=n, buff=b, chocolate=h, gray=g, green=r, orange=o, pink=p, purple=u, red=e, white=w, yellow=y)
stalk-shape, character (enlarging=e, tapering=t)
stalk-root, character (bulbous=b ,club=c, cup=u, equal=e, rhizomorphs=z, rooted=r)
stalk-surface-above-ring, character (fibrous=f, scaly=y, silky=k, smooth=s)
stalk-surface-below-ring, character (fibrous=f, scaly=y, silky=k, smooth=s)
stalk-color-above-ring, character (brown=n, buff=b, cinnamon=c, gray=g, orange=o, pink=p, red=e, white=w, yellow=y)
stalk-color-below-ring, character (brown=n, buff=b, cinnamon=c, gray=g, orange=o, pink=p, red=e, white=w, yellow=y)
veil-type, character (partial=p, universal=u)
veil-color, character (brown=n, orange=o, white=w, yellow=y)
character (none=n, one=o, two=t)
character (cobwebby=c, evanescent=e, flaring=f, large=l, none=n, pendant=p, sheathing=s, zone=z)
spore-print-color, character (black=k, brown=n, buff=b, chocolate=h, green=r, orange=o, purple=u, white=w, yellow=y)
character(abundant=a, clustered=c, numerous=n, scattered=s, several=v, solitary=y)
character (grasses=g, leaves=l, meadows=m, paths=p, urban=u, waste=w, woods=d)
This dataset includes descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms
in the Agaricus and Lepiota Family. Each species is classified as poisonous
(True or False).
The Guide clearly states that there is no simple rule for determining the edibility of a mushroom;
no rule like “leaflets three, let it be” for Poisonous Oak and Ivy.
We made the following enhancements to the original data for improved usability:
Any missing values, denoted as "?" in the dataset, were transformed into NAs.
Binary factor variables with exclusive "t" and "f" values were converted to logical TRUE/FALSE
vectors.
The binary factor criterion variable with exclusive "p" and "e" values was converted to a logical TRUE/FALSE
vector.
Other than that, the data remains consistent with the original dataset.
https://archive.ics.uci.edu/ml/datasets/Mushroom
Mushroom records drawn from The Audubon Society Field Guide to North American Mushrooms (1981). G.H. Lincoff (Pres.), New York: A.A. Knopf.
Other datasets:
blood
,
breastcancer
,
car
,
contraceptive
,
creditapproval
,
fertility
,
forestfires
,
heart.cost
,
heart.test
,
heart.train
,
heartdisease
,
iris.v
,
sonar
,
titanic
,
voting
,
wine
FFTrees
objectplot.FFTrees
visualizes an FFTrees
object created by the FFTrees
function.
plot.FFTrees
is the main plotting function of the FFTrees package and
called when evaluating the generic plot
on an FFTrees
object.
plot.FFTrees
visualizes a selected FFT, key data characteristics, and various aspects of classification performance.
As x
may not contain test data, plot.FFTrees
by default plots the performance characteristics
for training data (i.e., fitting), rather than for test data (i.e., for prediction).
When test data is available, specifying data = "test"
plots prediction performance.
Whenever the sensitivity weight (sens.w
) is set to its default of sens.w = 0.50
,
a level shows balanced accuracy (bacc
). If, however, sens.w
deviates from its default,
the level shows the tree's weighted accuracy value (wacc
) and the current sens.w
value (below the level).
Many aspects of the plot (e.g., its panels) and the FFT's appearance (e.g., labels of its nodes and exits) can be customized by setting corresponding arguments.
## S3 method for class 'FFTrees' plot( x = NULL, data = "train", what = "all", tree = 1, main = NULL, cue.labels = NULL, decision.labels = NULL, truth.labels = NULL, cue.cex = NULL, threshold.cex = NULL, decision.cex = 1, comp = TRUE, show.header = NULL, show.tree = NULL, show.confusion = NULL, show.levels = NULL, show.roc = NULL, show.icons = NULL, show.iconguide = NULL, hlines = TRUE, label.tree = NULL, label.performance = NULL, n.per.icon = NULL, level.type = "bar", which.tree = NULL, decision.names = NULL, stats = NULL, grayscale = FALSE, ... )
## S3 method for class 'FFTrees' plot( x = NULL, data = "train", what = "all", tree = 1, main = NULL, cue.labels = NULL, decision.labels = NULL, truth.labels = NULL, cue.cex = NULL, threshold.cex = NULL, decision.cex = 1, comp = TRUE, show.header = NULL, show.tree = NULL, show.confusion = NULL, show.levels = NULL, show.roc = NULL, show.icons = NULL, show.iconguide = NULL, hlines = TRUE, label.tree = NULL, label.performance = NULL, n.per.icon = NULL, level.type = "bar", which.tree = NULL, decision.names = NULL, stats = NULL, grayscale = FALSE, ... )
x |
An |
data |
The type of data in
By default, |
what |
What should be plotted (as a character string)? Valid options are:
Default: |
tree |
The tree to be plotted (as an integer, only valid when the corresponding tree argument is non-empty).
Default: |
main |
The main plot label (as a character string). |
cue.labels |
An optional string of labels for the cues / nodes (as character vector). |
decision.labels |
A character vector of length 2 indicating the content-specific names for noise vs. signal predictions/exits. |
truth.labels |
A character vector of length 2 indicating the content-specific names for true noise vs. signal cases (using 'decision.labels' if unspecified). |
cue.cex |
The size of the cue labels (as numeric). |
threshold.cex |
The size of the threshold labels (as numeric). |
decision.cex |
The size of the decision labels (as numeric). |
comp |
Should the performance of competitive algorithms (e.g.; logistic regression, random forests, etc.) be shown in the ROC plot (if available, as logical)? |
show.header |
Show header with basic data properties (in top panel, as logical)? |
show.tree |
Show nodes and exits of FFT (in middle panel, as logical)? |
show.confusion |
Show a 2x2 confusion matrix (in bottom panel, as logical)? |
show.levels |
Show performance levels (in bottom panel, as logical)? |
show.roc |
Show ROC curve (in bottom panel, as logical)? |
show.icons |
Show exit cases as icon arrays (in middle panel, as logical)? |
show.iconguide |
Show icon guide (in middle panel, as logical)? |
hlines |
Show horizontal panel separation lines (as logical)?
Default: |
label.tree |
A label for the FFT (optional, as character string). |
label.performance |
A label for the performance section (optional, as character string). |
n.per.icon |
The number of cases represented by each icon (as numeric). |
level.type |
The type of performance levels to be drawn at the bottom (as character string, either |
which.tree |
Deprecated argument. Use |
decision.names |
Deprecated argument. Use |
stats |
Deprecated argument. Should statistical information be plotted (as logical)?
Use |
grayscale |
logical. If |
... |
Graphical parameters (passed to text of panel titles,
to |
An invisible FFTrees
object x
and a plot visualizing and describing an FFT (as side effect).
showcues
for plotting cue accuracies;
print.FFTrees
for printing FFTs;
summary.FFTrees
for summarizing FFTs;
FFTrees
for creating FFTs from and applying them to data.
Other plot functions:
showcues()
# Create FFTs (for heartdisease data): heart_fft <- FFTrees(formula = diagnosis ~ ., data = heart.train) # Visualize the default FFT (Tree #1, what = 'all'): plot(heart_fft, main = "Heart disease", decision.labels = c("Absent", "Present")) # Visualize cue accuracies (in ROC space): plot(heart_fft, what = "cues", main = "Cue accuracies for heart disease data") # Visualize tree diagram with icon arrays on exit nodes: plot(heart_fft, what = "icontree", n.per.icon = 2, main = "Diagnosing heart disease") # Visualize performance comparison in ROC space: plot(heart_fft, what = "roc", main = "Performance comparison for heart disease data") # Visualize predictions of FFT #2 (for new test data) with custom options: plot(heart_fft, tree = 2, data = heart.test, main = "Predicting heart disease", cue.labels = c("1. thal?", "2. cp?", "3. ca?", "4. exang"), decision.labels = c("ok", "treat"), truth.labels = c("Healthy", "Sick"), n.per.icon = 2, show.header = TRUE, show.confusion = TRUE, show.levels = TRUE, show.roc = TRUE, hlines = FALSE, font = 3, col = "steelblue") # # For details, see # vignette("FFTrees_plot", package = "FFTrees")
# Create FFTs (for heartdisease data): heart_fft <- FFTrees(formula = diagnosis ~ ., data = heart.train) # Visualize the default FFT (Tree #1, what = 'all'): plot(heart_fft, main = "Heart disease", decision.labels = c("Absent", "Present")) # Visualize cue accuracies (in ROC space): plot(heart_fft, what = "cues", main = "Cue accuracies for heart disease data") # Visualize tree diagram with icon arrays on exit nodes: plot(heart_fft, what = "icontree", n.per.icon = 2, main = "Diagnosing heart disease") # Visualize performance comparison in ROC space: plot(heart_fft, what = "roc", main = "Performance comparison for heart disease data") # Visualize predictions of FFT #2 (for new test data) with custom options: plot(heart_fft, tree = 2, data = heart.test, main = "Predicting heart disease", cue.labels = c("1. thal?", "2. cp?", "3. ca?", "4. exang"), decision.labels = c("ok", "treat"), truth.labels = c("Healthy", "Sick"), n.per.icon = 2, show.header = TRUE, show.confusion = TRUE, show.levels = TRUE, show.roc = TRUE, hlines = FALSE, font = 3, col = "steelblue") # # For details, see # vignette("FFTrees_plot", package = "FFTrees")
predict.FFTrees
predicts binary classification outcomes or their probabilities from newdata
for an FFTrees
object.
## S3 method for class 'FFTrees' predict( object = NULL, newdata = NULL, tree = 1, type = "class", sens.w = NULL, method = "laplace", data = NULL, ... )
## S3 method for class 'FFTrees' predict( object = NULL, newdata = NULL, tree = 1, type = "class", sens.w = NULL, method = "laplace", data = NULL, ... )
object |
An |
newdata |
dataframe. A data frame of test data. |
tree |
integer. Which tree in the object should be used? By default, |
type |
string. What should be predicted? Can be |
sens.w , data
|
deprecated |
method |
string. Method of calculating class probabilities. Either 'laplace', which applies the Laplace correction, or 'raw' which applies no correction. |
... |
Additional arguments passed on to |
Either a logical vector of predictions, or a matrix of class probabilities.
print.FFTrees
for printing FFTs;
plot.FFTrees
for plotting FFTs;
summary.FFTrees
for summarizing FFTs;
FFTrees
for creating FFTs from and applying them to data.
# Create training and test data: set.seed(100) breastcancer <- breastcancer[sample(nrow(breastcancer)), ] breast.train <- breastcancer[1:150, ] breast.test <- breastcancer[151:303, ] # Create an FFTrees object from the training data: breast.fft <- FFTrees( formula = diagnosis ~ ., data = breast.train ) # Predict classification outcomes for test data: breast.fft.pred <- predict(breast.fft, newdata = breast.test ) # Predict class probabilities for test data: breast.fft.pred <- predict(breast.fft, newdata = breast.test, type = "prob" )
# Create training and test data: set.seed(100) breastcancer <- breastcancer[sample(nrow(breastcancer)), ] breast.train <- breastcancer[1:150, ] breast.test <- breastcancer[151:303, ] # Create an FFTrees object from the training data: breast.fft <- FFTrees( formula = diagnosis ~ ., data = breast.train ) # Predict classification outcomes for test data: breast.fft.pred <- predict(breast.fft, newdata = breast.test ) # Predict class probabilities for test data: breast.fft.pred <- predict(breast.fft, newdata = breast.test, type = "prob" )
print.FFTrees
prints basic information on FFTs for an FFTrees
object x
.
As x
may not contain test data, print.FFTrees
by default prints the performance characteristics for training data (i.e., fitting), rather than for test data (i.e., for prediction).
When test data is available, specify data = "test"
to print prediction performance.
## S3 method for class 'FFTrees' print(x = NULL, tree = 1, data = "train", ...)
## S3 method for class 'FFTrees' print(x = NULL, tree = 1, data = "train", ...)
x |
An |
tree |
The tree to be printed (as an integer, only valid when the corresponding tree argument is non-empty).
Default: |
data |
The type of data in
By default, |
... |
additional arguments passed to |
An invisible FFTrees
object x
and summary information on an FFT printed to the console (as side effect).
plot.FFTrees
for plotting FFTs;
summary.FFTrees
for summarizing FFTs;
inwords
for obtaining a verbal description of FFTs;
FFTrees
for creating FFTs from and applying them to data.
read_fft_df
reads and returns
the definition of a single FFT (as a tidy data frame)
from the multi-line FFT definitions of an FFTrees
object.
read_fft_df
allows reading individual tree definitions
to manipulate them with other tree trimming functions.
write_fft_df
provides the inverse functionality.
read_fft_df(ffts_df, tree = 1)
read_fft_df(ffts_df, tree = 1)
ffts_df |
A set of FFT definitions (as a data frame,
usually from an |
tree |
The ID of the to-be-selected FFT (as an integer),
corresponding to a tree in |
One FFT definition (as a data frame in tidy format, with one row per node).
get_fft_df
for getting the FFT definitions of an FFTrees
object;
write_fft_df
for writing one FFT to tree definitions;
add_fft_df
for adding FFTs to tree definitions;
FFTrees
for creating FFTs from and applying them to data.
Other tree definition and manipulation functions:
add_fft_df()
,
add_nodes()
,
drop_nodes()
,
edit_nodes()
,
flip_exits()
,
get_fft_df()
,
reorder_nodes()
,
select_nodes()
,
write_fft_df()
reorder_nodes
allows reordering
the nodes
in an existing FFT definition
(in the tidy data frame format).
reorder_nodes
allows to directly set and change the node
order in an FFT definition by specifying nodes
.
When a former non-final node becomes a final node,
the exit type of the former final node
is set to the signal value (i.e., exit_types[2]
).
reorder_nodes(fft, order = NA, quiet = FALSE)
reorder_nodes(fft, order = NA, quiet = FALSE)
fft |
One FFT definition (as a data frame in tidy format, with one row per node). |
order |
The desired node order (as an integer vector).
The values of |
quiet |
Hide feedback messages (as logical)?
Default: |
One FFT definition (as a data frame in tidy format, with one row per node).
add_nodes
for adding nodes to an FFT definition;
edit_nodes
for editing nodes in an FFT definition;
drop_nodes
for deleting nodes from an FFT definition;
flip_exits
for reversing exits in an FFT definition;
select_nodes
for selecting nodes in an FFT definition;
get_fft_df
for getting the FFT definitions of an FFTrees
object;
read_fft_df
for reading one FFT definition from tree definitions;
add_fft_df
for adding FFTs to tree definitions;
FFTrees
for creating FFTs from and applying them to data.
Other tree definition and manipulation functions:
add_fft_df()
,
add_nodes()
,
drop_nodes()
,
edit_nodes()
,
flip_exits()
,
get_fft_df()
,
read_fft_df()
,
select_nodes()
,
write_fft_df()
select_nodes
selects
one or more nodes
from an existing FFT definition
(by filtering the corresponding row(s) from the FFT definition
in the tidy data frame format).
When not selecting the final node, the last selected node becomes the new final node (i.e., gains a second exit).
Duplicates in nodes
are selected only once
(rather than incrementally) and nodes
not in
the range 1:nrow(fft)
are ignored.
select_nodes
is the inverse function
of drop_nodes
.
select_nodes(fft, nodes = NA, quiet = FALSE)
select_nodes(fft, nodes = NA, quiet = FALSE)
fft |
One FFT definition (as a data frame in tidy format, with one row per node). |
nodes |
The FFT nodes to select (as an integer vector).
Default: |
quiet |
Hide feedback messages (as logical)?
Default: |
One FFT definition (as a data frame in tidy format, with one row per node).
add_nodes
for adding nodes to an FFT definition;
drop_nodes
for deleting nodes from an FFT definition;
edit_nodes
for editing nodes in an FFT definition;
flip_exits
for reversing exits in an FFT definition;
reorder_nodes
for reordering nodes of an FFT definition;
get_fft_df
for getting the FFT definitions of an FFTrees
object;
read_fft_df
for reading one FFT definition from tree definitions;
add_fft_df
for adding FFTs to tree definitions;
FFTrees
for creating FFTs from and applying them to data.
Other tree definition and manipulation functions:
add_fft_df()
,
add_nodes()
,
drop_nodes()
,
edit_nodes()
,
flip_exits()
,
get_fft_df()
,
read_fft_df()
,
reorder_nodes()
,
write_fft_df()
showcues
plots the cue accuracies of an FFTrees
object
created by the FFTrees
function (as points in ROC space).
If the optional arguments cue.accuracies
and alt.goal
are specified,
their values take precedence over the corresponding settings of an FFTrees
object x
(but do not change x
).
showcues
is called when the main plot.FFTrees
function is set to what = "cues"
.
showcues( x = NULL, cue.accuracies = NULL, alt.goal = NULL, main = NULL, top = 5, quiet = list(ini = TRUE, fin = FALSE, set = TRUE), ... )
showcues( x = NULL, cue.accuracies = NULL, alt.goal = NULL, main = NULL, top = 5, quiet = list(ini = TRUE, fin = FALSE, set = TRUE), ... )
x |
An |
cue.accuracies |
An optional data frame specifying cue accuracies directly (without specifying |
alt.goal |
An optional alternative goal to sort the current cue accuracies (without using the goal of |
main |
A main plot title (as character string). |
top |
How many of the top cues should be highlighted (as an integer)? |
quiet |
Should user feedback messages be suppressed (as a list of 3 logical arguments)?
Default: |
... |
Graphical parameters (passed to |
A plot showing cue accuracies (of an FFTrees
object) (as points in ROC space).
print.FFTrees
for printing FFTs;
plot.FFTrees
for plotting FFTs;
summary.FFTrees
for summarizing FFTs;
FFTrees
for creating FFTs from and applying them to data.
Other plot functions:
plot.FFTrees()
# Create fast-and-frugal trees (FFTs) for heart disease: heart.fft <- FFTrees(formula = diagnosis ~ ., data = heart.train, data.test = heart.test, main = "Heart Disease", decision.labels = c("Healthy", "Diseased") ) # Show cue accuracies (in ROC space): showcues(heart.fft, main = "Predicting heart disease")
# Create fast-and-frugal trees (FFTs) for heart disease: heart.fft <- FFTrees(formula = diagnosis ~ ., data = heart.train, data.test = heart.test, main = "Heart Disease", decision.labels = c("Healthy", "Diseased") ) # Show cue accuracies (in ROC space): showcues(heart.fft, main = "Predicting heart disease")
The file contains patterns of sonar signals bounced off a metal cylinder or bounced off a roughly cylindrical rock at various angles and under various conditions. The transmitted sonar signal is a frequency-modulated chirp, rising in frequency.
sonar
sonar
A data frame containing 208 rows and 60 columns.
Number in the range 0.0 to 1.0 that represents the energy within a particular frequency band, integrated over a certain period of time.
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
(see V1
)
Criterion: Did a sonar signal bounce off a metal cylinder (or a rock)?
Values: TRUE
(metal cylinder) vs. FALSE
(rock) (53.37% vs.\ 46.63%).
We made the following enhancements to the original data for improved usability:
The binary factor criterion variable with exclusive "m" and "r" values was converted to a logical TRUE/FALSE
vector.
Other than that, the data remains consistent with the original dataset.
https://archive.ics.uci.edu/ml/datasets/Connectionist+Bench+(Sonar,+Mines+vs.+Rocks)
Gorman, R. P., and Sejnowski, T. J. (1988). Analysis of hidden units in a layered network trained to classify sonar targets. Neural Networks, 1, pp. 75–89.
Other datasets:
blood
,
breastcancer
,
car
,
contraceptive
,
creditapproval
,
fertility
,
forestfires
,
heart.cost
,
heart.test
,
heart.train
,
heartdisease
,
iris.v
,
mushrooms
,
titanic
,
voting
,
wine
FFTrees
objectsummary.FFTrees
summarizes key contents of an FFTrees
object.
## S3 method for class 'FFTrees' summary(object, tree = NULL, ...)
## S3 method for class 'FFTrees' summary(object, tree = NULL, ...)
object |
An |
tree |
The tree to summarize (as an integer, but may be a vector).
If |
... |
Additional arguments (currently ignored). |
Given an FFTrees
object x
,
summary.FFTrees
selects key parameters from x$params
and provides the definitions and performance statistics for tree
from x$trees
.
Inspect and query x
for additional details.
summary.FFTrees
returns an invisible list containing two elements:
definitions
and corresponding performance measures of tree
s;
stats
on decision frequencies, derived probabilities, and costs (separated by train
and test
).
A header prints descriptive information of the FFTrees
object (to the console):
Its main
title, number of trees (object$trees$n
), and the name of the criterion variable (object$criterion_name
).
Per default, information on all available trees is shown and returned.
Specifying tree
filters the output list elements for the corresponding tree(s).
When only a single tree
is specified, the printed header includes a verbal description of
the corresponding tree.
While summary.FFTrees
provides key details about the specified tree
(s),
the individual decisions (stored in object$trees$decisions
) are not shown or returned.
An invisible list with elements containing the definitions
and performance stats
of the FFT(s) specified by tree
(s).
print.FFTrees
for printing FFTs;
plot.FFTrees
for plotting FFTs;
inwords
for obtaining a verbal description of FFTs;
FFTrees
for creating FFTs from and applying them to data.
Data indicating who survived on the Titanic.
titanic
titanic
A data frame containing 2,201 rows and 4 columns.
Factor - Class (first, second, third, or crew)
Factor - Age group (child or adult)
Factor - Sex (male or female)
Logical - Whether the passenger survived (TRUE) or not (FALSE)
See Titanic
of the R datasets package for details and
the same data (in a 4-dimensional table
).
https://www.encyclopedia-titanica.org
Dawson, Robert J. MacG. (1995). The ‘Unusual Episode’ Data Revisited. Journal of Statistics Education, 3. https://doi.org/10.1080/10691898.1995.11910499.
Other datasets:
blood
,
breastcancer
,
car
,
contraceptive
,
creditapproval
,
fertility
,
forestfires
,
heart.cost
,
heart.test
,
heart.train
,
heartdisease
,
iris.v
,
mushrooms
,
sonar
,
voting
,
wine
A dataset of votes for each of the U.S. House of Representatives Congressmen on the 16 key votes identified by the CQA.
voting
voting
A data frame containing 435 rows and 16 columns.
handicapped-infants, logical (TRUE, FALSE)
water-project-cost-sharing, logical (TRUE, FALSE)
adoption-of-the-budget-resolution, logical (TRUE, FALSE)
physician-fee-freeze, logical (TRUE, FALSE)
el-salvador-aid, logical (TRUE, FALSE)
religious-groups-in-schools, logical (TRUE, FALSE)
anti-satellite-test-ban, logical (TRUE, FALSE)
aid-to-nicaraguan-contras, logical (TRUE, FALSE)
mxmissile, logical (TRUE, FALSE)
immigration, logical (TRUE, FALSE)
synfuels-corporation-cutback, logical (TRUE, FALSE)
education-spending, logical (TRUE, FALSE)
superfund-right-to-sue, logical (TRUE, FALSE)
crime, logical (TRUE, FALSE)
duty-free-exports, logical (TRUE, FALSE)
export-administration-act-south-africa, logical (TRUE, FALSE)
Criterion: Where the voters democratic (or republican) congressmen?
Values: TRUE
(democrat) / FALSE
(republican) (61.52% vs. 38.48%).
The CQA lists nine different types of votes: Voted for, paired for, and announced for (these three simplified to yea), voted against, paired against, and announced against (these three simplified to nay), voted present, voted present to avoid conflict of interest, and did not vote or otherwise make a position known (these three simplified to an unknown disposition).
We made the following enhancements to the original data for improved usability:
Any missing values, denoted as "?" in the dataset, were transformed into NAs.
Binary factor variables with exclusive "y" and "n" values were converted to logical TRUE/FALSE vectors.
The binary character criterion variable with exclusive "democrat" and "republican" values was converted to a logical TRUE/FALSE
vector.
Other than that, the data remains consistent with the original dataset.
https://archive.ics.uci.edu/ml/datasets/Congressional+Voting+Records
Congressional Quarterly Almanac, 98th Congress, 2nd session 1984, Congressional Quarterly Inc., Volume XL. Washington, D.C., 1985.
Other datasets:
blood
,
breastcancer
,
car
,
contraceptive
,
creditapproval
,
fertility
,
forestfires
,
heart.cost
,
heart.test
,
heart.train
,
heartdisease
,
iris.v
,
mushrooms
,
sonar
,
titanic
,
wine
Chemical and tasting data from wines in Northern Portugal.
wine
wine
A data frame containing 6497 rows and 13 columns.
fixed acidity (nummeric)
volatile acidity (nummeric)
citric acid (nummeric)
residual sugar (nummeric)
chlorides (nummeric)
free sulfur dioxide (nummeric)
total sulfur dioxide (nummeric)
density (nummeric)
PH Value (nummeric)
Sulphates (nummeric)
Alcohol (nummeric)
Quality (nummeric, score between 0 and 10)
Criterion: Is the wine red
or white
? (24.61% vs.75.39%)
http://archive.ics.uci.edu/ml/datasets/Wine+Quality
P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis (2009). Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems, 47 (4), 547–553. https://doi.org/10.1016/j.dss.2009.05.016
Other datasets:
blood
,
breastcancer
,
car
,
contraceptive
,
creditapproval
,
fertility
,
forestfires
,
heart.cost
,
heart.test
,
heart.train
,
heartdisease
,
iris.v
,
mushrooms
,
sonar
,
titanic
,
voting
write_fft_df
writes
the definition of a single FFT (as a tidy data frame)
into the one-line FFT definition used by an FFTrees
object.
write_fft_df
allows turning individual tree definitions
into the one-line FFT definition format
used by an FFTrees
object.
read_fft_df
provides the inverse functionality.
write_fft_df(fft, tree = -99L)
write_fft_df(fft, tree = -99L)
fft |
One FFT definition (as a data frame in tidy format, with one row per node). |
tree |
The ID of the to-be-written FFT (as an integer).
Default: |
An FFT definition in the one line
FFT definition format used by an FFTrees
object
(as a data frame).
get_fft_df
for getting the FFT definitions of an FFTrees
object;
read_fft_df
for reading one FFT definition from tree definitions;
add_fft_df
for adding FFTs to tree definitions;
FFTrees
for creating FFTs from and applying them to data.
Other tree definition and manipulation functions:
add_fft_df()
,
add_nodes()
,
drop_nodes()
,
edit_nodes()
,
flip_exits()
,
get_fft_df()
,
read_fft_df()
,
reorder_nodes()
,
select_nodes()