YaDT from command line

dTcmd is a command line program that exploits (some of) the features of YaDT C++ classes in order to build decision trees. dTcmd takes a metadata table and a training table as inputs and it constructs a decision tree. There are command line options to specify the minimum number of cases to split a node and the confidence limits in pruning tree. Also, optional test table and scoring table may be specified. Tables can be in comma separated text files, gzipped text files, or in internal binary format. Built trees can be saved as PMML complaint XML documents, text files or in binary format.

Command line arguments:

> dTcmd32 <input options> <tree options> <output options>

dTcmd64 is the 64 bit compiled version of dTcmd32. It runs 10-15% faster than dTcmd32 both on Windows and Linux.

Command line options:


Input data options

Input data options for dTcmd:

The option -f <file> is a shorthand for -fm <file>.names -fd <file>.data

Tables are represented either:

Mixture of text files and gzipped text files are possible (e.g.,metadata being in a (gzipped) text file whilst training data being in a text file).


Tree construction options

The following parameters affect the tree construction algorithm:


Output options

The following options affect the outputs of dTcmd:

Zero,one of more of these options can be specified.


Text files

Text files code tables in comma-separated format. To change separator to the character c, use the option -sep <c>. For instance, -sep " " switcesh to space separated columns. Also,the special string "?" represent unknown/null values.


Gzipped text files

Gzipped text files are files with suffix .gz obtained by compressing text files with gzip.


Metadata table

Metadata tables have three columns, which in order represents:

For instance,the file golf.names

outlook,string,discrete
temperature,integer,continuous
humidity,integer,continuous
windy,string,discrete
toPlay,string,class

describes training data consisting of the following columns:


Trainig data table

Training data tables have a number of columns according to the metadata table. The order of columns must be consistent with the order of metadata table rows. Unknown values are not admitted when the column type is weights or class. Here it is the golf.data training data file:

sunny,85,85,false,1,Don't Play
sunny,80,90,true,1,Don't Play
overcast,83,78,false,1.5,Play
rain,70,96,false,0.8,Play
rain,68,80,false,2,Play
rain,65,70,true,1,Don't Play
overcast,64,65,true,2.5,Play
sunny,72,95,false,1,Don't Play
sunny,69,70,false,1,Play
rain,75,80,false,1.5,Play
sunny,75,70,true,3,Play
overcast,72,90,true,1.5,Play
overcast,81,75,false,1,Play
rain,71,80,true,1,Don't Play


Binary data table

dTcmd may save and load a binary file containing a binary representation of a metadata table and a training table (see options, -bd <file> and -db <file>). Binary input/output is faster and binary file size is much less than text file size. However, binary files are not guarranteed to be readable from future/past version of YaDT!


Binary tree

dTcmd may save and load a binary file containing a binary representation of a decision tree (see options, -bt <file> and -tb <file>). Binary files are not guarranteed to be readable from future/past version of YaDT!


XML tree

dTcmd may save to a file or to standard output a PMML complaint XML representation of the built tree (see options, -x <file> and -xstd).


Confusion matrix and text trees

dTcmd may save to a file or to standard output a text representation of the built tree and of confusion matrix over training and test data (see options, -t <file> and -tstd).


Verbose log

dTcmd may save to a file or to standard output a verbose log of computation in progress (see options, -l <file> and -lstd).


Test data table

Test data table has exactly the same format of training data table.


Score data table

Score data table has the same format of training data table with the following exceptions:

An example score file for the golf example is the following:

overcast,80,75,false,1
rain,90,75,true,2
sunny,98,82,false,3
sunny,80,75,true,4
overcast,90,75,false,5
rain,78,82,false,6


Scored data table

Scoring a score data with a tree yields a scored data table in output as a text file containing in the same order of score data table:

An example score file for the golf score data table is the following:

1,Play,1
2,Don't Play,1
3,Don't Play,0.8
4,Play,1
5,Play,0.9
6,Play,1