YaDT: Yet another Decision Tree builder

Version 1.2.5 (October 2010)
(c) Salvatore Ruggieri, 2002-2010
http://www.di.unipi.it/~ruggieri

The C4.5 decision tree induction algorithm [4] is a constant reference among classification models in data mining and machine learning. A previous work [3] introduces EC4.5, a patch to the original C4.5 implementation that vastly improves over time efficiency (up to 5X over public datasets [6]). Based on the achievements of [3] and on some further optimizations, a new implementation has been designed and implemented in standard C++ from-scratch. This new implementation, called YaDT [1], provides the following benefits:

and still it improves over

YaDT has been recently [2] enhanced with parallelism over multi-core machines (as for now, only for Linux) by exploiting the Fastflow library. This allows for achieving up to 2.7X speedup over sequential YaDT on a typical quad-core desktop machine.

YaDT is distributed free for research and/or educational purposes. See licence .

References

[1] S. Ruggieri. YaDT: Yet another Decision Tree builder. 16th International Conference on Tools with Artificial Intelligence (ICTAI 2004): 260-265. IEEE Press, November 2004.

[2] M. Aldinucci, S. Ruggieri, M. Torquati. Porting Decision Tree Algorithms to Multicore using FastFlow. 21th European Conference on Machine Learning and 14th Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2010), Part I: 7-23. Vol. 6321 of LNCS, Springer, September 2010.

[3] S. Ruggieri. Efficient C4.5. IEEE Transactions on Knowledge and Data Engineering, 14(2):438-444, March-April 2002.

[4] J.R.Quinlan. C4.5: Programs for Machine Learning, Morgan Kaufmann 1993

[5] Data Mining Group. Predictive Model Markup Language (PMML), version 2.0, http://www.dmg.org

[6] S. Hettich and S.D. Bay. The UCI KDD Archive, Irvine, CA: University of California, Department of Information and Computer Science. http://kdd.ics.uci.edu