Discrimination discovery

Decision trees

  • YaDT: Yet another Decision Tree builder. YaDT is a new from-scratch implementation of the entropy-based tree construction algorithm. It has been designed and implemented in C++ with strong emphasis on efficiency (time and space) and portability (Windows/Linux, 32/64 bit executable).

    Obtaining software: YaDT 1.2.5 (October 2010) with 32/64 bit libraries for VisualStudio 2010/GCC 4.1, and YaDT 1.2.3 (February 2007) with libraries for VisualStudio 2005/GCC 4.0, and YaDT 1.2.1 (January 2005) with libraries for VisualStudio 2003/GCC 3.2.

    Reference papers:

  • Efficient C4.5. Following an analytic evaluation of the run-time behavior of the C4.5 algorithm, EC4.5 is a more efficient version of the decision tree builder algorithm. It improves on C4.5 by adopting the best among three strategies for computing information gain of continuous attributes. EC4.5 computes the same decision trees as C4.5 with a performance gain of up to 5 times.

    Obtaining software: a patch from C4.5 release 8 to EC4.5 Beta 1.0 for Linux platforms is available here for educational/research domain. I suggest, however, to download the even faster YaDT tree builder.

    Reference paper: S. Ruggieri. Efficient C4.5. IEEE Transactions on Knowledge and Data Engineering. Vol 14, Issue 2, March-April 2002, 438-444.

    For other decision tree software see www.KDnuggets.com - Analytics and Data Mining Resources.

Polyhedral analysis

  • Learning from Polyhedral Sets. The software implements a learning procedure that abstracts a collection of polyhedra (solutions of linear systems) to a minimal and representative parameterized linear systems. Checking whether a given polyhedron is obtainable by some parameter instance (and computing such values) is also implemented. The software is written in SWI Prolog.

    Obtaining software: lps 1.0 (August 2013).

    Reference papers:

  • Typing linear constraints and moding CLP(R) programs. The software implements a type system for linear constraints and a well-moding checker for CLP(R) programs.

    Obtaining software: clpt 1.3 beta (November 2008).

    Reference paper: S. Ruggieri, F. Mesnard Typing linear constraints. ACM Transactions on Programming Languages and Systems. Vol 32, Issue 6, July 2010, Article 21.

Environments for Knowledge Discovery in Databases

  • KDD Markup Language - Mining Query Language. KDDML-MQL is an environment that supports the specification and execution of complex Knowledge Discovery in Databases (KDD) processes in the form of high-level queries. The environment is made of two layers, the bottom one called KDDML and the top one called MQL.

    Obtaining software: visit the KDDML/MQL web site.

    Reference paper: A. Romei, S. Ruggieri, F. Turini KDDML: a middleware language and system for knowledge discovery in databases. Data and Knowledge Engineering. Vol 57, Issue 2, May 2006, 179-220.