Skip Headers
Oracle® Data Mining Application Developer's Guide,
10g Release 2 (10.2)

Part Number B14340-01
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Master Index
Master Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
View PDF

6 Java API Overview

This chapter introduces the new Oracle Data Mining Java API. You can use the Java API to create thin client applications that access the rich data mining functionality within the Oracle Database.

The ODM Java API is an Oracle implementation of the Java Data Mining (JDM) 1.0 standard API for data mining. The ODM Java API implements Oracle-specific extensions to JDM 1.0, in compliance with the JSR-73 standards extension framework. The full range of data mining functions and algorithms available in the Database, including the new predictive analytics features in the DBMS_PREDICTIVE_ANALYTICS PL/SQL package, are exposed through the ODM Java API.

The ODM Java API replaces the proprietary Java API for data mining that was available with Oracle 10.1. It is fully compatible with the Oracle 10g Release 2 (10.2)PL/SQL API for data mining.

This chapter includes the following topics:

6.1 The JDM 1.0 Standard

JDM 1.0 is an industry standard Java API for data mining, developed under the Java Community Process (JCP). It defines Java interfaces that vendors can implement for their Data Mining Engines.

JDM interfaces support mining functions including classification, regression, clustering, attribute importance, and association; and specific mining algorithms including naïve bayes, support vector machines, decision trees, and k-means.

For a complete description of the JDM 1.0 standard, visit the JSR-000073 Data Mining API page of the Java Community Process Web Site.

http://jcp.org/aboutJava/communityprocess/final/jsr073

You can download the JDM 1.0 javadoc from the Oracle Data Mining page of the Oracle Technology Network.

http://www.oracle.com/technology/products/bi/odm/index.html

The Java packages defined by the JDM standard are summarized in Table 6-1.

Table 6-1 JDM 1.0 Standard High-Level Packages

Package Description

javax.datamining

Defines the classes and interfaces used in JDM subpackages.

javax.datamining.base

Defines the interfaces for top-level objects and interfaces. This package was introduced to avoid cyclic package dependencies.

javax.datamining.resource

Defines objects that support connecting to the Data Mining Server and executing tasks.

javax.datamining.data

Defines objects that support logical and physical data, model signature, taxonomy, category set, and the generic super class category matrix.

javax.datamining.statistics

Defines objects that support attribute statistics.

javax.datamining.rule

Defines objects that support rules and their predicate components.

javax.datamining.task

Defines objects that support tasks for building, computing statistics, importing, and exporting models. The task package has an optional apply subpackage, which is mainly used for supervised and clustering functions.

javax.datamining.association

Defines objects that support the build settings and model for association rules.

javax.datamining.clustering

Defines objects that support the build settings, models and apply output for clustering.

javax.datamining.attributeimportance

Defines objects that support the build settings and model for attribute importance.

javax.datamining.supervised

Defines objects that support the build settings and model for supervised learning functions. This package includes optional subpackages for classification and regression and a test task that is common to both.

javax.datamining.algorithm

Defines objects that support algorithm-specific settings. This package has optional subpackages for different algorithms.

javax.datamining.modeldetail

Defines objects that support the details of various model representations. This package includes optional subpackages for different types of models.


6.2 Oracle Extensions to JDM 1.0

The ODM Java API adds functionality that is not part of the JDM standards. The Oracle extensions to the JDM API provide the following major additional features:

See Also:

Oracle Data Mining Java API Reference (javadoc) for detailed information about the ODM Java API.

The Java packages defined by the Oracle extensions to the JDM standards are summarized in Table 6-2.

Table 6-2 Oracle High-Level Packages that Extend the JDM 1.0 Standards

Package Description

oracle.dmt.jdm.featureextraction

Defines objects related to feature extraction, which supports the scoring operation.

oracle.dmt.jdm.algorithm.nmf

Defines objects related to the Non-Negative Matrix Factorization (NMF) algorithm.

oracle.dmt.jdm.algorithm.ocluster

Defines objects related to the Orthogonal Partitioning Clustering algorithm (O-cluster)

oracle.dmt.jdm.algorithm.abn

Defines objects related to the Adaptive Bayes Network (ABN) classification algorithm.

oracle.dmt.jdm.transform

Defines objects related to data transformations.


6.3 Principal Objects in the ODM Java API

In the JDM standard API, named objects are objects that can be saved using the saveObject method of a Connection instance. All named objects are inherited from the javax.datamining.MiningObject interface.

The JDM standard supports both permanent and temporary named objects. Permanent objects (persistentObject) are saved permanently in the database. Temporary objects (transientObject) exist only for the duration of the session.

The persistent and transient named objects supported by the Oracle extensions to the JDM API are listed in Table 6-3.

Table 6-3 Named Objects in ODM Java API

Persistent Objects Transient Objects

Model

ApplySettings

BuildSettings

PhysicalDataset

Task


CostMatrix


TestMetrics



Note:

The LogicalData and Taxonomy objects in the standard JDM API are not supported by Oracle.

The named objects in the ODM Java API are described in the following sections.

6.3.1 PhysicalDataSet Object

A PhysicalDataSet object refers to the data to be used as input to a data mining operation. In JDM, PhysicalDatSet objects reference specific data through a Uniform Resource Identifier (URI), which could specify a table, a file, or some other data source.

In the ODM Java API, a PhysicalDataSet must reference a table or a view within the database instance referenced in the Connection. The syntax of a physical data set URI in the ODM Java AI is the Oracle syntax for specifying a table or a view.

[SchemaName.]TableName

or

[SchemaName.]ViewName

In JDM, PhysicalDataSet objects can support multiple data representations. Oracle Data Mining supports two types of data representation: single-record case, and wide data. The Oracle implementation requires users to specify the case-id column in the physical dataset. Refer to Oracle Data Mining Concepts for more details.

In the ODM Java API, a PhysicalDataSet object is transient. It is stored in the Connection as an in-memory object.

6.3.2 BuildSettings Object

A BuildSettings object captures the high-level specifications used to build a model. The ODM Java API specifies a variety of mining functions: classification, regression, attribute importance, association, clustering, and feature extraction.

A BuildSettings object can specify a type of desired result without identifying a particular algorithm. If an algorithm is not specified in the BuildSettings object, the DMS selects an algorithm based on the build settings and the characteristics of the data.

BuildSettings has a verify method, which validates the input specifications for a model. Input must satisfy the requirements of the ODM Java API.

In the ODM Java API, a BuildSettings object is persistent. It is stored as a table with a user-specified name in the user schema. This settings table is interoperable with the PL/SQL API for data mining. Normally, you should not modify the build settings table manually.

6.3.3 Task Object

A Task object represents all the information needed to perform a mining operation. The execute method of the Connection object is used to start the execution of a mining task.

Mining operations, which often process input tables with millions of records, can be time consuming. For this reason, the JDM API supports the asynchronous execution of mining tasks.

Mining tasks are stored as DBMS_SCHEDULER job objects in the user schema. The saved job object is in a DISABLED state until the execute method causes it to start execution.

The execute method returns a javax.datamining.ExecutionHandle object, which provides methods for monitoring an asynchronous task. ExecutionHandle methods include waitForCompletion and getStatus.

See Also:

6.3.4 Model Object

A Model object results from the application of an algorithm to data, as specified in a BuildSettings object.

Models can be used in several operations. They can be:

  • inspected, for example to examine the rules produced from a decision tree or association

  • tested for accuracy

  • applied to data for scoring

  • exported to an external representation such as native format or PMML

  • imported for use in the DMS

When a model is applied to data, it is submitted to the DMS for interpretation. A Model references its BuildSettings object as well as the Task that created it.

6.3.5 TestMetrics Object

A TestMetrics object results from the testing of a supervised model with test data. Different test metrics are computed, depending on the type of mining function. For classification models, the accuracy, confusion-matrix, lift, and receiver-operating characteristics can be computed to access the model. Similarly for regression models, R-squared and RMS errors can be computed.

See Also:

"Testing a Model".

6.3.6 ApplySettings Object

An ApplySettings object allows users to tailor the results of an apply task. It contains a set of ordered items. Output can consist of:

  • Data to be passed through to the output from the input dataset, for example key attributes

  • Values computed from the apply itself, for example score, probability, and in the case of decision trees, rule identifiers

  • Multi-class categories for its associated probabilities. For example, in a classification model with target favoriteColor, users could select the specific colors to receive the probability that a given color is favorite

Each mining function class defines a method to construct a default ApplySettings object. This simplifies the programmer's effort if only standard output is desired. For example, typical output for a classification apply would include the top prediction and its probability.