Back to the papers
Carletti and Massimo Coppola
"Structured Parallel Programming and Shared Objects: Experiences in Data Mining Classifiers",
to appear in the proceedings of the ParCo 2001 Int.l Conference, September 2001, Naples, Italy. (Elsevier)
We propose the addition of a shared object abstraction to a
skeleton-based parallel programming language, to deal with large shared data
in dynamically behaving data-driven computations. These are commonly found in
the field of Data Mining.
We report parallel speed-up results of a task-parallel C 4.5 classifier using a Shared Tree object. With respect to this implementation, we analyse the positive impact of the methodology on the fundamental characteristics of the language: expressiveness, programmability, efficiency of applications.
Coppola and Marco Vanneschi
"High-Performance Data Mining with Skeleton-based Structured Parallel Programming"
To appear in Parallel Computing, Special Issue on Parallel Data Intensive Algorithms and Applications (2001)
We show how to apply a Structured Parallel Programming methodology
based on skeletons to Data Mining problems, reporting several results
about three commonly used mining techniques, namely association rules,
decision tree induction and spatial clustering.
We analyze the structural patterns common to these applications, looking at application performance and software engineering efficiency. Our aim is to clearly state what features a Structured Parallel Programming Environment (PPE) should have to be useful for parallel Data Mining.
Within the skeleton-based PPE SkIE that we have developed, we study the different patterns of data access of parallel implementations of Apriori, C 4.5 and DBSCAN. We need to address large partitions reads, frequent and sparse access to small blocks, as well as an irregular mix of small and large transfers, to allow efficient development of applications on huge databases.
We examine the addition of an object/component interface to the skeleton structured model, to simplify the development of environment-integrated, parallel Data Mining applications.
Domenica Arlia and Massimo Coppola
"Experiments in parallel clustering with DBSCAN",
Euro-Par 2001, Manchester,UK. LNCS 2150.
We present a new result concerning the parallelisation of DBSCAN, a Data Mining algorithm for density-based spatial clustering. The overall structure of DBSCAN has been mapped to a skeleton-structured program that performs parallel exploration of each cluster. The approach is useful to improve performance on high-dimensional data, and is general with respect to the spatial index structure used. We report preliminary results of the application running on a Beowulf with good efficiency.
Becuzzi, Massimo Coppola, Salvatore Ruggieri, and Marco Vanneschi
"Parallelisation of C4.5 as a Particular Divide & Conquer Computation",
3rd Workshop on High Performance Data Mining (IPDPS 2000 , Cancun), LNCS 1800.
In this work we show the research track and the current results about the application of structured parallel programming tools to develop scalable data-mining applications. We discuss the exploitation of the divide and conquer nature of the well known C 4.5 classification algorithm in spite of its in-core memory requirements. The opportunity of applying external memory techniques to manage the data is advocated. Current results of the experiments are reported.
Becuzzi, Massimo Coppola, and Marco Vanneschi
"Mining of Association Rules in Very Large Databases: a Structured Parallel Approach",
EuroPar 1999, Tolouse, LNCS 1685.
Newer and newer parallel architectures being developed raise a strong demand for high-level and programmer-friendly parallel tools. We show some results regarding mining of association rules, a well-known Data Mining algorithm, which we ported from sequential to parallel within the PQE2000/SKIE environment. The main goals achieved are the low effort spent in parallelizing the code, the machine independence of the application produced, source code portability and performance portability. Here we report test results for the same parallel program on three different architectures.
Aldinucci, Massimo Coppola, and Marco Danelutto
"Rewriting skeleton programs: how to evaluate the data-parallel stream-parallel tradeoff"
Int. Workshop on Constructive Methods for Parallel Programming, CMPP 1998, Goteborg.
Some skeleton based parallel programming models allow the programmer to
use both data and stream parallel skeletons within the same program.
It is known that particular skeleton nestings can be formally rewritten into different nestings that preserve the functional semantics. Indeed, the kind and possibly the amount of parallelism usefully exploitable may change while rewriting takes place.
Here we discuss an original framework allowing the user (and/or the compiling tools) of a skeleton based parallel programming language to evaluate whether or not the transformation of a skeleton program is worthwhile in terms of the final program performance. We address, in particular, the evaluation of transformations exchanging data parallel and stream parallel skeleton subtrees.
Back to the papers
Turn back to the home page