Gianluca
Carletti and Massimo Coppola
"Structured
Parallel Programming and Shared Objects: Experiences in Data Mining Classifiers",
to appear in the proceedings of the ParCo 2001 Int.l Conference, September 2001, Naples, Italy. (Elsevier)
We propose the addition of a shared object abstraction to a
skeleton-based parallel programming language, to deal with large shared data
in dynamically behaving data-driven computations. These are commonly found in
the field of Data Mining.
We report parallel speed-up results of a task-parallel C 4.5
classifier using a Shared Tree object. With respect to this implementation, we
analyse the positive impact of the methodology on the fundamental
characteristics of the language: expressiveness, programmability,
efficiency of applications.
Massimo
Coppola and Marco Vanneschi
"High-Performance
Data Mining with Skeleton-based Structured Parallel Programming"
To appear in Parallel Computing, Special Issue on Parallel Data Intensive Algorithms and Applications (2001)
We show how to apply a Structured Parallel Programming methodology
based on skeletons to Data Mining problems, reporting several results
about three commonly used mining techniques, namely association rules,
decision tree induction and spatial clustering.
We analyze the
structural patterns common to these applications, looking at
application performance and software engineering efficiency. Our aim
is to clearly state what features a Structured Parallel Programming
Environment (PPE) should have to be useful for parallel Data Mining.
Within
the skeleton-based PPE SkIE that we have developed, we study the
different patterns of data access of parallel implementations of
Apriori, C 4.5 and DBSCAN. We need to address large partitions reads,
frequent and sparse access to small blocks, as well as an irregular mix of
small and large transfers, to allow efficient development of
applications on huge databases.
We examine the addition of an
object/component interface to the skeleton structured model, to
simplify the development of environment-integrated, parallel Data
Mining applications.
Domenica Arlia and Massimo Coppola
"Experiments
in parallel clustering with DBSCAN",
Euro-Par 2001, Manchester,UK. LNCS 2150.
We present a new result concerning the parallelisation of DBSCAN, a Data Mining algorithm for density-based spatial clustering. The overall structure of DBSCAN has been mapped to a skeleton-structured program that performs parallel exploration of each cluster. The approach is useful to improve performance on high-dimensional data, and is general with respect to the spatial index structure used. We report preliminary results of the application running on a Beowulf with good efficiency.
Primo
Becuzzi, Massimo Coppola, Salvatore Ruggieri, and Marco Vanneschi
"Parallelisation
of C4.5 as a Particular Divide & Conquer Computation",
3rd Workshop on High Performance Data Mining (IPDPS 2000 , Cancun), LNCS 1800.
In this work we show the research track and the current results about the application of structured parallel programming tools to develop scalable data-mining applications. We discuss the exploitation of the divide and conquer nature of the well known C 4.5 classification algorithm in spite of its in-core memory requirements. The opportunity of applying external memory techniques to manage the data is advocated. Current results of the experiments are reported.
Primo
Becuzzi, Massimo Coppola, and Marco Vanneschi
"Mining
of Association Rules in Very Large Databases: a Structured Parallel Approach",
EuroPar 1999, Tolouse, LNCS 1685.
Newer and newer parallel architectures being developed raise a strong demand for high-level and programmer-friendly parallel tools. We show some results regarding mining of association rules, a well-known Data Mining algorithm, which we ported from sequential to parallel within the PQE2000/SKIE environment. The main goals achieved are the low effort spent in parallelizing the code, the machine independence of the application produced, source code portability and performance portability. Here we report test results for the same parallel program on three different architectures.
Marco
Aldinucci, Massimo Coppola, and Marco Danelutto
"Rewriting
skeleton programs: how to evaluate the data-parallel stream-parallel tradeoff"
Int. Workshop on Constructive Methods for Parallel
Programming, CMPP 1998, Goteborg.
Some skeleton based parallel programming models allow the programmer to
use both data and stream parallel skeletons within the same program.
It is known that particular skeleton nestings can be formally
rewritten into different nestings that preserve the functional
semantics. Indeed, the kind and possibly the amount of parallelism
usefully exploitable may change while rewriting takes place.
Here we discuss an original framework allowing the user (and/or the
compiling tools) of a skeleton based parallel programming language to
evaluate whether or not the transformation of a skeleton program
is worthwhile in terms of the final program performance. We address,
in particular, the evaluation of transformations exchanging data
parallel and stream parallel skeleton subtrees.
Turn back to the home page