Workpackage 8

Referente del Workpackage: Prof.Marco Danelutto, Dipartimento di Informatica, Università di Pisa

Description of the activities

Activity 1

Durata (mesi)

Durata (mesi/uomo)

299

Costo totale previsto

3.600

(1859.24 KEuro)

High performance, integrated programming environment

The development of GRID/network computing applications requires in general capabilities and properties beyond those needed in both sequential programming and in parallel/distributed programming, as it requires the management of computations and environments that are typically dynamic and heterogeneous in their composition (both hw and sw) and comprehend resource hierarchies with different features (e.g. memory and network). Furthermore, in this context the interaction between services, data and remote resources needs to be organized. There is a growing agreement in the scientific community that current programming languages and tools are not sufficient to support the development of GRID applications .

Therefore a WP goal is the study and the design of a high performance programming environment targeting GRID/netputing platforms and able to efficiently support multidisciplinary applications. In order to realize such environment results from previous projects concerning structured parallel programming environments for parallel/distributed architectures made out of LANs of homogeneous PEs (e.g. PQE2000 e ASI-PQE) will be exploited. Furthermore, all those aspects related to usability, dynamic and heterogeneous configuration, portability, interoperability, performance reliability, fault tolerance and security that will lead to the definition of an efficient high performance application support tool will be studied.

The programming environment will be designed and implemented exploiting component technology. This will allow a more efficient design and implementation of the environment items to be achieved. On the other side, the component based implementation will alleviate the effort required to develop applications, as the different implemented components can be reused in different combinations to achieve better results. Last, the component based implementation will allow single items functional to the whole environment to be reused in different contexts, possibly related to GRID/netputing but not exploiting the integrated programming environment as a whole. The component space of the programming environment will be structured by locating suitable "component compositional schemas" (patterns) that make possible federations of component systems to be individuated. To this purpose, the algebraic properties of such composition patterns will be studied in such a way that a formal approach to environment design could be followed.

The programming environment, or the components developed to implement it, will make usage of cost models that will be specifically developed to provide the final user features that allow applications to be migrated between different execution environments preserving their efficiency and their performance. The cost models will be also used to handle those situations related to dynamic features that are typical of GRID/netputing environments (e.g. to drive the choices between different possible alternatives in the usage of hw/sw components.

During the design of the high performance, GRID/netputing programming environment components, non classical distributed cooperation patterns will be taken into account. In particular, the peer-to-peer pattern will be considered among those patterns that can be used to model different hw/sw entities interactions. By adopting this innovative interaction pattern between parallel/distributed entities we expect to achieve scalability and performance improvements (w.r.t. frameworks uniquely based on client/server models). These improvements will be mainly due to absence of the bottlenecks typical of the classical client/server pattern.

Activity 1: Expected results

The main goal of this activity is the implementation of a high performance programming environment that provides the programmer of parallel/distributed applications for GRID/netputing platforms a set of very high performance components and a "structured" methodology that assist him in the assembly of such components within the final application. In particular, the programming environment must provide all the things needed to support the typical features of the GRID environments, mainly those related to dynamic features and heterogeneity. The component based implementation of the programming environment will allow the developed components to the used within the integrated environment in a seamless way, as well as outside it. Outside the environment the high performance components will be

used to implement (parts of) applications developed with methodologies different from those used within the integrated environment, but in the meantime able to share items of the integrated environment due to the adoption of some "standard" component definition. The whole integrated programming environment design and implementation will be focused on a "structured" component composition methodology. Standard ways of using the components provided within the environment will be provided as primitives to the programmer of (multidisciplinary) high performance applications. The programmer must be enabled to use such primitives with a minimum effort and without spending any effort in the programming of all those low level details that with current GRID tools he as to deal with to achieve the same result.

The development of cost models for GRID and/or heterogeneous WAN architectures and their usage to develop performance models suitable for the applications developed with the integrated programming environment will basically lead to two different goals:

- on the one side, the development of heuristics and/or tools that can be used to develop applications whose performance is predictable, even if with some approximation degree

- on the other side, the development of heuristics and/or tools that allow, at execution time, to manage those situations related to dynamic features that are typical of GRID/netputing environments, in particular to retarget the application behavior to take into account these dynamic features in real time.

Last but not least, the adoption of non classical cooperation models, such as the peer-to-peer one, in the design and implementation of the integrated programming environment, will allow to overcome some of the limits inherent to the client/server model usually adopted in network environments. In particular, by implementing all (or part of) the integrated environment using components cooperating in a peer-to-peer way, we expect that the performance limits currently observed in this framework (due to the centralization deriving from the implementation of certain services as server accessed from a range of remote entities) will be eliminated.

The planning of this activity is by phases:

-(from 0 to +12 months) feasibility study, identify the better research tracks

-(from +13 to +18 months) architectural design and (from +19 to +24 months) detailed design of the integrated programming environment

-(from +25 to +36 months) implementation of alfa, beta and 1.0 versions of the integrated programming environment

Unità di ricerca impegnate e relativi compiti

nº

Responsabile scientifico

Mesi/uomo

Costo (ML)

Note

Laforenza Domenico

299

3.600

(1859.24 KEuro)

Ambiente di sviluppo integrato ad alte prestazioni per Griglie Computazionali

299

3.600

(1859.24 KEuro)

Activity 2

Durata (mesi)

Durata (mesi/uomo)

120

Costo totale previsto

1.404

(725.11 KEuro)

Resource management (information service, monitoring, scheduler)

The Grid Information Service (GIS), a fundamental component of any GRID infrastructure, is the subsystem that takes care of publishing and searching the information concerning the resources available on the network. The main goal of GIS is to support optimal choices of the computational resources used to schedule a computation, but also to identify the data and the software components available on the GRID. This research activity will define and implements a scalable, fault tolerant GIS, also investigating innovative technology such as the peer-to-peer one. Furthermore, we sill study the data models that allow a simple description and update of the GIS handled information to be achieved, jointly with the possibility to perform GIS queries efficient w.r.t. the different applicative needs. An important requirement for GIS concerns dynamic information handling: to avoid impacts on scalability, we want to achieve a high level of accuracy of the dynamic information distributed by GIS without requiring the synchronizations needed to guarantee a consistent view of the global status. A second activity will take care of specifying and developing the tools and infrastructures that will allow end users and GIS services to collect information on the different GRID resources status, on faults or possible error situations. Furthermore, in case the component based applications turn out to be designed to modify their configuration and behavior as a consequence of dynamic changes in the execution environment, the monitoring service should provide those applications all the information necessary to the purpose. Within this research track, an environment will be designed and implemented that allow the application behavior to be monitored. This last feature is necessary in order to validate the application cost models, that is to provide the scheduling/reconfiguring tools with the information related to application performance prediction. Last but not least, this research track will provide the GRID managers with alarm and troubleshooting tools that can be used to identify faults and to provide motivations and hints to solve such problems.

A third research track within this activity concerns policies and tools needed to allocate, co-allocate, reserve and schedule GRID resources to fulfill applications requests. Take into account a GRID used to implement high performance, cooperative applications: in this case the main requirement is the ability to individuate a set of resources that can be used to co allocate the software components. in some cases, e.g. to satisfy QoS needs or to allow exclusive resources usage, it is necessary to implement a distributed reservation system allowing efficient scheduling to be achieved. We will study also the "high-throughput" scheduling of parameter sweeping batch applications, mainly those aspects concerning the relationships between scheduling and (possibly replicated or cached) data accesses. In particular, different policies will be evaluated, concerning data or code migration, or remote access to data by application code. in particular, within a single framework we'll study the problems related to resource reservation and scheduling in platforms different from GRID. Problems related to time-shared and space-shared sharing of resource in this framework will be taken into account, with respect to the high performance and high throughput requirements. The study will be based on the definition of some project patterns often found in typical high performance applications e on some case studies representative of those patterns.

Activity 2: Expected results

From the architectural GIS design viewpoint different strategies will be evaluated to achieve scalability; for instance, structures based on hierarchies will be compared with alternative structures based on cells exploiting peer-to-peer cooperation strategies, in the perspective of coming to a joint approach. Different data models will be evaluated to describe resources (either machines, software components, data, load info, resource reservations, etc) that will allow complex query on the available resources and on their features to be computed.

Specific application environment may also require the development of ad hoc models. We'll implement efficient search and delivery services of information that guarantee an accurate access to information even in case it is highly dynamic, as the information concerning the system status.

Components involved in the monitoring activity will be defined, in particular which are the producers and the consumers of performance events. In this phase we should evaluate how much of the information produced should be actually stored within the GIS, taking also into account that monitoring information looks like event streams from which higher level knowledge must be extracted. Libraries use to instrument the code will also be designed and implemented as well as tools supporting the decisions needed to face the alarm situations on the GRID and the relative troubleshooting activity. Last but not least, as far as resource reservation and scheduling is concerned. we'll come to a classification of the component based applications accordingly to specific project patterns. The goal is to individuate performance models for each one of these patterns in such a way that more detailed information concerning performance prediction can be provided to the scheduling/reservation tools. Take into account that, for specific patterns the scheduling tools will be enabled to perform application reconfiguration choices during the loading phase. Such choices concern, for instance, component to resource mapping as well as the relative parallelism degree, that is choices concerning the application performance tuning and based on the knowledge relative to the resources actually present on the platform. Different scheduling and reservation policies will be implemented aimed at minimizing the execution latencies or to maximize the throughput of different case studies previously defined. We'll also come to the evaluation of different scheduling policies, both experimentally and through simulation tools, and to the implementation of resource brokers prototypes able both to individuate the resource amount matching the user needs and performance needs and to configure in optimal (or sub-optimal) way the application component allocation on the resources (performance tuning).

The planning of this activity is by phases:

- (from 0 to +12 months) feasibility study, identify the better research tracks

- (from +13 to +18 months) architectural design and (from +19 to +24 months) detailed design of the GIS components to be implemented (this activity will be performed taking into account the choices made in the meanwhile in the framework of the integrated programming environment design)

- (from +25 to +36 months) implementation of alfa, beta and 1.0 versions of the GIS components and integration of such components in the integrated programming environment.

Unità di ricerca impegnate e relativi compiti

nº

Responsabile scientifico

Mesi/uomo

Costo (ML)

Note

Laforenza Domenico

120

1.404

(725.11 KEuro)

Resource Management

120

1.404

(725.11 KEuro)

Activity 3

Durata (mesi)

Durata (mesi/uomo)

165

Costo totale previsto

1.696

(875.91 KEuro)

Libraries

Software development strategies for scientific applications, aiming at achieving goals such as sw reuse, portability, extendability, modularity, fault tolerance and efficiency, is based on the usage of libraries as "building blocks". Currently, taking into account the evolution of application needs and of hardware and software architectures, we need to redesign the strategies used to develop such libraries to preserve the features mentioned above in wide frameworks of interdisciplinary and inter-operable software, development tools, languages, machines, etc. What it is actually needed is to be able to implement efficient and reliable numerical software modules within environments that make simple and reliable the composition of such modules with other modules or other software items to be able to implement more and more complex applicative software.

The component based approach (which can be considered an evolution of

object oriented model) represents an answer to this requirement. Within this activity of the workpackage we'll study the implementation of a extensible set of components (toolkit) to support scientific applications that allows sophisticated numeric algorithms and software to be used in a transparent and efficient way on a wide range of high performance GRID/netputing platforms. The basic idea it to associate to the same interface one or more machine dependent implementations of a unique software or of a set of different software, as well as to design suitable interfaces (API) the allow mathematical abstractions to be used in order to individuate solver classes rather than particular algorithms. The toolkit items will be developed according to a methodology such that parallel software modules developed within shared or distributed memory environments (implemented using standard and reliable tools such as communication libraries, MPI, or languages with extensions, e.g. C, FORTRAN + OpenMP) will be fully integrated as components of the overall environment, guaranteeing in the meanwhile the maximum degree of reusability and interoperability.

The toolkit supporting scientific applications will host components to solve classical problems that actually happen to be the core of a large number of technical/scientific applications, such as basic linear algebra operations, integral computations in one or more dimensions, FFT, but also components to be used in advanced high performance data mining applications.

The toolkit for scientific applications will also provide functionalities for 3D spatial data handling, analysis and visualization. The availability of sophisticated 3D models is a requirements of many scientific and simulation applications. Such models should satisfy accuracy and reliability requirements, and should be efficient for real time and Internet based applications. Despite the technological improvement in rendering and in data transmission, the operations on the geometric model of a 3D complex representation remain the bottleneck, which is necessary to overcome to achieve the required performance figures. At this aim it is important to develop optimal representation in the number of bit, and efficient algorithm for simplification, compression and decompression operations.

A further aid will be provided by the possibility of using morphological analysis to select meaningful information from huge amount of data. Performant and portable parallel algorithms will be embedded in suitable components to provide functionalities for 3D data analysis, simplification and compression.

Activity 3: Expected results

The result of this workpackage activity exactly consist in the implementation of a high performance scientific library core for GRID and/or heterogeneous WANs and to provide such libraries to (multidisciplinary) applications programmers as a perfectly integrated item of the programming environment which is the result of the first activity of this workpackage. We expect that the adoption of component technology for both the libraries and the overall environment lead to a simplification of the library integration process. Furthermore, by choosing a some kind of component standard to implement programming environment components and, as a consequence, of library items, we will decouple the activities related to library development from those related to programming environment development, at least in the initial phase of the project.

The implementation of component based numerical toolkits for high performance scientific applications is the goal of different research projects related to the "Common Component Architecture Forum, (CCAF)", e.g. the ALICE project at Argonne, LSA at the Indiana University and Babel. We therefore expect that the results already achieved within such project can be completely reused within our framework. We also expect to achieve significant contributions to the CCAF goals.

More in detail, the expected scientific results are:

- identification and analysis of different integration methodologies

- identification and analysis of mechanisms/tools to define interfaces of scientific libraries

- experiments related to and analysis of services allowing an efficient and transparent usage of the libraries within a dynamic, heterogeneous GRID programming environment to be achieved

- a contribute to the definition of standards for scientific software integration.

The planning of this activity is by phases:

-(from 0 to +12 months) feasibility study, identify the better research tracks, in particular those concerning the definition of modules to include between those provided to the programmer and those concerning the component model to adopt (this activity is performed strictly in contact with the activities related to the programming environment design and implementation)

-(from +13 to +18 months) architectural design and (from +19 to +24 months) detailed design of the library toolkit

-(from +25 to +36 months) implementation of alfa, beta and 1.0 versions of the toolkit

Unità di ricerca impegnate e relativi compiti

nº

Responsabile scientifico

Mesi/uomo

Costo (ML)

Note

Murli Almerico

165

1.696

(875.91 KEuro)

Librerie scientifiche per GRID.

165

1.696

(875.91 KEuro)

Activity 4

Durata (mesi)

Durata (mesi/uomo)

Costo totale previsto

732

(378.05 KEuro)

Problem solving environments.

The high performance network computing programming tools can be efficiently used to build problem solving environments (PSE) to be used within different application domains. The problem solving environments aim at providing all the functionalities needed to solve problems in a particular applicative domain or in more than a single domain. A PSE can be defined for a specific problem but it can encapsulate an infrastructure which is problem independent. The PSE environments aim at including user code as a module in a larger and composite applications. In particular, in high performance PSEs these elements will be usually executed on remote machines. PSE systems allow the programmer implementing a composite application to manage application components through a work flow model executed on the host as a part of the primary user interface. As these systems are designed for a distributed executing environment (e.g. a GRID), they include one or more subsystems that can be used to transfer data and control between the different processes of the application. Within this activity we plan to study the requirements and to define the properties of a general problem solving environment suitable to be used in a dynamic, high performance network computing framework based on component technology. We also plan to provide a characterization of the functional extension normally available as a part of PSE for network and GRID computing. Currently, the words PSE and workbench are used to describe a large variety of tools supporting the development and the execution of applications. When such words are used to communicate between system developers and users (or application developers) an high ambiguity degree comes in in the definition of the PSE at hand. A framework with well defined terms and categories will be extremely helpful to all the subject involved in the development of PSE targeting GRID and network computing. For this reason, within this activity we plan to investigate and design problem solving environments taking into account the following, main aspects: applicative domains, composition capabilities, user code integration, data management, interaction and configuration management capabilities. In the design of problem solving environments for network computing we'll take into account the strong relationships between the PSE features and the programming tools developed within this package, that will be possibly used to implement, compose and integrate the PSE components.

Activity 4: Expected results

The main goal of this activity in the workpackage concerning PSEs consists in the development of a model allowing the implementation of high performance PSEs on top of GRID and/or heterogeneous WANs. The model will be used as a general framework to develop specific distributed and GRID-aware PSEs. In particular, the expected result consists in a general architecture of PSE defined on top of the integrated programming environment developed within this package. The development of PSEs on top of the integrated environment will allow a modular implementation of the PSE to be achieved. Furthermore, the adoption of a component based model derived from the one used in the integrated programming environment will provide the user with new and more powerful mechanism to implement the PSE features, which was not originally planned, by assembling components that the PSE provides anyway as "building blocks".

The planning of this activity is by phases:

-(from 0 to +12 months) feasibility study concerning the implementation of GRID/netputing PSEs using component technologies such as those used in the implementation of the integrated programming environment which is the result of the first activity of this workpackage

-(from +13 to +18 months) identification of a specific PSE and (from +19 to +24 months) design of this PSE on top of the integrated programming environment

-(from +25 to +36 months) implementation prototype of the candidate PSE basically demonstrating the feasibility of the approach.

Unità di ricerca impegnate e relativi compiti

nº

Responsabile scientifico

Mesi/uomo

Costo (ML)

Note

Murli Almerico

732

(378.05 KEuro)

Problem Solving Environments per GRID

732

(378.05 KEuro)