Next: Models and Methodologies for Up: Parallel Hierarchical Architectures Previous: Parallel Hierarchical Architectures

Architectural Overview and Motivation

In recent decades, the development of parallel architectures was dominated by two types. The first type was the distributed-memory (dm) machine. Each processor has ist own memory and is connected to the other processors by some kind of network. The second type is called shared memory (sm) machine. The processors are connected by their common memory. The processors can communicate using the shared memory. In the first type of architecture, there are two possiblities for interprocessor communication. First, the processors can communicate by sending messages over the network to each other (message passing, mp). Second, there is a software layer which simulates a common address space. This case is called a distributes shared memory (dsm). In the end of the 1990s a new parallel architectures was promoted by the US-ASCI (United States Accelerated Strategic Computing Initiative) project [1]. The aim was to combine the two mentioned types above, in order to create new, more efficient, less expensive, more scalable parallel computers. The idea was to take standard sm-machines as building blocks (nodes) and to connect them with a fast network. Hence, the resulting architecture has at least two levels of hierarchy. Processors within one node can communicate very fast over their shared memory while processors of different nodes have to communicate through the slower network. The situation may even get more complex, if the network does not guarantee that the access times between all nodes is equal. The number of systems of that type increased enourmously in the last 5 years. Looking at the list of the fastest fivehundred supercomputers in the world (Top500 [35]), we can see that 37,6% (Constellation and Clusters) belong to that kind of architecture. And what is also important, there are 3 systems under the top ten. That shows that these systems can also compete concerning peak performance. At the moment, the worlds biggest project of building a supercomputer, the Earthsimulator [34], belongs to that kind of architecture. It will consist of 640 nodes where each has 8 processors. The total amout of memory will be 10 Tera bytes and the total peek performance will be about 40 Tera FLOPS.
to be continued by:

Further description of sub-archtitectures: SCOMA, ccNUMA, PC-Based-Clusters, NUMA-Clusters
Perhaps more detailed data out of the top500 statistics
Explicit Listing of all aims and advantages of the architecture

Literature:
[21], [35], [7], [6], [14], [31], [1]

Next: Models and Methodologies for Up: Parallel Hierarchical Architectures Previous: Parallel Hierarchical Architectures

Massimo Coppola 2002-02-08