In recent decades, the development of parallel architectures was
dominated by two types. The first type was the distributed-memory (dm)
machine. Each processor has ist own memory and is connected to the other
processors by some kind of network. The second type is called
shared memory (sm) machine. The processors are connected by their
common memory. The processors can communicate using the shared memory.
In the first type of architecture, there are two possiblities for
interprocessor communication. First, the processors can communicate by
sending messages over the network to each other (message passing, mp).
Second, there is a software layer which simulates a common address space.
This case is called a distributes shared memory (dsm).
In the end of the 1990s a new parallel architectures was promoted by the
US-ASCI (United States Accelerated Strategic Computing Initiative) project [1].
The aim was to combine the two mentioned types above, in order to
create new, more efficient, less expensive, more scalable parallel
computers. The idea was to take standard sm-machines as building blocks (nodes)
and to connect them with a fast network. Hence, the resulting architecture
has at least two levels of hierarchy. Processors within one node can communicate
very fast over their shared memory while processors of different nodes have to
communicate through the slower network. The situation may even get more complex,
if the network does not guarantee that the access times between all nodes is equal.
The number of systems of that type increased enourmously in the last 5 years.
Looking at the list of the fastest fivehundred supercomputers in the world (Top500 [35]),
we can see that 37,6% (Constellation and Clusters) belong to that kind of
architecture. And what is also important, there are 3 systems under the top ten.
That shows that these systems can also compete concerning peak performance.
At the moment, the worlds biggest project of building a supercomputer, the
Earthsimulator [34], belongs to that kind of architecture. It will consist of
640 nodes where each has 8 processors. The total amout of memory will be
10 Tera bytes and the total peek performance will be about 40 Tera FLOPS.
to be continued by:
Literature:
[21], [35],
[7], [6], [14], [31], [1]