Next: Bibliography Up: Algorithmic and Implementations of Previous: Algorithmic and Implementations of

Introduction

Persistent storage in modern computers is handled by magnetic hard disk drives. They form the last, and therefore the slowest level in the memory hierachy. Disk drive technology is very sophisticated and provides an accelerated growth in disk capacity. Nowadays, a single of-the-self drive is capable of storing approximately 80 GB and the capacity is doubled almost every 14 - 18 month. Nevertheless, this is not sufficient to manage the ever growing data volumes. The fast growing processing power of modern computers, the global availability of information, and the increasing use of mixed-media contents demand for more flexible storage systems. The simplest solution of using many disk drives independently seems to solve the capacity problem but does not scale if the number of data requests and number of used disks are concerned, i.e. a very popular data element may be stored on one disk and many users want to access it, resulting in hot spots and very slow data delivery.

The above reasoning implies that a number of problems have to be addressed until a larger collection of disk drives satisfies the imposed needs of users. We define a storage network as a collection of disks connected by an arbitrary network. The underlying topology or architecture of the network is unrestricted. It may vary from disks attached to a SCSI bus up to very complex networks like the Internet where the disks drives are associated with server nodes. Having a closer look at this abstraction, two crucial observations become apparent. Larger demands for capacity are easily solved by attaching more disks to the network. Nevertheless, the data need to be distributed in an intelligent way to provide an even utilization of disks. Only this distribution can ensure the scalability in term of data requests when the number of disk drives is increased. Furthermore, it offers the possibility of a parallel data access provided that the data resides on different disk drives. This improves not only the acces time but also the utilization because large data requests are served by more then one disk.

But using a storage network imposes another inherent restriction - availability. Every disk has an estimated operation time during which an error is rather unlikely to occur. This entity is called mean time to/between failure (MTTF or MTBF, respectively) and for modern drives this number is in the order of 1,000,000 hours. Regarding a whole storage network, the MTBF decreases significantly. This is due to the fact, that it depends on the failure rate $\lambda_{sys}$ [18,8] which is simply the sum of the failure rates $\lambda_i$ of each participating disk . This problem can be solved by using redundancy of some kind. The easiest way is to store copies of all data like in an RAID 1 system (with ) [11] or the PAST storage server [12,7]. In case of a disk failure, the system is still operational and the faulty disk can be replaced. Because the copying of data results in a large waste of resources, more space efficient methods are sought. One of the common approaches is the use of parity information like in RAID 4/5 [11,5], Random RAID [3], Swarm [10,9], and many video servers [2,13,19,20,1]. The idea is to split a data object into subblocks and derive an additional subblock which stores the result of an XOR operation over all subblocks. Assuming the subblocks reside on different disks, one disk failure can be tolerated and all data can be rebuild by accessing the remaining subblocks. This technique can be expanded to allow multiple disk faults by deriving multiple parity information [4,15,16,17].

The second problem is the data distribution problem. It has to give an answer to the following question. Where should the data be stored so that any data request can be answered as fast as possible without forming hot spots? Furthermore, the data distribution has to ensure scalability. The more disks are in a storage network the more data requests should be possible or the faster should the data be delivered. This can only be done if the data is distributed so that many disks participate when delivering data. There are a large number of different approaches including deterministic striping, e.g. [11,6,2] and randomized placement, e.g. [1,3,14].

The above reasoning can be summerized be defining a number of properties a storage network has be concerned with.

Availability: Because of the increased sensitivity to disk faults storage networks need to implement some kind of fault tolerant mechanism to tolerate the loss of some disk drive.
Space and Access Balance: To ensure good disk utilization and provide scalability the data should be evenly distributed over all disk drives. The data requests are usually generated by a file system above the storage network. Hence, the storage network has no knowledge about the access pattern and must handle any possible request distribution.
Resource Efficiency: Redundancy implies a necessary waste of resource. Nevertheless, systems should use all its resources in an useful way.
Access Efficiency: Stored data that cannot be accessed is lost data. This property defines time and space requirements needed to access an arbitrary data element. These requirements are of crucial importance when very large networks are concerned.
Heterogeneity: Hetereogeneity describes the possibility to handle disks of different size and different speed.
Adaptability: The information volume is constantly growing and even generous planned system can quickly run out of space. This property measures the ability to adapt to a changed demand like capacity or performance.
Locality: Locality describes the degree of communication required to answer any data request. It distinguished between centralized and distributed approaches.

It seems impossible for a storage network to meet all the above requirements. But almost all of the described storage networks are developed to solve at least one of them. In the rest of this paper we will overview many very different systems by describing their solutions to the above requirements.

Next: Bibliography Up: Algorithmic and Implementations of Previous: Algorithmic and Implementations of

Kay Salzwedel 2002-01-14