The above reasoning implies that a number of problems have to be addressed until a larger collection of disk drives satisfies the imposed needs of users. We define a storage network as a collection of disks connected by an arbitrary network. The underlying topology or architecture of the network is unrestricted. It may vary from disks attached to a SCSI bus up to very complex networks like the Internet where the disks drives are associated with server nodes. Having a closer look at this abstraction, two crucial observations become apparent. Larger demands for capacity are easily solved by attaching more disks to the network. Nevertheless, the data need to be distributed in an intelligent way to provide an even utilization of disks. Only this distribution can ensure the scalability in term of data requests when the number of disk drives is increased. Furthermore, it offers the possibility of a parallel data access provided that the data resides on different disk drives. This improves not only the acces time but also the utilization because large data requests are served by more then one disk.
But using a storage network imposes another inherent restriction - availability. Every disk has an estimated operation time during which an error is rather unlikely to occur. This entity is called mean time to/between failure (MTTF or MTBF, respectively) and for modern drives this number is in the order of 1,000,000 hours. Regarding a whole storage network, the MTBF decreases significantly. This is due to the fact, that it depends on the failure rate [18,8] which is simply the sum of the failure rates of each participating disk . This problem can be solved by using redundancy of some kind. The easiest way is to store copies of all data like in an RAID 1 system (with ) [11] or the PAST storage server [12,7]. In case of a disk failure, the system is still operational and the faulty disk can be replaced. Because the copying of data results in a large waste of resources, more space efficient methods are sought. One of the common approaches is the use of parity information like in RAID 4/5 [11,5], Random RAID [3], Swarm [10,9], and many video servers [2,13,19,20,1]. The idea is to split a data object into subblocks and derive an additional subblock which stores the result of an XOR operation over all subblocks. Assuming the subblocks reside on different disks, one disk failure can be tolerated and all data can be rebuild by accessing the remaining subblocks. This technique can be expanded to allow multiple disk faults by deriving multiple parity information [4,15,16,17].
The second problem is the data distribution problem. It has to give an answer to the following question. Where should the data be stored so that any data request can be answered as fast as possible without forming hot spots? Furthermore, the data distribution has to ensure scalability. The more disks are in a storage network the more data requests should be possible or the faster should the data be delivered. This can only be done if the data is distributed so that many disks participate when delivering data. There are a large number of different approaches including deterministic striping, e.g. [11,6,2] and randomized placement, e.g. [1,3,14].
The above reasoning can be summerized be defining a number of properties a storage network has be concerned with.
It seems impossible for a storage network to meet all the above requirements. But almost all of the described storage networks are developed to solve at least one of them. In the rest of this paper we will overview many very different systems by describing their solutions to the above requirements.