Sat. Jan 21st, 2023

Data Storage and Recovery Systems

Nobody wants to find that a failed PC results in the loss of their data.

For this reason, having data backed up and securely stored is extremely important.

Much specialised hardware exists for this purpose, and some of this is summarised below.

Data Storage

In terms of immediate data-storage, a technique for improving the reliability of the data is to use a technology called RAID: Redundant Array of Independent Disks.

RAID allows data to be copied across more than one disk, using special hardware which monitors and performs the logic necessary for this to happen. RAID exists in different variations:

RAID typeDescription
0Also called striping. Data is split into segments, and segments are alternately stored on alternate disks. This is not true RAID, as it only serves to provide a larger storage volume but as it achieves this by saving data across multiple discs, there is no redundancy. In the event of a single drive failure, all data will be lost
1Raid 1 is also called mirroring. A mirrored disk is exactly what it sounds like. If you have two 2TB drives, you would have a total usable space of 2TB – but both drives have a complete copy of all information. In the event of a single drive failure, a complete copy is still available on the other drive.
5Raid 5 requires a minimum of three disks, and uses n drives of equal size (s) to provide (n-1) * s storage. For example, with three 2TB hard drives, a usable storage space of 4TB is available. With six 2TB drives, the usable space is 10TB.
RAID5 works by splitting the data into segments (like RAID0) across n-1 drives. The final drive contains calculated checksums for each block of data. So, if you have five drives, a document would be split into four parts, and stored on the first four drives. The RAID controller then performs an XOR operation on the four parts, and stores the result in the fifth drive. With a RAID5 system, if any single drive fails, the data lost can always be recalculated from the remaining drives. The drive can be replaced and the data rebuilt.

To an extent, RAID systems can also affect the performance of a system: reading data from multiple drives simultaneously can increase the rate at which data can be fetched, as in many cases, the limiting rate is the connection between the storage and the rest of the system.

Although software can be used to create RAID arrays, this consumes some of the host’s processing capability. You would expect to find dedicated RAID hardware (expansion cards) in a business setting.

Data Recovery Systems

…aka backups.

Whilst RAID systems are used to provide data security and improve availability of live systems, backup systems are also used for longer-term data storage.

Backups can be performed using either HDDs or magnetic tape. (Cloud storage is also an option, but as this is simply using someone else’s computer, the hardware isn’t any different).

HDDs have the advantage of higher read and write speeds than tape: that means backing up the data is faster, and retrieving it is also quicker. Plus, being a disk, it has the advantage of allowing random access: that it, you can access any part of a disk directly.

Magnetic tape has none of the above advantages. In particular, the lack of random access makes finding specific areas very difficult. However, magnetic tap has the advantage of being stable for extremely long periods in storage, and as such, is ideal for archiving and saving data which is of high value, but unlikely to be often required – things like past bank transactions are vitally important to save, but the likelihood of needing them is low.

Forensic data recovery

As a footnote: disks contain an area used to store something called the ‘Table of contents’, shortened to TOC. A disk contains many addressable regions, and as such, it is important to maintain a list of where on the disk everything can be found.

On most operating systems, when a file is deleted, all that happens is the entry in the TOC is removed. The data for that file still exists on the disk, there is just no longer an entry saying where it can be found.

Without an entry marking the area of disk as ‘in use’, it could be used next time a document is saved. Of course, it also may not.

Data recovery software works by ignoring the TOC, and instead scanning the entire disk, analysing the data at every location and trying to guess what (if anything) was once saved there. This means it is possible to retrieve deleted files, although the chance of files being retrievable diminishes with time as other documents are saved, as they may overwrite part of the document’s data.

This can be taken a stage further with true forensic data recovery. (example here)