Serial ATA RAID systems - an overview of performance

Capturing or generating signals using computer based instrumentation cards at rates of mega or giga samples per second inevitably hits a bottleneck when continuous streaming of the data to or from the host computer storage drive is required. This is one reason why very fast memory is often installed onto the cards themselves. However where many gigabytes or terabytes of data are concerned this may provide insufficient storage space.

Is there a way forward? Drives that can use SATA (Serial Advanced Technology Attachment) provide ultra high data transmission speeds. At the time of writing these notes it's commonly 6 giga bits/sec (750 mega bytes/sec as 8 bits = one byte) and since 2013 up to 16 giga bits/sec (2 GBytes /sec) is possible. It must be kept in mind that these are the theoretical speeds and other factors must be considered too. The SATA specification encompasses a whole information transfer protocol that takes up some of the available bandwidth. Specifically 8b/10b encoding. The read/write performance of the storage system also depends on hard drive cache size, spindle speed and access times. One important change in recent years has been the introduction of solid state drives SSD. With prices falling these should be considered as the drive of choice for ultra fast read/write operations. Currently SSD give more than twice the performance of traditional hard disk drives with read/write rates of around 500 MBytes per second.

Taking everything above into account, for a single drive data storage system, it is the storage drive that remains the main limitation to read/write data streaming speed. This limitation can be overcome by using an array of multiple drives.

Redundant Arrays of Inexpensive Disks; (RAID). The word "redundant" might be a little misleading here, in fact RAID usefully combines multiple small, inexpensive disk drives into an array of disk drives that yields performance and data security benefits which can exceed that of a single large (more expensive) drive. This array of drives appears to the computer as a single logical storage unit (drive). The key to increased performance under RAID is parallelism, where simultaneous access to multiple disks allows data to be written to or read from a RAID array faster than would be possible with a single drive.

RAID is most commonly available in configurations RAID 0, 1, 2, 3, 4, 5 or 10, but how to choose? Here we will look closer at systems 0 and 1, both of which will work with just two drives and represent the entry level system most applicable for a PC based or PXI chassis instrumentation system.

RAID Level 0. At this level, data is split across drives by a process called "striping", resulting in higher data throughput. Since no redundant information is stored (that is a second copy of the data), performance is maximised and can be expected to approach double that of a single drive, but the failure of any disk in the array results in data loss. A RAID 0 can be created with disks of differing sizes, but the storage space added to the array by each disk is limited to twice the size of the smallest disk, so for example if a 1 TByte disk is striped together with a 2 TBytes disk, the total size of the array available for RAID 0 data will be 2 TBytes.

RAID Level 1. This provides redundancy by writing all data to two (or more) drives in a "mirroring" process. As the data is identical on each drive, having a redundant drive has the advantage of always having a copy of the data safe, but it must be noted that there is no gain in storage size in this arrangement, for example using two 1 TByte drives will yield 1 TByte! The performance of a Level 1 array tends to be slightly faster on reads and slower on writes compared to a single drive. It is much slower than a Level 0 array. However with RAID Level 1, if either drive fails, a copy of the data remains.

The chart below shows performance comparisons for single drives and the performance increases for a pair of SATA hard drives running with a RAID controller. Its values should not be taken as absolute as there are many factors to cause variation including the data block size and general hard drive performance, but it does give some relative idea of the performance advantages that could be gained.

Single SATA drive compared with twin SATA hard drives used in a RAID 0 and 1 arrangement

So the choice really comes down to which is the most important to your application performance or data security. Most users of computer based signal capture or waveform generator cards will go for RAID 0 as the fastest and therefore the most viable way to stream a large amount of data at the highest speed, selecting good quality hard drives to minimise the possibility of data loss through drive failure. There is a way of combining RAID 1 and 0 to get the best of both worlds, but at least four drives are required. A common way to undertake connection of two drives into a RAID system is by use of a controller built into the motherboard on "high-end" PC's, most commonly servers. If the computer chassis provides room for two or more drives and the motherboard does not have the RAID facility or SATA connections are limited in number or speed, then a RAID controller card may be used. This controller card usually gives better performance too and is the recommended method when streaming to / from our Spectrum digitisers and waveform generators. A range of RAID controllers are available, usually with 2, 4, 6 or 8 internal SATA drive connectors, most falling in the price range £50 to £800. A few RAID controller cards can have more connectors. Areca™ is a company producing such RAID controller cards. Below is an example of a computer system set up to stream radar signal data into six 960 Gbyte SSD drives at ultra high-speed.

Raid 0 streaming system, with cover removed showing M4i signal capture and controller card
Computer: 4U 19" rack mount chassis with Supermicro motherboard
 
        1x Intel™ Xeon E5-1620v2 3.7GHz 4 Core processor
        16GB of DDR3 RAM
        Expansion slots: Three x4-lane PCIe, four x8-lane PCIe
        8 bay 2.5-inch SAS3/SATA drive cage fitted into PC chassis. Drives externally accessible
 
Operating system: MS Windows 7 Pro 64 bit installed onto Enterprise 7200 RPM SATA 1TB HDD
 
RAID controller card: Areca™ 8 port 1883I with PCIe x-8 lane interface
 
RAID configuration: Six 960 Gbyte Crucial™ SSD configured to RAID 0. 6 Gbit/s SATA links to the controller card
 
Signal capture: Two synchronised M4i.4411-x8 4 channel 130MS/s 16-bit digitiser (A/D) cards from Spectrum Instrumentation GmbH with a PCIe x8-lane interface

Result: Real-time data throughput from the M4i.4411-x8 cards via its x8-lane PCI-Express interface to the six solid-state drives was found in total to be just over 3.3 Gbytes per second. The cards M4i.4411-x8 deep 4Gbyte on-board memory acted as a transfer buffer, so that any brief variations to data transfer speed were of no consequence.

This system was thoroughly tested before release to the customer by DataQuest Solutions. The M4i signal card that we provided came with Spectrum's test software SBench6™, plus examples written in code (including C/C++, Visual Basic and Delphi). These illustrate the functionality of the system and how it can be further programmed to a specific application. This ensures that the end user can quickly test and be satisfied that the delivered system is working within the required technical specification.

As a quick comparison, PXI / PXIe systems have the RAID controller and storage drives combined into a 3U high module of three slots width with two 2.5" SSD drives. As you might expect this is a very compact arrangement, but this does limit the number of drives hence storage space. The other option is a PXI controller card with a cable linked to an external box with multiple HDD or SSD.



DataQuest Solutions Ltd. | Phone: 01526 557171 | Email: info@dqsolutions.co.uk