White Paper 01/2014 RAID systems within Industry Functioning, variants and fields of application of RAID systems https://support.industry.siemens.com/cs/ww/en/view/109737064
Warranty and liability Warranty and liability Note The Application Examples are not binding and do not claim to be complete regarding the circuits shown, equipping and any eventuality. The Application Examples do not represent customer-specific solutions. They are only intended to provide support for typical applications. You are responsible for ensuring that the described products are used correctly. These Application Examples do not relieve you of the responsibility to use safe practices in application, installation, operation and maintenance. When using these Application Examples, you recognize that we cannot be made liable for any damage/claims beyond the liability clause described. We reserve the right to make changes to these Application Examples at any time without prior notice. If there are any deviations between the recommendations provided in these Application Examples and other Siemens publications e.g. Catalogs the contents of the other documents have priority. Security information We do not accept any liability for the information contained in this document. Any claims against us based on whatever legal reason resulting from the use of the examples, information, programs, engineering and performance data etc., described in this Application Example shall be excluded. Such an exclusion shall not apply in the case of mandatory liability, e.g. under the German Product Liability Act ( Produkthaftungsgesetz ), in case of intent, gross negligence, or injury of life, body or health, guarantee for the quality of a product, fraudulent concealment of a deficiency or breach of a condition which goes to the root of the contract ( wesentliche Vertragspflichten ). The damages for a breach of a substantial contractual obligation are, however, limited to the foreseeable damage, typical for the type of contract, except in the event of intent or gross negligence or injury to life, body or health. The above provisions do not imply a change of the burden of proof to your detriment. Any form of duplication or distribution of these Application Examples or excerpts hereof is prohibited without the expressed consent of the Siemens AG. Siemens provides products and solutions with industrial security functions that support the secure operation of plants, systems, machines and networks. In order to protect plants, systems, machines and networks against cyber threats, it is necessary to implement and continuously maintain a holistic, state-of-the-art industrial security concept. Siemens products and solutions only form one element of such a concept. Customer is responsible to prevent unauthorized access to its plants, systems, machines and networks. Systems, machines and components should only be connected to the enterprise network or the internet if and to the extent necessary and with appropriate security measures (e.g. use of firewalls and network segmentation) in place. Additionally, Siemens guidance on appropriate security measures should be taken into account. For more information about industrial security, please visit http://www.siemens.com/industrialsecurity. Siemens products and solutions undergo continuous development to make them more secure. Siemens strongly recommends to apply product updates as soon as available and to always use the latest product versions. Use of product versions that are no longer supported, and failure to apply latest updates may increase customer s exposure to cyber threats. To stay informed about product updates, subscribe to the Siemens Industrial Security RSS Feed under http://www.siemens.com/industrialsecurity. Entry-ID: 109737064, Version, 01/2014 2
Table of contents Table of contents Warranty and liability... 2 1 Motivation... 4 2 Targets... 4 3 Implementation... 5 3.1 Software RAID vs. Hardware RAID... 5 3.2 RAID levels... 5 3.2.1 RAID 0... 6 3.2.2 RAID 1... 6 3.2.3 RAID 5... 7 4 Maximizing the System Availability... 8 4.1 Recognizing impending failures in time... 8 4.2 Minimizing the system down-time via hot swap... 8 4.3 Immediate recovery of a RAID group via hot spare... 8 5 Conclusion... 9 Entry-ID: 109737064, Version, 01/2014 3
1 Motivation 1 Motivation Information processing systems are subjected to stress in the form of dust, shock/vibration, electromagnetic interference and extreme temperatures when employed in rough environments. Figure 1 Special industrial PCs have been developed which can withstand these stressful conditions and are protected especially against these influences. Apart from the protection from external influences, the data of information processing systems requires particular attention. For error-free and smooth operation these are equally important as the hardware components. If a data carrier fails, the operation is interrupted until the data carrier has been exchanged and the backed up data restored. If only a very old backup or no backup of the system exists, extensive and timeconsuming setup measures become necessary. However, the most recent data, which might be used in running production processes, will be lost despite of the backup. How can the availability of a data carrier, hence the entire information processing system be increased? 2 Targets One aim is to increase the availability of the data by means of suitable measures. Closely related to this is the demanded protection of the saved data from data loss during the failure of a data carrier. The system output must not be reduced by the measures. Entry-ID: 109737064, Version, 01/2014 4
3 Implementation 3 Implementation Explanation of Terms The mentioned objectives can be reached by using RAID systems for data storage. Table 1 RAID Abbreviation Meaning Redundant Array of Independent Disks This refers to the combination of several physical data carriers into one logical data carrier. There are different implementations which are discussed below. 3.1 Software RAID vs. Hardware RAID Several physical data carriers can be combined into one logic data carrier using software as well as special hardware. This is implemented by means of a RAID controller. The advantages and disadvantages of both options are listed in the table below. Table 2 Pro Software RAID In the case of a HW defect, the saved data can also be read from a different system Hardware RAID Relieving the CPU and the system bus Exchange possible during runtime Contra CPU load and system bus Cache usage problematic during system failure or power cut Additional hardware/costs for the controller Defect at the controller requires identical model 3.2 RAID levels RAID levels describe the way in which several data carriers are combined and used. The various RAID levels are mainly distinguished by the following characteristics: Available total capacity for data storage Data security during failure of a data carrier Data throughput during reading Data throughput during writing Number of required hard discs RAID levels 0, 1 and 5 have been established. These are discussed in greater detail below. Entry-ID: 109737064, Version, 01/2014 5
3 Implementation 3.2.1 RAID 0 In RAID level 0, the data to be saved are divided between the individual data carriers. This is done without redundancy. Parallel access to the individual data carriers accelerates the data throughput during reading as well as during writing. However, this type of interconnection bears the risk of data loss. If a physical data carrier works with errors, the logic connection of the data carrier is also defective. Figure 2 A C E Level 0 Striping B D F Read Speed Write Table 3 Data security 3.2.2 RAID 1 In RAID level 1, the data to be saved is mirrored and stored on the data carriers. This redundant type of data storage preserves the data even during the failure of one data carrier. Figure 3 Level 1 Mirroring Read Speed Write Data security Table 4 A B C D E F A B C D E F Entry-ID: 109737064, Version, 01/2014 6
3 Implementation 3.2.3 RAID 5 In RAID level 5, the data to be saved is stored on the data carriers. Additionally, a parity data block is generated for the saved data which in return is stored on a separate physical data carrier. Using this parity data block, the original information can be restored when the data carrier is faulty. Considered simply, the following applies for the first line in the figure: A + B = P AB This enables reconstructing the original data in the event of an error, for example: B = P AB A A = P AB A The distributed storage of the parity data blocks on all data carriers prevents an excessive load on individual data carriers. In addition the data throughput during reading can be accelerated. RAID level 5 requires at least 3 hard drives. Figure 4 Level 5 Striping/Parity Table 5 A PCD E B C PEF PAB D F Read Speed Write Data security Entry-ID: 109737064, Version, 01/2014 7
4 Maximizing the System Availability 4 Maximizing the System Availability 4.1 Recognizing impending failures in time Apart from the redundant storage of data in the RAID array, early detection of errors is an important aspect for increasing the availability of the system. Detecting errors especially includes the monitoring of critical system components: speed of casing fans, casing temperature, S.M.A.R.T. 1 state of the data carrier. If the failure of a component becomes apparent due to abnormal values, this should be indicated by an automatic message via e-mail or SMS text message. The required spare parts can then be procured on time and, if necessary, be exchanged without any time delays. Diagnosis should be possible locally as well as remote. 4.2 Minimizing the system down-time via hot swap Another way of reducing the down-times of an information processing system is its ability to exchange defective components especially data carriers during runtime. This ability is referred to as hot swap. In connection with redundant saving of data in a RAID system, this enables uninterruptible operation while exchanging the defective data carrier. 4.3 Immediate recovery of a RAID group via hot spare During the failure of a data carrier the risk of data loss is increased. If another data carrier fails before the defect data carrier has been replaced, the data can t be restored. To avoid that a hot spare data carrier can be installed, that immediately steps in when a defect occurs. Thereby the RAID group hence the data consistency is fully available. The defect data carrier doesn t have to be replaced immediately. The replacement can be integrated into a perfunctory service task. 1 S.M.A.R.T. = Self-Monitoring, Analysis and Reporting Technology, that means a system for monitoring hard discs and providing early detection of impending defects. Entry-ID: 109737064, Version, 01/2014 8
5 Conclusion 5 Conclusion Depending on the used RAID level, RAID systems are suitable for reducing the risk of data loss as well as for increasing the speed at which the data is to be transferred. This, for example, can reduce the time necessary for creating reports based on many process values from archive data or databases. The interconnection of data carriers into one RAID level 5 gives the user security from data loss during the failure of a data carrier as well as improved performance when transferring the saved data. Furthermore, useful functions and capabilities, such as automatic notification in the case of an error as well as hot swapping of data carriers, help increasing the overall availability of a system. However, the application of a RAID system is no substitute for regular backups of important data; it must be considered as a useful supplementation. Entry-ID: 109737064, Version, 01/2014 9