Another Advantage Of Linux Software RAID

I am a long-time fan of Linux software RAID. But in reading an online discussion which happened to mention variations in actual sizes of storage devices (disks or flash) of supposedly identical capacity, I realized that there is yet another advantage of Linux software RAID: the fact that it can build arrays out of disk partitions, not necessarily entire disks. This means you can create the partitions to be a standard size, slightly smaller than the maximum capacity of the disk. (Each disk will have just one partition on it, the rest of the disk being unused.) Then if you have to replace a failed disk, you can create a new partition of the same size on the replacement. Otherwise, if you use the entire disk, you run the risk that the new disk is a bit smaller than the old one, and trying to add it to the RAID container may fail.

I am a long-time fan of Linux software RAID. But in reading an online discussion which happened to mention variations in actual sizes of storage devices (disks or flash) of supposedly identical capacity, I realized that there is yet another advantage of Linux software RAID: the fact that it can build arrays out of disk partitions, not necessarily entire disks.
This means you can create the partitions to be a standard size, slightly smaller than the maximum capacity of the disk. (Each disk will have just one partition on it, the rest of the disk being unused.) Then if you have to replace a failed disk, you can create a new partition of the same size on the replacement.
Otherwise, if you use the entire disk, you run the risk that the new disk is a bit smaller than the old one, and trying to add it to the RAID container may fail.
This is true, however all modern RAID controllers have a mechanism to achieve the same goal. LSI refers to this as “size coercion”, and enabling it limits disks to the next lowest round half gigabyte: # /opt/MegaRAID/MegaCli/MegaCli64 -pdlist -a0 | grep -i Coer Non Coerced Size: 3.637 TB [0x1d1b0beb0 Sectors] Coerced Size: 3.637 TB [0x1d1b00000 Sectors] non-coerced size: 7812988592*512 / (1024*1024*1024) 3725.52327 coerced size: 7812939776 sectors *512 bytes/sector / (1024*1024*1024)bytes 3725.50000 Yes, it’s still technically possible to have some “larger” disks to have exactly 0x1d1b00000 sectors and an errant “smaller” disk to have one fewer, which would still break in this case - but the chances of that happening are vanishingly low. I like linux mdraid, but there are a number of situations in which good quality hardware RAID is vastly superior (performance being the main one). The kinds of issues that people seem to have with hardware RAID either stem from using poor quality components ( fake raid, motherboard hardware raid, low-end RAID controllers which are really just slightly smart HBAs etc), or from not having support contracts on hardware (and running into issues like “my RAID controller died and I couldn’t find the right replacement model so I couldn’t recover the array"). If you’re in a situation where your options are motherboard RAID vs linux software RAID, or using hardware RAID without support contracts or cold spares and don’t want to/can’t recover from backup, then mdraid might well be the best option.

On Fri, 8 Jan 2016 22:09:15 +1300, Daniel Lawson wrote:
I like linux mdraid, but there are a number of situations in which good quality hardware RAID is vastly superior (performance being the main one).
The CPU processing required to do software RAID gets lost in the idle rounding error on current machines.
The kinds of issues that people seem to have with hardware RAID either stem from using poor quality components ( fake raid, motherboard hardware raid, low-end RAID controllers which are really just slightly smart HBAs etc), or from not having support contracts on hardware (and running into issues like “my RAID controller died and I couldn’t find the right replacement model so I couldn’t recover the array").
It offers far more opportunities for headaches than that. Software RAID rules.

Hi Lawrence, At present I have no experience with Linux RAID, but I've been contemplating some scenario's (see below) which would give me some idea the flexibility and resilience of Linux based RAID 1 mirroring. I'd be interested to know how well you think Linux RAID would stand up to my testing / abuse ;-) Setup: Let's say I have 4 x 32GB USB sticks from different vendors. On each one I create one 31.5GB Ext4 partition with names of "disk_1", "disk_2", ...etc. Linux RAID 1 is used to mirror the 4 x USB sticks, and some data is written to the mirror-sets. I then remove disk_4 and run as a 4 x member mirror-set with 3 members present and one member missing. Scenario 1: Using console terminal commands I remove disk_3 USB stick from the mirror-set and take it to another computer. Can it be mounted and read as single JOBD or must it be mounted as a single unit of a mirror set in order to be read? Scenario 2: At the command line I remove disk_3 USB stick from the mirror set. Can I then mount this disk on this same computer as a single JOBD or must it be as a single member of another mirror-set? E.g. So that I could do a network based backup of disk_3 while the other two disks in the mirror-set continue their normal operations. Scenario 3: Without using commands to elegantly remove disk_3 from the mirror-set, I just pull out disk_3 USB stick. I continue writing data to the remaining mirror-set. One hour later I plug back in disk_3. Will it auto-magically be joined to the existing mirror-set and updated so it contains current data? ...OR, do I need to enter commands to get it to be rejoined to the existing mirror-set? Scenario 4: Without using commands to elegantly remove disk_3 from the mirror-set, I just pull out disk_3 USB stick. I then plug in disk_4 USB stick. Will disk_4 be auto-magically joined to the existing mirror-set and updated? ...OR do I need to enter commands? thanks, Ian.
Date: Thu, 7 Jan 2016 19:03:43 +1300 From: ldo(a)geek-central.gen.nz To: wlug(a)list.waikato.ac.nz Subject: [wlug] Another Advantage Of Linux Software RAID
I am a long-time fan of Linux software RAID...

On Mon, 11 Jan 2016 09:54:42 +1300, Ian Stewart wrote:
Scenario 1: Using console terminal commands I remove disk_3 USB stick from the mirror-set and take it to another computer. Can it be mounted and read as single JOBD or must it be mounted as a single unit of a mirror set in order to be read?
You mean JBOD? That should work if you mount it read-only. I would avoid trying to write to it, or you could corrupt the RAID metadata. I was effectively doing this sort of thing with older versions of GRUB, by setting it to boot off one of the members of the RAID container.
Scenario 2: At the command line I remove disk_3 USB stick from the mirror set. Can I then mount this disk on this same computer as a single JOBD or must it be as a single member of another mirror-set? E.g. So that I could do a network based backup of disk_3 while the other two disks in the mirror-set continue their normal operations.
Again, it should work if you mount it read-only. Also, note that you don’t need to remove it from the array in order to make direct accesses to it on the same machine. It still remains available under its original device name. I take advantage of this for doing badblocks scans on disks. If you are doing backups, I would do them at the file level. You can reduce the impact on the system with ionice(1).
Scenario 3: Without using commands to elegantly remove disk_3 from the mirror-set, I just pull out disk_3 USB stick. I continue writing data to the remaining mirror-set. One hour later I plug back in disk_3. Will it auto-magically be joined to the existing mirror-set and updated so it contains current data? ...OR, do I need to enter commands to get it to be rejoined to the existing mirror-set?
Can’t recall encountering this actual situation (all the cases where disks disappeared from the array were due to crashes). But I’m pretty sure if it doesn’t get picked up automatically, you can re-add it manually.
Scenario 4: Without using commands to elegantly remove disk_3 from the mirror-set, I just pull out disk_3 USB stick. I then plug in disk_4 USB stick. Will disk_4 be auto-magically joined to the existing mirror-set and updated? ...OR do I need to enter commands?
Same situation as scenario 3. There are commands if needed to do this manually.

Hi Lawrence, Thanks for your reply. Sorry, yes, I meant JBOD.
Can it be mounted and read as single JOBD or must it be mounted as a single unit of a mirror set in order to be read? That should work if you mount it read-only. I would avoid trying to write to it, or you could corrupt the RAID metadata.
Yeah, I figure that although its RAID 1, there will still be RAID related meta-data on the disk. I wondered if this meta-data made the disk unreadable. E.g. If you had one volume of a RAID5 set then you cant recover your data from it, but a volume of RAID 1 should be OK to recover data. I also wondered if device names would cause conflicts. But, as you say, mounting it read only it should then be OK. For scenarios 3 and 4, I was thinking along the lines that your desktop computer at work has RAID1 based USB drives for all your ~/home/ data. You just pull out the USB stick when you go home at the end of the day, and plug it in again in the morning. If the office burns down at night and you loose your computer, then your data is all on the USB stick and available to be read by another computer. This RAID1 approach may not be the best way to get yourself a backup, but would be one of the quickest / easiest to manage. Maybe at a wlug meeting we could get Linux RAID1 up and running on a laptop with some USB sticks and subject it to some testing and see if it handles the abuse and how auto-magic it is, etc. cheers, Ian.

Maybe at a wlug meeting we could get Linux RAID1 up and running on a laptop with some USB sticks and subject it to some testing and see if it handles the abuse and how auto-magic it is, etc.
You've preempted my question, as I'm still looking for topics for the meeting this month ! @Lawrence + @Ian Would you be interested in demoing some RAID stuff? I'm sure Ian can supply hardware... ;-) Cheers, Peter -- Peter Reutemann Dept. of Computer Science University of Waikato, NZ +64 (7) 858-5174 http://www.cms.waikato.ac.nz/~fracpete/ http://www.data-mining.co.nz/

Hands up who saw this coming hahaha ;-) -- Securely sent with Tutanota. It's good, you should try it: https://tutanota.com 11. Jan 2016 13:10 by fracpete(a)waikato.ac.nz:
Maybe at a wlug meeting we could get Linux RAID1 up and running on a laptop with some USB sticks and subject it to some testing and see if it handles the abuse and how auto-magic it is, etc.
You've preempted my question, as I'm still looking for topics for the meeting this month !
@Lawrence + @Ian Would you be interested in demoing some RAID stuff? I'm sure Ian can supply hardware... ;-)
Cheers, Peter -- Peter Reutemann Dept. of Computer Science University of Waikato, NZ +64 (7) 858-5174 http://www.cms.waikato.ac.nz/~fracpete http://www.data-mining.co.nz _______________________________________________ wlug mailing list | > wlug(a)list.waikato.ac.nz Unsubscribe: > http://list.waikato.ac.nz/mailman/listinfo/wlug

Hands up who saw this coming hahaha ;-)
Shh... Don't scare my presenters... ;-) Cheers, Peter -- Peter Reutemann Dept. of Computer Science University of Waikato, NZ +64 (7) 858-5174 http://www.cms.waikato.ac.nz/~fracpete/ http://www.data-mining.co.nz/

On Mon, 11 Jan 2016 13:10:53 +1300, Peter Reutemann wrote:
@Lawrence + @Ian Would you be interested in demoing some RAID stuff? I'm sure Ian can supply hardware... ;-)
I can show setup and status commands, perhaps even removing/adding disks. Can Ian provide hardware failures? ;) Note I have only used RAID-1.

On Mon, 11 Jan 2016 12:29:46 +1300, Ian Stewart wrote:
xI wondered if this meta-data made the disk unreadable. E.g. If you had one volume of a RAID5 set then you cant recover your data from it, but a volume of RAID 1 should be OK to recover data.
Yes, it has worked fine so far with RAID-1. That’s why I’ve only used RAID-1. ;)
I also wondered if device names would cause conflicts.
No, because the RAID devices are named /dev/md0, /dev/md1 etc. The original disk names /dev/sda etc remain unaffected.
For scenarios 3 and 4, I was thinking along the lines that your desktop computer at work has RAID1 based USB drives for all your ~/home/ data. You just pull out the USB stick when you go home at the end of the day, and plug it in again in the morning. If the office burns down at night and you loose your computer, then your data is all on the USB stick and available to be read by another computer. This RAID1 approach may not be the best way to get yourself a backup, but would be one of the quickest / easiest to manage.
Good point. But you can’t be sure that syncing has finished when you want to unplug. I have a custom script that uses rsync to back up between volumes. I can easily run that several dozen times a day. It’s fast.
Maybe at a wlug meeting we could get Linux RAID1 up and running on a laptop with some USB sticks and subject it to some testing and see if it handles the abuse and how auto-magic it is, etc.
Certainly can do. Keep the volumes to a modest size, so it won’t take too long. :)

Hi Lawrence, With regard to a wlug presentation... I was thinking along the lines of a laptop plugged into the overhead projector with its main partition for Linux and, say, 2 x 1GB partitions of dummy data in a mirror-set. Then also have one or two USB sticks which have 1GB partitions on them that can be plugged into the laptop and join the mirror-set.
Can Ian provide hardware failures?
The failure scenario I envisage would be to pull out a USB stick and observe the remainder of the mirror-set still works OK. Plug in a USB stick and observe the merging of the member back into the mirror-set. Do you think that would be OK? Would 1GB partitions take too long to sync in a demo situation? Would, say, 100MB partitions be more suitable? I have one of Ian Young's Thinkpad laptops that works good with the overhead projector. I don't think Ian would mind if you used it to demo Linux RAID. It's got Ubuntu/Mate 15.04 on it at the moment. Let me know what is your preferred flavour of distro/desktop and I'll install it. I'd add GParted for use in generating same sized partitions. Any other apps you'd want added?
Note I have only used RAID-1
I figure that RAID-1 is all that most people would be interested in as a single member, removed from the mirror-set, can be demoed as a backup device.
But you can’t be sure that syncing has finished when you want to unplug.
Is there a way to monitor the degree of completion of syncing of an added member to the mirror-set? It's now more than 15 years ago that I used DEC's OpenVMS RAID-1 called Volume Shadowing, but I recollect that they had a feature called "mini-merge" as opposed to "full-merge". My understanding was that mini-merge would find the files that had changed and only update them as part of the syncing when a member gets added back to the mirror-set, whereas "full-merge" was a block by block merging process. Does Linux RAID have this sort of "mini-merge" functionality?
I have a custom script that uses rsync to back up between volumes.
It would be good if you could also demo this. It may highlight that rsync is a better way of doing backups than pulling a member off a mirror-set. cheers, Ian.

On Mon, 11 Jan 2016 19:53:36 +1300, Ian Stewart wrote:
I was thinking along the lines of a laptop plugged into the overhead projector with its main partition for Linux and, say, 2 x 1GB partitions of dummy data in a mirror-set. Then also have one or two USB sticks which have 1GB partitions on them that can be plugged into the laptop and join the mirror-set.
I was going to suggest that you can even use loopback image files instead of partitions. However, I’m not so sure how to emulate taking one of these offline. Therefore...
The failure scenario I envisage would be to pull out a USB stick and observe the remainder of the mirror-set still works OK. Plug in a USB stick and observe the merging of the member back into the mirror-set.
Yes, sounds like the best idea.
Do you think that would be OK? Would 1GB partitions take too long to sync in a demo situation? Would, say, 100MB partitions be more suitable?
Best thing I guess is to do a rehearsal. Find an optimum size so it takes maybe half a minute to a minute to resync, not too slow and not too fast, either. :)
Let me know what is your preferred flavour of distro/desktop and I'll install it.
I don’t think it matters much. I’ll be sticking to the mdadm command line, which should work the same regardless.
I'd add GParted for use in generating same sized partitions. Any other apps you'd want added?
mdadm and rsync, of course. My existing backup scripts are designed to run across machines. So I can show them, but probably not run them.
Note I have only used RAID-1
I figure that RAID-1 is all that most people would be interested in as a single member, removed from the mirror-set, can be demoed as a backup device.
Fine.
But you can’t be sure that syncing has finished when you want to unplug.
Is there a way to monitor the degree of completion of syncing of an added member to the mirror-set?
Come to think of it, I believe there is a percentage-complete indication in the status display, if a resync is in progress.
It's now more than 15 years ago that I used DEC's OpenVMS RAID-1 called Volume Shadowing, but I recollect that they had a feature called "mini-merge" as opposed to "full-merge". My understanding was that mini-merge would find the files that had changed and only update them as part of the syncing when a member gets added back to the mirror-set, whereas "full-merge" was a block by block merging process. Does Linux RAID have this sort of "mini-merge" functionality?
Linux software RAID is strictly block-level. Volume Shadowing sounds like something at the filesystem level, no doubt specific to ODS-2, the standard VMS filesystem.
I have a custom script that uses rsync to back up between volumes.
It would be good if you could also demo this. It may highlight that rsync is a better way of doing backups than pulling a member off a mirror-set.
I can demo rsync, and I can show examples of the sorts of scripts I write with it. Lawrence
participants (5)
-
Daniel Lawson
-
Eric Light
-
Ian Stewart
-
Lawrence D'Oliveiro
-
Peter Reutemann