Skip to main content

Setting up a file server with Fedora 11 and Software RAID

WARNING TECHNICAL CONTENT!

I am sure this is a familiar story. Our media collection is spread out over two NAS (Network Attached Storage) devices and two computers on about 8 hard disks. There is no redundancy - if a disk dies it takes the media with it - and the only backup startegy is to write to DVDs - a waste as we have several bookcases full of the original CDs and DVDs!!

It is time for a new file server with redundancy - in other words a RAID based file system.

I tried out a few options. Windows Home Server looked promising but it doesn't do RAID 5. It does do mirroring and you can add any disk you like - USB, SATA, PATA, Hard Disks of any size you have but to get redundancy means telling it to duplicate a directory. This means you need twice as many disks as RAID 5 which uses one extra disk.

So in the end I decided to use a Software Raid solution based on the recenylt released Fedora 11 (64 bit release) Linux release which also included Virtualisation support which would also allow me to run a Virtual Machine or two if I wanted.

I considered using old hardware and just buying some new hard disks but after pricing the disks it was only about 30% more to build a whole new computer. In the end I decided not to reuse old hardware as I expect this solution to be operational for two to threee years with the only hardware changes being adding an extra disk or two (another reason to use Linux as it allows the RAID to grow).

I decided on a Core 2 Quad system with 4GB of RAM, 6 SATA ports on the motherboard so I would not need an extra card and a 250G SATA disk to boot from and to hold software and a couple of virtual machines and 5 1.5TB SATA disks to create a 6TB storage area. The case also can hold about 8 disk comfortably and with two 120mm fans the disks are running at about 35 Celcius (34 to 38 according to the palimpsest application).

Now I do know that there will be performance bottlenecks due to maxing out the onboard SATA controllers and ports but as I have a Gigabit network in the house being restricted to 80 to 100MB is not a problem and even with the expected maximum of 4 clients trying to stream video it shouldn't get close to that. Besides I can always add a SATA card later although I may need to find a PCI Video card as the motherboard I chose does not have onboard video. That was a compromise as I did not want to wait another week or two for the motherboard I originally chose to be in stock. Finally a 460W (hopefully of decent quality - anyone used Gigabyte power supplies?) should provide ample power.

In my researching I found the following tutorial which proved to be invaluable.
http://www.linuxhomenetworking.com/wiki/index.php/Quick_HOWTO_:_Ch26_:_Linux_Software_RAID

Be prepared to install and reinstall as you learn how to setup linux and deal with the software RAID. The first time I installed I did not setup the Software RAID during the install. I installed to the 250GB disk (/dev/sda) and when in linux (and safely logged in in a terminal - or command line prompt for the MSDOS/Windows user - as the administrator user - watch out the word "root" may trigger the Australian governments internet filter - I was able to run the commands to partition disks, set the partition types and then run the build command for the RAID
mdadm --create --verbose /dev/md0 --level=5 \
--raid-devices=5 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1

This failed with a completely useless message (about something called a RUN_ARRAY) which actually meant that the raid5 module had not been installed properly. I solved that by reinstalling and setting up the array using the installer. I won't try and explain how that works as I needed a few attempts to get it right. Even then I still ended up doing it all from the command line.

After booting into Linux after the install the array was degraded but was being rebuilt automatically and the palimpsest application can be used to monitor the progress.

The RAID rebuild (strange way to describe it when the RIAD hasn't even been built yet!!) failed at 8%. I found after some trail and error that I could build the array if I used just 4 of the disks.

mdadm --create --verbose /dev/md0 --level=5 \
--raid-devices=4 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1

So that is what I did. This worked and I could format the array, mount it and copy to it, share it using samba with no problems.

mkfs.ext4 /dev/md0
One thing to note is that the build for four disks takes about 10 hours.

So now I added the extra disk to the array.
mdadm --add /dev/md0 /dev/sdf1
mdadm --grow -n 5 /dev/md0

You will be very happy to know it took two days for the rebuild to add the extra disk!

Of course the local electricity company decided to cut the power when the rebuild was 66% complete
. My UPS allowed me to shutdown but when I powered it back up again the boot dropped to the command line and I had to fix things from the command line.

As I had not created a /etc/mdadm.conf (I was waiting for everything to complete) I had to tell Linux what the RAID was composed of as follows:
mdadm --assemble /dev/md0 /dev/sdb1 /dev/sdc1 \
/dev/sdd1 /dev/sde1 /dev/sdf1

This started the rebuild which continued at 67% and actually ran twice as fast as it was before. Now I needed to create the /etc/mdadm.conf so Linux would boot properly. As the recovery console had the root filesystem mounted in read-only mode I had to remount it:
mount -o rw,remount /
Then I created the config file as follows

mdadm --detail --verbose --scan > /etc/mdadm.conf

Then it was just a matter of waiting for the rebuild to complete

cat /proc/mdstat
or
mdadm -D /dev/md0

Will show you the progress.

Once it was finished I resized the RAID
resize2fs /dev/md0
Now I rebooted and Linux completed the boot and I could configure samba etc

This link is a good starting point for configuring linux

http://www.fedoraguide.info/index.php?title=Main_Page

Now I want to make sure that if something happen such as a disk failure I am informed by email (the file server will be out in the garage when it is deemed stable enougth)


I added the following line to /etc/mdadm.conf
MAILADDR myemail@mydomain.type

from http://forums.fedoraforum.org/showthread.php?t=68018

and I edited /etc/mail/sendmail.cf

dnl # Uncomment and edit the following line if your outgoing mail needs to
dnl # be sent out through an external mail server:
dnl #
define(`SMART_HOST', `smtp.myisp.com')
dnl #

After this I had to install a package

yum install sendmail-cf

and run

/etc/mail/make

I don't know if this will work and I will have to do some testing, take a backup of the Linux system disks (clonezilla), completely fill the RAID to stress/performance test it and test for corrupt files...

So far this has taken two weeks. About half of that time was waiting for RAID rebuilds to finish and trying to figure out why five disks was causing a problem (I still have no idea but two of the disks have reallocated sectors so maybe that caused a problem - if that number grows then I will have to replace them one at a time - at least I get to test replacing a failed disk). It will be a while before I start reusing the 8 disks and two NASes this will hpefully replace.

Comments

Popular posts from this blog

The Extended Attributes Are Inconsistent in Windows 8

I have upgraded my laptop to Windows 8 and all was fine for about three hours and then I started to get an error when trying to run any application that required Administrator privileges. Half an hour of Googling led me to this blog post   Case of the broken uac prompt This problem was caused by downloading themes. The final theme I had decided on included a new sound theme. Clearly at least one of the sounds did not play (the one for the UAC prompt!). Changing the sound theme (right click on desktop, choose "personalisation") to Windows Default sorted out the problem. Doing a clean install would also have fixed it but that would have been a drastic solution and as soon as I installed a new theme it may have happened again.

Beware vnc and upnp

My new linux box got hacked yesterday. I was careless. My ADSL router has upnp support and it is turned on by default. I enabled "Remote Desktop" in Linux - which is a version of vnc - and decided not to set a password as it would not be accessible from outside the local network. That was my first mistake. Last night I noticed a second connection to the linux box. Someone was using the browser and had connected to Western Union and was trying to install the flash plugin. They had not got very far as Fedora 14 does not install Flash on a 64bit system as it is still in beta so the install is not straightforward. I was able to disconnect this errant person before they got any further and I then disconnected the ADSL line from the modem to prevent another attempt and proceeded to diagnose what had happened. I checked the preferences for VNC and noticed the automatically configure the network check box had been selected and that it was reporting an external address could be

"No child processes" error on Linux

A problem was reported by a customer. They were getting a failure and in the logs it reported error → waitpid failed 'Reason: No child processes' The “No child processes” error came from waitpid() after using  fork/spawn to launch a utility to load data into a data base. Upon detailed investigation it appears it is possible that some other process that the user is running has changed the default handler for SIGCHLD - possibly the shell (e.g. bash!) used to launch our server processes. If the signal handler is set to SIG_IGN then when a process is started using fork()/exec() the return code from the process is NOT returned and waitpid() cannot retrieve the response code. The most likely reason for "No child processes" error from waitpid() is that the signal handler for child processes (SIGCHLD) is not set to SIG_DFL. This should not be possible however it seems that on Linux a process run in the shell (or maybe a shell process) can set it