Raid:Manual Rebuild

From SME Server
Revision as of 13:03, 16 February 2013 by Stephdl (talk | contribs)
Jump to navigationJump to search
PythonIcon.png Skill level: Advanced
The instructions on this page may require deviations from standard procedures. A good understanding of linux and Koozali SME Server is recommended.


Warning.png Warning:
Get it right or you will lose data. Take a backup! Let the raid sync, this can take quite a while.


SME Servers Raid Options are largely automated, if you built your system with a single hard disk simply logon as admin and select Disk Redundancy to add a new drive to your RAID1 array. The same procedure is used if you have a disk failure in a RAID array and you have replaced that failed disk.

But with the best laid plans things don't always go according to plan, these are the processes required to do it manually.

See also: Hard Disk Partitioning and Raid#Resynchronising_a_Failed_RAID

HowTo: Manage/Check a RAID1 Array from the command Line

What is the Status of the Array

[root@ ~]# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sdb2[2] sda2[0]
      488279488 blocks [2/1] [U_]
      [=>...................]  recovery =  6.3% (31179264/488279488) finish=91.3min speed=83358K/sec
md1 : active raid1 sdb1[1] sda1[0]
      104320 blocks [2/2] [UU]

unused devices: <none>

HowTo: Reinstate a disk from the RAID1 Array with the command Line

Look at the mdstat

First we must determine which drive is in default.


[root@ ~]#cat /proc/mdstat
Personalities : [raid1] 
md1 : active raid1 sdb1[1] sda1[0]
      104320 blocks [2/2] [UU]
      
md2 : active raid1 sdb2[2](F) sda2[0]
      52323584 blocks [2/1] [U_]
      
unused devices: <none>

(S)= Spare (F)= Fail [0]= number of the disk


Important.png Note:
As we can see the partition sdb2 is in default, we can see the flag: sdb2 [2] (F). We need to resynchronize the disk sdb to the existing array md2.


Fail and remove the disk, sdb in this case

[root@ ~]# mdadm --manage /dev/md2 --fail /dev/sdb2
mdadm: set /dev/sdb2 faulty in /dev/md2
[root@ ~]# mdadm --manage /dev/md2 --remove /dev/sdb2
mdadm: hot removed /dev/sdb2
[root@ ~]# mdadm --manage /dev/md1 --fail /dev/sdb1
mdadm: set /dev/sdb1 faulty in /dev/md1
[root@ ~]# mdadm --manage /dev/md1 --remove /dev/sdb1
mdadm: hot removed /dev/sdb1

Do your Disk Maintenance here

At this point the disk is idle.

[root@ ~]# cat /proc/mdstat
Personalities : [raid1] 
md1 : active raid1 sda1[0]
      104320 blocks [2/1] [U_]
      
md2 : active raid1 sda2[0]
      52323584 blocks [2/1] [U_]
      
unused devices: <none>


Important.png Note:
You'll have to determine if your disk can be reinstated at the array. In fact sometimes a raid can get out of sync after a power failure but also some times for physical outages of the hard disk. It is necessary to test the hard disk if this occurs repeatedly. For this we will use smartctl.


For all the details available by SMART on the disk

[root@ ~]# smartctl -a /dev/sdb

At least two types of tests are possible, short (~ 1 min) and long (~ 10 min to 90 min).

[root@ ~]# smartctl -t short /dev/sdb #short test
[root@ ~]# smartctl -t long  /dev/sdb #long test

to access the results / statistics for these tests:

[root@ ~]# smartctl -l selftest /dev/sdb

You can refer to this page for more information how activate or understand the Analysis and Reporting Technology (SMART) Monitor_Disk_Health


Important.png Note:
if you need to change the disk due to physical failure found by the smartctl command, install a new disk of the same capacity (or more) and enter the following commands to recreate new partitions by copying them from healthy disk sda.


[root@ ~]# sfdisk -d /dev/sda > sfdisk_sda.output
[root@ ~]# sfdisk /dev/sdb < sfdisk_sda.output

If you want to reinstate the same disk without replacing it, go to the next step.

Add the partitions back

[root@ ~]# mdadm --manage /dev/md1 --add /dev/sdb1
mdadm: hot added /dev/sdb1
[root@ ~]# mdadm --manage /dev/md2 --add /dev/sdb2
mdadm: hot added /dev/sdb2

Another Look at the mdstat

[root@sme8-64-dev ~]# cat /proc/mdstat
Personalities : [raid1] 
md1 : active raid1 sdb1[1] sda1[0]
      104320 blocks [2/2] [UU]
      
md2 : active raid1 sdb2[2] sda2[0]
      52323584 blocks [2/1] [U_]
      [>....................]  recovery =  1.9% (1041600/52323584) finish=14.7min speed=57866K/sec

unused devices: <none>


Important.png Note:
with a new disk it may be worthwhile to reinstall grub to avoid problems on startup error. The grub is the program that allows you to launch the operating systems. Please enter the following commands.


HowTo: Write the GRUB boot sector

[root@ ~]# dd if=/dev/sda1 of=/dev/sdb1
[root@ ~]# grub

    GNU GRUB  version 0.95  (640K lower / 3072K upper memory)

 [ Minimal BASH-like line editing is supported.  For the first word, TAB
   lists possible command completions.  Anywhere else TAB lists the possible
   completions of a device/filename.]

grub> device (hd1) /dev/sdb

grub> root (hd1,0)
 Filesystem type is ext2fs, partition type 0xfd

grub> setup (hd1)
 Checking if "/boot/grub/stage1" exists... no
 Checking if "/grub/stage1" exists... yes
 Checking if "/grub/stage2" exists... yes
 Checking if "/grub/e2fs_stage1_5" exists... yes
 Running "embed /grub/e2fs_stage1_5 (hd0)"...  16 sectors are embedded.
succeeded
 Running "install /grub/stage1 (hd1) (hd1)1+16 p (hd1,0)/grub/stage2 /grub/grub.conf"... succeeded
Done.

grub> quit