Topics:
DVR
nvrec
Mplayer
Links
Misc
Commands
Humor
New user
uploaded files
|
(linux_command_line)-> (Parent)->Problems with my Raid Array |
submited by Russell Fri 18 Feb 05 Edited Sun 20 Feb 05 |
OK I had a non critical failure, I built my raid6 array with 4 drives (one missing) and mounted it. This was a replacement array and when the new array was online I planed to shut down the first array ( still in the same machine) and give it's drives to the second array as second parity and a spare drive.
As can be the case in my office, I didn't get around to that change right away, and yesterday when I prepared to make these changes, I found that the array had degraded to only two drives ( no parity) several days earlier. I had not gotten an email notice about the failure.
It seems that even though the mdadm --monitor command was running , it was not watching my new array. I have explictly added the command :
mdadm --monitor /dev/md1 &
to my /etc/rc.local
I was remote at the time I found the problem so the first thing I did was to add one of the drives from the old array
So I start watching the sync process and about 10 minutes lator the machine stops responding to me. my ssh session locks up. The server stops responding to pings. I can't do anything.
I get into the office in the moring and find the machine just locked up. No error messages on screen just totaly unresponsive. .. So I reboot. Mount the array (/dev/md1) read only and tell it to add a drive.
mdadm /dev/md1 -a /dev/hdc1
and it SLOWLY starts the sync process. All the drives in this array are 160 GB disks, I have timed the current process at 3 minutes per 1% completed or just under 5 hours for the whole process.
I guess that's not totaly unrealistic, it took about that long to copy the files onto this array, and that only involved about 1/2 of the data. to do this, I guess the Raid process needs to read and process 300 Gb of data and write a new 150~160 Gb of data.
Still I don't like the idea of leaving the office central file server in Read Only Mode for the first 4/5 hours of the busness day. My choice is to change it to mount r/w.. I 'm not sure that is safe. ( keep in mind, that I currently have no redunadancy on this drive ... )
9:15am
26% of the re-sync completed.. I needed to change the mount mode to read/write because a user needed to save files to the server.
I hope I don't regret it.
9:52am 31% ---- still running
10:49am 50% ---- still running
11:53am 71% ---- not dead yeat
12:35pm 84% ---- still not dead
[root@backuppc russell]# /sbin/mdadm --detail /dev/md1
/dev/md1:
Version : 00.90.01
Creation Time : Wed Jan 26 11:06:30 2005
Raid Level : raid6
Array Size : 312576512 (298.10 GiB 320.08 GB)
Device Size : 156288256 (149.05 GiB 160.04 GB)
Raid Devices : 4
Total Devices : 3
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Fri Feb 18 12:53:44 2005
State : clean, no-errors
Active Devices : 2
Working Devices : 3
Failed Devices : 0
Spare Devices : 1
Rebuild Status : 90% complete
Number Major Minor RaidDevice State
0 33 65 0 active sync /dev/hdf1
1 34 65 1 active sync /dev/hdh1
2 0 0 -1 removed
3 0 0 -1 removed
4 22 1 2 spare /dev/hdc1
UUID : f8e14f75:1e6932f3:23750999:74c8702d
Events : 0.611826
Fri Feb 18 12:53:49 EST 2005
[root@backuppc russell]#
12:53pm 90%
1:11pm 95%
1:31pm Re-sync complete...
Now to add the 2nd parity drive
7:pm finaly got the full array online:
[root@backuppc root]# mdadm --detail /dev/md1
/dev/md1:
Version : 00.90.01
Creation Time : Wed Jan 26 11:06:30 2005
Raid Level : raid6
Array Size : 312576512 (298.10 GiB 320.08 GB)
Device Size : 156288256 (149.05 GiB 160.04 GB)
Raid Devices : 4
Total Devices : 5
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Fri Feb 18 19:12:09 2005
State : clean, no-errors
Active Devices : 4
Working Devices : 5
Failed Devices : 0
Spare Devices : 1
Number Major Minor RaidDevice State
0 33 65 0 active sync /dev/hdf1
1 34 65 1 active sync /dev/hdh1
2 22 1 2 active sync /dev/hdc1
3 3 65 3 active sync /dev/hdb1
4 33 1 -1 spare /dev/hde1
UUID : f8e14f75:1e6932f3:23750999:74c8702d
Events : 0.618446
Feb 20, 7am
OK so the machine reset this morning, ( I think I have a lose wire on the powere supply, I will check it when I am back in the office) But when it restarted the array re-assembled with drives missing, I belive this is beacause I need to add lines to /etc/mdadm.conf. I beleive I have it right now. my current /etc/mdadm.conf lines:
DEVICE /dev/hde1 /dev/hdb1 /dev/hdh1 /dev/hdf1 /dev/hdc1
ARRAY /dev/md1 devices=/dev/hdh1,/dev/hdf1,/dev/hdc1,/dev/hdb1,/dev/hde1
Since it will be 4+ hours until the array is done re-adding the second parity drive, I don't think I will test those settings today. but Idealy I won't need to rebuild anything if the machine resets again.
|
Add comment or question...:
|