Drives Dissapearing - Hot swap issues on Win 2k3 Server

The drives in question are for backing up company data on our Windows 2003 Server.

A little backstory: We had a system working where we were rotating 3 different 1TB WD Sata drives, one each night, as our backup procedure. These were hot swappable, and working like a charm for years.

Our data grew and so we purchased 3 2TB WD Sata drives to replace the old 3 and effectively double the amount of “past” we could store in our backups. However, these 3 2TB drives (purchased from 2 different places) didn’t want to work via hot swap with the motherboard’s onboard Marvell Sata/Raid controller. (They would only show up if you rebooted with it in the bay). I figured the controller was at fault, so I purchased this addon card.

Right off the bat this setup seemed to work better. Except only at first. While it looked like the drives showed up, that night’s backup failed and i came into work to see that the drive was not there. It now take a few swaps before a drive will fully show up! It may detect it, but it isn’t loading right… you try to write to the disk and it tells you there’s an I/O error.

Anyone know what could be up? The 1TB drives still work great…

Hi dude, are you able to provide the model number of the drives? They may not support a RAID (As it tends to happen to some WD drives).

WD2001FASS  is the model # of the drives in question.

WD1001FALS  is the model # of the drives which still work.

When this new drive disappears and then the backup tries to use it, we get lots of System Events… ie:

Information - Error 36 = Command [0x2a] on physical disk 0 failed. Sense key=0xb, ASC=0x0, ASCQ=0x0.

Warning - Error 3 = Request to physical disk 0 is timed out.

Error - Error 15 = The device, \Device\Harddisk2, is not ready for access yet.

Error - Error 55 = The file system structure on the disk is corrupt and unusable. Please run the chkdsk utility on the volume Off-Site-Backups-1-10-11.

Thanks for your help!!

Hi dude!! That “Warning - Error 3” you provided just nailed it… The drive is timing out, that model does not have any kind of feature to prevent it from timing out on the RAID…

That model lacks the  TLER (Time Limited Error Recovery) present on some older WD drives (Although WD said it was not intentional for them to have it) and most RAID-Specific (Enterprise) WD drives, preventing the time-out…

If I were you, I’d change those drives for RE3s or RE4s, which are built for RAIDS…

1 Like

Thanks for your quick reply! 

Before i go out and buy some new drives, i just want to verify that this is ineed what is going on.

The drives that are timing out are actully not part of a Raid.  There is a raid of two drives in our server, but these are only plugged in one at a time, and for backups.

Does the information you gave me still apply in this scenario?

Thanks,

Chocko

It does not… They should not time out outside of a RAID unless there’s another factor into play… What about the voltage supplied?

bummer!  I’m swapping these drives out of a bay, so the power source is the same wheather i plug the 1TB or the 2TB drive in. So, something else must be going on here.

I’ll try ruling out the bay this evening… what other things could I look into?

power souce does not seem to be the problem, nor does the cradle. 

What other things might cause this problem??

Drive firmware may have independent sleep modes that OS unaware of. Microsoft addressed similar issue here  with a hitfix. (Server 2008).

http://support.microsoft.com/kb/977178/#top

I am focusing on long spin up time on 2TB drives. If this is the case you may have to update drive firmware or look for another brand.

PS. wd2001fass is  considered to a troublesome drive… just google!

1 Like

Thanks for your reply. It seems a plausible explanation, though I can’t seem to find any firmware updates for this drive. Also, my operating system is windows 2003 server, so that hotfix doesn’t really help me. I tried searching that site for a 2k3 server hotfix of any related nature, but was unable.

Are there settings anywhere where i can change the sleep mode or spin up time of the drive?

I am not sure about 2003, normally Windows power management is handled from ACPI interface. Modern WD HDDs do not honor ACPI command set. I have seen traditional AMP has certain effect on some WD drives. Try modifying APM (Advance Power Management) settings from Motherboard BIOS setup.

1 Like

There wasn’t anything applicable in the APM settings, though there was a setting for HDD timeout, and i increased it from 10 to 30 seconds… though this didn’t seem to have any effect.

I even ran a plethora of windows updates, one even touching on the Marvell Controller–but these also had no effect.

The 1TB drive still is recognized quite quickly and easily, while all 3 of my 2TB drives only half-innitialize. They don’t show up in My computer, or Storage Mangement. A small amount of Drive information does appear in the MRU tray utility, but not as much as it should.

There must be something else going on. Can someone help me confirm if there is a firmware update for these drives?

HDD timeout on BIOS settings is used to set waiting time until HDDs identify themselves to BIOS on boot. I was referring to APM settings on Power Management Tab. Newer motherboard may not have this feature. It is always good idea to contact WD support personnel get more info on your drives.

I’m experiencing a simular issue in my setup. Which is:

  • Win2k3

  • RocketRaid 2300 controller

  • WD10EVDS (4 times in a RAID-5 setup)

If I enable the idle timer on the controller, from time-to-time, one of the 4 are suffering from a time-out.

And disconnect from the controller.

They are not re-discovered/re-connected automatically - even after a reboot.

I’m using this setup for backup purposes. So performance is not that important.

Reliability is. And I picked these drives because they are (also) positioned for 24x7 usage.

What can be done to prevent this from happening?

Other than replace these with RE drives?