Velociraptors in RAID0 cause system to freeze repeatedly


#1

Hello,

my two Velociraptors WD3000HLFS in RAID0 cause my system to freeze for about 3 minutes in approximtely 1 hour 15min intervals.

Well, it doesnt freeze completely - already running apps in RAM work normally untill a HDD operation is needed, launching new apps or any work requiring HDD activity (almost everything) stops to respond during this. After 3 mins the system suddenly ‘awakes’ and everything runs normally…  untill it hangs again after about 60-90min… etc

During this 3min period:

  • the system is incredibly slow or doesnt respond at all

  • the hdd activity led becomes constantly lit

  • Windows performance monitor shows the hdd activity to be constantly 100% (just with few minor fluctuations) but almost no data are transfered (just few kB/s), the response (access) time of running processes grows to more than 1000ms. 

  • CPU and RAM are idle.

The system ran perfectly stable for half a year before. The problem has arisen quite unreasonably 'cos I didn’t install any new ‘strange’ stuff nor changed any OS settings significantly.

Overheating of any HW is improbable.

I also:

  • have latest mobo BIOS and drivers for all devices
  • tested CPU and RAM stability and they seem to work flawlessly.

- tested the system with numerous antivirus programes - no malware/spyware found

- ran a CHKDSK test - no errors found

  • ran a verify raid volume test in intel rapid storage - no errors found

- tested both hardrives in WD Data lifeguard diagnostics but found no errors - smart data ok, both tests (quick, extended) succesfully passed (twice)

- reinstalled the OS (fully formated the HDDs with DLG, deleted the old RAID, created a new RAID, used latest ahci/raid driver) - problem persists 

I dont have any other hdds to try atm.

The problem occurs no matter what I’m doing - idle at desktop, browsing web, playing games… and it’s quite annoying tbh :-/

Note: the dlg couldnt recognize the hdds nor perform any tests when in raid, so I connected them to an another pc separately as secondary hdds to test them. But I suppose it’s normal, aint it?

system specs:

Asus Maximus III Formula P55 mobo
Core i7 860
4 GB RAM 
Radeon HD 5870
2x 300GB Velociraptor WD3000HLFS in RAID 0 (using mobo-integrated intel raid controller)
Win 7 64bit 

I have also found other ppl with similar problems but with different drives, no solutions tho (e.g. here).

Any help would be welcome and appreciated.


#2

Note: the dlg couldnt recognize the hdds nor perform any tests when in raid, so I connected them to an another pc separately as secondary hdds to test them. But I suppose it’s normal, aint it?    =YES

Had a similiar problem an my Gigabyte X58 UD5 and it turned out to be a defective JMB onboard controller (with a harddisk connected and running Win7 64 ultimate) so disconnect all other drives than the Velos and see what happens.

Know it sounds like a strange suggestion but you could try and reduce SATA speed to 1,5 via jumpers on drive(s) and see what happens.


#3

Thx for a reply.

Disconnected other SATA devices than the Velos-  no change.

I’ve never tryed to ‘play’ with the jumper settings before so I’ll see what I can do…  o_o

BTW: I also have an additional JMB RAID controller on my mobo, but I use the Intel’s one integrated in chipset cos it’s faster.


#4

Well i know i am guessing but better than nothing i hope :

And the JMB is disabled in BIOS ?

Are all other USB devices disconnected other than mouse and keyboard ?

Be aware that standard or safe BIOS setup values on especially ASUS  does not always (never on the X series ::slight_smile: give the right voltages for chipset. (judging from your post you are aware but just to be safe)

ASUS standard supplied cooling for the chipset is nearly always not sufficient, here it sometimes (strangely) helps to lower RAM speed (yes i know that the memory controller is on the CPU) and /or peel of the ASUS sticker on SB cooler (south bridge) and mount a blower to cool the SB (it some times get too hot cause its close to the GPU(s) (graphic card(s))

Difficult for me to understand why Intel (un)wisely removed thermal monitoring from their latest ICHs (south bridges) cause it is cumbersome to get a temperature probe mounted so you get a temperatur reading that you can trust and thus diagnose any temperature related problem with the south bridge cause you dont know the ok values before the failiure/problem.

It could also be a generel problem related to the P55 chipset and/or the 860 and then you are really in trouble.

Another thing to try is to go back to the latest intel matrix driver (instead of the new RAPID driver) have seen some peob having trouble with the RAPID.

The problem could also come from a degrading PSU (Powersupply) (if your PSU was just sufficient for your config at  initial build time)

Any PSU will degrade a little with time cause capacitors loose capacity and that will slowly give disk errors (at least on the X58 or 5520 chipset) when you use the system (if it where on edge capacitywise or temperature wise) even if all things looks like they are ok.

This is really a nasty thing cause what will happen is that the values that Win7 constantly writes to disk (paging file etc.) have sometimes a little error, they sum up and you have to reinstall your OS.

So capacitywise for the PSU it must be able to supply the system at 80% load as a rule of thump.

Using Corsair HX series or other high quality PSU (there is a reason that they are expensive) will normally give you 10 to 20% (or even more) headroom on top of that.


#5

And the JMB is disabled in BIOS ?  -  No, it never was before tho. I’ll disable it.

Are all other USB devices disconnected other than mouse and keyboard ?  -   Yes.

Be aware that standard or safe BIOS setup values on especially ASUS  does not always (never on the X series : ![:smiley:]( “Smiley Happy”) give the right voltages for chipset.  -  Yeah, I encountered it on my former ASUS X38 board that was likely to overdo cpu and chipset voltage a lot. Now I also use the ‘auto’ voltage settings for all pieces of HW (except DRAM - forced 1.65V), but it seems to work fine and the values are not exaggerated. The PCH (SB) voltage is 1,06 V which seems to be ok.

ASUS standard supplied cooling for the chipset is nearly always not sufficient,  -  My mobo’s (maximus 3 formula) heatsinks are large enough I think and the case (CM HAF 922) is especially designed for good airflow.

 

here it sometimes (strangely) helps to lower RAM speed (yes i know that the memory controller is on the CPU)  -  RAM speed is already lowered. Running at 1600MHz, 7-8-7-20 1T, 1.65V, the stock is 1866MHz with same timings and voltage (was too lazy to do the OC :slight_smile: )

 

 and /or peel of the ASUS sticker on SB cooler (south bridge) and mount a blower to cool the SB (it some times get too hot cause its close to the GPU  -  there’s no sticker on it :), PCH (SB) temp is 40°C in idle and under load it doesn’t exceed 50°C me thinks (not sure tho, will test it later)

 

Difficult for me to understand why Intel (un)wisely removed thermal monitoring from their latest ICHs (south bridges) cause it is cumbersome to get a temperature probe mounted so you get a temperatur reading that you can trust and thus diagnose any temperature related problem with the south bridge cause you dont know the ok values before the failiure/problem.

It could also be a generel problem related to the P55 chipset and/or the 860 and then you are really in trouble. - Well I can see the chipset temps in both BIOS and Windows cos ASUS has quite a handy utility with all the volt and temp values. I know the i7 860 is a hot one but it runs at stock clocks and a big fat Noctua NH U12P is sitting on it so there is no problem with overheating.

 

Another thing to try is to go back to the latest intel matrix driver (instead of the new RAPID driver) have seen some peob having trouble with the RAPID.  -  the problem started when I had the matrix driver 8.9.0.1023 (latest), then updated to the rapid when reinstaling the OS.

PSU  -  I have Enermax Modu82+ 625W which is 9 months old (whole pc was purchased in november 2009). Enermax is one of these high quality (=expensive) brands and the wattage is suitabe for my config i suppose.**** 

 


#6

Tomas7 wrote:

And the JMB is disabled in BIOS ?  -  No, it never was before tho. I’ll disable it.

Are all other USB devices disconnected other than mouse and keyboard ?  -   Yes.

 

Be aware that standard or safe BIOS setup values on especially ASUS  does not always (never on the X series : ![:smiley:]( “Smiley Happy”) give the right voltages for chipset.  -  Yeah, I faced it on my former ASUS X48 board that was likely to overdo the chipset voltage a lot. Now I also use the ‘auto’ voltage settings for all pieces of HW (except DRAM - forced 1.65V), but it seems to work fine and the values are not exaggerated. The PCH (SB) voltage is 1,06 V which seems to be ok.

"ok looks like you know what you are doing herebut be sure that the needed chipset voltages have ben rised so you dont brick your CPU (thinking here of voltages needed when you rise memory voltage)"

 

ASUS standard supplied cooling for the chipset is nearly always not sufficient,  -  My mobo’s (maximus 3 formula) heatsinks are large enough I think and the case (CM HAF 922) is especially designed for good airflow.

"dont agree completely (i always water cool  the chipset on ASUS boards to be sure of long time stability but maybe i am a little paranoid here )

 

here it sometimes (strangely) helps to lower RAM speed (yes i know that the memory controller is on the CPU)  -  RAM speed is already lowered. Running at 1600MHz, 7-8-7-20 1T, 1.65V, the stock is 1866MHz with same timings and voltage (was too lazy to do the OC :slight_smile: )

"would still recommend to try one of the standard  lower speeds until you have found the failure  you can always rise it later"

 

 and /or peel of the ASUS sticker on SB cooler (south bridge) and mount a blower to cool the SB (it some times get too hot cause its close to the GPU  -  there’s no sticker on it :), PCH (SB) temp is 40°C in idle and under load it doesn’t exceed 50°C me thinks (not sure tho, will test it later)

 "oki"

Difficult for me to understand why Intel (un)wisely removed thermal monitoring from their latest ICHs (south bridges) cause it is cumbersome to get a temperature probe mounted so you get a temperature reading that you can trust and thus diagnose any temperature related problem with the south bridge cause you don’t know the ok values before the failure/problem.

It could also be a generel problem related to the P55 chipset and/or the 860 and then you are really in trouble. - Well I can see the chipset temps in both BIOS and Windows cos ASUS has quite a handy utility with all the volt and temp values. I know the i7 860 is a hot one but it runs at stock clocks and a big fat Noctua NH U12P is sitting on it so there is no problem with overheating.

 

Another thing to try is to go back to the latest intel matrix driver (instead of the new RAPID driver) have seen some peob having trouble with the RAPID.  -  the problem started when I had the matrix driver 8.9.0.1023 (latest), then updated to the rapid when reinstaling the OS.

 

PSU  -  I have Enermax Modu82+ 625W which is 9 months old (whole pc was purchased in november 2009). Enermax is one of these high quality (=expensive) brands and the wattage is suitabe for my config i suppose. ( how much WATT does that ATI GPU need ?)

Had a problem with one of my enermax PSUs why enermax uses “gold” plated connectors and i got some kind of electro migration over time why the MB connecter was “silver” = bad connection. (the rule of thump is “gold” on "gold " and “silver” on “silver”)

The “gold” contact was also quite soft so there was not sufficient contact pressure, so i replaced the power connectors/wires with some from an old PSU and problem gone and retired the other enermax PSU (dont open PSU if you are not a qualified engineer voltages there can kill you)

So maybe you could loan a PSU and test ?

Also consider when Nvidia and ATI optimizes their drivers the GPU will in my experience need more power)

 

 

 


#7

Disabling the Jmicron controller didnt help. 

Voltages:

 I’ve been running on these voltage and frequency settings since the PC was set up and it’s been always stable in that way (no bluescreens etc). It spent hours in stability tests, too.

But I’ll set all to ‘auto’ (it will lower dram to 1333MHz, 1.5V and some lazy latencies) and see if anything happens…

EDIT: didnt help… 

PSU:

ATI Radeon HD5870 has 188W TDP

Core i7 860 has 95W TDP

…so if I’m calculating this right, the total system consumption shouldnt exceed 400W (or 450W max) roughly

I have an identical PSU in another PC with a very similar config (same CPU and GPU, MSI GD65 P55 mobo, single WD Velo drive) and it runs good.

I looked into PSU specifications but didnt find what are the connectors made of. Well, these more complicated electricity ‘thingies’ are beyond my knowledge tbh. I’m just an amateur.

But I’ve used Enermax PSUs with ASUS boards in several PCs before and never had any problems with them…

I’ll try to loan some HW but won’t be able to get it soon enogh I’m afraid…

EDIT: btw a guy on ASUS forum suggested that “some sectors may be slow [150ms+] to respond and cause a raid degrade even though a test didnt show bad sectors”  …  but I dont know how to test that. Any ideas?


#8

Did you use dos version of DLG to test ?

If not mount drive on your good machine (dismount all other disks so you only have the disk connected where you want to use dos DLG), write 0 to Drive (all the drive), and afterwards do the extended test.

do that for both velos.

note the time for completion of both tests if big diff on one drive (slow sector thing) you RMA it and enclose these post to WD RMA departement for their reference.

if no unnormal found on drives think its time to RMA your MB to ASUS (best if you have another MB of the same model so you could replace and test with that) before RMAing it.

The idea here is to replace some of the “big” components (PSU,MB, DISK(s),cables, memory etc. to try and pinpoint the error. 


#9

I managed to borrow and test a brand new Seagate HDD and the problem seems to be gone.

(The drive is runs in AHCI mode)

Can’t make the DOS (CD) version of DLG running though. Due to the unlocated licence agreement file bug (also mentioned somewhere on this forum). And dont have a floppy drive…

So I’m gonna install and use the Velos in RAID (with same settings) on the second PC when it’s available to make sure that the problem is not in my RAID controller /mobo/. 

Or use the drives separately to (hopefully) reveal the faulty one… 

Otherwise I’ll take a big hammer to deal with the naughty pc…  xD