Disk standby

I’ve been trying to track down the problem with the disk exiting standby frequently.  I’ve found that

on my system which currently is not being used.  Will wake up at random times less than 2 minutes.

But at other times will stay in standby for several hours.  What I have found is that the 95% of the

disk I/O being done is the Kjournal and flush.  Both of the I/O is to update the atime for different

files.  First the Kjournal will write its entry to update the atime.  Then flush will write the inode.

I’ve also seen a few times where the disk will exit standby and no disk I/O has been done.  md1

is mounted with noatime,nodirtime.  /dev/root is mounted with relatime. 

Lookig for help to see if there are options that will fix the issue of the atime being updated all

the time.  Or why the atime is being updated when noatime is set for md1.

RAC

1 Like

Rac8006,

I think you are on the right track… so lets clarify for the laymans of this forum.

From en.wikipedia.org

Reading a file changes its atime eventually requiring a disk write, which has been criticized as it is inconsistent with a read only file system. This behaviour can usually be disabled by adding the noatime mount option in  /etc/fstab. Turning off atime updating breaks  POSIX compliance, and some applications, such as  mbox-driven “new  mail” notifications, [4] and some file usage watching utilities, notably  tmpwatchLinux kernel developer  Ingo Molnár called atime “perhaps the most stupid Unix design idea of all times,” [5] [6] adding: “[T]hink about this a bit: ‘For every file that is read from the disk, let’s do a … write to the disk! And, for every file that is already cached and which we read from the cache … do a write to the disk!’” He further emphasized the performance impact thus:

atime updates are by far the biggest I/O performance deficiency that Linux has today. Getting rid of atime updates would give us more everyday Linux performance than all the pagecache speedups of the past 10 years, _combined_.

File system cache may significantly reduce this activity to one disk write per cache flush.

Solutions[edit]

Current versions of  LinuxMac OS XSolarisFreeBSDNetBSD, and  OpenBSD support a noatime mount option, which causes the atime field never to be updated. This breaks compliance with POSIX.

Current versions of the Linux kernel support four mount options, which can be specified in  fstab:

  • strictatime (formerly atime, and formerly the default; strictatime as of 2.6.30) – always update atime
  • relatime (“relative atime”, introduced in 2.6.20 and the default as of 2.6.30) – only update atime under certain circumstances (explained below)
  • nodiratime – never update atime of directories, but do update atime of other files
  • noatime – never update atime of any file or directory; implies nodiratime; highest performance, but least compatible

strictatime accords with POSIX. File systems mounted with the noatime option do not update the atime on reads, while the relatime option provides for updates only if the previous atime is older than the mtime or ctime, or the previous atime is over 24 hours in the past. Many users use noatime without problem, so long as they do not use an application which depends on atime, and this offers some benefits over relatime (no writing of atime ever on read).

 Now how this pertains to the WD Cloud is that the cloud is built using the linux system, so even when nothing is happening, system logs are always being written however WD has isolated most of these logs and has created a ramdisk to isolate most activities to memory but not all.

Even monitorio.sh is guilty using echo >file to write a date-time.file as well as touch date-time file to keep the last date-time stamp before the drive is set to standby using hdparm -y.

After a certain build-up of unwritten data, the system will wake suddenly to flush out the data to disk. This will occur several times from 2 seconds to several minutes after the disk has gone into standby; then the system will sleep for several hours at a time (provided that you have switched off the scans as well as the cron).  

I’ve tried using “sync” in monitorio to no avail. At first I thought I saw a decrease in the number of initial wake-ups and sometimes it was true, that theory went to **bleep** last year when I got consistent wake-ups at every 25 minutes indicating that some wake-ups are coming from outside and this was the time when the WD server was having problems.

I gave up…

however do continue your research… perhaps it may be as simple as  remounting a logging ramdisk with noatime. 

1 Like

Ralphael

I’m confused.  I was testing a possible fix for the standby problem.  So I checked my system this morning and

the system entered standby around three AM.  It existed standby at 9 AM when I checked to see how the fix was

working.  I checked again around 5 PM.  The system was in standby since 9 AM until I checked it at 5 PM.

The patch seems to be working. 

Now the confusion.  My other system upgraded to the new firmware at 3 AM.  It entered standby at 4 AM when

it was done installing the new firmware.  When I checked it at 5 PM it was still in standby.  The new firmware does not mention any fix for the disk standby problem.

I guess I’ll have to spend a couple of days teesting to see if the new firmware corrects the problem.

RAC

how are you checking your standbys’? the user logs I hope?

here is my script for copying the logs to my share called Cloudy. Replace Cloudy with your share name.

rm -f /shares/Cloudy/user.log
cp /var/log/user.log /shares/Cloudy
chmod 777 /shares/Cloudy/user.log

You smb fix is one of many problems that is associated with sleep; it is not the main sleep problem as turning off my SMB did not influence greatly my Cloud sleep patterns. 

One of your problems was a SMB contention amongst 3 devices, thus fixing one of the device may end up fixing your whole network.

I hate to accuse WD of hiding this ongoing problem for 4 years, to the degree that most average user would argue with me that nobody is having this problem except a select few or those who notices it. For the most part nobody notices it as much anymore since WD are hiding it better.

  1. Like setting the sleep time to a fixed 10 minutes thus giving the illusion that device actually sleeps when it actually wakes up several times in an hour. Remember what I said in another post when we had a choice of 1 hour before it sleeps, it never does because of the write to disk flush that somehow occurs exactly 2 seconds after you put the drive into standby. Thus it waits for another hour and this process repeats itself. People complained 4 years ago and the new firmware took away the options for 30 and 60 minutes sleep time and gave us only on/off options.

  2. They also removed the disk access light so that accessing the drive no long blinks a led indicating that the drive is being accessed. This use to drive me crazy with the WD live when I can see the drive blinking like crazy throughout the night for months on end.

  3. in release 3.04 the scans were stopped if cpu and disk activity goes beyond a certain cpu percentage; thus delaying the initial thrashing until the disk is full of data and the hard drive return date has expired before continuing the thrashing of the scans.

  4. They also reduced the cpu usage of the scans with nice.

All of these fixes do not fix the sleep problem. The scans are still running, cron still wakes up the device at 3 to check for updates even if you tell it not to, logs are rotated and so on…

The fixes, fixes the cpu allocated to the jobs allowing the device to be a nicer NAS instead of a brick tthat was dedicated to scanning every file that you loaded on its hard drive. None of all the fixes over the last 4 years allowed the device to actually sleep.

I know it well and I know my logs shows how often the device wakes up.

After stopping all the jobs including Cron, the best I have seen is between 4-6 hours of standby with a scattering of 4 second, 4 minutes, 20 minutes wake ups. worst nights  I get between 20-30 wake ups through the night, best nights I get 10 wake-ups but still with few 4 seconds wake up.

Here is my script that I use to turn off the jobs on my Cloud.

#!/bin/bash
/etc/init.d/nfs-kernel-server stop
/etc/init.d/nfs-common stop
/etc/init.d/upnp_nas stop
/etc/init.d/mDNSResponder stop
/etc/init.d/wdphotodbmergerd stop
/etc/init.d/wdnotifierd stop
/etc/init.d/wdmcserverd stop
/etc/init.d/wddispatcherd stop

/etc/init.d/cron stop

I hope my input provides you with some insights. Thanks for continuing your monitoring.

Ralphael

I was wrong the new firmware still has the sleep problem.  As for my test system it sleep 6 hours then 8 hours.  I think

it would have slept 14 hours if I did not login to check the log.

One thing I find about the cron function is that it does a random sleep to reboot.   If new firmware can be rebooted even

if no firmware was downloaded.

If people don’t look at the user.log file they may not know that there system is not sleeping very much.  A guick check

would be to grep standby /var/log/user.log.

I think the atime may be the major cause of not sleeping.  Just not sure what the ramafications of turning it off.  It would stop the writes.  But there are side effects.  Relatime is suppose to correct the problem but it doesn’t seem to fix the problem.

The scans should only be done if files are added.  Not at specific times.

As for the smb problem.  Nobody knew what the problem was just that execessive disk usage was being done.  When

I reported my findings to the samba group.  They said that several OEM’s have been complaining for several years about

this problem.

This is my stop script

/etc/rc2.d/S85wdmcserverd stop
/etc/rc2.d/S86wdphotodbmergerd stop
/etc/rc2.d/S92wdnotifierd stop
/etc/rc2.d/S20winbind stop
/etc/rc2.d/S20minidlna stop
#/etc/rc2.d/S85twonky stop
/etc/rc2.d/S50netatalk stop
/etc/rc2.d/S60mDNSResponder stop
/etc/rc2.d/S95wdAutoMount stop
/etc/rc2.d/S20nfs-common stop
/etc/rc2.d/S20nfs-kernel-server stop
/etc/rc2.d/S61upnp_nas stop

You asked how I check the standby.  Here is my script.  I call it sleep.awk.  I then run it as follows.

cat /var/log/user.log|./sleep.awk

awk ‘{if ($8 ~ /after/){
hh1=int($9/3600)
mm1=int(($9%3600)/60)
ss1=($9%3600)%60
Total = Total + $9
printf “%3s %2s %8s %8s %5d %2d:%02d:%02d\n”,$1,$2 ,substr($12,0,9),$3,$9 ,hh1,mm1,ss1}}
END {
hh1=int(Total/3600)
mm1=mm1=int(($9%3600)/60)
ss1=($9%3600)%60
printf “Total Sleep Time: %2d:%02d:%02d\n” ,hh1,mm1,ss1 }’

RAC

PS What is the purpose of this line?  Why are you making it executable?

chmod 777 /shares/Cloudy/user.log

PS What is the purpose of this line?  Why are you making it executable?

chmod 777 /shares/Cloudy/user.log

I have to admit, it was laziness :stuck_out_tongue: when I first copied the user.log file over to the shares, i did not have permission to read it from my Mac, so I just did a chmod to all bit on which includes making it an executable :stuck_out_tongue:

I like your stop script and of course will be copying it… and I can see that you are indeed another linux guru.

I had modified monitorio once (down to a dozen lines) before I upgraded to 3.04 and I never bothered to copy it back in. I had wanted to add in the log formating of number of hours, minutes and seconds the device was asleep rather than the bare number of seconds. In addition I still think that echo statement and touch file just before hdparm -y is one of the wake up flush problems because time after time it wakes up a few seconds after hdparm is issued. Then exactly 5 minutes later (I have mine set to 5 minutes instead of 10), it tries to sleep again and is usually successful. 

Maybe I’ll try again as you seem to have a better sleep pattern than I and perhaps with my new and improved monitorio (it has a sync and sleep 5 in it and some fancy loops that monitors for a few seconds to check if everything has actually flushed before hdparm’ing it) , I might have a drive that sleeps 24/7 :stuck_out_tongue: and if smb is a problem, I can always turn it off as a sleep event and turn it back on when it wakes up with a AFP mounting.

In regards to executing scripts, I copied all my scripts into the /usr/bin directory so I just have to SSH into the device and type a script command like

standbyon (just copies an enabled standby.conf to /etc/standby.conf)

standbyoff

cplogs (the script above to copy the log to my shares)

So with all this said…

remeber to vote on the ideas page for fixing the sleep; no idea if it would help but at least there would be 3 people wanting their device fixed for sleeping.

Good job rac8006… keep it up…

Would it be possible to spin the disk down but leave the NAS itself awake?  Wouldn’t that stop it from spinning up from SMB/CIFS packets?

hello Jac70,

Actually that is exactly what does happen. Only the disk is set to standby and the NAS is still awake with logs and temp files being written to ramdisk;  unlike a laptop, there is no actual sleep hardware in this device.

However something does turn on the disk at random times from SMB/CIFS packets to log writes and more. All the jobs and processes that we have managed to identify and turn off are jobs that so far doesn’t really hinder the operation of the Cloud. such as that I still have remote Cloud access when I’m at the gym and I’m able to watch movies despite the fact that I have turned off the scans and thumbnails. The only drawback that I could see is that my movies and photos doesn’t have thumbnails.

I also modified monitorio.sh to do the sync sleep 5.  But I remember a long time ago they used to say that you need to do sync;sync;  Something about the first one not always writing everything.  So I do sync;sync; sleep 5.  I am seeing times

wjere the system will wakeup and no data has been read or written. 

RAC

okay Rac8006, have you been reading my old posts or are you eerily just doing this because of like minds?

Yeah I did a couple of syncs too at one point in time and when I still got those 2 seconds wake ups I decided to try a looped sync call  checking for any disk activities and syncing again and waitied till it was all clear for up to 60 seconds before finally issuing the hdparm comannd. I think it worked for the 5 seconds ones, but then I still got the 5, 10, 20 minutes ones, so I said **** and called it quits.

I attribue the wakeups “when no data has been read or written” to ethernet i/os, hardware interrupts; remember when we access the cloud app that wakes up the device, this is an ethernet i/o event that the drive recognizes and I am guessing it has to be hardware interrupt rather then something monitoring a port.

I am fairly sure it was the WD servers that cause the series of wakeups at every 25 minute intervals through out the night until I deleted the cloud access in annoyance; poof no more 25 minute wake ups.

Alternatively there are still logs generated by various parts of linux that still hasn’t been accounted for perhaps even redirecting them to null could still trigger an i/o interrupt. No idea.

However even the fact that you can get 8 to 14 hours of straight sleep is good enough unless you have been reporting a total sleep hour and the max is in reality 4-6 hours then I think WD still has a bit of work to do.

Ralphael

The sleep times I reported are 6 hours with no wake up.  I also had 8 hours with no wake up.  I’m going to let my test

system go for about 17 hours before I check it.

The disk has 8 partitions.  partitions 5 and 6 are identical.  Partitions 7 and 8 contain the same data but are different

sizes.  Partitions 1 and 2 are the root file system.  Partitions 3 is swap.  Partition 4 is the data partition.  The monitorio.sh

script checks for I/O activity on partitions 1 and 2.  If for some reason a read or write to the swap partition is done.  The disk

would wake up with no activity to the partitions being monitored.

RAC

Rac8006, 

congrats on your upcoming 14 hour “device” sleep. 

If you suspect atime, perhaps you may be right since you found the SMB flaw which may be fixed in an upcoming firmware.

good luck and do keep posting up your findings.

There are always users like myself that actually reads and enjoy playing with the tiny linux cloud.

Ralphael wrote:

hello Jac70,

 

Actually that is exactly what does happen. Only the disk is set to standby and the NAS is still awake with logs and temp files being written to ramdisk;  unlike a laptop, there is no actual sleep hardware in this device.

 

However something does turn on the disk at random times from SMB/CIFS packets to log writes and more. All the jobs and processes that we have managed to identify and turn off are jobs that so far doesn’t really hinder the operation of the Cloud. such as that I still have remote Cloud access when I’m at the gym and I’m able to watch movies despite the fact that I have turned off the scans and thumbnails. The only drawback that I could see is that my movies and photos doesn’t have thumbnails.

Ah, I see. Thank you.  Here’s the list of services I have disabled so far:

and a sample of my user.log over a 24-hour period.

May 12 00:02:08 WDMyCloud logger: exit standby after 111 (since 2015-05-12 00:00:17.750207001 -0400)

May 12 00:14:05 WDMyCloud logger: exit standby after 108 (since 2015-05-12 00:12:17.740207001 -0400)

May 12 03:00:08 WDMyCloud logger: exit standby after 9353 (since 2015-05-12 00:24:15.070207001 -0400)

May 12 03:05:02 WDMyCloud logger: disable lazy init

May 12 05:05:08 WDMyCloud logger: exit standby after 6588 (since 2015-05-12 03:15:20.170207000 -0400)

May 12 05:50:07 WDMyCloud logger: exit standby after 2090 (since 2015-05-12 05:15:17.840207000 -0400)

May 12 07:17:13 WDMyCloud logger: exit standby after 4616 (since 2015-05-12 06:00:17.320207000 -0400)

May 12 09:17:13 WDMyCloud logger: exit standby after 6591 (since 2015-05-12 07:27:22.800207000 -0400)

May 12 10:27:17 WDMyCloud logger: exit standby after 3595 (since 2015-05-12 09:27:22.820207000 -0400)

May 12 17:30:44 WDMyCloud logger: exit standby after 24798 (since 2015-05-12 10:37:26.850207000 -0400)

May 12 17:53:50 WDMyCloud logger: exit standby after 776 (since 2015-05-12 17:40:54.620207000 -0400)

May 12 18:57:44 WDMyCloud logger: exit standby after 3225 (since 2015-05-12 18:03:59.800207000 -0400)

May 12 19:08:05 WDMyCloud logger: exit standby after 12 (since 2015-05-12 19:07:53.840207000 -0400)

May 12 20:00:13 WDMyCloud logger: exit standby after 2518 (since 2015-05-12 19:18:15.100207000 -0400)

May 12 21:20:08 WDMyCloud logger: exit standby after 1767 (since 2015-05-12 20:50:41.270207000 -0400)

May 12 22:00:14 WDMyCloud logger: exit standby after 1797 (since 2015-05-12 21:30:17.730207000 -0400)

May 12 23:50:14 WDMyCloud logger: exit standby after 1154 (since 2015-05-12 23:31:00.750207000 -0400)

I need that Samba patch. :frowning:

If I understand what services you have running.  I’m assuming its the S** services that are running.  Which means

that samba is running.   The number from your log don’t show the samba problem that I was having.  My system

woud wake up every 12 minutes.  In fact your number don’t look that bad.  It looks like your system was sleeping for 19 hours 11 minutes.

rac8006, 

I thought I read your script correctly meaning that you are totalling the sleep time and not looking at the max sleep time.

While it is true that his device did sleep for 19 hours  11 minutes in total,

  1. it woke up at 00:2 after sleeping for 1 minute and 52 seconds, sleep timer at 10 minutes and

  2. again woke up at  00:14 after sleeping for 1:48 seconds.

3. At 00:24 it went to sleep for 2 hours and 35 minutes before being woken up at 03:00 which is the cron job to check for an update.

The max sleep time which is longest number of continuous hours that the device slept for is at 9:27am till 5:30pm for 8 hours.

8 hours is good, but it isn’t 19 hours as you are reading it.

As it stands, with all the various jobs killed, this device is working, however I still urge WD to patch up their firmware so control for the scans are at the menu level and the device can at least sleep for an average of 4-6 hours at a time without a user having to SSH into the device to kill jobs; although we will still probably do it, but it won’t be a necessity as it is now.

go vote here (Click here) and tell WD to fix this sleep once and for all. 

Ralphael

Any time that I have mentioned in the past has been single sleep time.  I do total the sleep time in my script.

But I have never mentioned my total sleep time.  I only mentioned the total sleep time for jac70 becasue sleeping 19 hours

out of 24 is not bad. 

Below is the last 24 hour log from both of my systems.  I let both systems sit from just before 11PM until I logged in

just after 7PM the next day.  My test system has the samba fix and the atime fix.  Not sure why the stock system

is not showing the samba problem.  On the test system from 6:38:10AM until 19:04:10PM there were no reads/writes

to sda1/sda2/md1.  The same from 03:10:12AM to  06:33:04 AM

This is the results of my test on my patched system.

May 13 22:48:05 03:00:12 15127 4:12:07
May 13 03:10:12 06:33:04 12172 3:22:52
May 13 06:38:10 19:04:10 44760 12:26:00

This is the results of my other system which is stock.

May 13 22:53:04 03:00:11 14827 4:07:07
May 13 03:10:11 03:10:19 8 0:00:08
May 13 03:15:25 03:17:12 107 0:01:47
May 13 03:22:17 03:24:38 141 0:02:21
May 13 03:29:43 03:51:46 1323 0:22:03
May 13 03:56:51 05:01:21 3870 1:04:30
May 13 05:06:26 06:05:12 3526 0:58:46
May 13 06:25:33 06:25:47 14 0:00:14
May 13 06:35:57 09:06:46 9049 2:30:49
May 13 09:11:51 19:05:28 35617 9:53:37

RAC

Rac8006

Alright, it is good to clear that up because we don’t want everyone totaling up their sleep time and say its good :stuck_out_tongue:

However it isn’t the max sleep hours that we are after either,  while it is good that we get WD to maximize a single continuous sleep hour we really need to “minimize” the high number of “2 second, 5 seconds, 50 seconds wake ups” as this actually hurts the drive more then an always on Mode; at least according to all the internet experts.  It is like flicking the light swith on and off until the bulb breaks.

I am almost ok even with an hourly wake up… almost … but I prefer at least several hours between wake-ups.

I like the results for your patched system; that is totally acceptable.

Your stock system looks exactly like mine which is what mine is anyways (stock); irratic wakeups, fitful tossing and turning of 8 seconds, 14 seconds wakeups and full rem sleep for 9:53 hours yours, 4 to 6 hours on mine.

Anyways Rac8006, you have stirred up quite a bit of dust at my place as I revisited much of this stuff.

Hopefully WD will create a sleep fix someday…

Ralphael

I put the atime fix in the stock machine.  Will check the standby intervals tomorrow.

RAC