My cloud nightmare!

Just bought the 4TB my cloud NAS, and thought the deployment would be a blizz. But it becomes a nightmare, and only after I login to the linux box with root, I know why.

WD, bad job!

  1. From user point of view, the NAS out of box seem to be fine to set up, but when I tried to use Windows Vista and 8 to access it, I can only create folders in the public share, and any file copy would take forever or just return with 0x8007003b error. So I google the error, and finally I see my firmware is old, so I updated the firmware to the latest. Now, I can copy some files with only 1MB/sec transfer rate. This is an unbelievable bad performance. Or one more thing, this box reports error when I try to get latest firmware, even though all internet status is fine (so I have to download the firmware, and load it locally)

  2. From technical point of view, this product is a very bad  and CHEAP design. Let’s see what a ssh session can tell me:

2.1) CPU: ARMv7 !

WDMyCloud:~# cat /proc/cpuinfo
Processor       : ARMv7 Processor rev 1 (v7l)

2.2) Physical RAM: 230MB !

WDMyCloud:~# cat /proc/meminfo
MemTotal:         230560 kB

2.3) System load: 3 when idling, but will be 5.5 with one samba session copying files!

The CPU used and the RAM installed already spill trouble for system performance. But a system load of 3 when there is no samba connection is really stunning.

I think WD design team should give me some explanations. From software to hardware, this product is a total failure.  Anyone with a bit Linux/Unix experience probably would tell them when a system load is more than 4, it is a virtual stalk!

Asside from the fact that this post is in the wrong forum (This forum is for the My Cloud and Mobile Apps), not the NAS products, None of this is valid.

If it were valid, no one would get the performance that many users are seeing:

Untitled.png

Digging into the Load Average question: 

Have you actually looked to see why the Load Average is greater than 1?

I have…

CloudNAS:/# top -b -n 1 | awk '{if (NR <=7) print; else if ($8 == "D") {print; count++} } END {print "Total status D: "count}'
top - 07:57:46 up 4 days, 17:05, 1 user, load average: 3.00, 3.10, 3.25
Tasks: 90 total, 1 running, 89 sleeping, 0 stopped, 0 zombie
%Cpu(s): 4.7 us, 1.1 sy, 0.0 ni, 92.8 id, 1.3 wa, 0.0 hi, 0.1 si, 0.0 st
KiB Mem: 230560 total, 176200 used, 54360 free, 40292 buffers
KiB Swap: 500732 total, 26928 used, 473804 free, 66092 cached

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    3 root 20 0 0 0 0 D 0.0 0.0 0:00.00 cpu1_hotplug_th
  371 root -2 0 0 0 0 D 0.0 0.0 0:59.41 btn_t
 2237 root 20 0 0 0 0 D 0.0 0.0 14:17.55 pfe_ctrl_timer
Total status D: 3

The btn_t process is a kernel thread poling a hardware register to see if the reset button is being pressed.

It’s not dependent on interupts, and not waiting on disk I/O so it’s not interfering with performance.

cpu_hotplug is part of the ARM architecture used to power up and down cores dynamically.   You can tell the process doesn’t actually run much (if it all) after booting (low PID and near-zero runtime.)

Tony,

I actually searched for the criminal when the unit was just rebooted and with a load of 3, but I could not find any special process at the moment to catch my attention. vmstat shows no blocked process either. 

If I run your top/awk, I got the same result as you have posted. So you are telling me that a system avg load of 3 is a normal thing. Well, I am not quite convinced, I am a unix/linux person after all, with such a 2 core system, load 3 should already post a big burden, in fact, I typically would point a finger to any system with Avg Load/CPU above 1. But anyway, read on:

So after my initial post, I started a 368GB directory copy using Windows browser, the maximum transfer rate was still 1MB/sec, and it never finished this morning (other funky errors). So I cancel the copy from Windows side. Now the system idling load is 4!

top - 10:15:23 up 10:34,  1 user,  load average: 4.11, 4.66, 4.91
Tasks:  88 total,   1 running,  87 sleeping,   0 stopped,   0 zombie
%Cpu(s): 50.2 us,  0.3 sy,  0.0 ni, 49.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:    230560 total,   177964 used,    52596 free,     3708 buffers
KiB Swap:   500732 total,     7096 used,   493636 free,   104248 cached

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
 5813 root      20   0 64140  17m  592 S  99.9  7.6 606:57.04 forked-daapd
 8177 root      20   0  2664 1012  688 R   0.7  0.4   3:12.53 top
 2214 root      20   0     0    0    0 D   0.3  0.0   1:11.16 pfe_ctrl_timer
    1 root      20   0  1688  532  444 S   0.0  0.2   0:09.21 init
    2 root      20   0     0    0    0 S   0.0  0.0   0:00.00 kthreadd
    3 root      20   0     0    0    0 D   0.0  0.0   0:00.00 cpu1_hotplug_th
    4 root      20   0     0    0    0 S   0.0  0.0   0:03.53 ksoftirqd/0
    5 root      20   0     0    0    0 S   0.0  0.0   0:01.03 kworker/0:0

The forked-daapd is killing the unit? I did used my iPad to access the my cloud last night to monitor the ‘My Cloud’ activity, and I thought iPad’s app access is using httpd. Nowhow can that forked-daaped becomes a persistent process taking 1 core after a long and failed windows file copy and a iPad cloud access, I can’t figure that out.

As I have described in my initial post, this is the “WD My Cloud 4TB” unit I just purchased (called NAS or not - to me it is a NAS design technically), and firmware was updated to WDMyCloud v03.03.01-156 : Core F/W last night just to get me to this point (I was getting ‘0x8007003B unexpected network error’ with the older firmware.)

One thing I forgot to mention:

I am using wireless connection with all my PCs - the full bandwidth is 54Mpbs. Obviously I can get your kind of transfer rate. But 1MB/sec? that is still too low.

If I do not use Windows browser, but use a command prompt xcopy /E /V /Y S:\music X:\music, the task manager show the wireless connection is taking roughly 1/3 hit - that would be roughly 54//3/8 = 2.25MB/sec.

So xcopy gives me about 2.25MB/sec transfer rate, but Windows browser copy gives me <1MB/sec. Is Windows browser is doing more stuff during copy?

Any other thought what the problem could be? Obviously, one smbd session is taking 1 core, and system avg load is ~4.3 during the xcopy session.

allenz wrote:

So you are telling me that a system avg load of 3 is a normal thing. Well, I am not quite convinced, I am a unix/linux person after all, with such a 2 core system, load 3 should already post a big burden, in fact, I typically would point a finger to any system with Avg Load/CPU above 1. But anyway, read on:

Yes, given the hardware architecture of this box, a load of 3 is normal.   Since the three non-interruptable processes aren’t actually doing anything with the CPU.

The load is three, yet the CPU load is 0% (99.8% idle.)   That still means 99.8% of the CPU is available to other processes, regardless of the CPU queue.  If those hardware triggers were interrupt based (instead of polled by software) the load would be 0.x.   But as long as the CPU is just “waiting” for you to press the reset button, then it’s not going to be a real load.

Being a linux / unix person, you should certainly be aware that you can’t just look at the load average as the sole source of information.

allenz wrote:

 

The forked-daapd is killing the unit? I did used my iPad to access the my cloud last night to monitor the ‘My Cloud’ activity, and I thought iPad’s app access is using httpd. 

forked-daapd is the iTunes server.   It has nothing to do with the iPad My Cloud App.  Any time you write a file to an iTunes Server-hosted share, forked-daapd is going to start indexing all that content (under the presumption that it’s media), extracting metadata, etc. etc.   If you don’t want it to slow your writes down, then disable iTunes server before you do bulk writes and then turn it back on when you’re done.

And, no, it’s not “killing” the unit…  Sure, it’s running close to 100% cpu.   But any process that’s not waiting for IO or being multiplexed with other busy processes will ALWAYS as much CPU as it can.  In your example, there’s no other process needing CPU, so of course it will be close to 100%.

allenz wrote:

As I have described in my initial post, this is the “WD My Cloud 4TB” unit I just purchased (called NAS or not - to me it is a NAS design technically), 

Understood, but you had posted this thread to the wrong forum.  You posted it to the My Cloud App forum (not the NAS forum.)  A moderator has since moved it to the correct place.

As to the weird errors you getting – I have no idea about that.   

I just deleted and re-uploaded my entire iTunes library (14,000 tracks, 44GB of data) and forked-daapd kicked off as soon as the first file was transfered – the box barely slowed down, and never missed a beat.

Last week I re-uploaded all of my Videos using FreeFileSync (about 988 gig) and it had no issues, either.

allenz wrote:

One thing I forgot to mention:

 

I am using wireless connection with all my PCs - the full bandwidth is 54Mpbs. Obviously I can get your kind of transfer rate. But 1MB/sec? that is still too low.

Well, you never know.  Unless you can duplicate slow speeds on a wired connection, you can’t rule out wireless as the issue.

Yes, Windows Explorer does other things than XCOPY does.

I’m a Cisco-Certified network engineer with 20+ years experience.  I’ve learned to despise wireless for anything more than a convenient connection.   You should never use WiFi when you’re expecting performance. 

The system’s physical RAM still bugs me:

top - 11:51:04 up  1:16,  1 user,  load average: 4.39, 4.47, 4.37
Tasks:  90 total,   3 running,  87 sleeping,   0 stopped,   0 zombie
%Cpu(s):  6.9 us, 43.6 sy,  0.0 ni, 23.1 id, 25.3 wa,  0.0 hi,  1.2 si,  0.0 st
KiB Mem:    230560 total,   184264 used,    46296 free,     1412 buffers
KiB Swap:   500732 total,    11388 used,   489344 free,   133416 cached

This is during the long xcopy session. I see the system starts to use swap (meaning slow) even with just one samba connection. From vmstat, I start to see some swap in/out activity, and from time to time, blocked process:

procs -----------memory---------- —swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 2  0  11760  46448   1380 132724    4   20    44  4136 2448 2259 10 43 41  6
 1  0  11760  46276   1380 132860    0    0    92     0 1696 1497 11 40 29 20
 1  0  11784  47240   1364 131720    0   24     0 20928 2724 2522  6 49 15 31
 1  2  11752  47516   1372 131524   12    4   552     4 2143 2144  9 49 28 15
 1  1  11756  46016   1360 132824    0   12   336    12 2967 2827 11 51 26 12

RAM is cheap these days. Why just have 230MB installed? Billed as my cloud, the typical use would be mostly for home use, at least, in most case, a family should be counted as 2+ persons, meaning 2+ simultaneous samba connections.  The choise or ARM and RAM really make me wonder (even though I can certainly understand choice of ARM is for power consumption saving, but why not 4 core?).

I used to have a buffalo NAS (with Intel cpu), which performs much better than this one.

IMHO, you’re tilting at windmills.  (Looking for problems that don’t exist.)

The Cloud is freaking FAST.  It’s among the fastest in its class, and certainly at the top of the list in its price range. What difference does it make what it’s doing with swap?

It’s almost twice as fast as my two QNAP TS-41x NASes for which I spent more than 4x the $$$.

It doesn’t have only 230MB of memory.  It has 256MB of memory (a portion of which is allocated to a RAM disk).

allenz wrote:

RAM is cheap these days.

Uhm, no, it’s not.  You have to think like a marketing department before you say that.  Even if it cost $10 more to add another 256M of memory, that’s $10 that’s not going to change much other than increasing the price.

allenz wrote:

a family should be counted as 2+ persons, meaning 2+ simultaneous samba connections.  The choise or ARM and RAM really make me wonder (even though I can certainly understand choice of ARM is for power consumption saving, but why not 4 core?).

Hardly ever.   I mean, how often do you really think two users are going to be writing files simultaneously?   The chances are so close to zero that it’s not even worth thinking about.   And if the likelihood is really as high as you think in your environment, then a home NAS isn’t the best choice, because memory ceases to be the issue then.  The issue during simultaneous writes is disk thrashing.  

Why not 4-core?  Because the price of the CPU would double.    The CPU in this box is plenty well suited to the task.   Most home NASes use either ARM or some other small scale processor.   

Now I am getting this weired error:

File creation error - The system cannot find the file specified.
X:>xcopy /E /V /Y S:\music X:\music
S:\music\04 Track 04 4.m4a
S:\music\06 Track 06 5.m4a
S:\music???-Demo.mp3
File creation error - The system cannot find the file specified.

Unable to create directory - X:\music\Compilations(2005) Concerto pour deux Voix

While command line of xcopy reports failure, the directory actually is there when I check from browser. This is a bit like what I observed yesterday before firmware update: the Windows browser says failure, but some files are already copied (not sure if copied files are good or not, though).

Well, I will try to move my PC to connect with wire to give another try. If there is still problem, I guess I will move on to return the unit.

Allenz:   I think what you’re seeing there specifically has to do with a very specific Samba bug.

WD is aware of it and we’re waiting for a fix.

Thanks, Tony,

It indeed looks like a bug to me (I have to deal with a samba bug on Solaris at work from v3.0 to v3.5, and finally got the issue resolved with v3.6.x, now at home, sigh ;))

Anyway, it looks like if I repeat the xcopy, I can also get another type of error:

File creation error - The specified network name is no longer available.

Sound like the samba share somehow disappeared during the long operation. The previous reported directory not existing issue could be the same issue.

allenz wrote:

 

File creation error - The specified network name is no longer available.

Please try doing the same thing while wired.

I would wager that the error will not happen.  I might lose, though…

well, after several xcopy failures (the failure is all during or in-between some huge file copying - 300MB/ea), I already wrap the unit up for returning to vendor.

I did not bother to try out the wired setting - even if I can copy all music there in a wired setting, what about future use? It could be a bug and fixable eventually, but my furture daily usage would be almost exclusively from a wireless connection. Nowaday in a typical house, I don’t know how many people has the luxery to wire the whole house, and wireless connection is a typical norm. If this product has issue with that, then it is not for me. Testing it out in a wired setting would be mostly just a curiosity.

If there is a fix expected within a week, I might wait to give it try. I can’t wait longer than that (for return purpose).

The point of my question was to rule out wireless problems.

For certain, the Cloud has no idea whether you’re wired or wireless.  There’s no difference as far as it’s concerned.

But if the problems you’re having go away when wired, then chances are you’ll experience the same issues (with wireless) regardless of which product you use.

allenz wrote:

Thanks, Tony,

 

It indeed looks like a bug to me (I have to deal with a samba bug on Solaris at work from v3.0 to v3.5, and finally got the issue resolved with v3.6.x, now at home, sigh ;))

A little more info on this (possible) bug is in this thread:

http://community.wd.com/t5/WD-My-Cloud/Editing-Picture-Metadata-or-File-Name-Not-Working-Properly/m-p/633195/highlight/false#M2425

You can look in the Samba log files to see if that’s what’s happening.

BTW, the Cloud (as of this date) is running Samba Version 4.0.0rc5, Nov 13, 2012.

The fix referenced in that Bugzilla report was pushed two months later.

WD just needs to incorporate an updated Samba build.

Looks like your V3.6.x at work may have the same issue.  :)