HOW-TO: Troubleshoot NAS Performance concerns

TonyPh12345 · November 8, 2013, 5:26pm

Here’s a simple way to isolate some specific possible issues with respect to performance with Network Attached Storage.

First, just a quick statement about expectations.

DON’T USE WIRELESS when performance is a must. There are just far too many variables that create unpredictability on WiFi networks. If you are using wireless, then you pretty much have to take what you get. There’s a number of guides out on the Internet about how to troubleshoot and tune WiFi networks for better performance, and I’m not going to duplicate those here.

The first thing to look for when having slow performance to the NAS is network-related issues. Again, we’re not talking Wireless here… this is for WIRED networks.

There are a couple of “rules” one needs to consider first.

Rule 1) Do NOT manually set the speed or duplex on a network device “statically” unless you also make the exact same changes at the other end of the cable.

In other words, never set your PC’s network interface manually to a specific speed (10 / 100 / 1000). Never set your network interface to a specific duplex setting (Full / Half) UNLESS you can explicitly configure the switch at the other end of the cable to have the exact same settings.

Most consumer-grade network devices (Routers / Switches / Hubs, etc) do not have the capability to manually set the speed and duplex settings.

It’s a poor practice anyway, because when you do this, you’re disabling Autonegotiation on the link. And when Autonegotiation is disabled on one end of the link, the device on the other end is unable to detect the duplex setting, and the IEEE 802.3 specifications require that device to then use HALF duplex. If you have one end set to STATIC Speed or Duplex, and the far end is NOT set the same, there will be a mismatch in settings and network errors will skyrocket.

Yes, there are some exceptions to this behavior, but ALL of those exceptions cause performance problems.

Nowadays, there’s just rarely a good reason to mess with these settings anyway.

Rule 2) Don’t listen to all those guides out on the internet that recommend enabling Jumboframes. Unless you know EXACTLY WHY you’re doing it, and fully understand the impact it will have on the rest of your network, just don’t go there.

As a general rule, don’t enable Jumbo-frames on ANY device in your network unless you can enable Jumboframes on ALL devices on that same network. This includes Routers, Switches, Wireless APs, etc. etc.

A “Jumboframe” is a generic term meaning any packet size larger than the IEEE802.3 standard-dictated MTU of 1524 bytes. Enabling Jumbo Frames means you’re adjusting the Maximum Transmission Unit (MTU) of a given device. Mismatched MTUs on your network devices can cause devices to discard packets (because they lack the internal memory to receive larger packets) or worse, cause “fragmented” packets (where large packets are carved up into many smaller packets) at some intermediate step in the network.

Yes, there are times where Jumboframes are useful. You just better know what you’re doing. The performance gains to be had with Jumboframes aren’t usually that substantial in the home-user case anyway.

So now let’s look at some specifics on your own network.

Using Notepad, copy the following text and save it as a file named “errors.bat” on your Windows desktop:

:begin
@cls
@netstat -es | findstr /C:Error /C:IP /C:Retra /C:ICMP
@ping 127.0.0.1 -n 5 -w 1000 > nul
goto begin

Next, shut down all applications & software running on your PC to help reduce the “noise” on the network during the next steps.

Now, double-click the icon for the batch file you created, and you’ll get a screen that looks something like this:

Errors 0 0
IPv4 Statistics
  Received Header Errors = 0
  Received Address Errors = 0
IPv6 Statistics
  Received Header Errors = 0
  Received Address Errors = 0ICMPv4 Statistics
  Errors 0 0ICMPv6 Statistics
  Errors 0 0
TCP Statistics for IPv4
  Segments Retransmitted = 42
TCP Statistics for IPv6
  Segments Retransmitted = 0
UDP Statistics for IPv4
  Receive Errors = 2
UDP Statistics for IPv6
  Receive Errors = 0

Don’t worry about the initial numbers; they’ll be different for everybody.

It’s helpful, sometimes, to start from a fresh-boot so that the numbers are as close to zero as possible.

Now start a large file copy FROM your PC, TO your NAS (not the other way around) … one that would ordinarily take a few minutes.

The screen will refresh roughly every five seconds.

What you’re looking for are CHANGES to those numbers over time.

Changes in these numbers indicate network errors.

On the first line: Errors

These are PHYSICAL errors. If either of these numbers are increasing, you’ve got a hardware problem. If the first number (RECEIVE errors) is increasing, the problem could be the switch, cable, or PC hardware interface. If the second number is increasing, the problem is most likely a hardware issue in your PC. This will NOT indicate a NAS hardware or NAS cable problem.

In the next section: IPv4 Statistics

If “Received Header Errors” is increasing, then something on your network is sending malformed packets that are being discarded by your PC. If the number stops increasing when the file transfer stops, then this points to an issue with the NAS device.

Don’t worry about the “Received Address Errors” value. An increase of a few per minute isn’t unsual.

We’ll skip over ICMP and IPv6 Statistics because they’re not relavent here.

Next, and probably most important, is " TCP Statistics for IPv4, Segments Retransmitted"

If this number is increasing, you’ve found a significant issue. The faster the increase, the worse the performance will be. A “perfect” scenario will, of course, be ZERO increase.

Each time this number increments, it’s indicating that a packet from your PC to your NAS got discarded, or arrived too late at the NAS to be of use. The PC then had to retransmit the data. This causes a performance penalty because, for a period of time, NO data is moving across the network.

A probable reason for this (and all too common) is that the switch(es) used in the network are discarding the packets due to buffer overflows. Many “consumer” grade switches have TINY buffers… sometimes as little as 8 KBytes per port.

Most consumer-grade NASes, though, are tuned up to 64 Kbytes segments or “windows.” You’ll need to read some tutorials on TCP/IP to understand what these numbers mean. But as an example, if your PC or NAS sends a huge burst of 64 kilobytes of data, and your switch can’t buffer it all, it can then lose a portion of the data causing retransmissions and a slow-down in performance.

For this reason, I recommend Gigabit Ethernet switches with a MINIMUM of 128 Kbytes of buffer memory per port.

I also recommend switches on which IEEE 802.3x Flow Control is either permanently enabled or configurable. Flow control allows a two-way communication between the switch itself and the nodes attached. If the switch senses that its buffers are about to be overrun, it can tell the sender to momentarily stop sending data. This can prevent a lengthy and “expensive” TCP timeout and retransmit.

WD NASes that are known to support Flow Control (and have the feature enabled by default):

WD My Book Live
WD My Book Live Duo
WD My Cloud (NOT the EX4)

The WD My Cloud EX4 does not appear to have Flow Control enabled. The command line lacks the utility to check, and my switches do not report ever seeing “Pause” frames received from the EX4.

So, if you’re getting discarded / retransmitted segments, I recommend repeating the test when the NAS is connected DIRECTLY to your PC. Consult various other posts in the forums on how to set this up.

Do the same thing on the Cloud NAS to see problems in the other direction

SSH into your Cloud. Consult the manual for how to enable SSH. Don’t worry, you’re not going to be voiding your warranty because you’re not going to edit anything.

When you’re logged in, paste the following comand into the command line EXACTLY as it appears:

watch -n 1 'ifconfig -s eth0 ; netstat -s | egrep -i retrans\|error\|tcp:'

Your screen will begin outputting information like this:

Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
eth0 1500 0 178399 0 0 0 1964124 0 0 0 BMRU

Tcp:
    3 segments retransmited
    0 packet receive errors

Now, start copying a large file FROM your NAS, TO your PC (The opposite direction from the above steps) and then return to the SSH window to watch the statistics.

The only numbers that should be increasing regularly are under the columns RX-OK and TX-OK . (The columns can be difficult to match up, but they are the third and seventh number after “eth0” in the 2nd line.)

Any other increasing number is indicating error conditions. If other numbers are increasing in “eth0” section, you may have a cable or switch problem.

If RX-ERR is increasing, it is a possible switch port or cable issue.

If RX-DRP is increasing, this may indicate the NAS is too busy to handle the traffic. If your performance is poor and you’re certain no other users are accessing the NAS, then this is suspicious.

If RX-OVR is increasing, this is a switch problem, or something on your network is configured with Jumboframes that shouldn’t be.

If TX-ERR is increasing rapidly (more than 1 every few seconds) this may be a hardware fault in the NAS.

If TX-DRP is increasing, the NAS is trying to send packets faster than your network will accept them – Are you using a Gigabit Ethernet interface?

Lastly, under TCP, the retransmissions here indicate the same issue as described above.

Let me know if this helps y’all!

Ichigo · November 9, 2013, 8:42pm

Awesome guide Tony, bookmarked.

TonyPh12345 · January 28, 2014, 12:46am

I just added a paragraph regarding IEEE 802.3x Flow Control. If shopping for switches, it’s a desirable feature.

My NetGear GS108Tv2 switches have this feature, and it is being actively used.