Any interests in Kernel 4.0 on My book live?

Yes, mostly patches. Most changes to smb.conf are no longer necessary with this and other recent samba versions. Some like max xmit and write cache size are still required.
Same with sysctl. All the network tuning parameters I used to modify/optimized are now directly set by my network kernel driver as either default or dynamically calculated.

Sorry about that, it’s essentially a tar dump from my development system. I have not cleaned up everything, also because I many questions about Samba config.h make

Libkcapi is the Linux Kernel Crypto API User Space Interface Library which I modified, compiled and installed. Maybe I did something wrong there or there is a bug in libkcapi install script as the library seems corrupt.
readelf -a /usr/lib/libkcapi.so.hmac
readelf: Error: Not an ELF file - it has the wrong magic bytes at the start
You can safely remove this library… Sorry for the inconvenience…

Nope, I just don’t use it. Totally fine to turn it back on, no impact on performance.

Dash is the Debian default non-interactive shell. For root (the only user I use on the system, I changed it to bash, for the same reasons you mention…

Just edit /etc/resolv.conf. In Debian 9 that no longer work since DNS and resolv.conf are systemd managed. Took me an hour to figure out why LinuxMint 19 would not resolve any address.
Avahi-daemon is a fine solution too. Good hint, will include that in my config as it makes config simpler.

With a failed drive unbricking under my belt, I’ve been having a play with the image you uploaded, thanks Ewald!

Being a fair Linux n00b, I’ve persuaded Webmin on there to add some kind of GUI. Is there some way to get the LED working as per the WD OS?

FWIW, the DNS stuff seems to be linked to Microsoft fiddling (breaking) SMB.

Hello everyone!
I’ve tried to install new kernels posted by Ewald here. Tried both 4.9.33 and 4.9.99. In both cases i was suffering from a very strange network perfomance issue. SSH refreshes the screen only once per 1-2 seconds or if i press any key on a client machine. SMB listing works ok, but file transmit speed is extremely low, like 1MB per minute. Looks like the TCP protocol in MBL kernel is waiting for early ACK, which generally does not happen in modern networks. Or it’s just freezing every few seconds without any reason. But i am totally not sure about this guess. lsmod is empty, the only sysctl value which is not default, is:
fs.inotify.max_user_watches = 32768

The filesystem is old and original, but as far as i know, it does not have to influence the TCP performance.

By the way, can i ask somebody, who successfully compiled the kernel, to send me iscsi_trgt.ko for any of these kernels? Setting up the toolchain for ppc kernel may take quite a long time, and making an ISCSI target is my only goal.

The led driver is functional:

echo red >/sys/class/leds/a3g_led/color

That said, you will still miss two functionalities:

  • package wd-nas (e.g. wd-nas_02.50.00-142897_powerpc), which includes a number of scripts such as /usr/local/sbin/monitorio.sh, ledconfig.sh, ledCtrl.sh etc. to set the led lights on certain events/conditions e.g. blue light when monitorio.sh manually sets the drive in standby. I posted this package and 2 other packages earlier in this thread, feel free to install & test. This is all user space code (shell scripts).

  • blinking green light on disk io
    I made this functionality conditional in the kernel SATA driver (drivers/ata/sata_dwc_ncq.c). It’s only turned on when compiled with DWC_DEBUG (lines with signal_hdd_led). If you know how to compile the kernel yourself, it’s easy to change, from

#if defined(CONFIG_APOLLO3G) && defined(DWC_DEBUG)
signal_hdd_led(1 /blink=yes/, 2 /* _3G_LED_GREEN */);
#endif

to

#if defined(CONFIG_APOLLO3G)
signal_hdd_led(1 /blink=yes/, 2 /* _3G_LED_GREEN */);
#endif

Alternatively, I can compile a version for you…
Ewald

@Vshmuk,
I did a check of the code but could see such an issue. Much of L2 is in hardware and L3 is in the common kernel network section. On my test network, I have low cost Gb switches (TP-link TL-SG108/ TL-SG105), but also tested with enterprise class switches from CISCO and HP (10Gb). Performance was always close to theoretical limit of gigabit interface speeds (e.g. 117MB/s samba reads, 122MB/s netcat).
The only one thing I can think about is that I may have forgotten to turn off jumbo packets on the Debian image I posted, so it might have an MTU of 4080, which provides the best performance given the HW.
If your switch does not support jumbo packets, you get this type of behavior, or worse e.g. with TP-Link ethernet-over-power extenders.

In kernel 4.9.x, the kernel does such a great job wrt. auto-tuning default parameters that I left everything default for ease-of-support & best performance (need to check why fs.inotify.max_user_watches was changed).

Did you try netcat or SFTP? Have you tested the disk read/write speed with the dd tests posted earlier or /home/root/bench.sh?
With Samba there are so many variables that impact performance. The sample config on my image is tuned for Windows 10 with SMB3.x.
Ewald

Well, MTU on eth0 is set to 1500.
fs.inotify.max_user_watches was changed because it is set in /etc/sysctl.d/ in my filesystem. There were more settings which could theoretically impact performance, so i removed them and left only this setting because it seems unrelated to network stack.
At the moment i am trying SSH (obviously underlying under SFTP), which has lags even on a ‘text level’. Midnight commander and similar ncurses-tools refresh their screens only if i send any data to the socket (like pressing any key) OR after 2-5 seconds.
About switches, mine is middle-level router ASUS RT-N56ac, and i haven’t met any problems with my router or previous versions of kernel on WD MBL. That’s why this is so strange.

UPD:

Have you tested the disk read/write speed with the dd tests posted earlier or /home/root/bench.sh?

Yes, they are okay. The problem is only with sending data to network.

With Samba there are so many variables that impact performance. The sample config on my image is tuned for Windows 10 with SMB3.x.

Generally i am using Apple’s AFP to transfer files. And it’s unusable at the moment (8M per minute or so). Anyway, i’ve tried connecting through SSH from different Debian-based amd64 clients.

Add: I’ve tried to compile the kernel myself, using your patches and config. The problem persists even with direct Ethernet connection without switches. Still no clue.

Thanks, I did see that and had a play with installing the DEB but I must be doing something wrong as nothing seems to have changed.

That would be awesome if you could manage it, I’m struggling to sort this out so I think kernel compilation is beyond me :slight_smile:

FWIW, I’ve simply untarred your “Debian Jessie 8.11 optimized for MyBookLive” from a few posts above if that matters.

@Vshmuk,

Really sorry to hear you have such bad and unworkable network performance.:unamused:

Good that you were able to eliminate switches etc. as possible root cause, even though with MTU 1500, I have never seen issues of this magnitude.

Unfortunately, I don’t have an Apple client to test. So I have not tested AFP, not even sure I compiled that in the kernel or that netatalk is installed…

Windows, Android, IPhone, multiple Linux are being used as test clients. Any chance to test NFS or SAMBA from your Debian client? I tested Linux Mint 18 (ubuntu 16.x Xenial), 19 (ubuntu 18.x Bionic Beaver) and Debian 7, 8 and 9 with ssh, sftp, NFS, netcat (nc) and Samba. Also tested another MBL with original firmware and my software. All ~100MB/s.

Have you tried with low level tools such as netcat ?

The network speed you will get of course dependent on the protocols being used. With SFTP, I also get only 6 to 7 MB/s read speeds,. I have not had the chance to recompile “scp” with hardware crypto libraries, so all encryption is in software and in user space.

The one thing that Samba is very sensitive to is a change in network configuration (e.g. MTU or kernel parameter) while a share is mounted. Performance may drop below 1MB/s. You need to disconnect the share first, execute “service smbd restart” and then map the drive again. In some scenario’s a reboot of Windows might be needed to restore decent performance. On Windows 10 or Linux, I have not seen the need for a reboot though…

A few final things to check:

  • flush the arp cache

  • netstat -s (you will see delayed acks, retransmits, …)

  • could it be hardware related (cable)?

Good luck!
Ewald

Thank you for your answer!
Well, what i have done by the moment (of course i tried different computers and cables).
I realized that it’s a fully networking problem and it does not depend on a high-level protocol. The networking stucks only if it does not receive any packets. For example, if i ping MyBook from a separate tab, the file transmit starts working with bandwidth peaks every second (frequency of ICMP echo). Netcat also have such an issue. Maybe it’s an issue with PHY driver, who knows.

Anyway, i’ve taken 4.1 kernel from here: GitHub - MyBookLive/kernel-4.0.x: My Book Live patches for vanilla Linux kernel, and with that patches it works pretty well without any networking problems. I have 60MB/s from my WD green (dd to /dev/null) on this kernel and now i’m not sure about how to tune it up, and if i have to or 60MB/s is ok.

@MattD_AU,

Ah, I wish one could just install these packages as-is. The reality is that they need to be fully rewritten for Debian Jessie and the new systemd. Maybe there is a way to do a quick rewrite of just monitorio.sh… No promises though. When I started this work, there were only two goals: use a recent version of Debian with recent security patches and get maximum performance out of our (old) hardware.

I will post a version of kernel 4.9.119 with led activity on disk IO. It’s running a 48-hour stress test now…
UPDATE:
New kernel 4.9.119 with hdd activity led indicator here
To install: save compress tar file to /tmp

cd /
tar xvzf /tmp/kernel_4.9.119_hdd_led.tgz
cd /boot
cp uImage uImage.save
cp uImage_4.9.119_hdd_led uImage
systemctl reboot

Sorry for the delay. I now recall the reason why I deactivated this: the original code was defective and could never work. They had a workaround in user space. So I had to re-code a few things. For the fun I also added magenta and cyan colors (just try: echo cyan >/sys/class/leds/a3g_led/color).

I must admit, it’s nicer with the hdd activity led :wink:
Please note that once there is a hdd error, the color will stay solid red and no more hdd activity will be shown,
To restore, just type:

echo green >/sys/class/leds/a3g_led/color
I will post the source code shortly, once it passed the 96 hour test.

Ewald

2 Likes

Very nice, thanks for posting! It is also helpful in case of network issues - I usually take a look at network activity of devices to troubleshoot first.

Absolutely looking forward to the source code!

I noticed that before, I ran a kernel 4.9.99mbl+, while it’s now 4.9.119mbl+x. Is the ‘x’ just for the LED activity issue, or are there further changes?

For the record: The 4.9.99mbl+ ran for 25 days without any changes, the MBL is now the most problem-free server in my house. Thanks for that.

@Ewald
I think you’ve been very successful and I’m extremely grateful for your time investment. Certainly its nice to run it again without worrying about unpatched vulnerabilities and the performance improvement is absolutely out of sight.

My unit is currently sitting bare on the desk with a temporary old testing HDD and is still banging into GbE limits for read speed and writing at 55MB/s. Pretty sure thats a 3x & 2x improvement!

@emk2203,
The extension is really of no importance. The “+x” basically means that it’s cross compiled from an Intel server.
With 16 cores and 3 SSD’s it compiles a full kernel in about 10s :smiley:
But I want to keep track of which compiler generated the kernel as I’ve had some issues with versions of gcc (byte swap code in certain contexts).

BTW. The kernel with hdd led support is called uImage_4.9.119_hdd_led but you can’t tell from uname.

Both 4.9.99 and 4.9.119 are very stable and there is no reason to upgrade, except for the hdd led activity.

Ewald

Kernel 4.9.119 with hdd activity led support released:

Patches (should be usable on any 4.9 version, but .77+ recommended)
Precompiled version

Changes:

  • hdd activity led support (incl. sources)

  • fixes from 4.9.119

  • sources released for network stack and EMAC ethernet driver

Unless any major bug turns up, this will be the last 4.9 release,
The little time I have is being spend on 4.14 and Debian 9.

Ewald

2X netconsole howto
The bad news: my development MBL has no UART pads, so I can’t solder a console cable to watch it booting and/or see where a kernel halts or panics. This is of course very annoying when doing kernel dev work.:unamused:
Specifically 4.14 has crashed on me more than I hoped for :sweat: , despite all the massive prep work from
the Lede/OpenWRT team.

The good news: with “2X netconsole” you can both watch system boot (uboot system console) and watch the kernel console messages (linux system console). I wrote a little howto guide and posted it here.
No more soldering! Unless of course you enjoy the thrill of it…

It’s even possible to interact with u-boot and issue console commands (and a whole lot more when flashing your own uboot), but enabling this is a bit more challenging and risky, so I have not documented this.

Enjoy,
Ewald

2 Likes

Not exactly the correct thread, but somewhat related.
My friend’s WD MBL 3TB seems to have crashed. It came with a WD Red NAS drive, which I removed and scanned to find that there are numerous bad sectors.
I tried to use the Guide 3 in the other thread and copied the image, but the WD MBL does not seem to book (there is an amber/ yellow) light that remains stable, and my router does not seem to be getting any request for granting an IP via DHCP.
I also have a WD Blue 500 GB HDD to test - and I tried the same 3TB image on this drive. The result is the same - it seems to be stuck on some amber light (I can hear the HDD operating underneath though).

Is the netconsole - helping in some way to see if this MBL is booting up at all? I suspect that the circuit board is having a problem.

@vakharia,
Yes, double netconsole might help as you will see both the uboot messages as well as kernel boot messages.
And it avoids the effort (and risk) from soldering on a UART header.
The one challenge you will have though is the following: to save the Uboot netconsole settings to non volatile memory (flash) you need to be able to read off boot.scr from the drive. That happens before kernel boot. But if the problem is really HW or some corruption in the root file system, you won’t get there as /boot will not be readable. Please note, root file system has to ext3 or ext2.
One alternative is to serve boot.scr (which activates netconsole) from TFTP (basically boot from TFTP).
NOTE: The default uboot TFTP/Bootp address is (serverip=) 172.25.102.35

Have you done a file system check on the filesystems on the drive (sd1 and /sd2) ?

Ewald

@Ewald: Thank you for responding. However, I should clarify that I am still on stock firmware and stock kernel. The MBL just stopped booting up one day. So, I don’t know how ‘double netconsole’ should be used when the drive is not even booting up (it seems stuck on the yellow solid light).
So far I have tried the following:

  1. Using Linux system restore, I deleted all the partitions and tried to write the 3TB image on the WD Red NAS drive that came with the MBL. I can confirm that that disk had a lot of bad sectors, so the result was that the yellow light did not go away.
  2. I have another spare 500 GB WD Blue, which is fine. I removed all the partitions and then wrote the same 3TB image to that disk and tried to boot that. I again am stuck on the yellow light.

I have also used the WD utility from Windows to try and repair the 3TB WD Red NAS drive, but the tests failed (too many bad sectors).

So, I am not sure what else can be done to check the filesystem.

I can setup a TFTP server, but I am not sure what needs to be done on the 500 GB WD Blue (so that it can interact with TFTP.)

I suspect that the motherboard/ circuit board of MBL is gone bad, but I don’t know if there is a way to test this theory (besides changing the hard disks which I have already done).

The first thing you can do is double check the partition layout to confirm its correct.

I’ve used this HOWTO and its “debrick.sh” script several times to debrick otherwise dead MBL’s and to install new, clean drives as large as 8TB.