Like most, I learn a lot more by doing things wrong before doing them right. Maybe, I can save someone some of my learning pain, I mean curve!

Wednesday, September 2, 2009

Let's Tune 'er Up!!

I've had a little time since my last post to work on the tuning. Please consider this some where between coarse to medium tuning and certainly not fine tuning!

As I stated in a previous post, using the 0.71RC1 straight install and enabling nothing more "large read/write" and "use sendfile" in CIFS, I was able to achieve transfer rates of approximately 17 MB/sec. Performance with ftp was actually lower, ~9-10 MB/sec. In order to tune a system, one must know what the components can do. The components in this case were the raw drives, the zfs partition and samba/cifs. The tests of these components and their results are shown below.

Disk Test
I ran diskinfo from an ssh console against the first drive in my array (all three drives are the same - 1.5 TB WD Caviar Green WD15EADS).

nas01:/# diskinfo -tv ad4
ad4
512 # sectorsize
1500301910016 # mediasize in bytes (1.4T)
2930277168 # mediasize in sectors
2907021 # Cylinders according to firmware.

16 # Heads according to firmware.
63 # Sectors according to firmware.
ad:WD-WCAVY0511783 # Disk ident.

Seek times:
Full stroke: 250 iter in 7.376821 sec = 29.507 msec
Half stroke: 250 iter in 5.211402 sec = 20.846 msec
Quarter stroke: 500 iter in 8.364027 sec = 16.728 msec
Short forward: 400 iter in 3.197826 sec = 7.995 msec
Short backward: 400 iter in 3.506082 sec = 8.765 msec
Seq outer: 2048 iter in 0.774789 sec = 0.378 msec
Seq inner: 2048 iter in 0.571217 sec = 0.279 msec

Transfer rates:
outside: 102400 kbytes in 1.051929 sec = 97345 kbytes/sec
middle: 102400 kbytes in 1.142268 sec = 89646 kbytes/sec
inside: 102400 kbytes in 1.950994 sec = 52486 kbytes/sec

Clearly, the disk performance is not the cause of the bottle necks since the minimum disk transfer was 3.5 time or more faster than my sustained transfer rate.

zfs Partition
I used dd to create a 1GB file using two different block sizes, 1 & 64 KB.


nas01:/mnt/vdev0mgmt# dd if=/dev/zero of=mytestfile.out bs=1K count=1048576
1048576+0 records in
1048576+0 records out
1073741824 bytes transferred in 47.028245 secs (22831850 kbytes/sec)

nas01:/mnt/vdev0mgmt# dd if=/dev/zero of=mytestfile.out bs=64K
count=16384
16384+0 records in
16384+0 records out
1073741824 bytes transferred in 11.612336 secs (92465619 bytes/sec)
Again, both of these cases exceeded my test case though only barely with a 1KB block size. So this is not the culprit.

Network Transfer
I use iperf to test transfer rates in both directions, from server to workstation and vice a versa. The work station used is a Quad Core Intel with 8 GB of RAM. The memory use during the tests never exceeded 6 GB eliminating any workstation disk interaction. I ran iperf with two transaction size, 8 and 64 KB. The results are:

C:\>iperf -l 8K -t 30 -i 2 -c 192.168.2.5
------------------------------------------------------------
Client connecting to 192.168.2.5, TCP port 5001
TCP window size: 8.00 KByte (default)
------------------------------------------------------------
[148] local 192.168.2.207 port 57889 connected with 192.168.2.5 port 5001
[ ID] Interval Transfer Bandwidth
[148] 0.0-30.0 sec 984 MBytes 275 Mbits/sec

C:\>iperf -l 64K -t 30 -i 2 -c 192.168.2.5
------------------------------------------------------------
Client connecting to 192.168.2.5, TCP port 5001
TCP window size: 8.00 KByte (default)
------------------------------------------------------------
[148] local 192.168.2.207 port 57890 connected with 192.168.2.5 port 5001
[ ID] Interval Transfer Bandwidth
[148] 0.0-30.3 sec 1.70 GBytes 482 Mbits/sec

Again, both of these cases exceeded my test case significantly but did show that transaction size has a lot to do with network efficiency.

samba/cifs Test
I used the Microsoft RoboCopy utility to copy a 1GB from and to the NAS servers from my Windows workstation. The results are:

W:\tmp>robocopy . c: temp.file (READ)
Speed : 19140465 Bytes/sec.
Speed : 1095.226 MegaBytes/min.

Ended : Wed Sep 02 17:57:30 2009

This test more or less approximated my original test though it was slightly faster.

Conclusions:
  1. Neither the hard drives, the file system nor the network were major contributors to the bottleneck.
  2. The bottle necks then are "probably" in the kernel (stack, ipc, filesystem and network drivers) and in samba/cifs

Being basically lazy, and definitely not a good scientist (sorry teachers :( ), I surfed the web and found some tried and true tunings for samba/cifs as well as some items that seem to make sense for the kernel. Note, that testing has shown this configuration works for my server. I think the samba/cifs settings will likely help on any server as they have for me over the years across multiple Linux and BSD distributions. The kernel tunings are likely to have a heavy depency on the hardware that I use, namely the 1.6 GHz dual-core Atom 330, the Intel chipset (945GC northbridge and ICH7 southbridge) and Realtek NIC (RTL8111C) built into the MS-9832 motherboard and its 2 GB of RAM. If you hardware is too disimilar, you will "definitely" need to validate values on your own.

Here's what I did to tune my server.

samba/cifs tweaks
I added the following two lines to the auxillary parameters on the services/cifs configuration page.

max xmit = 65535
socket options = TCP_NODELAY IPTOS_LOWDELAY SO_SNDBUF=65535 SO_RCVBUF=65535

I also set the send and receive buffers to 65535 to insure that is what they are.

kernel tweaks
I harvested my kernel tunings from multiple locations with references to their source embedded as remarks below. These additons were made to my /cf/boot/loader.conf file since I am booting from a USB flash drive. I used the advanced file editor in the WebGUI to make these changes since it takes care of mounting the flash drive read write and then resets it to read only
.

# http://acs.lbl.gov/TCP-tuning/FreeBSD.html
kern.ipc.shmmax=67108864
kern.ipc.shmall=32768
#
http://harryd71.blogspot.com/2008/10/tuning-freenas-zfs.html
vm.kmem_size_max="1024M"
vm.kmem_size="1024M"
vfs.zfs.prefetch_disable=1
#
http://wiki.freebsd.org/ZFSTuningGuide
vfs.zfs.arc_max="100M"
# ups spinup time for drive recognition
hw.ata.to=15
# System tuning - Original -> 2097152
kern.ipc.maxsockbuf=16777216
# System tuning
kern.ipc.nmbclusters=32768
# System tuning
ern.ipc.somaxconn=8192
# System tuning
kern.maxfiles=65536
# System tuning
kern.maxfilesperproc=32768
# System tuning
net.inet.tcp.delayed_ack=0
# System tuning
net.inet.tcp.inflight.enable=0
# System tuning
net.inet.tcp.path_mtu_discovery=0
#
http://acs.lbl.gov/TCP-tuning/FreeBSD.html
net.inet.tcp.recvbuf_auto=1
#
http://acs.lbl.gov/TCP-tuning/FreeBSD.html
net.inet.tcp.recvbuf_inc=16384
#
http://acs.lbl.gov/TCP-tuning/FreeBSD.html
net.inet.tcp.recvbuf_max=16777216
# System tuning
net.inet.tcp.recvspace=65536
#
http://acs.lbl.gov/TCP-tuning/FreeBSD.html
net.inet.tcp.rfc1323=1
#
http://acs.lbl.gov/TCP-tuning/FreeBSD.html
net.inet.tcp.sendbuf_auto=1
#
http://acs.lbl.gov/TCP-tuning/FreeBSD.html
net.inet.tcp.sendbuf_inc =8192
# System tuning
net.inet.tcp.sendspace=65536
# System tuning
net.inet.udp.maxdgram=57344
# System tuning
net.inet.udp.recvspace=65536
# System tuning
net.local.stream.recvspace=65536
# System tuning
net.local.stream.sendspace=65536
#
http://acs.lbl.gov/TCP-tuning/FreeBSD.html
net.inet.tcp.sendbuf_max=16777216

While I originally had the intent of testing the impact of the individual settings, I quickly grew bored of the reboots and rigor required (hence my reason for choosing an engineering vs. a scientific career :)). I can clearly say, you "must" disable the zfs pre-fetch in order to get read rates up to the levels that I have achieved.

Tuned Results!
My first test were to verify that I achieved my end goal of having read and write rates in excess of 35 MB/sec. My goal was achieved. Yeah!!!


C:\>robocopy w: . temp.file (READ)
Speed : 39179078 Bytes/sec.
Speed : 2241.844 MegaBytes/min.


C:\>robocopy . w: temp.file2 (WRITE)
Speed : 35574390 Bytes/sec.
Speed : 2035.582 MegaBytes/min.

While the previous test is verification of my goal, I wanted to see if the changes to the kernel network configuration changed the base network performance. It did, significantly.

C:\>iperf -l 8k -t 30 -i 2 -c 192.168.2.5
------------------------------------------------------------
Client connecting to 192.168.2.5, TCP port 5001
TCP window size: 8.00 KByte (default)
------------------------------------------------------------
[148] local 192.168.2.207 port 58332 connected with 192.168.2.5 port 5001
[ ID] Interval Transfer Bandwidth
[148] 0.0-30.0 sec 985 MBytes 276 Mbits/sec

C:\>iperf -l 64k -t 30 -i 2 -c 192.168.2.5
------------------------------------------------------------
Client connecting to 192.168.2.5, TCP port 5001
TCP window size: 8.00 KByte (default)
------------------------------------------------------------
[148] local 192.168.2.207 port 58334 connected with 192.168.2.5 port 5001
[ ID] Interval Transfer Bandwidth
[148] 0.0-30.3 sec 1.89 GBytes 537 Mbits/sec

nas01:/# iperf -l 8k -t 30 -i 2 -c 192.168.2.207
------------------------------------------------------------
Client connecting to 192.168.2.207, TCP port 5001
TCP window size: 65.0 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.2.5 port 55338 connected with 192.168.2.207 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-30.1 sec 2.11 GBytes 601 Mbits/sec

nas01:/# iperf -l 64k -t 30 -i 2 -c 192.168.2.207
------------------------------------------------------------
Client connecting to 192.168.2.207, TCP port 5001
TCP window size: 65.0 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.2.5 port 49701 connected with 192.168.2.207 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-30.1 sec 2.12 GBytes 604 Mbits/sec

The network tuning increased the small block size read rate from 275 to 601 Mbps and the large block size read rate from 482 to 537 Mbps. I did not capture the screen on my write tests, but the results were approximlately 550 Mbps untuned and are now 601 Mbps for the small block and 604 Mbps for the large block.

As I sleepily look back and what I have done, I realize that I have not tested the NAS server to the point of determining the effect of filling the cache since I have tested with files size of 1 GB and have 2 GB of RAM in the server. That will have to wait for another day.

Again, as I warn earlier, you "must" test to make sure that these tweaks are amenable to your hardware. If you are running 32 bit version of FreeNAS, there are many more kernel tweaks needed. If you are running with less memory, you will need to reduce some of the allocations. If you have a much larger server with many more clients, you will need to increase the allocations and probably want to have a better NIC than my bargain basement Realtek.

Realtek takes a beating in many of the NAS and network centric forums; however, depending upon the usage patternse, the Realtek can be a more than capable GigE NIC as this testing shows. It just bears getting the tuning right!!!

Bye for now, more errors are coming this way!!!

9 comments:

  1. Thanks for this blogpost! But I recommend not to use the ZFS tuning settings (see http://wiki.freebsd.org/ZFSTuningGuide).

    Good blogpost!

    ReplyDelete
  2. Thanks your for the informations

    ReplyDelete
  3. Wow, thanks for those hints!

    Being even less of of a scientist/engineer than you are, I didn't even bother trying the kernel tweaks. The CIFS tweaks alone got me from ~30MB/s to 55MB/s.

    I might try fiddling around with the kernel though, seeing as we both got almost exactly the same hardware.

    Anyway, thanks a thousand times!

    ReplyDelete
  4. Like alex, I havent even gotten to kernel tweaking and the samba changes alone took me from ~25MB/s to about 50ishMB/s via gige from a laptop to an intel ss4200 running 8 32bit with a zfs storage pool. Im rather liking this!

    ReplyDelete
  5. Thank you and Thank you! You save my headache and FreeNAS. The CIFS/SAMBA tweak along has increase my speed from ~25MB/s to ~55-75MB/s.

    Once again, I can't thank you enuff. I was about to trash FREENAS!

    thanks,
    Kelvin

    ReplyDelete
  6. freenas:/mnt/# iperf -1 -t 30 -c 192.168.178.7
    ------------------------------------------------------------
    Client connecting to 192.168.178.7, TCP port 5001
    TCP window size: 257 KByte (default)
    ------------------------------------------------------------
    [ 3] local 192.168.178.8 port 53272 connected with 192.168.178.7 port 5001
    [ ID] Interval Transfer Bandwidth
    [ 3] 0.0-30.8 sec 3.31 GBytes 925 Mbits/sec


    fast enough i think :)
    only enabled the tuning switch within nas and set kmem_size for 2gb of ram

    ReplyDelete
  7. Hi there,

    Thanks so much for posting this! Even though these are fairly old posts, it's been extremely helpful as I try to troubleshoot my ZFS FreeNAS system!

    One question: Can you confirm the type of ZFS partition used here? Are the 4 drives set up in a raidz2 configuration? Reason I'm asking is because I'm running a similar setup (4x 1TB Western Digital drives) and while I get similar per-disk performance, the 1GB write into the partition is quite poor. For example:


    ===
    lechon:~# diskinfo -tv ad4
    ad4
    512 # sectorsize
    1000204886016 # mediasize in bytes (932G)
    1953525168 # mediasize in sectors
    1938021 # Cylinders according to firmware.
    16 # Heads according to firmware.
    63 # Sectors according to firmware.
    ad:WD-WMAV51495305 # Disk ident.
    ...
    ...

    Transfer rates:
    outside: 102400 kbytes in 1.088883 sec = 94041 kbytes/sec
    middle: 102400 kbytes in 1.195945 sec = 85623 kbytes/sec
    inside: 102400 kbytes in 1.999985 sec = 51200 kbytes/sec
    ===


    However, the write into the ZFS partition is anywhere from 5-22x *slower* than what you've posted from your setup:

    ===
    lechon:/mnt/all# dd if=/dev/zero of=mytestfile.out bs=1K count=1048576
    1048576+0 records in
    1048576+0 records out
    1073741824 bytes transferred in 255.053981 secs (4209861 bytes/sec)

    (using a 64k block size)
    lechon:/mnt/all# dd if=/dev/zero of=mytestfile2.out bs=64K count=16384
    16384+0 records in
    16384+0 records out
    1073741824 bytes transferred in 238.061992 secs (4510345 bytes/sec)
    ===

    All 4 of my drives are attached via SATA to a separate PCI SATA card - Is there a way to tell if perhaps the card itself is slow and the bottle neck occurs when issuing the write across all the drives in the ZFS partition? Any thoughts or help would be appreciated.

    Thanks in advance!,
    george

    ReplyDelete
  8. Cool experiment. I've used Iperf before and tested some of the free network capacity tools and have found pathtest to be the most accurate - check it out, it is most customizable and, best part, free! www.testmypath.com

    ReplyDelete
  9. Hello!

    What tweaks do I have to apply in order to run a 32bit System?
    Thanks for your help!

    ReplyDelete

About Me

Houston, Texas, United States
Geek, sometimes its biting the head off of a chicken, sometimes its getting hit in the head while working on something :)

Followers