Like most, I learn a lot more by doing things wrong before doing them right. Maybe, I can save someone some of my learning pain, I mean curve!

Friday, October 1, 2010

A new day, a new perspective

OK, I won't butcher another song :), but its amazing how much 24 hours can change ones perspective. I have traced the majority of my performance issues to a bad WD15EADS drive. While the drive isn't reporting any problems, it will not transfer at a speed greater than 10 MB/s. The drive is still under warranty, so I am going to file a warranty repair request with WD today. In the mean time, I have configured my 2 good WD drives in a zfs mirror that is large enough to hold everything that I need to have online for a while
Thanks to the anonymous commentor on my earlier post who called my attention to the correct sector size for the EADS drives.  It is 512 bytes, not 4KB, so I have removed my gnop configuration.
Irrespective of the drive, I have finally been able to confirm that my current motherboard's ICH7 implementation does not support AHCI so I am definitely limited to ATA100 performance. With the bad drive out and restoring with zfs send/receive, I did see all of my drives peak at 100 MB/s from time to time. I can now reliably get in the box transfer of in the 35 - 45 MB/s range. This is more inline with my expectations.

I'm still of the opinion that if you are going to use zfs on FreeBSD 8.1 that 8 GB of RAM is the minimum.  While I now have a reasonable solution with 2 GB, it has taken an innordinate amount of my time to get it there.  I hope others reading this will be able to learn from my experience and save a lot of time.

Without going into a lot of boring detail, the issue with the RAM is as much a problem with FreeBSD as it is with zfs. If you are doing large transfers (i.e. > 10X RAM), FreeBSD 8.1 will not release its Inactive memory fast enough and the cache will starve off thereby at a minimum impacting performance. The worst part of this is that FreeBSD will panic and shutdown from time to time. And since the FreeBSD support for my current board's ACPI isn't solid, the box locks up and doesn't recover without physical intervention. So from a stability point, it seems to me that you need to have enough RAM to make it unlikely that you will starve off the cache and put the OS in a position to panic. I am hoping, that this will be addressed in 9.0 but I'm not enough of a FreeBSD kernel geek to know.

Given that I am back to a sufficient level of performance and the reliability is good, as long as I don't try to remotely execute big transfers, I am going to hang with my current motherboard. In looking around a bit, I received some good recommendations regarding Intel's I3 as well as their current ATOM chips.  I like the I3's; however, the cost is a bit more than I want to go at this time. The current 4 GB limitation on the ATOM keeps me from wanting to make an investment in them at this time. I'll keep my eyes open and when chip makers present a sufficiently interesting motherboard, I'll look at upgrading.

lbe

PS: if you are interested in what my current /boot/loader.conf looks like, here it is.  Given the limitations of my current level of ignorance, I think this is about as good as it gets for FreeBSD 8.1 on a dual core Atom 330 configured as a NAS server using zfs.

kern.ipc.shmmax=67108864
kern.ipc.shmall=32768
vm.kmem_size_max="1024M"
vm.kmem_size="1024M"
vfs.zfs.arc_min="256M"
vfs.zfs.arc_max="784M"
vds.zfs.vdev.cache.bshift="16"
vfs.zfs.vdev.cache.zise="10m"
vfs.zfs.vdev.cache.max="16384"
vfs.zfs.vdev.min_pending="4"
vfs.zfs.vdev.max_pending="12"
vfs.zfs.vdev.aggregation_limit="131072"
vfs.zfs.vdev.ramp_rate="2"
vfs.zfs.vdev.time_shift="6"
kern.ipc.maxsockbuf=16777216
kern.ipc.nmbclusters=32768
kern.ipc.somaxconn=32768
kern.maxfiles=950000
net.inet.tcp.delayed_ack=0
net.inet.tcp.inflight.enable=0
net.inet.tcp.path_mtu_discovery=1
net.inet.tcp.recvbuf_auto=1
net.inet.tcp.recvbuf_inc=524288
net.inet.tcp.recvbuf_max=16777216
net.inet.tcp.recvspace=65536
net.inet.tcp.rfc1323=1
net.inet.tcp.sendbuf_auto=1
net.inet.tcp.sendbuf_inc =524288
net.inet.tcp.sendspace=65536
net.inet.udp.maxdgram=57344
net.inet.udp.recvspace=65536
net.local.stream.recvspace=65536
net.local.stream.sendspace=65536
net.inet.tcp.sendbuf_max=16777216
aio_load="YES"
net.inet.tcp.mssdflt=9142

Tuesday, September 28, 2010

Motherboard Blues

My apology to Wilbert Harrison who wrote the blues classic Kansas City.  Hum along if you know the tune;)

I'm going to Insanity, Insanity here I come
I'm going to Insanity, Insanity here I come
They got a crazy way of serving there
And I'm gonna get me some.

I'll be standing in the corner
In the corner of FreeBSD and hardware support
I'm gonna be standing in the corner
In the corner of FreeBSD and hardware support
With my Insanity server
And a bottle of Insanity RAM.

Well I might take IDE
I might take SATA, but if I have USB
I'm gonna get there just the same
I'm going to Insanity , Insanity here I come
They got a crazy way of serving there
And I'm gonna get me some.

I'm gonna pack my files
Leave at the break of dawn
I'm gonna pack my files
Everydrive will be sleeping
Nobody will know where I've replicated
Cause if I stay in the zpool
I know I'm gonna scrub.
Gotta find a friendly mobo
And that's the reason why,
I'm going to Insanity
Insanity here I come
They got a crazy way of serving there
And I'm gonna get me some
Its amazing where my minds goes sometime.  This little ditty started with just the first line and before I knew it I had the whole song done.  If you like blues, or don't know what blues are, then check out Roy Clark's version of this great standard.  He adds the line Kansas City, I spent a week there one day.  I feel much like that with my current learnings: Insanity, I spent a week there one day.

Over the last couple of weeks since my last post, I have been tweaking and testing and overall have been unhappy with where I currently am with my server.  In short, I have reached the conclusion that 2 GB of RAM just isn't enough RAM for a zfs file server "if" you want a consistent level of "reasonable" performance (i.e. aggregate file transfer >= 30 MB/sec.  My current motherboard only supports 2 GB.  So, I am halting my testing with it and am turning my search to finding its replacement.

I have started a thread on Hard Forum, Need MB rec for Low Power, ZFS, FreeBSD, Home NAS, asking for help. 

I'll let you know what decision I make!!

lbe

Wednesday, September 8, 2010

FreeBSD ain’t free, if I value my time and include the cost of confusion!

There I said it.  Flame on to all of you FreeBSD idealogues!!!  If you look through the history of this blog, you will see that I have been about addressing my NAS needs for a while, well over a year with FreeNAS running on FreeBSD 7.2 and now with FreeBSD 8.1 rolling my own.  If you look at the timestamp of the entries, you will notice some significant gaps. 

There are several sayings that could be applied during these gaps.  “No news is good news” or “ignorance is bliss” are examples of the positive and the polite; however, to do it justice requires Latin: “non impediti ratione cogitationis.”  In English, this translates to “unimpeded by the thought process.”  Like most things that I know, I can’t take credit for this phrase. I learned about it from a couple of my idols, Click and Clack the Tappet Brothers of Cartalk. 

I now know that I wasn’t even thinking about somethings that could have and should have bothered me.  Unfortunately, now that I know about them, and think I am on a path to addressing them, I am now worrying about what else I do not know!  Read on ...

What have I learned?

1.      If you want to run zfs without worries, run it on some big honking hardware!
2.     Even with big honking hardware, you should still be worried. You will still get yours, just later instead of sooner:)
3.     Reliably running zfs on FreeBSD on commodity, low power hardware requires the sacrifice of millions of brain cells on a weekly if not daily basis!
4.     Always have backups!!
5.     Always have backups of your backups!!!

Gripes first -

I won’t lie and say that I always RTFM, but I RTFM much more than most.  So when I started out of this venture to build the perfect home NAS server, I spent a lot of time in my recliner with my laptop on my bulging belly, both which can be attested to by my wife, considering not only what choice should I make but also how should I make it.  I really thought I had found an appliance approach in FreeNAS with which I would be happy.  Me being happy with IT appliances in general is a non sequitur because I always want to add some things that aren’t done just the way I want.  But FreeNAS got me a long way to where I wanted to go.

Initially FreeNAS was reliable and seemed to be performant, but I had to go and add sabnzbd to download my wife’s, yes it must be her fault :), favorite TV shows from supernews and add Sick-Beard to handle setting up the downloads.  What’s the harm with a couple of little Python programs?  It wasn’t that difficult to hack Python 2.6, sabnzbd and Sick-Beard into my embedded version of FreeNAS.  For brevity’s sake, I’ll skip the details.  Prior to doing this, FreeNAS was running pretty well though I had figured out the FreeBSD 7.2 and my Atom 330 based MB didn’t get along well with hyper-threading enabled.  But following this, I started to notice performance problems when synchronizing my digital photography library from my desktop to the server using a robocopy script that overwrote everything every time, my error, not robocopy.  But I said, what the hey and ignored it.

Then the server started crashing from time to time and I couldn’t figure out why.  I enabled a syslog server, poured through the logs, couldn’t find anything, couldn’t make it crash, ...  I was going crazy trying to figure out what was up.  I reached a point where I was ready to give up on FreeBSD and go back to my tried true reliable Linux and ignore file system goodness until btrfs is ready for use.  But before totally jumping, and having to migrate data, I decide to stick a different CF card into my CF/IDE adapter and install FreeBSD 8.1 RC2.  I decided I could forego the FreeNAS GUI if I could get stability and consistent performance.  Voila, it appeared that I had it, until ... I tried to a duplicate of about 500 GB of data to an external drive to start keeping in a drawer at my office as an offsite backup of the critical files.  When doing this, I started to see big time performance problems and started to be able to crash the server somewhat regularly with a cryptic but key error message about running out of kmem. 

In researching this, I learned the real meaning of some of the tuning values that I had in my /boot/loader.conf – vm.kmem_size_min and vm.kmem_size_max as well as vfs.zfs.arc_min and vfs.zfs.arc_max.  With a little trial and error, I came up with appropriate settings for my little box and eliminated the crashes; however, the transfer performance would drop off as rsync would run over time to replicate my critical data.  Finally, I came across references to problems with Western Digital’s 1.5 TB (WE15EADS) Green drives that I am using. 

The drives have a 4KB physical sector but report 512 Bytes to the BIOS.  So performance drops off on really big writes because zfs on FreeBSD sends 4KB of data to the drive as 8 separate writes of 512 bytes, which requires the firmware in the drive to increase its work load by an estimated factor of 60 (1st 512 Bytes - write 4KB, 2nd 512 Bytes, read 4K, write 4K, ..., 8th 512 Bytes, read 4K, write 4K -- so 4KB of writes become 4KB write + (4KB read + 4KB write)X(4KB/512Bytes - 1) = 60.  The drives built in 32 MB cache helps until it fills and the zfs arc kicks in and then the arc begins to fill.  So all in all, no big deal right?

Actually it is a very big deal if you are writing files to zfs that are larger than your arc plus the size of the buffer on the drive.  And because of the behavior of the zfs arc cache code on FreeBSD, notice I am not calling it a bug because I don’t know enough to point to where it is, the allocated memory is not made available to be re-used at a rate fast enough to sustain the transfer speed and the throughput drops over time. 

You can observe this yourself by executing a copy watching the free memory in top drop while the inactive memory increases.  This is further exacerbated by memory in the wired pool not being marked as inactive quickly enough.  This “appears” to me to indicate that FreeBSD and/or zfs is too aggressive in grabbing memory for caching relative to the rate at which it releases it.  This results in transfer speeds well below those of other operating systems running on the same hardware.

Fortuitously, BSD provides gnop, a drive geometry abstraction layer, to create another lie to offset the lie told to the BIOS (512 Byte sectors instead of 4KB sectors) by the drive.  Unfortunately, this layer is not saved as metadata on the drive so it will not persist through a reboot.  Fortunately, I found a script on a Japanese web site, nothing on the site except for the script and the tests were in english, that I used to create the gnop geometry entries prior to starting zfs.

This significantly increased my performance; however, over time, the transfer rate would still drop because of the aforementioned memory allocation issue, note I didn’t say problem or bug ;).  But, I found another person who created a one line perl command, yes perl to the rescue!!! who needs a stinking snake ;), that tried to allocate an exorbitant amount of memory.  This does trigger the FreeBSD memory management to release the memory and the kernel kills the overreaching little process to boot.  This result in freeing up the memory for re-uses.

So with the gnop geometry implemented and running the perl one liner in cron, I am able to sustain a whopping 9-10 MB/sec sustained transfer rate that pretty much renders the server unusable while big transfers occur.  While this isn’t great, it is much better than getting down to 1-2 MB/sec and crashing!

Fortunately, these big writes don’t occur too often, so most of the time, my little low power box can pump transfer rates on the order of 30 - 40 MB/sec as long as the files don’t exceed about 750 MB, based upon my current tunings.  I am currently in the process of implementing L2ARC using a CF card in a CF/IDE adapter and a ZIL using higher write speed CF card (my cheap version of SSDs).  I am also going to add a third CF on which to place the cache and log directories for sabnzbd and Sick-Beard as well as the SQLite3 database for mediatomb to insure that these applications remain fairly responsive during periods of heavy SMB or rsync usage.  While this sounds expensive, I ordered all of the parts needed from Amazon for slightly under $100.  You could use USB flash drives with similar benefits if you want to go even cheaper.  But I am not always happy with FreeBSD’s performance with USB media so I decided to go the CF/IDE route.

Recommendations

I’ll put out an update in the next couple of weeks to let you know how things go along with what I think is a prudent methodology for others to use in selecting their hardware, configuring their OS and zfs, and in tuning.

If you feel like building a home server using zfs on low power, relatively wimpy software, take 2 aspirin and lay down until the feeling goes away J!  If you can’t resist, here are my top ten tips:

1.     Run zfs on 64-bit capable hardware.
2.     Hyper-threading may be a problem with FreeBSD kernels and some chipsets.  The problems are both performance and stability related. Test, Test, Test!!
3.     Put as much memory as you can in the box, 2 GB minimum >4 GB recommended.
4.     Be careful choosing disks.  Stay away from advanced format disks that don’t honestly report their physical sector size. 
5.     If you use disk that report a different sector size than physical to the BIOS, use gnop to correct
6.     Use raw disks, do NOT partition them
7.     Implement an L2ARC and ZIL using flash or SSD.
8.     Set your vm.kmem_size max to roughly ½ your total physical memory.
9.     Set you vfs.zfs.arc_max to roughly ¾ of the vm.kmem_size.
10.  Hit the file system hard, both read and writes, and monitor your vm.kmem_size and vfs.zfs.misc.arcstats.
11.  BONUS TIP:  Perform tests that match your expected usage pattern so that you aren’t surprised as I was when performing large transfers.

In closing, thanks to sub.mesa for his concise documentation on FreeBSD, zfs and his commentary on WD drives.  Thanks to Brendan Greg for an excellent article on zfs L2ARC. 

While I have little doubt that zfs on FreeBSD is the most performant reliable copy on write filesystem available today without spending large sums of money, I am not sure how long this will be true.  I believe that the FreeBSD release cycle and the conservative nature of its maintainers may actually be working against it users desires in this case. 

The fuse-zfs project seems to have exorcised many of its reliability demons and is now more feature rich with its implementation of zfs (pool version 23) on Linux.  It is still lacking some on performance, but not by a whole lot. 

btrfs appears to be coming along at a pretty fast rate.  Though both are owned/maintained by Oracle, btrfs seems to me to have a life going forward even if Oracle totally shuts down its participation; whereas, zfs’ path past the currently released code seems to be dead outside of Solaris. 

And while these quandaries exist, Microsoft continues to fairly quietly sell Windows Server 2008 with a very tried and tested file system with robust snapshot and performance capabilities. Flame on if you must ye ideologues of ole.  NetApp continues to sell their Filers; and Veritas continues to sell its very expensive solutions. 

I personally believe that it is time for the open source world to put its differences behind them, BSD or Linux, ext or ufs, zfs or btrfs, and pick something that can deliver a robust and performant copy on write filesystem with the right features!  Both zfs and btrfs have similar delivery goals but go about things somewhat different.  In the end, I am an engineer, sigh, I care about what works reliably, what is performant, and what is supportable. 

Thanks for sticking with my griping and complaining this far.  I promise I’ll be better next post ;)

lbe

Friday, August 20, 2010

Trials and Disappointments of the last year

It is hard to believe that it has been almost a year since my last update.  I assure that this is not because I have ceased making errors or learning from them.  With Volker moving to Debian, FreeNAS being frozen with FreeBSD 7.2, at least for now, Oracle's Acquisition of Sun and the resulting demise of Open Solaris, my inability in achieving stability with SABnzbd and Sick-Beard hacked into embedded FreeNAS, I decided in June to hang up FreeNAS.

As you can see in my blog, I have had a lot of interest in getting to the benefits of ZFS.  I spent a lot of time investigate the possibility of going to Linux with BTRFS and had overal reached a decision to do that.  But, because of my laissez faire nature, I really didn't want to write off and relead a couple of terabytes of data to make the transfer.  So I decided to give FreeBSD 8.1, which was in RC at the time, a chance.  While I like the allure of appliances like FreeNAS with their simple web interfaces and fences to keep you on the right path, as a long term fiddgle and IT designer, I always feel hemmed in with this, hence my desire to add SABnzbd and Sick-Beard.

While installing FreeBSD 8.1, I came close, very close to abandoning it.  It had been years since I worked with it at the system administration level.  The differences in paradigms from Linux and the at times lacking documentation were driving me crazy.  In making some poor decisions, I got into to multi-day ports compilations on my relatively weak hardware.  But in the wee hours of the mornings across a weekend, I ended up with a configuration that is stable and performant.  It  meets my fileserving needs as well as lets me run a handful of other services on it.  My ZFS volumes cleanly imported and have been running well.  I have had two kernel panics since having it in, err hm production, both associated with heavy writes while moving some ISO images to the server.  I have been tuning vm.kmem.size and vfs.zfs.arc_max values to attempt to get this settled down and think I may be there now.

I'll close this post out with the folowing observaton/recommendation - If you are or have a mind to administer FreeBSD directly and are being pinched by the current state of FreeNAS, give FreeBSD 8.1 a try.  It isn't nirvana, but it isn't keeping me awake at night either.

I'll post more later with more configuration information.

Cheers, lbe

About Me

Houston, Texas, United States
Geek, sometimes its biting the head off of a chicken, sometimes its getting hit in the head while working on something :)

Followers