09 January 2008 - 21:51Isode Revisited
A few years ago Isode published a benchmark of their version 10.1 server against an unspecified version of OpenLDAP (most likely 2.0).
http://www.isode.com/whitepapers/m-vault-benchmarking.html
Recently we at Symas were given the opportunity to work with Isode to test out their new 14.0 server with a completely rewritten backend, so we took this chance to compare against the current OpenLDAP 2.4.7 release.
Using the same AMD quad-core server as in our CPU scaling tests, we obtained only 4632 auths/second using 8 cores with Isode on a 1 million entry DB, compared to the 29048 auths/second we got with 8 cores on OpenLDAP. The DB load times were quite comparable, with OpenLDAP loading the DB in 3:05 (m:ss) and Isode loading the same LDIF in 4:02. I suspect a lot of the disparity here is because I'm using BDB 4.6.21 with OpenLDAP, while Isode is still using BDB 4.5.20. (Rebuilding OpenLDAP with BDB 4.5.20, this load took 3:57. So clearly Isode's bulk loader is on par with OpenLDAP's.)
I would have run a test with 5 million entries next, but Steve Kille @ Isode asked "how would it behave with a DB much larger than the system memory, like 50 million entries?" The last time we tested OpenLDAP with 50 million entries was on an SGI Altix with 8 processors and 128GB of RAM; with only 16GB of RAM this would mainly be a test of disk speed, but I figured what the heck. It's still an interesting test, because the other UMich-derived LDAP servers using LDBM-based backends still can't operate at this scale without self-corrupting. It took 11:27:40 to slapadd this LDIF file using the XFS filesystem, resulting in a 69GB database on disk. OpenLDAP delivered only 160 auths/second on this database.
The first attempt to load the DB into Isode failed because their bulk loading tool was (erroneously) trying to cache all of its processed entries in RAM while loading. (Not to anyone's surprise) it ran out of memory after loading 10,470,200 entries. After receiving a patched tool from Isode, I found that it appeared to complete its load in about 4 hours, but then the process got hung waiting for the filesystem driver, and even after 20 hours it hadn't recovered. I rebooted and tried this load a couple of times, getting the same hang each time. I finally gave up and recreated the data disk using ext2 instead of XFS, after which it successfully loaded the database in only 3:18:29, over 3 times faster than OpenLDAP on XFS. After a couple hours of querying to prime the cache, it delivered 8 auths/second.
After changing the filesystem, OpenLDAP was loaded again using ext2; this load took 6:59:25. It delivered 196 auths/sec. Apparently the choice of filesystem can have a large impact on disk I/O.
I was still disturbed by the extremely slow slapadd time here; the database is 50x larger than the 1M DB but the load took 220x longer. I noticed there were long periods of time where the process was using 100% CPU but no entries were actually getting added. Debugging and profiling revealed that a lot of CPU time was being spent in BDB's __env_alloc_free() function. It seems that the BDB scalability issue we first attempted to address years ago still exists in BDB 4.6. Tweaking slapadd to use process-private memory instead of shared memory for its environment brought the load time down to only 3:04:21. We'll probably release this tweak in OpenLDAP 2.4.8.
No comments:
No trackbacks: