In March of 2013 I was working on a project to build new hardware based MySQL servers for a fairly large (550G at the time) INNODB database. The application fronting this database is public and in addition to it’s daily tasks, runs many reports, searches, etc. Six months before, the primary server which houses this database cratered and took down the application for five days. Much sleep was lost. After resurrecting the database with help from various team members, we began looking toward the future. During the outage we tried many things including a restore to a new piece of hardware which failed miserably due to the new machine being based on NUMA architecture. I filed this away for later use which brings us to March 2013.
Upper management types had a TPS (Transactions Per Second) number in their heads that they wanted to see during benchmarking. I am to this day not sure where they got it. The number was 3000 and it didn’t matter that the current machine was only doing 700. I knew the number was achievable but that some major tweaking was going to have to be done on both the OS level and the MySQL level. The company is standardized on RHEL 6 which made performance tuning a breeze and is a tale for another article. Once performance tuning was completed it was on to MySQL tuning and benching.
After several benchmarks it was obvious we were nowhere near the 3000 TPS mark desired by upper management. We were closer to 1600. Still better then the 700 of the original server but not there yet. After installing Linux Perf and using “perf top” I began seeing a pattern during bench tests with over 512 concurrent threads. The kernel would suddenly spend a lot of time waiting for memory locations. Research commenced and I came across this guy’s blog post. After quickly putting together a poor man’s profiler I discovered that while I was having similar issues with mutexes, they were not the same as described in the afore mentioned post but still related to the same area of processing; that is to say memory contention. It is valuable to mention here that the machine being tested was a 32 core machine with 384G of physical RAM and an INNODB_BUFFER_POOL of around 290G.
I decided to give tcMalloc a shot. tcMalloc is part of a suite of tools written by google for Linux called gPerfTools or “Google Performance Tools”. It includes several performance and multi-thread tools that replace or enhance native linux libraries. You can choose to invoke them however you like. For tcMalloc itself, there are several versions available. After researching what each lib did (they are all available after compile), I decided on using tcmalloc-minimal and invoking it through a mysql wrapper while also using numactl. The wrapper is pretty simple as you can see below.
LD_PRELOAD=”/usr/local/lib/libtcmalloc_minimal.so” exec $numactl –interleave all $mysqld “$@”
This wrapper is then called by my.cnf at startup and invokes numactl using libtcmalloc-minimal. Your mysqld.log file will prove to you it has been loaded and give you some diagnostics information. You can also see it working in perf top or any number of available performance monitoring tools. Running the same bench tests against the system again showed a large difference. The poor man profiler showed me that there were very few mutex issues and perf-top told me that the kernel was spending much less time trying to figure outhow to use it’s memory. CPU and over-all load dropped by 60% and the TPS which had become known as “the number” was finally achieved.
Too good to be true?
Too good to be true was my first thought. I benched and benched and was finally convinced that it was “safe” enough to go into production. It was rolled out and has run for the better part of eight months with no issue. Customer accolades about performance came in and the world was a happy place. If you are waiting for “until….”, you will have to keep waiting as for the time being, no issues have been found using tcMalloc in this fashion. If there are any downfalls to this process it’s the extra work involved (which I feel is minimal considering what you get).
Why doesn’t MySQL fix this?
Actually, MySQL has addressed kernel mutex issues and numa (apparantly) in 5.6 and the company I work for is planning for an upgrade to 5.6. While I won’t be around for this unfortunately, I hope to hear about it’s performance once completed. Another spin off MySQL, Percona, also has libraries natively that address this very issue. I have not tested this in any heavy fashion but preliminary tests I have run comparing Percona with regular MySQL come back favorably.