Hope this will serve as a shortcut for whose trying to strip every millisecond from haproxy running on FreeBSD. If you’re reading this you don’t need any explanation on what this all means, so let’s start!
Get the ingredients:
- cd /usr/ports/lang/gcc46 && make install clean (this will install gcc 4.6 and all of its dependencies. Run your test to verify gcc is working)
- get latest haproxy with ‘fetch http://haproxy.1wt.eu/download/1.4/src/haproxy-1.4.20.tar.gz’
- uncompress it wherever you like
package you’ll find a Makefile.bsd. Copy it to Makefile.bsd.profile. Edit the “profile”
one, find the section # tools options and change it to:
CFLAGS = -Wall
$(COPTS) $(DEBUG) -fprofile-generate
Setup your cputype and other optimization switches as you like, then
make –f Makefile.bsd.profile
You’ll end up with a pretty big executable and several “.gcda“ files in your source tree. Start the executable with your usual config file and let it balance your traffic for some time. It will generate runtime stats useful for later compiling.
Meanwhile copy your Makefile.bsd.profile to Makefile.bsd.use, and change the CFLAGS and LDFLAGS to
CFLAGS = -Wall
$(COPTS) $(DEBUG) -fprofile-use
Now kill your running haproxy. You may copy the big executable to keep a copy of it, for your records.
make –f Makefile.bsd.use clean && make –f Makefile.bsd.use
This will rebuild the executable using infos gathered from the runtime. In my case the size of the executable is some 10% smaller than the one compiled with same options and no PGO.
I don’t really know, but I started from the end. I have long time CPU usage statistics for the machines who run haproxy. In these hours, not statistically valid I know, I can say I use some 7% CPU less, compared to haproxy compiled with CLANG 3 (not stock gcc 4.2). I’ll post an update when valid data is available.
The platform I’m running on is FreeBSD 8.1 i386 (beware of using CPUTYPE=native on this platform!).
My compile-fu is not that strong: so if you find flaws in my procedure, please contact me so that I can update the document.
After a week of running daily loads, I can say that it is impossible to say if the pgo executable is better than the other. There is a 10%/90% split between user/kernel time, so we're trying to evaluate something that is in that 10%. As far as now, there is no noticeable difference, if any, running the pgo executable in user mode processor usage.