Infohub‎ > ‎Articles‎ > ‎

Using gcc 4.6 to compile haproxy with PGO (Profile guided optimization) on FreeBSD


Using gcc 4.6 to compile haproxy with PGO (Profile guided optimization) on FreeBSD

Hope this will serve as a shortcut for whose trying to strip every millisecond from haproxy running on FreeBSD. If you’re reading this you don’t need any explanation on what this all means, so let’s start!

Get the ingredients:

-          cd /usr/ports/lang/gcc46 && make install clean (this will install gcc 4.6 and all of its dependencies. Run your test to verify gcc is working)

-          get latest haproxy with ‘fetch http://haproxy.1wt.eu/download/1.4/src/haproxy-1.4.20.tar.gz

-          uncompress it wherever you like

Inside the package you’ll find a Makefile.bsd. Copy it to Makefile.bsd.profile. Edit the “profile” one, find the section # tools options and change it to:
# tools options
CC = gcc46
LD = gcc46

then

CFLAGS  = -Wall $(COPTS) $(DEBUG) -fprofile-generate
LDFLAGS = -g -fprofile-generate

Setup your cputype and other optimization switches as you like, then

make –f Makefile.bsd.profile

You’ll end up with a pretty big executable and several “.gcda“ files in your source tree. Start the executable with your usual config file and let it balance your traffic for some time. It will generate runtime stats useful for later compiling.

Meanwhile copy your Makefile.bsd.profile to Makefile.bsd.use, and change the CFLAGS and LDFLAGS to

CFLAGS  = -Wall $(COPTS) $(DEBUG) -fprofile-use
LDFLAGS = -g -fprofile-use

Now kill your running haproxy. You may copy the big executable to keep a copy of it, for your records.

make –f Makefile.bsd.use clean && make –f Makefile.bsd.use

This will rebuild the executable using infos gathered from the runtime. In my case the size of the executable is some 10% smaller than the one compiled with same options and no PGO.

How to benchmark

I don’t really know, but I started from the end. I have long time CPU usage statistics for the machines who run haproxy. In these hours, not statistically valid I know, I can say I use some 7% CPU less, compared to haproxy compiled with CLANG 3 (not stock gcc 4.2). I’ll post an update when valid data is available.

The platform I’m running on is FreeBSD 8.1 i386 (beware of using CPUTYPE=native on this platform!).

Disclaimer

My compile-fu is not that strong: so if you find flaws in my procedure, please contact me so that I can update the document.

Results (updated)

After a week of running daily loads, I can say that it is impossible to say if the pgo executable is better than the other. There is a 10%/90% split between user/kernel time, so we're trying to evaluate something that is in that 10%. As far as now, there is no noticeable difference, if any, running the pgo executable in user mode processor usage.