Pink Iguana

Home » References » ICC, G++, Clang Benchmarks and Valgrind 3.9

ICC, G++, Clang Benchmarks and Valgrind 3.9

Jeff Cogswell, Slashdot, Speed Test: Comparing Intel C++, GNU C++, and LLVM Clang Compilers, here. You can sort of tell how people get confused with compiler benchmarks. These folks  prioritize benchmarking templates and time to compile, perfectly reasonable. However, if you have competitive numerical code extensively using templates then there is something wrong or you are from a distant alien future. For me, I would like optimization  switches on my compiler that would set it off to compile for a solid week, searching for the optimal set of machine instructions to translate my code to.  I would like Appel’s VST and an SDN Network OS hooked up to some massively competitive optimizing compiler (ICC?) where I input my code and a theorem about the code’s execution and the compiler goes away for a period of time and then returns a correctness proof of the theorem and an optimized executable version of the code. The more powerful the optimization capabilities of the compiler the better. Don’t really just want the compiler to quickly spit out some detuned code that I am going to have to run over and over in some highly competitive environment. Take your time and generate some code that doesn’t suck, too much. Fast, optimized, and  competitive performance code on a Haswell/Broadwell with a correctness proof – done.

When it comes to compilers (our current focus), we need to look at the specific language we’re compiling and how it gets compiled; and we want to keep things realistic. In benchmarking C++ compilers, we need to factor, for example, how the compiler deals with templates. Templates are generated at compile-time, and loads of templates can have an impact on how long it takes to compile. But we also need to consider the real world: Exactly how many templates are we compiling in a typical production application? If our code uses a couple dozen templates, is it important if compiling ten million templates on one compiler takes a bit longer than another compiler? We could argue that if a developer is working at a computer and compiling hundreds of times a day, then yes, it does matter. As a software developer myself, I know how being forced to wait a few extra seconds on every compilation can really add up over the course of the day.

Another aspect for benchmarking a compiler is not how long the compiler takes to compile, but the quality of the resulting code. That gets tricky as well, because there are so many compiler options that can skew the results. One compiler might have switches to generate vectorized code turned on by default, while another compiler doesn’t. The resulting code from the first compiler would be vectorized, and may run faster than the code in the second—all while the person who performed the benchmark, unaware of vectorization in the first place, may come to the possibly incorrect conclusion that the second compiler “generates slower code.” And there are different levels of vectorization, each with its own switch for different architectures, such as older SIMD architectures with 128 bits or newer architectures with 512 bits.

Julian Seward, LWN.net, Valgrind 3.9.0 released, here.

We are pleased to announce a new release of Valgrind, version 3.9.0,
available from http://www.valgrind.org.

3.9.0 is a feature release with many improvements and the usual
collection of bug fixes.  This release adds support for MIPS64/Linux,
Intel AVX2 instructions and POWER8 instructions.  DFP support has been
added for S390.  Initial support for hardware transactional memory has
been added for Intel and POWER platforms.  Support for Mac OS X 10.8
(Mountain Lion) has been improved.  Accuracy of Memcheck on vectorized
code has been improved.

Leave a comment