Jeff Cogswell, Slashdot, Comparing C++ Compilers’ Parallel-Programming Performance, here. Nice summary – I expect there are a bunch of folks who fall into the “we don’t parallel program, right?” camp and cannot find the motivation to process this material.
In general, both compilers give you plenty of options for generating vectorized code. Sometimes you have to do a little work, though, to get the compiler to actually vectorize your code. But it was clear that the Intel compiler was more willing to vectorize the code. The g++ compiler resisted at times, requiring different command-line options and occasionally reworking the code a bit; this is actually no surprise, since one might argue that Intel is staying a step ahead with their compilers, since they can actually be building the compilers for the processors before the processors are even released. The g++ team, however, can only work with what’s available to them. (That and they don’t have huge amounts of dollars to dump into it like Intel does.)
Either way, the g++ compiler did well up against the Intel compiler. I was troubled by how different the generated assembly code was between the 4.7 and 4.8.1 compilers—not just with the vectorization but throughout the code. But that’s another comparison for another day.
Lockless Inc., Auto-vectorization with gcc 4.7, here. gcc does have the virtue that it is free and gets much of the contemporary vectorization off the table with code and build tweeks. If you do not need competitive SIMD performance. it’ll do.
gcc is very good, and can auto-vectorize many inner loops. However, if the expressions get too complex, vectorization will fail. gcc also may not be able to get the most optimal form of the loop kernel. In general, the simpler the code, the more likely gcc is to give good results.
However, you cannot expect gcc to give what you expect without a few tweaks. You may need to add the “–fast-math” to turn on associativity. You will definitely need to tell the compiler about alignment and array-overlap considerations to get good code.
On the other hand, gcc will still attempt to vectorize code which hasn’t had changes done to it at all. It just won’t be able to get nearly as much of a performance improvement as you might hope.
However, as time passes, more inner loop patterns will be added to the vectorizable list. Thus if you are using later versions of gcc, don’t take the above results for granted. Check the output of the compiler yourself to see if it is behaving as you might expect. You might be pleasantly surprised by what it can do.