Georg Hager’s Blog, Intel vs. GCC for the OpenMP vector triad: Barrier shootout! here. If Hager keeps this up he will be first ballot Fun People.
Looking at the results for the two compilers we see that GCC has not learned anything over the last five years (this is for how long we have been comparing compilers in terms of OpenMP barrier overhead): The barrier takes roughly a factor of 20 longer with gcc than with the Intel compiler. Comparing with the ECM performance model  for the vector triad we see that the Intel compiler’s barrier is fast enough to at least get near the performance limit in the L2 cache (blue dashed line). Both compilers are on par where it’s easy, i.e., in L3 cache and memory, where the loop is so long that the overhead is negligible.
Note that the bad performance of g++ in this benchmark is not due to some “magic” compiler option that I’ve missed. It’s the devastatingly slow OpenMP barrier.