Home » Architecture » Floating point peak performance of Kaveri and other recent AMD and Intel chips

Floating point peak performance of Kaveri and other recent AMD and Intel chips

Rahul Garg, AnandTech, Floating point peak performance of Kaveri and other recent AMD and Intel chips, here.

The peak CPU performance will depend on the SIMD ISA that your code was written and compiled for. We consider three cases: SSE, AVX (without FMA) and AVX with FMA (either FMA3 or FMA4).

 

CPU floating-point peak performance
Platform Kaveri Trinity Llano Haswell Ivy Bridge
Chip 7850K 5800K 3870K 4770K 3770K
CPU frequency 3.7 GHz 3.8 GHz 3.0GHz 3.5GHz 3.5GHz
SSE fp32 (/cycle) 16 16 32 32 32
SSE fp64 (/cycle) 8 8 16 16 16
AVX fp32 (/cycle) 16 16 64 64
AVX fp64 (/cycle) 8 8 32 32
AVX FMA fp32 (/cycle) 32 32 128
AVX FMA fp64 (/cycle) 16 16 64
SSE fp32 (gflops) 59.2 60.8 96 112 112
SSE fp64 (gflops) 29.6 30.4 48 56 56
AVX fp32 (gflops) 59.2 60.8 224 224
AVX fp64 (gflops) 29.6 30.4 112 112
AVX FMA fp32 (gflops) 118.4 121.6 448
AVX FMA fp64 (gflops) 59.2 60.8 224

It is no secret that AMD’s Bulldozer family cores (Steamroller in Kaveri and Piledriver in Trinity) are no match for recent Intel cores in FP performance due to the shared FP unit in each module. As a comparison point, one core in Haswell has the same floating point performance per cycle as two modules (or four cores) in Steamroller.

Now onto GPU peaks. Here, for Haswell, we chose to include both GT2 and GT3e variants.

Platform Kaveri Trinity Llano Haswell GT3e Haswell GT2 Ivy Bridge
GPU floating-point peak performance
Chip 7850K 5800K 3870K 4770R 4770K 3770K
GPU frequency 720 MHz 800 MHz 600 MHz 1.3 GHz 1.25 GHz 1.15 GHz
fp32/cycle 1024 768 800 640 320 256
fp64/cycle (OpenCL) 64 48** 0 0 0 0
fp64/cycle (Direct3D) 64 0? 0 160 80 64
fp32 gflops 737.3 614 480 832 400 294.4
fp64 gflops (OpenCL) 46.1 38.4** 0 0 0 0
fp64 gflops (Direct3D) 46.1 0? 0 208 100 73.6

The fp64 support situation is a bit of a mess because some GPUs only support fp64 under some APIs.  The fp64 rate of Intel’s GPUs does not appear to be published but David Kanter provides an estimate of 1/4 speed compared to fp32. However Intel only enables fp64 under DirectCompute but does not enable fp64 under OpenCL for any of its GPUs.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: