Pink Iguana

Home » Uncategorized » Gather and Scatter SIMD

Gather and Scatter SIMD

Kirill R., 13 May 2016, Intel Advisor Documentation, Intel Advisor MAP Gather/Scatter, here.

Knights Landing introduces an Intel® Advanced Vector Extensions 512 (Intel® AVX-512) v(p)gather instruction that normally provides better effectiveness and wider applicability/flexibility than v(p)gather instructions in Intel® Advanced Vector Extensions 2 (Intel® AVX2) or Knights Corner (which is IMCI ISA-based). Intel AVX-512 gather (and scatter) support various combinations of index vs. offset vs. vector width, and introduce an explicit mask argument. Figure 10.14 provides a typical example of vgather instruction operands and corresponding Intel Intrinsic function syntax.

However, Intel AVX-512 code utilizing v(p)gather (and newly introduced v(p)scatter) instructions still demonstrate substantially worse performance than similar code using contiguous vector data load/store.  While gather/scatter–based vectorized code is faster than its scalar (or Intel AVX2/IMCI) counterpart, it is still wise to look for opportunities to improve or avoid it.

Intel, 18 Aug 2018, User and Reference Guide for the Intel C++ Compiler 15.0, Intrinsics for Integer Gather and Scatter Operations, here.  Intrinsics for FP Gather and Scatter Operations, here.

Overview: Intrinsics Reference

Intrinsics are assembly-coded functions that allow you to use C++ function calls and variables in place of assembly instructions.

Intrinsics are expanded inline eliminating function call overhead. Providing the same benefit as using inline assembly, intrinsics improve code readability, assist instruction scheduling, and help reduce debugging.

Intrinsics provide access to instructions that cannot be generated using the standard constructs of the C and C++ languages.

Intrinsics for Intel® C++ Compilers

The Intel® C++ Compiler enables easy implementation of assembly instructions through the use of intrinsics. Intrinsics are provided for the following instructions:

  • Intel® Many Integrated Core (Intel® MIC) instructions
  • Intel® Advanced Vector Extensions 2 (Intel® AVX2) instructions
  • Intel® Advanced Vector Extensions (Intel® AVX) instructions
  • Intel® Streaming SIMD Extensions 4 (Intel® SSE4) instructions
  • Intel® Supplemental Streaming SIMD Extensions 3 (SSSE3) instructions
  • Intel® Streaming SIMD Extensions 3 (Intel® SSE3) instructions
  • Intel® Streaming SIMD Extensions 2 (Intel® SSE2) instructions
  • Intel® Streaming SIMD Extensions (Intel® SSE) instructions
  • MMX™ Technology instructions
  • Carry-less Multiplication instruction and Advanced Encryption Standard Extensions instructions
  • Half-float conversion instructions

The Short Vector Math Library (svml) intrinsics are documented in this reference.

NOTE

Many routines in the svml library are more highly optimized for Intel® microprocessors than for non-Intel microprocessors.

 

 


Leave a comment