Pink Iguana
Advertisements

ber, cokelley did good – got job @AT&T. jss

Abby Jackson, 25 Jun 2018, Business Insider, IT’S OFFICIAL: AT&T is buying the digital ad firm AppNexus, here.

  • AT&T has purchased the digital ad firm AppNexus for an undisclosed amount.

  • The transaction is expected to close during the third quarter of 2018.

  • It most likely signals that the company plans to challenge the advertising titans Google and Facebook.

Advertisements

David Deutsch, Hennessey & Patterson (short form), and Jeff Dean

David Albert, 12 Aug 2011, The New York Times, Explaining it All: How We Became the Center of the Universe, here. My son told me this is solid, and he was right.

David Deutsch’s “Beginning of Infinity” is a brilliant and exhilarating and profoundly eccentric book. It’s about everything: art, science, philosophy, history, politics, evil, death, the future, infinity, bugs, thumbs, what have you. And the business of giving it anything like the attention it deserves, in the small space allotted here, is out of the question. But I will do what I can.

It hardly seems worth saying (to begin with) that the chutzpah of this guy is almost beyond belief, and that any book with these sorts of ambitions is necessarily, in some overall sense, a failure, or a fraud, or a joke, or madness. But Deutsch (who is famous, among other reasons, for his pioneering contributions to the field of quantum computation) is so smart, and so strange, and so creative, and so inexhaustibly curious, and so vividly intellectually alive, that it is a distinct privilege, notwithstanding everything, to spend time in his head. He writes as if what he is giving us amounts to a tight, grand, cumulative system of ideas — something of almost mathematical rigor — but the reader will do much better to approach this book with the assurance that nothing like that actually turns out to be the case. I like to think of it as more akin to great, wide, learned, meandering conversation — something that belongs to the genre of, say, Robert Burton’s “Anatomy of Melancholy” — never dull, often startling and fantastic and beautiful, often at odds with itself, sometimes distasteful, sometimes unintentionally hilarious, sometimes (even, maybe, secondarily) true.

David Deutsch and Richard Jozsa, 1992, The Royal Society, Rapid solution of problems by quantum computation, here.

A class of problems is described which can be solved more efficiently by quantum computation than by any classical or stochastic method. The quantum computation solves the problem with certainty in exponentially less time than any classical deterministic computation.

David Deutsch, 1985, Proceedings of the Royal Society of London, Quantum Theory, the Church-Turing principle and the universal quantum computer, here.

It is argued that underlying the Church-Turing hypothesis there is an implicit physical assertion. Here, this assertion is presented explicitly as a physical prin- ciple: ‘every finitely realizable physical system can be perfectly simulated by a universal model computing machine operating by finite means’. Classical physics and the universal Turing machine, because the former is continuous and the latter discrete, do not obey the principle, at least in the strong form above. A class of model computing machines that is the quantum generalization of the class of Tur- ing machines is described, and it is shown that quantum theory and the ‘universal quantum computer’ are compatible with the principle. Computing machines re- sembling the universal quantum computer could, in principle, be built and would have many remarkable properties not reproducible by any Turing machine. These do not include the computation of non-recursive functions, but they do include ‘quantum parallelism’, a method by which certain probabilistic tasks can be per- formed faster by a universal quantum computer than by any classical restriction of it. The intuitive explanation of these properties places an intolerable strain on all interpretations of quantum theory other than Everett’s. Some of the numerous connections between the quantum theory of computation and the rest of physics are explored. Quantum complexity theory allows a physically more reasonable definition of the ‘complexity’ or ‘knowledge’ in a physical system than does clas- sical complexity theory.

HPCWire, 17 Apr 2018,  Hennessy & Patterson: A New Golden Age fo Computer Architecture, here.

What an exciting time to be a computer architect!

1. Carver Mead, and Lynn Conway. “Introduction to VLSI systems,” Addison-Wesley, 1980.

2. Mark Hill. “A Primer on the Meltdown & Spectre Hardware Security Design Flaws and their Important Implications,” Computer Architecture Today Blog, February 15, 2018,https://www.sigarch.org/ a-primer-on-the-meltdown-spectre-hardware-security-design-flaws-and-their-important-implications.

3. John L. Hennessy, and David A. Patterson. “Domain Specific Architectures,” in Computer architecture: a quantitative approach, Sixth Edition, Elsevier, 2018.

4. Norman P. Jouppi, Cliff Young, Nishant Patil, David A. Patterson, et al. “In-datacenter performance analysis of a Tensor Processing Unit,” in Proc. 44th Annual International Symposium on Computer Architecture, pp. 1-12. ACM, 2017.

5. Luis Ceze, Mark Hill, Karthikeyan Sankaralingam, and Thomas Wenisch. “Democratizing Design for Future Computing Platforms,” June 26, 2017, www.cccblog.org/2017/06/26/democratizing-design-for-future-computing-platforms.

6. www.riscv.org.

7. DARPA, Broad Agency Announcement. “Electronics Resurgence Initiative,” September 13, 2017.

8. Yunsup Lee, Andrew Waterman, Henry Cook, Brian Zimmer, et al. “An agile approach to building RISC-V microprocessors,” IEEE Micro 36, no. 2 (2016): 8-20.<

9. David A. Patterson, and Borivoje Nikolić. “Agile Design for Hardware, Parts I, II, III,” EE Times, July 27 to August 3, 2015.

Tom Simonite, 18 Apr,2018, Wired, Google’s new AI Head is so Smart he doesn’t need AI, here.  Recent 24 Apr 2018 Princeton CS talk, here.

You are not expected to understand this

David Cassel, 15 Jan 2017, The New Stack, ‘You are Not Expected to Understand this’: An Explainer on Unix’s Most Notorious Code Comment, here.  After leaving Bell Labs, Murray Hill and Princeton for Wall Street in the early 90’s, I’m interviewing with the CTO at First Boston and he says to me, after a long set of interviews, “the next interviewer who will come into the room is the world’s expert on Unix.” So he leaves the room to get the expert, and I am left to guess who is gonna walk in that door next? dmr? ken? honey? when a young woman walks in and says “Dr. Kirby Smallberries is finishing up a meeting and will be with you shortly.” I guess ken was too busy that day?

The phrase “You are Not Expected to Understand This” is probably the most famous comment in the history of Unix.

And last month, at the Systems We Love conference in San Francisco, systems researcher Arun Thomas explained to an audience exactly what it was that they weren’t supposed to understand.

Computer science teacher Ozan Onay, who was in the audience, called it “one of my favorite talks of the day,” writing on his blog that “Nothing should be a black box, even when Dennis Ritchie says it’s ok!”

Hennessy and Patterson, 4 Jun 2018, A New Golden Age for Computer Architecture ISCA 2018 Turing Lecture, here. The slides are here.  Domain Specific Architecture Rally.  Nice talk – references in the slides seem good as well.  If you know what you are doing Leiserson says you can get 62K times better performance than python, Doh. Lieserson’s paper There’s Plenty of Room at the Top  is hidden behind journal paywall somewhere. Presumably it advocates using performance optimization in open acrhitectures like RISC V. Brilliant.

Virus Capsid Geometry

Jordana Cepelewicz, 19 Jul 2017, Quanta Magazine, The Illuminating Geometry of Viruses, here. They always seem to swing for the fences to get a vaccine or a cure for something, rather than just determining organization and structure.

More than a quarter billion people today are infected with the hepatitis B virus (HBV), the World Health Organization estimates, and more than 850,000 of them die every year as a result. Although an effective and inexpensive vaccine can prevent infections, the virus, a major culprit in liver disease, is still easily passed from infected mothers to their newborns at birth, and the medical community remains strongly interested in finding better ways to combat HBV and its chronic effects. It was therefore notable last month when Reidun Twarock, a mathematician at the University of York in England, together with Peter Stockley, a professor of biological chemistry at the University of Leeds, and their respective colleagues, published their insights into how HBV assembles itself. That knowledge, they hoped, might eventually be turned against the virus.

Their accomplishment has gained further attention because only this past February the teams also announced a similar discovery about the self-assembly of a virus related to the common cold. In fact, in recent years, Twarock, Stockley and other mathematicians have helped reveal the assembly secrets of a variety of viruses, even though that problem had seemed forbiddingly difficult not long before.

Their success represents a triumph in applying mathematical principles to the understanding of biological entities. It may also eventually help to revolutionize the prevention and treatment of viral diseases in general by opening up a new, potentially safer way to develop vaccines and antivirals.

Gather and Scatter SIMD

Kirill R., 13 May 2016, Intel Advisor Documentation, Intel Advisor MAP Gather/Scatter, here.

Knights Landing introduces an Intel® Advanced Vector Extensions 512 (Intel® AVX-512) v(p)gather instruction that normally provides better effectiveness and wider applicability/flexibility than v(p)gather instructions in Intel® Advanced Vector Extensions 2 (Intel® AVX2) or Knights Corner (which is IMCI ISA-based). Intel AVX-512 gather (and scatter) support various combinations of index vs. offset vs. vector width, and introduce an explicit mask argument. Figure 10.14 provides a typical example of vgather instruction operands and corresponding Intel Intrinsic function syntax.

However, Intel AVX-512 code utilizing v(p)gather (and newly introduced v(p)scatter) instructions still demonstrate substantially worse performance than similar code using contiguous vector data load/store.  While gather/scatter–based vectorized code is faster than its scalar (or Intel AVX2/IMCI) counterpart, it is still wise to look for opportunities to improve or avoid it.

Intel, 18 Aug 2018, User and Reference Guide for the Intel C++ Compiler 15.0, Intrinsics for Integer Gather and Scatter Operations, here.  Intrinsics for FP Gather and Scatter Operations, here.

Overview: Intrinsics Reference

Intrinsics are assembly-coded functions that allow you to use C++ function calls and variables in place of assembly instructions.

Intrinsics are expanded inline eliminating function call overhead. Providing the same benefit as using inline assembly, intrinsics improve code readability, assist instruction scheduling, and help reduce debugging.

Intrinsics provide access to instructions that cannot be generated using the standard constructs of the C and C++ languages.

Intrinsics for Intel® C++ Compilers

The Intel® C++ Compiler enables easy implementation of assembly instructions through the use of intrinsics. Intrinsics are provided for the following instructions:

  • Intel® Many Integrated Core (Intel® MIC) instructions
  • Intel® Advanced Vector Extensions 2 (Intel® AVX2) instructions
  • Intel® Advanced Vector Extensions (Intel® AVX) instructions
  • Intel® Streaming SIMD Extensions 4 (Intel® SSE4) instructions
  • Intel® Supplemental Streaming SIMD Extensions 3 (SSSE3) instructions
  • Intel® Streaming SIMD Extensions 3 (Intel® SSE3) instructions
  • Intel® Streaming SIMD Extensions 2 (Intel® SSE2) instructions
  • Intel® Streaming SIMD Extensions (Intel® SSE) instructions
  • MMX™ Technology instructions
  • Carry-less Multiplication instruction and Advanced Encryption Standard Extensions instructions
  • Half-float conversion instructions

The Short Vector Math Library (svml) intrinsics are documented in this reference.

NOTE

Many routines in the svml library are more highly optimized for Intel® microprocessors than for non-Intel microprocessors.

 

 

Jeff Dean, the x86 RISC chips, and Power9

Jeff Dean, 2009, Google, Challenges in Building Large-Scale Information Retrieval Systems, here.

1999 vs. 2009

  • # docs: ~70M to many billion
  • queries processed/day:
  • per doc info in index:
  • update latency: months to minutes
  • avg. query latency: <1s to <0.2s
  • More machines * faster machines:

~100X ~1000X ~3X ~10000X ~5X~1000X

Cade Metz, 21 Mar 2018, New York Times, Computer Chip Visionaries Win Turing Award, here. Students, the x86 architecture was one of the greatest RISC architectures and one of Patterson’s greatest accomplishments, and for this he won the Turing Award.

Today, more than 99 percent of all new chips use the RISC architecture, according to the association.

Joel Hruska, 21 Mar 2018, ExtremeTech, IBMS’s Power9 Could Dent x86 Server Market With Emphasis on GPU Compute, here.

EETimes has a profile of Power9’s overall market performance as part of the Facebook Open Compute Project. Google, Alibaba, and Tencent are all working on Power9 systems, with Tencent claiming that Power9 offers 30 percent more performance than an equivalent x86 installation while using less rack space and fewer servers. There are also reports of Power9 being quietly prepped for deployment by at least one major web company and for a major data center customer. IBM reportedly wants to win at least 20 percent of the Linux server market for >$5,000 servers and the company has put a major push behind Power9 as a GPU compute and artificial intelligence research platform, with the launch of its AC922 “Newell” platform late in 2017.

Multi buffering and SIMD in AVX2

Vinodh Gopal, et.al., July 2010, Intel, Processing Multiple Buffers in Parallel to Increase Performance on Intel Architecture Processors, here.  Kind of what you would expect. You could apply to High Frequency Trading pulling book updates off the wire. The AVX2 SIMD is about twice as wide as when this paper was written.

There are many algorithms, such as hashing or encryption, which are applied to a stream of data buffers. This occurs in networking, storage and other applications. Since the amount of data being processed is large, there is an ever-increasing need for very high performance implementations of these algorithms.

In many cases, one way to do this is to process multiple independent buffers in parallel. For example, a networking application might be encrypting each data packet. Each packet is encrypted independently from the other packets. This means that it should be possible to process several packets at the same time.

This may also be done when there is not a stream of buffers in a literal sense. For example, in a data de-duplication application, the first step is usually to partition the input data into a number of chunks, and then to compute the hash digest of each chunk. This is a perfect case where hashing multiple buffers in parallel can speed up the hashing step, as they are independent.

Implementations of the multi-buffer techniques may involve changes at the application level, but can result in speed improvements of 2-3X on Intel processors. The multi-buffer technique increases the performance of a single thread, similar to the Stitching methods in [4], but is a complementary and different approach.

One of the main challenges is to design a scheduler that can process the multiple data buffers of different sizes with minimal performance overheads. This paper shows how this can be done, illustrates this with pseudo code, and presents the measured performance gains.

Daniel Lemire and Leonid Boystov, 15 May 2014, arrive.org, Decoding billions of integers per second through vectorization, here. we can use this in trading systems for process done electronic trades feeding the trading system – the trader’s blotter. Make a fast SIMD quad-word aligned proto buffer implementation.

There are many algorithms, such as hashing or encryption, which are applied to a stream of data buffers. This occurs in networking, storage and other applications. Since the amount of data being processed is large, there is an ever-increasing need for very high performance implementations of these algorithms.

In many cases, one way to do this is to process multiple independent buffers in parallel. For example, a networking application might be encrypting each data packet. Each packet is encrypted independently from the other packets. This means that it should be possible to process several packets at the same time.

This may also be done when there is not a stream of buffers in a literal sense. For example, in a data de-duplication application, the first step is usually to partition the input data into a number of chunks, and then to compute the hash digest of each chunk. This is a perfect case where hashing multiple buffers in parallel can speed up the hashing step, as they are independent.

Implementations of the multi-buffer techniques may involve changes at the application level, but can result in speed improvements of 2-3X on Intel processors. The multi-buffer technique increases the performance of a single thread, similar to the Stitching methods in [4], but is a complementary and different approach.

One of the main challenges is to design a scheduler that can process the multiple data buffers of different sizes with minimal performance overheads. This paper shows how this can be done, illustrates this with pseudo code, and presents the measured performance gains.