KWRegan, GLL, Truth from Zero? here. Solid. So just give it the rules for 3SAT as a game, or get it to rediscover Knuth Morris Pratt. Then we will know no people are gonna do better than the ML box. Saves time reviewing papers.
David Silver is the lead author on the paper, “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm,” which was posted twelve days go on the ArXiv. It announces an algorithm called AlphaZero that, given the rules of any two-player game of strategy and copious hardware, trains a deep neural network to play the game at skill levels approaching perfection.
Today I review what is known about AlphaZero and discuss how to compare it with known instances of perfect play.
AlphaZero is a generalization of AlphaGo Zero, which was announced last October 18 on the Google DeepMind website under the heading “Learning From Scratch.” A paper in Nature, with Silver as lead author, followed the next day. Unlike the original AlphaGo, whose victory over the human champion Lee Sedol we covered, AlphaGo Zero had no input other than the rules of Go and some symmetry properties of the board. From round-the-clock self-play it soon acquired as tutor the world’s best player—itself.