Chess Mastered with a Self-Play Algorithm

**Vadim Tsypin** · Wednesday, 6th December, 2017, 09:50 PM

Re: Chess Mastered with a Self-Play Algorithm

Originally posted by Wayne Komer View Post

Chess Mastered with a Self-Play Algorithm
The contents are stunning. The DeepMind team had managed to prove that a generic version of their algorithm, with no specific knowledge other than the rules of the game, could train itself for four hours at chess, two hours in shogi (Japanese chess) or eight hours in Go and then beat the reigning computer champions – i.e. the strongest known players of those games. In chess it wasn’t just a beating, but sheer demolition.

Thanks Wayne, these are super-important news. How long would it take this generic algorithm to train in order to beat proprietary computer trading algorithms, eh?! :)

Bravo AlphaZero, and kudos to the team behind it! Sid Belzberg has posted a very interesting description in another thread this fall.

**Wayne Komer** · Wednesday, 6th December, 2017, 11:47 PM

Re: Chess Mastered with a Self-Play Algorithm

Chess Mastered with a Self-Play Algorithm

December 6, 2017

When I first read this article, I thought that it was the end of chess as we know it. If Magnus used the program to prepare his games, how could he be beaten?

This caution today:

The University of Princeton's AI expert Prof Joanna Bryson added that people should be cautious about buying too deeply into the firm's hype.

But she added that its knack for good publicity had put it in a strong position against challengers.

"It's not only about hiring the best programmers," she said.

"It's also very political, as it helps makes Google as strong as possible when negotiating with governments and regulators looking at the AI sector."

http://www.bbc.com/news/technology-42251535

**Garland Best** · Wednesday, 6th December, 2017, 11:48 PM

Re: Chess Mastered with a Self-Play Algorithm

This is for real. And it turns everything about machine chess upside down.

No opening book.
No tablebases.
No brute force approach.

Just start with the rules and within four hours be better than the best program of 2016.

Think what happens when this gets applied to stuff other than games.

**Walter De Jong** · Wednesday, 6th December, 2017, 11:49 PM

Re: Chess Mastered with a Self-Play Algorithm

Originally posted by Wayne Komer View Post

could train itself for four hours at chess [....] and then beat the reigning computer champion

Speechless!

**Vadim Tsypin** · Thursday, 7th December, 2017, 12:33 AM

Re: Chess Mastered with a Self-Play Algorithm

Originally posted by Garland Best View Post

Just start with the rules and within four hours be better than the best program of 2016.

Think what happens when this gets applied to stuff other than games.

Garaland, here's a start for one possible scenario. :) -

The First Millions

Their first target was MTurk, the Amazon Mechanical Turk. After its launch in 2005 as a crowdsourcing Internet marketplace, it had grown rapidly, with tens of thousands of people around the world anonymously competing around the clock to perform highly structured chores called HITs, “Human Intelligence Tasks.” These tasks ranged from transcribing audio recordings to classifying images and writing descriptions of web pages, and all had one thing in common: If you did them well, nobody would know that you were an AI. Prometheus 10.0 was able to do about half of the task categories acceptably well. For each such task category, the Omegas had Prometheus design a lean custom-built narrow AI software module that could do precisely such tasks and nothing else. They then uploaded this module to Amazon Web Services, a cloud-computing platform that could run on as many virtual machines as they rented. For every dollar they paid to Amazon’s cloud-computing division, they earned more than $2 from Amazon’s MTurk division. Little did Amazon suspect that such an amazing arbitrage opportunity existed within their own company!

To cover their tracks, they had discreetly created thousands of MTurk accounts during the preceding months in the names of fictitious people, and the Prometheus-built modules now assumed their identities. The MTurk customers typically paid after about eight hours, at which point the Omegas reinvested the money in more cloud-computing time, using still better task modules made by the latest version of the ever-improving Prometheus. Because they were able to double their money every eight hours, they soon started saturating MTurk’s task supply, and found that they couldn’t earn more than about $1 million per day without drawing unwanted attention to themselves. But this was more than enough to fund their next step, eliminating any need for awkward cash requests to the chief financial officer.

Read the excerpt aptly titled "The Last Invention of Man" or, better, get the whole book: in Quebec, it's already available in local libraries, I just finished it last month.

Interesting times ahead.

**Wayne Komer** · Thursday, 7th December, 2017, 12:46 AM

Re: Chess Mastered with a Self-Play Algorithm

Chess Mastered with a Self-Play Algorithm

December 6, 2017

Some online comments:

Maelic - It is a nice step different direction, perhaps the start if the revolution but Alpha Zero is not yet better than Stockfish and if you keep up with me I will explain why. Most of the people are very excited now and wishing for sensation so they don't really read the paper or think about what it says which leads to uninformed opinions.

The testing conditions were terrible. 1min/move is not really suitable time for any engine testing but you could tolerate that. What is intolerable though is the hashtable size - with 64 cores Stockfish was given, you would expect around 32GB or more otherwise it fills up very quickly leading to marked reduction in strength - 1GB was given and that far from ideal value! Also SF was now given any endgame tablebases, which is current norm for any computer chess engine.

The computational power behind each entity was very different - while SF was given 64 CPU threads (really a lot I've got to say), Alpha Zero was given 4 TPUs. TPU is a specialized chip for machine learning and neural network calculations. It's estimated power compared to classical CPU is as follows - 1TPU ~ 30xE5-2699v3 (18 cores machine) -> Alpha Zero had at it's back power of ~2000 Haswell cores. That is nowhere near fair match. And yet, even though the result was dominant, it was not where it would be if SF faced itself 2000cores vs 64 cores, It that case the win percentage would be much more heavily in favor of the more powerful hardware.

From those observations we can make a conclusion - Alpha Zero is not so close in strength to SF as Google would like us to believe. Incorrect match settings suggest either lack of knowledge about classical brute-force calculating engines and how they are properly used, or intention to create conditions where SF would be defeated.

With all that said, it is still an amazing achievement and definitively fresh air in computer chess, most welcome these days. But for the new computer chess champion we will have to wait a little bit longer.
________

- It is not necessary to play perfectly, only better than your opponent.

- I want this program for stock exchange analysis !!!

- The funny and ironic thing is that the games between AlphaZero and Stockfish were much more exciting than the games in London Chess Classic.

- I read the full paper.

As was said in the first comment above, the result, at least to me as a spectator, appears to be not completely valid, because the hardware was not equivalent between the two engines. I don't understand why the DeepMind team wouldn't grant Stockfish the same hardware as their engine. Alternatively, why not run AlphaZero on the same hardware as Stockfish?
The time allotted for each move as well as the rules for resignation, on the other hand, seem to have been clearly thought-out and implemented.

Nevertheless, a stunning result from a program that learned by itself without the benefit of human input in the form of evaluation functions, heuristics, and what not.

- Hardware was not the same because a different approach is used. Stockfish can't use tensor processing units. In any case they provide a comparison in the paper how strength varies based on time per move. Stockfish still wins in short time controls based on limitations of the Monte Carlo simulation approach when there are not "enough" simulations.

- What's the status of the "academic paper"? Has it been peer-reviewed, i.e. checked by independent experts not involved in the study - who may (or may not) raise points here first mentioned by maelic?

It doesn't seem to be the case - just an authors' preprint, no scientific journal, no acknowledgements to reviewers. If this is the case, it won't be taken seriously in the scientific world but considered "grey literature".

- I do not think that arguments of unfairness in hardware plays a major role, because Figure 2 in the report shows that Stockfish is reaching a plateau and more thinking time only marginally increases ELO. So more hardware for Stockfish will not make it much stronger.

We also have to consider that AlphaGo also was at a disadvantage as it did not use opening and endgame books as far as I understand but Stockfish was. What would happen if AlphaGo could use endgame tables and opening books??

Regarding whether the paper is peer-reviewed or not is not very relevant. If they make a reasonable case, show the games and other can follow their arguments, not much will change. In theoretical physics, pre-prints are taken very seriously.
________

Stockfish was reaching a plateau, yes, but maybe it was because of hardware limitations! I'm by no means a computer hardware expert, but I don't think 'better' hardware means more thinking time; rather, it allows faster processing, exploration of more positions, deeper calculations, etc.

Your point about AlphaZero being at a disadvantage because it didn't have opening books and endgame tablebases is a valid one.

- Frankly I find there is a lot of hype around all this, but if just one looks at the games it should become apparent that Stockfish made a number of incomprehensible mistakes, and I am referring specifically to the ones which GM Hammer considers as positional masterpieces, so I honestly do not share his excitement at all.

- In essence, this AlphaZero algorithm of tabula rasa (self-play reinforced learning) is an outside-the-box approach.

Although it defeated SF (28x in 100 games), the chip used by the DeepMind program was much superior. Since the algorithm approaches of the two chess engines are radically different, it's premature to conclude if indeed AlphaZero is superior. For example, IFFY Stockfish is backed by the equivalent 2000 Haswell cores of AlphaZero's 4 TPUs, will the game score now even out (or even shift to SF favor)?

**Egidijus Zeromskis** · Thursday, 7th December, 2017, 12:55 AM

Re: Chess Mastered with a Self-Play Algorithm

Do I understand correctly that the algorithm after the learning period gains (statistical) knowledge of all parts of the game (opening, middle-, end-game) and stores that all information somewhere? If that is right, the match is not really right, as stockfish is not great without opening and endgame bases.
While technical features of the Stockfish computer looks great, its play looks not that great.

I think that if to make a longer learning period, AZ would try to solve a chess. Hmm, but they say it is impossible with the current and future hardware - too many possibilities.

Anyway, the article gave some new ideas in chess programing. Lets hope that there will be more development in this field, and AZ will last longer that Deep Blue (dismantle after the match).

**Vlad Drkulec** · Thursday, 7th December, 2017, 02:02 AM

Re: Chess Mastered with a Self-Play Algorithm

It seems a bit fishy to me. AlphaZero learned to play book openings by playing itself for a very short period of time. The show Person of Interest had a AI computer learn/solve chess in a similar fashion. I think it would be more interesting and believable if Stockfish was run on equivalent hardware with opening book and endgame tablebases by a separate team of programmers. I wonder if this too is fake news. It is cool if it is real but...

**Sid Belzberg** · Thursday, 7th December, 2017, 02:20 AM

Re: Chess Mastered with a Self-Play Algorithm

Originally posted by Vlad Drkulec View Post

It seems a bit fishy to me. AlphaZero learned to play book openings by playing itself for a very short period of time. The show Person of Interest had a AI computer learn/solve chess in a similar fashion. I think it would be more interesting and believable if Stockfish was run on equivalent hardware with opening book and endgame tablebases by a separate team of programmers. I wonder if this too is fake news. It is cool if it is real but...

They married an Monte Carlo tree Search to a deep residual convolutional neural network stack. The Monte Carlo Tree Search is Heuristic search algorithm. This technology was not around several years ago. It is a radically new methodology for a computer to teach itself starting with a clean slate. I am certain it is for real. These things run on the Tensorflow research cloud that offers a research cloud with 180 petaflops(!) of cpu power. That is 10 to the power of 15 floating point calculations per second (Quadrillion!). 180000 trillion calcs per second! It is very real!
https://www.tensorflow.org/tfrc/

Here is the paper with 10 example games between alpha zero and stockfish. arxiv.org is a well known repository of scientific papers. Alphazero totally crushed stockfish! Hardly "fake news".
https://arxiv.org/pdf/1712.01815.pdf

**Vlad Drkulec** · Thursday, 7th December, 2017, 03:31 AM

Re: Chess Mastered with a Self-Play Algorithm

Originally posted by Sid Belzberg View Post

They married an Monte Carlo tree Search to a deep residual convolutional neural network stack. The Monte Carlo Tree Search is Heuristic search algorithm. This technology was not around several years ago. It is a radically new methodology for a computer to teach itself starting with a clean slate. I am certain it is for real. These things run on the Tensorflow research cloud that offers a research cloud with 180 petaflops(!) of cpu power. That is 10 to the power of 15 floating point calculations per second (Quadrillion!). 180000 trillion calcs per second! It is very real!
https://www.tensorflow.org/tfrc/

Here is the paper with 10 example games between alpha zero and stockfish. arxiv.org is a well known repository of scientific papers. Alphazero totally crushed stockfish! Hardly "fake news".
https://arxiv.org/pdf/1712.01815.pdf

Alphazero totally crushed a somewhat crippled version of Stockfish. I'm not saying what they have done is not impressive but can they beat a fully functional Stockfish not limited to one minute per move and with a competing team managing Stockfish's computer hardware and parameters. My impression of the games is that they were cherry picked. I find it hard to believe that the computer would so quickly find an Avrukh like repertoire on its own.

**Sid Belzberg** · Thursday, 7th December, 2017, 04:12 AM

Re: Chess Mastered with a Self-Play Algorithm

Originally posted by Vlad Drkulec View Post

Alphazero totally crushed a somewhat crippled version of Stockfish. I'm not saying what they have done is not impressive but can they beat a fully functional Stockfish not limited to one minute per move and with a competing team managing Stockfish's computer hardware and parameters. My impression of the games is that they were cherry picked. I find it hard to believe that the computer would so quickly find an Avrukh like repertoire on its own.

Go was magnitudes more difficult to master then chess and the thing did it. I am fully confident the games were not cherry picked and even beefing up stockfish's hardware would not come even close to beating this type of software. This is very telling here, the thing that beat stockfish had nohing to do with brute force and everything to do with software methodologies. This is an entire new way of AI and chess.

Originally posted by Chess Paper

"We also analysed the relative performance of AlphaZero ’s MCTS search compared to the state-of-the-art alpha-beta search engines used by Stockfish and
Elmo. AlphaZero searches just 80 thousand positions per second in chess and 40 thousand in shogi, compared to 70 million
for Stockfish and 35 million for Elmo.
AlphaZero compensates for the lower number of evalu-ations by using its deep neural network to focus much more selectively on the most promising
variations – arguably a more “human-like” approach to search, as originally proposed by Shan-non (27). Figure 2 shows the scalability of each player with respect to thinking time, measured
on an Elo scale, relative to Stockfish or Elmo with 40ms thinking time. AlphaZero ’s MCTS scaled more effectively with thinking time than eitherStockfish or
Elmo, calling into questionthe widely held belief (4, 11) that alpha-beta search is inherently superior in these domains."

I also think that this spells curtain for anticomputer cheating variants of chess such as those worked on by Paul Bonham. This thing takes a heuristic approach to learning that means that it can easily master games with incomplete information such as No Limit Texas Holdem poker that for the first time can beat poker pro's at their own game.

**Egidijus Zeromskis** · Thursday, 7th December, 2017, 10:35 AM

Re: Chess Mastered with a Self-Play Algorithm

Originally posted by Sid Belzberg View Post

They married an Monte Carlo tree Search to a deep residual convolutional neural network stack. The Monte Carlo Tree Search is Heuristic search algorithm. This technology was not around several years ago. It is a radically new methodology for a computer to teach itself starting with a clean slate.

You know this sounds as a total makra-adabra with many fancy words. I would like to know how different it is compared to the Kotov's huge-tree method on a finger level. Do you mean by the technology their TPU processors? I did not get what kind of and how large the memory they use to store its gained knowledge. Do you think that Stockfish after 40 000 000 games would be same as a just installed if it was allowed to build it's own opening tree and endgame tablebase from those 40M games?

I looked through several games. Stockfish was playing like an amateur. In many games its pieces were really badly places - like blocked Queen on h8; in other game a combination Ra8, Nb8, Nb7, Ba6. :?

Would love too see ideal match games - AZ vs AZ :)

**Vlad Drkulec** · Thursday, 7th December, 2017, 11:12 AM

Re: Chess Mastered with a Self-Play Algorithm

Originally posted by Egidijus Zeromskis View Post

I looked through several games. Stockfish was playing like an amateur. In many games its pieces were really badly places - like blocked Queen on h8; in other game a combination Ra8, Nb8, Nb7, Ba6. :?

Yes. Stockfish doesn't play that badly on my computer. The opening in one or two of the games mirror chessbase videos on the openings played. These are suspicious parts of this story.

Would love too see ideal match games - AZ vs AZ :)

AZ Powerbook coming out on Chessbase soon.

**Sid Belzberg** · Thursday, 7th December, 2017, 11:59 AM

Re: Chess Mastered with a Self-Play Algorithm

Originally posted by Egidijus Zeromskis

You know this sounds as a total makra-adabra with many fancy words.

Because you have not taken the time to look up what the words mean does not mean they are baloney. Here is the original alpha go zero paper that actually learned to play go in a matter of days at a world championship level starting with only the basic rules. This was a far more impressive feat then mastering chess. As far as the quality of the games go I am very sure that at this point this thing could beat any traditional chess playing program and have no doubt you will see more convincing games shortly. Educate yourself before jumping to conclusions.
https://www.nature.com/articles/natu...wjxeTUgZAUMnRQ

Chess Mastered with a Self-Play Algorithm

Chess Mastered with a Self-Play Algorithm

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment