With all the talk in another thread about AlphaZero and Stockfish and a paradigm shift, I want to present a "numerical look" at chess perfection. This will be somewhat philosophical.
So let's play God for a moment, and imagine that we have in front of us the total chess search tree. Every possible variation from the opening position is in there. It's a tree structure, so you can go back and forth along the branches, or to any node in the tree at will. All branches terminate with a result, so every move has a score based on: how many wins for the moving side it leads to, how many losses for moving side it leads to, how many draws it leads to.
Ok, so let's make a few assumptions:
(1) the average number of choices at any ply of a chess game is 35.
(2) the average length of a chess game between any 2 chess engines, not including opening book or endgame tablebase moves, is 80 plies for each engine.
What we are going to try and do is "score" chess perfection. Not a rating, but a "score" based on the following:
From assumption (1), if we sort all 35 average choices on a given ply based on the value each move has in God's chess search tree, then we assign a perfection score to each move where the top (best) move gets 35 points, the 2nd best move gets 34 points, 3rd best move gets 33 points, all the way down to worst move gets 1 point. The only value these points have is in assessing how perfect a person or engine is playing chess.
If we had a perfect chess engine that played best move on each and every ply guaranteed, and it played the average of 80 plies per game, it's perfection score would be 80 x 35 points = 2800 points (don't confuse this with 2800 ELO, not at all the same thing).
So 2800 points is a typical chess perfection score for a typical 80 ply game.
Now let's say we have an engine that plays 2nd best move on every ply, guaranteed. Its chess perfection score would be 80 x 34 points = 2720 points. This engine is playing at a perfection rate of 2720/2800 * 100 = 97.143% rounded to three decimal places. Of course, you'd get the same result by doing 34/35 * 100.
If a person or engine played the worst possible move on every ply guaranteed, it would achieve the lowest possible perfection score of 1/35 * 80 = 80 points. We assume it still could last 80 plies in a typical game, but in reality it would be lasting maybe 20, so let's be realistic and give it a perfection score of 20 points.
Now here's where things get interesting: where on this scale between 20 points (worst) and 2800 points (best) do you think the current version of Stockfish 8 would be, playing on a typical computer of quad core i7, 16 GB RAM, with opening book and endgame tablebase enabled (not relevant because those moves coming from opening book or endgame tablebase don't count in the perfection score)?
Would it be maybe equivalent to a theoretical engine that always plays 2nd best move guaranteed on every ply? Would it be 97.143% perfect?
Here's why I ask this question: because there is a limit to absolutely perfect chess, which I think a lot of people are ignoring. And if we are to believe that AlphaZero has a 28 - 0 score in 100 games versus that version of Stockfish playing on that hardware, that means we must believe that version of Stockfish playing on that hardware is very far down in perfection score. Like maybe equivalent to only playing maybe the 4th best move on every ply.
But when we look at Stockfish's moves versus other engines, can we really believe it is playing only the 4th best move on every ply? That would mean on every ply, there are always 3 better moves!
But of course, these are averages. Stockfish would perhaps be playing absolute best move on say half the plies. But that means on the other half, it must be playing only the 7th or 8th best move. Does that really sound believable?
It could be playing 7th or 8th best moves in cases where the top 7 or 8 best moves are almost identical in score. Remember, the move scores come from a chess tree bigger than all the atoms in our universe. So these move scores will not be identical, there would almost never be a tie to infinite decimal places of two of these scores. But they could be so close that we could say they are VIRTUALLY identical.
How often would this come up in a typical game, that there are 7 or 8 moves for the side to move that are just about identical in value on God's search tree? As much as 40 plies of an 80 ply game?
You can see why I warned this is a philosophical approach -- there are no hard and fast answers to these questions. I'm just presenting another way of looking at the whole question of where current engines may be on the path to absolute chess perfection. We've been up to now largely assuming that today's alpha-beta brute force engines are very close to such perfection.
If AlphaZero is really that much better, to win 28 out of 28 decisive games against Stockfish at or near its best, then it challenges that viewpoint. And THAT means that you should be able to go through a typical Stockfish 8 game played on typical hardware as described, and find a few dozens of plies where it played 5th, 6th, 7th or 8th best move. I wonder if anyone could actually do that, and identify those plies, and prove with analysis that there were on all those plies a handful of slightly better or at least equal moves that Stockfish did not play.
So let's play God for a moment, and imagine that we have in front of us the total chess search tree. Every possible variation from the opening position is in there. It's a tree structure, so you can go back and forth along the branches, or to any node in the tree at will. All branches terminate with a result, so every move has a score based on: how many wins for the moving side it leads to, how many losses for moving side it leads to, how many draws it leads to.
Ok, so let's make a few assumptions:
(1) the average number of choices at any ply of a chess game is 35.
(2) the average length of a chess game between any 2 chess engines, not including opening book or endgame tablebase moves, is 80 plies for each engine.
What we are going to try and do is "score" chess perfection. Not a rating, but a "score" based on the following:
From assumption (1), if we sort all 35 average choices on a given ply based on the value each move has in God's chess search tree, then we assign a perfection score to each move where the top (best) move gets 35 points, the 2nd best move gets 34 points, 3rd best move gets 33 points, all the way down to worst move gets 1 point. The only value these points have is in assessing how perfect a person or engine is playing chess.
If we had a perfect chess engine that played best move on each and every ply guaranteed, and it played the average of 80 plies per game, it's perfection score would be 80 x 35 points = 2800 points (don't confuse this with 2800 ELO, not at all the same thing).
So 2800 points is a typical chess perfection score for a typical 80 ply game.
Now let's say we have an engine that plays 2nd best move on every ply, guaranteed. Its chess perfection score would be 80 x 34 points = 2720 points. This engine is playing at a perfection rate of 2720/2800 * 100 = 97.143% rounded to three decimal places. Of course, you'd get the same result by doing 34/35 * 100.
If a person or engine played the worst possible move on every ply guaranteed, it would achieve the lowest possible perfection score of 1/35 * 80 = 80 points. We assume it still could last 80 plies in a typical game, but in reality it would be lasting maybe 20, so let's be realistic and give it a perfection score of 20 points.
Now here's where things get interesting: where on this scale between 20 points (worst) and 2800 points (best) do you think the current version of Stockfish 8 would be, playing on a typical computer of quad core i7, 16 GB RAM, with opening book and endgame tablebase enabled (not relevant because those moves coming from opening book or endgame tablebase don't count in the perfection score)?
Would it be maybe equivalent to a theoretical engine that always plays 2nd best move guaranteed on every ply? Would it be 97.143% perfect?
Here's why I ask this question: because there is a limit to absolutely perfect chess, which I think a lot of people are ignoring. And if we are to believe that AlphaZero has a 28 - 0 score in 100 games versus that version of Stockfish playing on that hardware, that means we must believe that version of Stockfish playing on that hardware is very far down in perfection score. Like maybe equivalent to only playing maybe the 4th best move on every ply.
But when we look at Stockfish's moves versus other engines, can we really believe it is playing only the 4th best move on every ply? That would mean on every ply, there are always 3 better moves!
But of course, these are averages. Stockfish would perhaps be playing absolute best move on say half the plies. But that means on the other half, it must be playing only the 7th or 8th best move. Does that really sound believable?
It could be playing 7th or 8th best moves in cases where the top 7 or 8 best moves are almost identical in score. Remember, the move scores come from a chess tree bigger than all the atoms in our universe. So these move scores will not be identical, there would almost never be a tie to infinite decimal places of two of these scores. But they could be so close that we could say they are VIRTUALLY identical.
How often would this come up in a typical game, that there are 7 or 8 moves for the side to move that are just about identical in value on God's search tree? As much as 40 plies of an 80 ply game?
You can see why I warned this is a philosophical approach -- there are no hard and fast answers to these questions. I'm just presenting another way of looking at the whole question of where current engines may be on the path to absolute chess perfection. We've been up to now largely assuming that today's alpha-beta brute force engines are very close to such perfection.
If AlphaZero is really that much better, to win 28 out of 28 decisive games against Stockfish at or near its best, then it challenges that viewpoint. And THAT means that you should be able to go through a typical Stockfish 8 game played on typical hardware as described, and find a few dozens of plies where it played 5th, 6th, 7th or 8th best move. I wonder if anyone could actually do that, and identify those plies, and prove with analysis that there were on all those plies a handful of slightly better or at least equal moves that Stockfish did not play.
Comment