A new game-by-game rating system

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • A new game-by-game rating system

    I've come up with a new way to score chess games. It's a way in which a rating number can be applied, independently to each player, from just a single game in which all moves are provided. It doesn't depend on who is playing or on what their ELO ratings are. It depends on what moves are played during the game, and it is called Game Performance Rating (GPR). Well, that's what I'm calling it, and I'm the one doing it.

    Disclaimer: it is possible someone else has already thought of something like this. After all, there's a lot of move by move analysis being done these days, to detect cheaters for example. But so far I haven't heard of an actual scoring system based on move by move analysis. If any of you have, please advise.

    For a long time I"ve not liked the traditional chess rating system. It's just too slow. I mean, you play a game, you win lose or draw, and your overall rating goes up or down some tiny amount depending on who you played and what bis or her rating was. Although the current rating system is not bad in terms of things like tournament pairings, the question is whether "not bad" is really good enough. For example, you might have lost the game in something like 80 moves where you played excellently throughout then blundered in the endgame. The strength of your play during the game is not captured at all. Only the result is tabulated.

    So after lots of thinking, I've figured out a system to make individual moves the granularity. And again, it doesn't depend on who you are playing or his or her current rating. When you play an individual move, it shouldn't depend on who you are playing. Maybe sometimes it does, but it shouldn't.

    Ok, so I'm not yet going to explain how this works. But I tried it on a pgn file that i happened to have, of a game between Fischer and Smyslov in 1965 in Havana Cuba. Fischer wasn't yet at the apex of his career, he was still 7 years away from his World Championship. Smyslov was a veteran Russian player and World Champion from I think the 1950's, I don't know much about him. Here is the pgn of this particular game:

    [Date "1965.06.26"]
    [Result "1-0"]
    [FEN "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1"]
    [White "Bobby Fischer"]
    [Black "Vassily Smyslov"]
    [Event "The Capablanca memorial"]
    [Site "Havana,Cuba"]
    [Round "2"]
    1.e4 e5 2.Nf3 Nc6 3.Bb5 a6 4.Ba4 Nf6 5.d3 d6 6.c3 Be7 7.Nbd2 O-O 8.Nf1 b5 9.Bb3 d5 10.Qe2 dxe4 11.dxe4 Be6 12.Bxe6 fxe6 13.Ng3 Qd7 14.O-O Rad8 15.a4 Qd3 16.Qxd3 Rxd3 17.axb5 axb5 18.Ra6 Rd6 19.Kh1! Nd7?! 20.Be3 Rd8?! 21.h3?! h6?! 22.Rfa1 Ndb8 23.Ra8 Rd1+ 24.Kh2 Rxa1 25.Rxa1 Nd7 26.b4! Kf7 27.Nf1 Bd6 28.g3 Nf6 29.N1d2 Ke7 30.Ra6 Nb8 31.Ra5! c6 32.Kg2 Nbd7 33.Kf1 Rc8? 34.Ne1! Ne8 35.Nd3 Nc7 36.c4! bxc4 37.Nxc4 Nb5 38.Ra6 Kf6 39.Bc1! Bb8 40.Bb2 c5 41.Nb6 Nxb6 42.Rxb6 c4 43.Nc5 c3 44.Bc1 1-0

    Based on these moves, here is my GPR assessment of the game:

    Fischer: GPR 8571
    Smyslov: GPR 4272

    As you will notice, Fischer'r GPR is almost exactly twice Smylov's. Does that mean Fischer played "twice as strong" as Smyslov? Well, if you knew the process I'm using, you would likely agree that yes, it does mean that.

    This new system gives a whole slew of numbers to the game itself, numbers much different from 1-0 or 0-1 or 1/2-1/2. These numbers are fair and unbiased to either player, any subjective calculations are applied equally to each player. Obviously you all don't know what to think because you don't know what i'm doing. I don't know if or when I'm going to disclose what I'm doing, so you can call me crazy or say I don't have a clue what i'm doing if you want. Yes, it's a faulty process, but the key is that the faults are being applied equally to each player, so the relativity of the rating numbers should br solid.

    I"m going to be posting more on this topic as time goes on. The process is time-consuming and I can't see it ever being used at a tournament site in real time, although someday that might actually be possible, perhaps for tie-breaks for example. I don't expect it to ever replace the ELO system.

    Next up... I will do a similar calculation for a game that was played at much shorter time control, and I am expecting it to show that each player, who will be super GM level, will play worse than Fischer and Smyslov did in the above game, because of the shortened time control. How much worse I don't yet know.

  • #2
    Interesting.

    I am not sure if my comment is relevant, but what about players deliberately preparing for for a specific opponent, knowing his/her tendencies, weaknesses, strengths and so forth? In chess many players do not simply play the pieces, they play the psychology of the opponent, particularly at the higher levels. A blunder according to the engines might be a brilliancy according to the psychology of the opponent. And what about bluffs?

    Also, there is the old expression, "It ain't how, it's how many."

    Comment


    • #3
      Originally posted by Brad Thomson View Post
      Interesting.

      I am not sure if my comment is relevant, but what about players deliberately preparing for for a specific opponent, knowing his/her tendencies, weaknesses, strengths and so forth? In chess many players do not simply play the pieces, they play the psychology of the opponent, particularly at the higher levels. A blunder according to the engines might be a brilliancy according to the psychology of the opponent. And what about bluffs?

      Also, there is the old expression, "It ain't how, it's how many."

      It should get more interesting as more results get provided.

      One thing I should mention is that openings are not rated. So preparation and opening theory don't come into the result. Now, in the middlegames and endgames, it is certainly possible that a bluff or a cheapo could be attempted (especially in faster time controls) and this would bring down the rating, even if the bluff resulted in a win. The opponent's rating would also suffer if he or she went for the bluff, of course. But that's the price of increased granularity and being more "up to the minute".

      Comment


      • #4
        Ok, here's another pair of Game Performance Rating (GPR) calculations on the Rapid game whose pgn is given below:

        [Date "2021.06.27"]
        [Result "1/2-1/2"]
        [FEN "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1"]
        [White "Magnus Carlsen"]
        [Black "Anish Giri"]
        [Event "Goldmoney Asian Rapid 2021"]
        [Round "7"]
        1.e4 e5 2.Nf3 Nc6 3.Bc4 Nf6 4.d3 Bc5 5.O-O d6 6.c3 O-O 7.Re1 a6 8.a4 Ba7 9.Nbd2 Ng4 10.Re2 Kh8 11.b4 f5 12.Bb3 Bd7 13.Ra2 Qe8 14.exf5 Bxf5 15.Nf1 Be6 16.b5 Bxb3 17.Qxb3 Ne7 18.d4 Ng6 19.c4 e4 20.Ng3 exf3 21.Rxe8 Raxe8 22.Qd1 Nh4 23.h3 fxg2 24.Re2 Rxe2 25.Qxe2 Nxf2 26.Be3 Nxh3+ 27.Kh2 g1=Q+ 28.Bxg1 Nf3+ 29.Kh1 Nhxg1 30.Qe7 Rg8 31.Nf5 Nxd4 32.Nh6 gxh6 33.Qf6+ Rg7 34.Qf8+ Rg8 35.Qf6+ Rg7 36.Qf8+ Rg8 37.Qf6+ 1/2-1/2

        Based on these moves, here is my rating assessment of the game:

        Carlsen Game Performance Rating - Rapid (GPR-R): 1238
        Giri Game Performance Rating - Rapid (GPR-R): 1058


        As we see, the game was drawn even though Carlsen significantly outplayed Giri. We know what a difference of 180 rating points means roughly in ELO, but we don't really know that yet for this new system... but we do know it is fairly significant.

        But what's really interesting is to compare these 2 numbers with the numbers Fischer and Smyslov put up in a normal classical time control:

        Fischer: GPR 8571
        Smyslov: GPR 4272


        Basically Carlsen played at only 1/7th the strength of Fischer at slow time control. This is what you are getting with Rapid chess! Much, much weaker play.

        I just noticed something too, these games were played 56 years and 1 day apart. I picked them randomly, so that's a bit amazing.

        Carlsen's two weakest moves were 24.Re2 and 26.Be3. Both of these would be considered serious blunders in a slow time control. Giri was even worse, with 26...Nxh3+ instead of 26...Nd3. Truly a monumental gaffe for a super GM to play, but it's Rapid chess. Giri also had 24...Rxe2 which was much worse than ...Nxf2.

        It's interesting that they both blundered on the same moves, 24 and 26. Then Carlsen threw a win away with 31.Nf5 instead of 31.Qxc7, but this was not as big number-wise as his other 2 blunders.

        Well, I'll wait and see what comments this brings about if any.

        Comment


        • #5
          If I may suggest, search for documentation on Dr. Ken Regan's IPR (Intrinsic Performance Rating)

          Comment


          • #6
            It may be possible to argue that Tal, who played many "unsound" moves that the engines reject, would have won fewer games if he had played better moves, since his opponents would not have been baffled by the complications Tal presented by his "unsound" moves. If so, then the absolute quality of the moves is not the point, at least with respect to Tal, rather it is the psychology of the moves, and the final results that count, and not whether or not the player tends to play moves selected by engines more often than his opponents, that decides who the better player is. Chess is about winning, the player that wins the most is the best, irrespective of whether or not the engines like his/her games better than those of someone else. Engines would tell us that Tal made a huge number of mistakes, but that his opponents made even worse ones. Engines would probably also tell us that Capablanca's opponents made far fewer mistakes than the opponents of Tal, but that Capablanca made fewer mistakes still. Perhaps your ideas could be expanded by considering Tal games versus those of Capablanca? There may not be an elite player whom according to the engines made more mistakes than Tal, or fewer than Capablanca. Thus by your system Capablanca, and in fact a vast number of players, would be rated far higher than Tal. Which would be absurd. Or am I missing your point?

            Comment


            • #7
              Originally posted by Brad Thomson View Post
              It may be possible to argue that Tal, who played many "unsound" moves that the engines reject, would have won fewer games if he had played better moves, since his opponents would not have been baffled by the complications Tal presented by his "unsound" moves. If so, then the absolute quality of the moves is not the point, at least with respect to Tal, rather it is the psychology of the moves, and the final results that count, and not whether or not the player tends to play moves selected by engines more often than his opponents, that decides who the better player is. Chess is about winning, the player that wins the most is the best, irrespective of whether or not the engines like his/her games better than those of someone else. Engines would tell us that Tal made a huge number of mistakes, but that his opponents made even worse ones. Engines would probably also tell us that Capablanca's opponents made far fewer mistakes than the opponents of Tal, but that Capablanca made fewer mistakes still. Perhaps your ideas could be expanded by considering Tal games versus those of Capablanca? There may not be an elite player whom according to the engines made more mistakes than Tal, or fewer than Capablanca. Thus by your system Capablanca, and in fact a vast number of players, would be rated far higher than Tal. Which would be absurd. Or am I missing your point?
              My point is only to show that chess ratings can be more granular and up-to-the-minute than the slow ELO system. I don't argue that ELO is faulty, except for one thing: ELO says nothing about the quality of a single game.

              I like the points you are raising, and your questions are very valid to ask. Chess as a professional endeavor is indeed all about winning, and whatever method is best for that (and legal / ethical) should be used. Chess as a game, that is different. That is all about finding good moves in all situations, always assuming that your opponent is going to play the very best replies.

              If it could be shown that that a Tal-like method -- playing sub-optimal moves that lead into complicated quagmires of tactics to befuddle the opponent -- could take someone all the way to the World Championship, I have to think someone out there right now would be (and maybe is?) doing it. Tal himself got to a peak ELO rating of 2705, so he was beating some really good players who were playing solid chess (because I'm assuming only Tal was trying this method?). Did Tal himself ever write anything to the effect that "you too can win using my method, just play into very complicated tactics and your opponent will melt away!" Obviously maybe not that crass, but I'm asking seriously, did he ever try to get others to try his method?

              For that matter, are there any chess coaches / trainers who are advocating this method? Again, chess as a game is about finding good moves, and I would guess that most chess teaching techniques focus on that. But I may be wrong, I am not involved at all in the teaching of chess.

              It sounds like Tal versus Capablanca would have been a great match! And yes, part of my point too is to enable comparing greats from different eras. So yes, using this method to compare Tal and Capablanca would be a fantastic idea.

              Can you provide me with some number of Tal games (pgn) that you think are the greatest examples of his method, and an equal number of Capablanca games where he plays most flawlessly? Say maybe 8 games from each player, or less but not more than 8 because this would take some time to do the ratings. I'd be very glad to get back to you.

              Comment


              • #8
                Originally posted by Aris Marghetis View Post
                If I may suggest, search for documentation on Dr. Ken Regan's IPR (Intrinsic Performance Rating)
                Hey Aris, thanks for that! Wow, I just downloaded his whitepaper.... it's going to take a while to understand it, if I even can understand it.

                My method is much simpler, but maybe I'll find it has problems that Dr. Regan had to find solutions for.

                Comment


                • #9
                  Originally posted by Pargat Perrer View Post
                  Can you provide me with some number of Tal games (pgn) that you think are the greatest examples of his method, and an equal number of Capablanca games where he plays most flawlessly? Say maybe 8 games from each player, or less but not more than 8 because this would take some time to do the ratings. I'd be very glad to get back to you.
                  I am not qualified, given my limited chess abilities. There may be others on this forum who are. It might also be interesting to look at the games of some of the Canadians. There is certainly a parallel for example, between Tal and Capablanca, and Hergott and O'Donnell. Deen loves wild games with lots of tactical opportunities while Tom loves to grind out wins while keeping the draw in hand if possible. Thus, while they were of equal strength, Deen had more decisive results while Tom tended to draw more. We could also put Lawrence Day into the Tal/Hergott category and Byron Nickoloff into the Capablance/O'Donnell category. I would be interested in your assessments of these players' games.

                  Comment


                  • #10
                    Originally posted by Brad Thomson View Post

                    I am not qualified, given my limited chess abilities. There may be others on this forum who are. It might also be interesting to look at the games of some of the Canadians. There is certainly a parallel for example, between Tal and Capablanca, and Hergott and O'Donnell. Deen loves wild games with lots of tactical opportunities while Tom loves to grind out wins while keeping the draw in hand if possible. Thus, while they were of equal strength, Deen had more decisive results while Tom tended to draw more. We could also put Lawrence Day into the Tal/Hergott category and Byron Nickoloff into the Capablance/O'Donnell category. I would be interested in your assessments of these players' games.
                    Yes, this is indeed a very interesting thread, thanks guys. I've been thinking about this quite a bit the last couple of days, and maybe there's another angle to consider (which I think we sometimes think of as the "Euwe approach"), which is how better players (in any endeavour) seem to adjust their approaches to the opponent. A very easy sports example to grasp is if a football defence is very good against the run, then opponents will usually look to prepare more passing plays. Of course, these adjustments can be taken down to lower and lower levels. I don't know for sure, but I suspect that Tal might not play all the same key moves against a Botvinnik as against a Bronstein? It would be interesting to see what at least a 2400 thinks of this discussion. Anyhow, good stuff, thanks.

                    Comment


                    • #11
                      No matter what system is used, my rating sucks and is only getting worse. LOL

                      Comment


                      • #12
                        Originally posted by Brad Thomson View Post
                        It may be possible to argue that Tal, who played many "unsound" moves that the engines reject, would have won fewer games if he had played better moves, since his opponents would not have been baffled by the complications Tal presented by his "unsound" moves. If so, then the absolute quality of the moves is not the point, at least with respect to Tal, rather it is the psychology of the moves, and the final results that count, and not whether or not the player tends to play moves selected by engines more often than his opponents, that decides who the better player is. Chess is about winning, the player that wins the most is the best, irrespective of whether or not the engines like his/her games better than those of someone else. Engines would tell us that Tal made a huge number of mistakes, but that his opponents made even worse ones. Engines would probably also tell us that Capablanca's opponents made far fewer mistakes than the opponents of Tal, but that Capablanca made fewer mistakes still. Perhaps your ideas could be expanded by considering Tal games versus those of Capablanca? There may not be an elite player whom according to the engines made more mistakes than Tal, or fewer than Capablanca. Thus by your system Capablanca, and in fact a vast number of players, would be rated far higher than Tal. Which would be absurd. Or am I missing your point?
                        I decided this morning to do a bit more research into this. I'm interested in Capablanca because he and Emanuel Lasker were the only World Champions that I know of who dabbled in and even created chess variants.

                        I found this on the Wikipedia page for Capablanca:

                        Statistical ranking systems place Capablanca high among the greatest players of all time. Nathan Divinsky and Raymond Keene's book "Warriors of the Mind" (1989) ranks him fifth, behind Garry Kasparov, Anatoly Karpov, Bobby Fischer and Mikhail Botvinnik—and immediately ahead of Emanuel Lasker. In his 1978 book The Rating of Chessplayers, Past and Present, Arpad Elo gave retrospective ratings to players based on their performance over the best five-year span of their career. He concluded that Capablanca was the strongest of those surveyed, with Lasker and Botvinnik sharing second place. Chessmetrics (2005) is rather sensitive to the length of the periods being compared, and ranks Capablanca between third and fourth strongest of all time for peak periods ranging in length from one to 15 years. Its author, the statistician Jeff Sonas, concluded that Capablanca had more years in the top three than anyone except Lasker, Karpov and Kasparov—though Alekhine had more years in the top two positions. A 2006 study found that Capablanca was the most accurate of all the World Champions when compared with computer analysis of World Championship match games, but this analysis was criticized for using a second-rank chess program, Crafty, modified to limit its calculations to six moves by each side, and for favoring players whose style matched that of the program. However a 2011 computer analysis by Bratko and Guid using the stronger engines Rybka 2 and Rybka 3 found similar results to the 2006 Crafty analysis for Capablanca.

                        So it looks like Capablanca is in fact measured by an engine technique to be the most accurate World Champion of all time (probably not including Carlsen).

                        So I guess I need to ask, why would you consider Capablanca being rated far higher than Tal to be absurd? I know you also added "and in fact a vast number of players" but that would need to be specified more accurately before considering it.


                        Comment


                        • #13
                          Originally posted by Pargat Perrer View Post
                          So I guess I need to ask, why would you consider Capablanca being rated far higher than Tal to be absurd? I know you also added "and in fact a vast number of players" but that would need to be specified more accurately before considering it.
                          It seems to me that if we assess the players based upon the quality of their moves insofar as engines appraise them, then Tal ranks much lower than many players, since so many of his moves were "unsound". But the point is that the moves worked, he was a world champion. As I said earlier, it ain't how it's how many. I think wins versus losses rather than engine appraisals is the correct way to assess chess players' strengths.

                          Comment


                          • #14
                            Originally posted by Brad Thomson View Post

                            It seems to me that if we assess the players based upon the quality of their moves insofar as engines appraise them, then Tal ranks much lower than many players, since so many of his moves were "unsound". But the point is that the moves worked, he was a world champion. As I said earlier, it ain't how it's how many. I think wins versus losses rather than engine appraisals is the correct way to assess chess players' strengths.
                            So here's a thought experiment: let's say we find someone who has some hideous disfiguration, and he plays chess events, and everyone loses to him because they are so overwhelmed by his appearance (and maybe noises) that they just can't focus on their chess at all and play terrible moves. No matter how resolute his opponents are before the games, they just break down during the games and make multiple blunders. He sets a new record for consecutive wins in FIDE events, and achieves a new record ELO rating. Does that make him the greatest player ever?

                            Now, I'm not trying to be ridiculous, in fact I like what you are arguing for and it's interesting because Alpha Zero has shown that SOME moves that most engines would consider substandard actually have more value than is being given credit. Exactly what types of moves and when to play them is something human players are anxious to discover for themselves. So there is an argument that the minimax chess engines are not getting it all correct. And the most interesting thing is, AZ is showing this AGAINST the minimax chess engines, which are immune to any psychological influence! This fascinates me, because it indicates there is still more to chess than the minimax engines are telling us.

                            But my disfigured player example is to show that psychological influences can be deceiving. Tal had an overall losing record against many of his contemporaries, as shown from his Wikipedia page, the players who had a winning record against Tal are in bold:

                            Mikhail Tal -- Career Score Versus Some Major Grandmasters

                            Only official tournament or match games have been taken into account. '+' corresponds to Tal's wins, '−' to his losses and '=' to draws.

                            Mikhail Botvinnik: +12−12=20
                            David Bronstein: +12−8=19
                            Bobby Fischer: +4−2=5
                            Efim Geller: +6−6=23
                            Anatoly Karpov: +0−1=19
                            Garry Kasparov: +1−2=9
                            Paul Keres: +4−8=20
                            Viktor Korchnoi: +4−13=27
                            Bent Larsen: +12−7=18
                            Tigran Petrosian: +4−5=35
                            Lev Polugaevsky: +2−8=22
                            Lajos Portisch: +9−5=18
                            Vasily Smyslov: +3−4=21
                            Boris Spassky: +6−9=27
                            Leonid Stein: +0−3=15
                            Miguel Najdorf: +3−1=5
                            Pal Benko: +8−1=3
                            Wolfgang Uhlmann: +4−0=3
                            Borislav Ivkov: +3−1=11
                            Svetozar Gligoric: +10−2=22

                            total..................+107 -98 =342 (wow! 342 draws!)

                            His overall record against these players is just a tad over 50%. I think Tal's unsound moves must have cost him almost as many points as they earned him. The fact that he won a World Championship is indicating that he did have great overall chess talent, and for one brief period it all came together for him.

                            Based on this, I don't think its absurd at all that Capablanca should be rated higher than Tal. But I am NOT disputing your overall point, that wins and losses should be the correct way to assess chess players' strengths, provided that the relative strength of the opponents is factored in, which with ELO it is.

                            So.... I'm not arguing against ELO. I'm only arguing that we could use a parallel rating system that tells us about individual games. It can even tell us about parts of individual games (except for openings). You can rate your middlegame strength, you can rate your rook-and-pawn endgame strength, you can rate your minor-pieces-and-pawns endgame strength, and anything else you want to rate. You can even rate your games against specific openings, its just that the play DURING the openings isn't rated.

                            And it is independent of who you are playing, it depends only on what YOU do. The only catch is that at this point in time, we don't yet know what the absolute numbers mean. We need lots of rated games for that.

                            Comment


                            • #15
                              Your thought experiment is extreme, but not completely nonsensical. We know that Ruy Lopez suggested playing with the sun at your back so that it is in your opponent's eyes, and we know too of Lasker's terrible, stinking cigars that could not have helped his opponents to play their best. Now, of course we could "judge" a game by the extent to which one's moves concided with those suggested by the top engines and "rate" it accordingly. But I think this would lead to many draws being higher rated than wins insofar as the wins were provoked by "unsound" moves that introduced complications the oppenent was unable to work through, while not a singe "unsound" move will lead to a draw if the opponent plays just as well. If we are going to rate draws higher than wins then there must be something wrong with the system.

                              Comment

                              Working...
                              X