Reasons to believe there's FIDE rating "inflation"?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Re: Reasons to believe there's FIDE rating "inflation"?

    Originally posted by Patrick Kirby View Post
    I tend to agree with Tom's point of view is just an estimate of strength relative to a certain pool.
    It is that, but I wouldn't say it's just that and nothing more.

    Tom was right that ratings are not intended to be used to compare players of different eras: they're designed (and work well) as predictors of the outcome of games between rated players. But that doesn't mean that they can't be used for a purpose they were not intended for. Addressing the question of "inflation" is a step toward understanding just what we can validly infer from the rating statistics.

    Originally posted by Patrick Kirby View Post
    a specific rating can be inflated or deflated relative to other players playing in the same pool ratings as a whole can't be deflated or inflated since a rating is by definition a relative not absolute metric.
    I though so a few weeks ago, but here's what changed my mind:
    Imagine we add (or subtract) 500 rating points from everyone's rating right now. Since rating predictions work based on the difference between the ratings of the two players, they would predict just as well as they do now, but wouldn't it make sense to talk about the ratings as a whole being "inflated" or "deflated" compared to their values before we made the 500 point change?


    Originally posted by Patrick Kirby View Post
    For the same reason there's no basis to compare the ratings of players from different eras since they competed against different pools of opposition.
    Again, I don't know why that would be true.
    [possibly not-very-helpful analogy: that's a bit like saying we can't compare strands at different ends of a rope because they're so far apart. True, but there's a lot of overlap in the connections between one end and the other.]
    It's true that Capablanca's and Fischer's opponents were all different from Carlsen's, but every year between those eras there was a lot of overlap in the pools of opponents, with (mostly) the same players competing against each other. So while a direct comparison is impossible (Carlsen can't play Capa), an indirect one may be.

    Comment


    • #32
      Re: Reasons to believe there's FIDE rating "inflation"?

      Originally posted by John Upper View Post

      I though so a few weeks ago, but here's what changed my mind:
      Imagine we add (or subtract) 500 rating points from everyone's rating right now. Since rating predictions work based on the difference between the ratings of the two players, they would predict just as well as they do now, but wouldn't it make sense to talk about the ratings as a whole being "inflated" or "deflated" compared to their values before we made the 500 point change?
      Yes, but only if we used these ratings to assign titles like Master or Grandmaster based on numerical performance using benchmarks determined before the addition (subtraction) of the points.

      Maybe better is to simply assign the highest rated player in the world a rating of 3000 and compute everyone's rating based on a comparison with their rating.

      So, for example, Carlsen is 3000, Kramnik and Aronian are about 2940, etc. When someone overtakes Carlsen then they become 3000 and Carlsen drops to whatever number his new rating would represent vs that person's number.

      In order to make a GM norm you would need to perform at say 2700, or exactly 300 points below the rating of the currently highest-rated player on the planet.
      "Tom is a well known racist, and like most of them he won't admit it, possibly even to himself." - Ed Seedhouse, October 4, 2020.

      Comment


      • #33
        Re: Reasons to believe there's FIDE rating "inflation"?

        Originally posted by Tom O'Donnell View Post
        Maybe better is to simply assign the highest rated player in the world a rating of 3000 and compute everyone's rating based on a comparison with their rating.

        So, for example, Carlsen is 3000, Kramnik and Aronian are about 2940, etc. When someone overtakes Carlsen then they become 3000 and Carlsen drops to whatever number his new rating would represent vs that person's number.
        Don't we already have something like that now: the ranking list?

        Lots of squash clubs have something like that -- a club ladder -- but they allow you to sort players only when they play head-to-head. They don't give any way to tell you e.g. how many wins against player #5 earns you the right to become player #3.

        An advantage of having a rating list which is computed independently of the chess rankings is that it allows us to make three and four-way comparisons between people who have never played each other.

        Originally posted by Tom O'Donnell View Post
        In order to make a GM norm you would need to perform at say 2700, or exactly 300 points below the rating of the currently highest-rated player on the planet.
        If I'm right, and the level of play is constantly going up, then this would make it harder (and eventually much harder) to get the GM title in the future than it is now. (though that might not be a bad thing)
        Last edited by John Upper; Thursday, 7th February, 2013, 12:43 AM.

        Comment


        • #34
          Re: Reasons to believe there's FIDE rating "inflation"?

          Originally posted by John Upper View Post
          Tom was right that ratings are not intended to be used to compare players of different eras: they're designed (and work well) as predictors of the outcome of games between rated players. But that doesn't mean that they can't be used for a purpose they were not intended for. Addressing the question of "inflation" is a step toward understanding just what we can validly infer from the rating statistics.

          It's true that Capablanca's and Fischer's opponents were all different from Carlsen's, but every year between those eras there was a lot of overlap in the pools of opponents, with (mostly) the same players competing against each other. So while a direct comparison is impossible (Carlsen can't play Capa), an indirect one may be.

          Originally posted by John Upper View Post
          Lots of squash clubs have something like that -- a club ladder -- but they allow you to sort players only when they play head-to-head. They don't give any way to tell you e.g. how many wins against player #5 earns you the right to become player #3.

          An advantage of having a rating list which is computed independently of the chess rankings is that it allows us to make three and four-way comparisons between people who have never played each other.
          Not sure if you saw my earlier post in this thread John, but I'll repeat my question so that you can try and explain WHY you want to "validly infer" whether Carlsen is better than Capa or vice-versa:

          Would you waste time trying to determine whether Paul Beckwith or Paul Morphy is the better career sub-2400 player? (according to another poster who claims Morphy was 2344).

          The point of the question is in my other post, that comparing Carlsen to Capa in this age of computer engines being regularly rated at 40/120 time controls well over 3000, even over 3100, is like comparing Beckwith to Morphy.

          WHY DO IT? And just how "valid" can it be -- it's all just conjecture, no?
          Only the rushing is heard...
          Onward flies the bird.

          Comment


          • #35
            Re: Reasons to believe there's FIDE rating "inflation"?

            Originally posted by Paul Bonham View Post
            Not sure if you saw my earlier post in this thread John, but I'll repeat my question so that you can try and explain WHY you want to "validly infer" whether Carlsen is better than Capa or vice-versa:

            Would you waste time trying to determine whether Paul Beckwith or Paul Morphy is the better career sub-2400 player? (according to another poster who claims Morphy was 2344).

            The point of the question is in my other post, that comparing Carlsen to Capa in this age of computer engines being regularly rated at 40/120 time controls well over 3000, even over 3100, is like comparing Beckwith to Morphy.
            I saw it.

            I didn't respond to it because:
            1) I was put off by its irrelevant trolling of Paul Beckwith and Mate Milinkovic. (do you do that in all your posts?)
            2) saying it's "silly" or a "waste of time" to discuss what we can infer from ratings is really just a way of saying that you're not interested in this subject and want to discuss something else. That's fine, but it's why the forum has a "New Thread" button.

            Originally posted by Paul Bonham View Post
            It's all a little silly because we have much greater players now who are head and shoulders above any human player. The search for chess perfection or the nearest thing to it should lead to the games of Houdini or Shredder or Stockfish or a dozen other chess engines.
            Of course computers are stronger than the best chess players. So?

            This thread isn't about the "search for chess perfection", it's about the significance of ratings and what we can infer from them. That's a specific case of a more general problem: the attempt to quantify quality, which is a fundamental first step in turning any topic into a science.

            I don't see how the existence of super-strong computers makes this silly or less interesting.

            Originally posted by Paul Bonham View Post
            And just how "valid" can it be -- it's all just conjecture, no?
            No. Statistics and logic can show us that some of our conventional ideas and jaded clichés are wrong.

            Comment


            • #36
              Re: Reasons to believe there's FIDE rating "inflation"?

              Originally posted by John Upper View Post

              If I'm right, and the level of play is constantly going up, then this would make it harder (and eventually much harder) to get the GM title in the future than it is now. (though that might not be a bad thing)
              It's supposed to be harder, that's the point of my recommendation. Alternatively, you could assign player number 300 in the world a rating of say 2600 and base all calculations for norms, etc. around that player's rating.

              As the communal level of play gets better, in order to distance yourself from the pack you have to get (or stay) even better. It makes no sense to say that 2200 represents "Mastery" back on date 1960 (or whatever benchmark you want to use) and then say that a person in 2013 is a Master based on that benchmark.

              Again, if Mr. X is improving at a rate of one point per year based on the measure of the "old him" vs the "new him" but the world is getting better on average by two points then that person's rating should be going down, not up. He may be improving but against the pool he's getting worse.

              That's why I was laughing my head off a few years ago when I suggested that CFC ratings were becoming inflated and a bunch of people logged on here to tell me that they were improving because they were studying, etc. Ya, well so is everyone else. You could still be improving and it would be correct for your rating to be dropping.
              "Tom is a well known racist, and like most of them he won't admit it, possibly even to himself." - Ed Seedhouse, October 4, 2020.

              Comment


              • #37
                Re: Reasons to believe there's FIDE rating "inflation"?

                If current ratings of active FIDE players are drawn in a histogram (players vs rating), they ONLY approximate a Gaussian distribution (the lower plot)


                (a bin length 25 rating points)

                An interesting observation from the Gaussian fit: it asks for more >2800 players LOL (it is quite visible in a log-scaled plot)

                The top plot is of ALL players in the FIDE rating list (active and inactive). There is a strange jump ~2000. Probably it is a result of floor lowering.

                Comment


                • #38
                  Re: Reasons to believe there's FIDE rating "inflation"?

                  Originally posted by Tom O'Donnell View Post
                  That's why I was laughing my head off a few years ago when I suggested that CFC ratings were becoming inflated and a bunch of people logged on here to tell me that they were improving because they were studying, etc. Ya, well so is everyone else. You could still be improving and it would be correct for your rating to be dropping.
                  I think when I suggested that CFC ratings were becoming highly inflated several months ago, I may have gotten the same response. Chess people sure love their high ratings
                  Shameless self-promotion on display here
                  http://www.youtube.com/user/Barkyducky?feature=mhee

                  Comment


                  • #39
                    Re: Reasons to believe there's FIDE rating "inflation"?

                    Originally posted by Brad Thomson View Post
                    If FIDE ratings are neither inflated nor deflated, does this necessarily mean that anyone today with a higher rating than Fischer's highest ever rating is therefore a better player than Fischer ever was?
                    Some thoughts on what (sometimes) makes a better player, and whether a rating system (ELO or similar) or top notch engine can really get to the heart of a player to player strength comparison.

                    There are aspects of strength involving rising to (or surpassing) the level of the [available] competition. Depends on the actual level of competition.

                    And in outfoxing/outplaying an opponent by better preparation, steering of openings and positions, move by move, into waters better navigated by self than opponent, understanding of opponent weaknesses and taking advantage of any achilles heels, subtle though they may be. Including psychological weaknesses, if such may be discerned.

                    We don't know if Fischer would have risen to (or surpassed) the strength of Karpov, Kasparov, or Carlsen, had he been able to play contemporaneously with them, study their games, use their same computer tools & databases, and meet them on level conditions.

                    He had a relatively shorter career. His development ended a bit early. How he might have progressed is a mystery.

                    Player styles are different. Some take more risks. Some play more solidly (larger chance of not losing, lesser chance of achieving a win).

                    Some may have the knack of understanding their opponent enough to be able to vary their own style (move choices & strategic or tactical plans) to most discomfit the opponent, at the most appropriate times during a game or match.

                    If chess is in fact a draw (perfectly played), a player seeking maximal results vs another human must (of necessity) play into positions where the opponent is relatively more likely to make a mistake [defined as a move that changes the position from perfect (drawable by best moves) to imperfect (loss if opponent plays best moves)], and be able to play subsequent positions relatively more accurately, in order to convert the win after such a mistake.

                    How do you measure such strength/skill/knack, i.e. the ability to conquer any opponent you face, by utilizing variously 1) objectively best moves, 2) measured risk moves (to unbalance an even position enough that the opponent is likely to go wrong), 3) deep bluffing moves (when you realize you may be lost unless you take extra risk or introduce extra complications), 4) deep trap moves that may be easy for an opponent to overlook.

                    Using best available chess engines to measure average error in the recorded games of chess giants seems to be flawed if chess is indeed a draw (perfectly played). Because the engines do not know (for sure) when an evaluation advantage (fraction of a pawn or whatever) has crossed from the theoretical draw into a theoretical win/loss.

                    Hypothetically, if chess is indeed a draw (perfectly played), the average error for chess champions should be based on how many times they played a move that is an actual mistake--that actually lowers the game theoretical value (win to draw, or draw to loss).

                    Making moves of one or two-tenths of a pawn error (per Houdini or Rybka or Stockfish evaluation) may not in fact be an error, though their calculations may be the best we have to rely on.

                    Opening positions & middlegame positions, with exponentially possible game trees thereafter, are generally beyond current computing ability to prove such engine flagged 'errors' have indeed changed the theoretical expectancy value (1, 1/2, 0) of the game. Same, perhaps, for more complex endgames.

                    Karpov may have played the white side of a Caro-Kann better than any Fischer could have played the black side (for talking purposes here), but Fischer may have been able to plan for that by adopting a Sicilian defense where he may have understood he had better chances to Win (else Draw).

                    Fischer's (or any other truly great player's) genius might in part have been (just illustrating my ideas here), in part, the ability to outplay a Karpov or a Kasparov or a Carlsen by steering games into positions Fischer had investigated and/or understood better than they themselves. To attempt to hoist them by their own petards, as it were.

                    Same for queen pawn openings, flank openings, etc, where the better player may be better able to happily & successfully go.

                    So how does the rating difference account for individual match-up results predictions (Elo not truly intended for such), or the genius/knack of rising to the occasion, no matter who the opponent, by better preparation of openings, and better understanding of self & opponent abilities & weaknesses (including psychological, if any), and relatively better handling of the opening/middlegame/endgame situations to maximize ones own strengths and maximize the pressure on the opponent's weaknesses.

                    And how does an engine provided series of eval differences (preferred engine moves vs. human chosen moves) over many games truly represent the real mistake measure from a game theoretical value (1, 1/2, 0) perspective.

                    Over the years (even now) we've seen engines that don't know when to mix it up, when to complicate, when to bluff, when to take a knowing risk to discomfit the opponent, to try to maximize the human chances of going wrong.

                    They don't always play openings well on their own, needing a fixed book to keep them out of trouble in the openings.

                    They don't always sac a pawn (give up material) when it may lead to overwhelming difficulties for the opponent. Their intuition (hah!) or calculation or tree pruning algorithms are unable to determine the appropriate moment to tighten the screws against an all too human opponent.

                    Sure they play strongly, sure they can/may win a huge preponderance of the time, but they do not exhibit the knack of genius, the understanding of human strengths & weaknesses, the ability to adjust to try to ensure a draw (to hold a match, win a tourney) or take necessary risks to imbalance and win ... when warranted by a game or match or tourney situation.

                    How can we take such aspects into account in our theoretical discussions and engine assisted historical analyses of games and moves and results?

                    Food for thought.

                    Comment


                    • #40
                      Re: Reasons to believe there's FIDE rating "inflation"?

                      Originally posted by John Upper View Post
                      I saw it.

                      I didn't respond to it because:
                      1) I was put off by its irrelevant trolling of Paul Beckwith and Mate Milinkovic. (do you do that in all your posts?)
                      So you can compare Carlsen to Capablanca, and that isn't "trolling" Carlsen, but when I do the same, comparing Beckwith to Morphy, I'm trolling Beckwith? And if you look at my post, you'll see that that is ALL I was doing, using Beckwith as an example of a career sub-2400 player (which is a matter of public record and was done with no disrespect).

                      And when I needed an example of someone who is concerned about 400-point rating gaps between players, there was no better example than Mate Milinkovic, who launched a thread on that topic. How is that trolling him?

                      Try again, you'll have to do much better than that. So far on the logic and inference front, you're 0 for 1.



                      Originally posted by John Upper View Post
                      2) saying it's "silly" or a "waste of time" to discuss what we can infer from ratings is really just a way of saying that you're not interested in this subject and want to discuss something else. That's fine, but it's why the forum has a "New Thread" button.
                      I most definitely AM interested in this subject, and that is why I am asking you for an explanation of WHY you pursue this wild goose chase of trying to "validly infer" that Carlsen is better than Capablanca ever was or vice-versa.

                      Now you are 0 for 2.


                      Originally posted by John Upper View Post
                      Of course computers are stronger than the best chess players. So?

                      This thread isn't about the "search for chess perfection", it's about the significance of ratings and what we can infer from them. That's a specific case of a more general problem: the attempt to quantify quality, which is a fundamental first step in turning any topic into a science.
                      You should be VERY CAREFUL when talking about any more general problem of quantifying quality, when you've already said that the original purpose of chess ratings "doesn't mean that they (chess ratings) can't be used for a purpose they were not intended for".

                      Can you think of any historical figures who might have wanted to create such a science, and who tried to do it using some kind of statistics for purposes they weren't intended for?

                      I am sure you have no greater interest in this general problem other than the specific case you mention (comparing players of different generations). If you somehow "solved" this wider general problem and you or someone else used your solution to help create a science, what would you think if your solution or science were credited with starting a prejudice against a whole group of people? There are already people out there who think chess ratings are indicative of more than just chess ability. You should let such sleeping dogs lie.

                      Now maybe you understand why I ask: WHY do you want to try and "validly infer" that Carlsen is better than Capablanca (or vice-versa)?




                      Originally posted by John Upper View Post
                      No. Statistics and logic can show us that some of our conventional ideas and jaded clichés are wrong.
                      Again you speak in general terms and try and glorify statistics and logic. I asked how valid a comparison between Carlsen and Capablanca can be and you insist that statistics and logic can (apparently) give us a definitive answer to this question. I defy you to produce such a definitive answer in a form that cannot be either outright refuted or at least reduced to a hypothesis. Be forewarned I am familiar with mathematical induction, although I don't think it would ever come to that. The very fact that you've mentioned some kind of intergenerational overlap of pools of players used in ratings calculations tells me that you're on a very slippery slope.

                      You've already falsely accused me of trolling and falsely stated I'm not interested in the subject of this thread. Now you're getting deeper, indicating that there are jaded cliches we all have that are wrong. Normally that would be a rather safe and accepted thing to say, but given everything else, you're painting a picture that's looking darker and darker.

                      I think you should give up on this thread while you are only behind.
                      Only the rushing is heard...
                      Onward flies the bird.

                      Comment


                      • #41
                        Re: Reasons to believe there's FIDE rating "inflation"?

                        To reference a few items across multiple replies in this thread, first the "2344" for Morphy is the "Intrinsic Performance Rating" (IPR) computed in my compendium for what Tim Krabbe' identified as his 59 most important games. (I have not updated the compendium in awhile, pending a large upgrade to my model, which last month's alleged-cheating developments delayed further.)

                        Fischer's IPR's were 2920, 2830, and 2970 for his three Candidates' Matches in 1971. That says today's top crowd would still have been the underdog in those matches, though they might have prevailed against the nervous inaccuracies shown in the WC match. And regarding how he'd do if transported to today with ChessBase and databases and current theoretical knowledge, the best indicator may be the 2725 IPR he put up in the 1992 match with Spassky. Not bad for being rusty!

                        Comment


                        • #42
                          Re: Reasons to believe there's FIDE rating "inflation"?

                          Originally posted by Egidijus Zeromskis View Post
                          Code:
                          http://chesstalk.info/forum/picture.php?albumid=36&pictureid=464
                          I didn't see this code until I hit "quote" to respond.
                          It looks like you tried to attach an image to your post, but it doesn't appear on my browser (Chrome), and when I cut-and-paste the URL I get nothing.
                          Is there another way to link to it?

                          edit: never mind, I found it by navigating through your pictures album, here:

                          http://www.chesstalk.info/forum/albu...&pictureid=464


                          Originally posted by Egidijus Zeromskis View Post
                          An interesting observation from the Gaussian fit: it asks for more >2800 players LOL
                          Isn't that what we'd expect if there is deflation?
                          Last edited by John Upper; Saturday, 9th February, 2013, 05:58 PM. Reason: found image

                          Comment


                          • #43
                            Re: Reasons to believe there's FIDE rating "inflation"?

                            Isn't that what we'd expect if there is deflation?

                            I don't know why the picture does not show up. During preview it was ok.


                            I think that the real distribution is NOT Gaussian. It has a faster fall speed (less players with higher ratings)
                            I did similar pictures several years ago though i can not find them.


                            Last edited by Egidijus Zeromskis; Saturday, 9th February, 2013, 08:58 PM.

                            Comment


                            • #44
                              Re: Reasons to believe there's FIDE rating "inflation"?

                              Originally posted by Egidijus Zeromskis View Post
                              I think that the real distribution is NOT Gaussian. It has a faster fall speed (less players with higher ratings)
                              could that be because of different K value for higher-rated players, making each rating point harder to earn?

                              Comment


                              • #45
                                Re: Reasons to believe there's FIDE rating "inflation"?

                                Originally posted by John Upper View Post
                                could that be because of different K value for higher-rated players, making each rating point harder to earn?
                                It starts at 2400, as I remember. However, it takes much more efforts to climb up.

                                Comment

                                Working...
                                X