CFC Ratings Auditor

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by Tony Li View Post

    I am not in favor of a lifetime rating floor, but a reduced k-factor for seniors seems reasonable.

    (and earlier:)

    3. Introduce a 12-month, 100-point rating floor when OTB reopens. Since a 100 point decline in playing strength is unlikely over a 1 year period, this should provide stable players protection against extremely underrated players when OTB returns. This will be a temporary measure to fight rating deflation.
    I'm also not in favour of a lifetime rating floor, but making the decline more gradual. It could be a 100-point rating floor per year, but at some point the senior will be overrated. The senior likely doesn't care about winning C or D-class prizes when they used to be A-class or Expert.They don't play chess to win money, and often willingly donate money to chess causes.

    One doesn't know the exact age at which one's strength will diminish. Some like Spraggett stay strong in their 50s. But it is a shock when one drops to a lower class. Perhaps the floors should be the next class limit 1800, 1600, 1400, or 1200. Juniors can bounce up and down, but seniors ratings seem to never recover from a disastrous tournament, only gaining a few points for a hard-fought good result. Juniors today are harder to beat due to coaching, having no glaring bad habits. Seniors returning to the game of their youth still have their bad habits and get shell shocked. In two rounds a day, the second round a senior can be brain dead (it's happened to me).
    Last edited by Erik Malmsten; Thursday, 13th August, 2020, 09:45 AM.

    Comment


    • #32
      Originally posted by Egidijus Zeromskis View Post

      Just curious why "60"?
      semi random choice. I'm just over 60 myself and rated below what would probably be my floor rating under most definitions and know others in the same position. Folklore has it strength declines starting in early 40's and I can't argue with that. If you thought some other number made the point, I wouldn't argue.

      Comment


      • #33
        Originally posted by Tony Li View Post

        Hi Roger,



        Would you be in favor of exploring the Glicko system?
        I should have added to my previous post : Glicko sets a lower K factor for more active players (and often juniors are), not to mention your plan of 6 rounds/day) so your juniors playing 6 rounds a day will wind up with a smaller K factor - not what you seem to be promoting.

        Comment


        • #34
          Interestingly, I have already received interested emails from two European countries about our election!

          Comment


          • #35
            Originally posted by Fred McKim View Post
            Not to mention the "whacky" bonus points for winning a tournament !
            I'm not sure that those bonuses were that whacky.

            Comment


            • #36
              Originally posted by Roger Patterson View Post

              semi random choice. I'm just over 60 myself and rated below what would probably be my floor rating under most definitions and know others in the same position. Folklore has it strength declines starting in early 40's and I can't argue with that. If you thought some other number made the point, I wouldn't argue.
              When you are over sixty you have to know your limitations. My last tournament was in the U.S. and just happened to be near a Whole Foods which featured a hot buffet. The food wasn't cheaper than fast food but it was healthier and thus influenced my performance: 4/5 in games played with a loss against a FIDE master and a win against another FIDE master. I played juniors who managed to beat other higher rated players but I managed to win those games and scored 2/3 against higher rated players. During such tournaments I don't take my blood pressure or cholesterol meds as taking them can lead to brain fog and blunders.

              Comment


              • #37
                Originally posted by Aris Marghetis View Post

                I'm curious for your opinion John (we seem to be in similar life stages): would a rating floor be appealing to you? I personally believe that it would be good for retaining longer-term CFC members.

                I understand that would induce a mathematical effect, but I'm hopeful for fair countervailing measures?!
                Hi Aris
                What do I need a rating floor. Just don't take away all my rating points when I play an underrated player. Hey I don't mind losing but 90% of the underrated players are made because of the current rating system. When I started in 1973 we could not move up and play in a section that was higher than your current rating. If you wanted to move into the next section you either won your section or you won enough games to move up next time.What was wrong with that system? Nothing in my in opinion. Then you had some guy on the CFC that decided to twik with that system and then the CFC let juniors play active chess get a established rating and get a low regular rating. No I'll just complain and live with the messed system . K factors don't work rating floors don't work so why do you want to use them you use them to decimate ratings and have players leave chess. That is all those things do.

                Comment


                • #38
                  Originally posted by Roger Patterson View Post

                  3b) It is good that you follow the rules but my question was would you enforce them.

                  3a) Although you follow the rules, perhaps you should talk to Paul about the extent to which some junior organizers will game the system. Your proposal is 6 games / day, 12 games for a regular weekend, 18 games for a long weekend. It has been the judgement of the CFC and/or VMs (so far) that this is not classical slow chess.

                  4) I have several issues with your argument. Yes it is true that the non linear function could be used. But would it result in any greater accuracy?

                  - Second, your calculation assumes that the non-linear function is a better fit to reality than the linear function. This is not the case. Although the underlying theory of the rating system gives an expected result given by that function, actual results differ from that curve substantially. See this curve: http://www.victoriachess.com/cfc/stats/expected.JPG This feature is not isolated to the CFC rating system - FIDE exhibits it as well. Judging by that curve, this hypothetical player will score an average of about 5 points, not 2.3 (or a performance rating of 200 points lower). So claiming that the nonlinear curve produces more accurate results in some alternate reality where expected results match that curve is simply not relevant to the real world. (and yes, these actual expected results differing from theory means that you do get a higher rating if on average you play higher rated opposition than if you play lower rated opposition on average).

                  - Third - you are framing the point estimates as being right and accurate. In fact it is subject to statistical noise - even if the theoretical expected result matched actual results your hypothetical player will score some where between 0 and 5 points or with your preferred 700 point rule some where between 200 and 700 points below the rating of his competitors. That's quite a big range.

                  5) I have no particular experience with Glicko but have a couple of issues. As per my answer to you in (4) the rating system does not function the way theory says it should. So I question whether the theoretical improvement of a Glicko system would be realized in practice.

                  ps : I should add, the results of the curve I linked to show why it is important to analyze carefully and not rely on a theoretical understanding of the rating system .

                  3. My proposal is to allow the lowest group in a multi-tournament to have double rounds. Normally we have 5 rounds in a weekend or 6 weekends in a long weekend, so there wouldn't be more than 10/12 games over 2/3 days, respectively. It might feel like a lot of games for you in the masters section, but fatigue shouldn't be a factor. Players in those sections just play blitz in between rounds...

                  The enforcement would come through verification that there are other sections in the tournament. Not sure how many of these "rogue" junior organizers there are but can we not address concerns on a case-by-case basis? Too many organizers is not yet a problem in Canadian chess.

                  4. The graph is yelling at us! The lack of a crossover suggests that there is a bias - not surprising given the current bonus system increases rating spread. Would it be easy for you to plot the same chart where the expected rating spread is 600 (or whenever the actuals cross 0.09)?

                  5. The Australian case had the maximum rd set too high! I am thinking of the version which reflects activity and streak.

                  Comment


                  • #39
                    Originally posted by Roger Patterson View Post

                    I should have added to my previous post : Glicko sets a lower K factor for more active players (and often juniors are), not to mention your plan of 6 rounds/day) so your juniors playing 6 rounds a day will wind up with a smaller K factor - not what you seem to be promoting.
                    I am in favor of creating an anchor for our rating system - it's the best organic way to minimize drift.

                    Improvement tends to happen in between tournaments, not during. Currently, the exposure to bonuses is higher for more active players - why would very active players be underrated?

                    Also, my 6 rounds a day is not about catching up rating- half the K factor would be reasonable. It would be implemented in the same way as blitz tournaments are rated within the quick system.

                    Comment


                    • #40
                      Originally posted by John Brown View Post

                      Hi Aris
                      What do I need a rating floor. Just don't take away all my rating points when I play an underrated player. Hey I don't mind losing but 90% of the underrated players are made because of the current rating system. When I started in 1973 we could not move up and play in a section that was higher than your current rating. If you wanted to move into the next section you either won your section or you won enough games to move up next time.What was wrong with that system? Nothing in my in opinion. Then you had some guy on the CFC that decided to twik with that system and then the CFC let juniors play active chess get a established rating and get a low regular rating. No I'll just complain and live with the messed system . K factors don't work rating floors don't work so why do you want to use them you use them to decimate ratings and have players leave chess. That is all those things do.
                      John, are you basically suggesting no playing up? We are very close to that in most tournaments - some allow 100 points playing up.

                      To your point, there are some anomalous results when players with different ratings play. I think a key role for the Ratings Auditor could be to encourage organizers to host successive tournaments with different section cut-offs. I remember when tournaments in Ontario were all Open, U2000, U1600 and U1200, a lot of players would be stuck at the ends of the sections. It helps to have the same organizer mix it up so all of a sudden people in the middle of sections are playing at the top of bottom.

                      Comment


                      • #41
                        I think a mix of sectioned tournament and open tournaments where everyone plays in one section perhaps with the addition of accelerated pairings for very large tournaments is probably the best way to go. Michigan has a bottom half tournament when the rating bands are 1900+, 1700+, 1500+. I find personally that if I play in a tournament with everyone rated over 1900, or over 2000 I will usually maintain my rating or gain points. If I play in a tournament where I am playing a lot of 1500 players, I will lose points and the games will be of poor quality. Usually there is one game per 5 or 6 where I just play poorly whether that is against a 1700, 1900 player or a 2300 player. I also have one or two games were I play well against my opponent regardless of rating. In most cases, I will not travel to a tournament where the prospect will be that I will be playing a lot of low rated players.The prizes are not as big of a motivator as the opportunity to play strong players. This makes sense as the cost of travel to a tournament requiring a hotel stay is usually on the order of $500 to $600 all in. I usually avoid tournaments where the costs are more than $150 per night. Travelling to Michigan and commuting usually has a cost of about $150 all in when you factor in tolls. I am less likely to avoid a Michigan event where I am likely to play low rated players in part because I usually consider the cost of a tournament on the basis of how much it costs and how many strong players I am likely to play. They usually have 6 or 7 round events which means that I will play 4 or 5 strong players who are close to or above my strength. I usually factor in gas and transportation and perceived wear and tear on myself as well. Some tournaments don't provide enough time to get meals and coffee and sources of food are not available. I won't attend those events unless I can bring my own food (which is not always possible in Michigan unless you want delays at the border).

                        I usually shop for things like milk, eggs and vitamins which can often greatly reduce the cost in Michigan especially when they sell milk for 88 cents per gallon jug and a dozen eggs for the same price. Of course I also fill up on gas which in normal times usually involves a savings of 20 to 40 percent in Michigan. Gas prices in Windsor were briefly Michigan like during the depths of the pandemic but when a tank of gas lasts two months or more due to sheltering in place the price of gas is largely irrelevant. My last fill up was June 10th at Costco and I am probably just below half a tank at the moment. I will top up next week when I go to Costco as I am getting close to running out of the slow acting insulin though I could probably drag it out a bit to the following week.

                        Tony: I disagree that you don't improve during a tournament. I think you improve whenever you are tested by strong opposition. I also find that I experience improvement whenever I systematically spend time studying chess. Many of the problems of older players is probably because either they do not spend time on studying chess or because they are not using a coach they study ineffectively. I often steal the repertoire's of my students at least online to help expand my own repertoire.
                        Last edited by Vlad Drkulec; Friday, 14th August, 2020, 04:29 AM.

                        Comment


                        • #42
                          Originally posted by Tony Li View Post


                          4. The graph is yelling at us! The lack of a crossover suggests that there is a bias - not surprising given the current bonus system increases rating spread. Would it be easy for you to plot the same chart where the expected rating spread is 600 (or whenever the actuals cross 0.09)?

                          .
                          1) FIDE has the same phenomena (that expected and actual result curves don't match), FIDE does not have bonus points, so this is not a function of having bonus points.

                          2) "crossover" ???? why would you expect crossover? You would expect the two curves to be identical.

                          3) "Would it be easy for you to plot the same chart where the expected rating spread is 600 (or whenever the actuals cross 0.09)" ?? It's on the graph. The bottom axis is rating difference. (although I don't understand your insertion of the word "expected" in this sentence.)

                          4) "given the current bonus system increases rating spread" ?? why would you say that?

                          Comment


                          • #43
                            Originally posted by Roger Patterson View Post

                            1) FIDE has the same phenomena (that expected and actual result curves don't match), FIDE does not have bonus points, so this is not a function of having bonus points.

                            2) "crossover" ???? why would you expect crossover? You would expect the two curves to be identical.

                            3) "Would it be easy for you to plot the same chart where the expected rating spread is 600 (or whenever the actuals cross 0.09)" ?? It's on the graph. The bottom axis is rating difference. (although I don't understand your insertion of the word "expected" in this sentence.)

                            4) "given the current bonus system increases rating spread" ?? why would you say that?

                            1. The FIDE phenomenon is completely different. The fit is not perfect, but there is crossover. In the Chessbase article below, you can see there are rating differences where the favorite is expected to gain rating points, and other differences where the underdog is expected to gain rating. In the CFC chart, the favorite is expected to lose rating points at every rating difference.

                            https://en.chessbase.com/post/what-s...the-elo-system

                            http://www.victoriachess.com/cfc/stats/expected.JPG <- Paul, this shows the urgency to act!

                            2. We can never get a perfect fit, but crossovers suggest some balance on average. The lack of a crossover suggests a serious bias in the rating system.

                            3. The plot will fit well if you recalculate the "Expected Theory" using a spread of 550 or 600, instead of the 400 in the rating formula. It is surprising that this has not been attempted!

                            4. Bonuses are supposed to track improvement. The current bonus is equally generous to higher rated players as lower rated. However, higher rated players both improve more slowly and are more active. Fitting the Expected Theory line using 550/600 should clarify that.


                            P.S. interesting that there was a match between two players rated 1500 points apart where the underdog won 1/4.

                            Comment


                            • #44
                              Originally posted by Tony Li View Post


                              1. The FIDE phenomenon is completely different. The fit is not perfect, but there is crossover. In the Chessbase article below, you can see there are rating differences where the favorite is expected to gain rating points, and other differences where the underdog is expected to gain rating. In the CFC chart, the favorite is expected to lose rating points at every rating difference.

                              https://en.chessbase.com/post/what-s...the-elo-system

                              http://www.victoriachess.com/cfc/stats/expected.JPG &lt;- Paul, this shows the urgency to act!

                              2. We can never get a perfect fit, but crossovers suggest some balance on average. The lack of a crossover suggests a serious bias in the rating system.

                              3. The plot will fit well if you recalculate the "Expected Theory" using a spread of 550 or 600, instead of the 400 in the rating formula. It is surprising that this has not been attempted!

                              4. Bonuses are supposed to track improvement. The current bonus is equally generous to higher rated players as lower rated. However, higher rated players both improve more slowly and are more active. Fitting the Expected Theory line using 550/600 should clarify that.


                              P.S. interesting that there was a match between two players rated 1500 points apart where the underdog won 1/4.
                              1) the crossover you refer to in the link is a crossover in the rating points awarded - not in the actual versus theoretical curve. It comes about because FIDE has put a 400 point limit on the rating difference in the application of the expected score formula - as clearly indicated in the link you gave. (indeed, back in the day, the CFC when it used a fully linear calculation capped the rating difference at I think 350 points (but maybe it was 400). The article clearly explains this as being due to this artificial rule rather than as a result of the lower rated player performing worse or better than the theoretical expected score. The FIDE system is showing the same type of result of a mismatch between theoretical expected and actual results with no crossover.

                              2) yes, if you curve fit the actual result scale, it fits to the same formula 1/(1+10^(R1-R2)/400) ) (going from memory here) with with 400 changed to 600. I noticed that years ago when I drew up the curve. It's interesting that the it's such a nice ratio but it does not mean that if you adjusted everyone's rating by that ratio you would get sensible results. (consider a 2600 player and 1400 player - a difference of 1200 points. Scaling to be a difference of 800 points (the ratio you want to use) would mean either a) keeping the 1400 at 1400 but changing 2600 to 2200 b) change the 1400 to 1800 but keep the 2600 at 2600; or something in between . None of these seem particularly reasonable.

                              If what you want is just to use the actual results curve to calculate expected score, I suspect you will not have a stable rating system. The whole edifice of the rating system is based on a theory that results in the theoretical curve. If you use some other curve, you will get something different. But of course you could check that - go back 10 years ago, re-rate all the events using your new curve and see what happens. My bet is that after 10 years of using the new curve you will find a) a huge shift to a much different set of ratings and b) the "new" curve no longer fits the actual results vs rating difference.

                              I'll point out that this curve was calculated for data before the current bonus system was in place.

                              3) yes the bonus is equally generous to high and low rated players who perform well. But high rated players who improve slowly don't get bonuses. Bonuses are awarded to players who score well beyond expectations, presumably because they improved rapidly. (plus a few who get it due to statistical noise). So I don't see a problem.

                              4) yes, I looked into it at the time. well, first there are poor statistics that far down the curve (not too many games with a rating difference of 1500). It was something like a 300 player beating an 1800 player.
                              Last edited by Roger Patterson; Friday, 14th August, 2020, 08:27 PM. Reason: typo on some of the numbers.

                              Comment


                              • #45
                                Originally posted by Roger Patterson View Post



                                If what you want is just to use the actual results curve to calculate expected score, I suspect you will not have a stable rating system. The whole edifice of the rating system is based on a theory that results in the theoretical curve. If you use some other curve, you will get something different. But of course you could check that - go back 10 years ago, re-rate all the events using your new curve and see what happens. My bet is that after 10 years of using the new curve you will find a) a huge shift to a much different set of ratings and b) the "new" curve no longer fits the actual results vs rating difference.

                                r.
                                I'll just add a quote from the Sonas article you linked too:

                                "It cannot be solved by simply adjusting the Elo Expectancy Table; if we did that, then the ratings would respond accordingly and we would be in the same boat again. It is much trickier to solve than that."

                                which was my view when I generated the graph for CFC data.

                                Comment

                                Working...
                                X