Megabase 2010- a caution

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Megabase 2010- a caution

    NB - I have posted an update on Dec. 27 which contains additional information.

    Further to Hugh Brodie's note - I just received my copy a few days ago via a special offer for ChesBase Magazine subscribers. The purpose of this posting is to offer a word of caution about buying Mega Database 2010.

    It contains 4,463,293 games, game fragments, tournament reports, and random odds and ends.

    There appears to have been little or no data validation or cleanup done, as there are over 60,000 games without a valid result (1-0, 0-1, or 1/2 - 1/2). There are games with numbers instead of letters for the player's name. Lots of "games" with 0 moves, etc.etc.

    In the case of a games database bigger doesn't mean better. If you filter the games using Chessbase 10's "Good Games Criteria" you end up with only about 1.7 million games (about 38% of the total). By Chessbase's own defintion the vast majority of the games are in the not so good category. A database of 4.3 games is slow to work with (even on my Quad processor with 6 GB of ram).

    Since I have a lot of disk space I like to have 2 versions of the database, one is the full database that can be used to lookup the odd thing, and a more useful (to me) smaller database of just the high quality games (minimum of 10 moves, both players over 2100, complete games data, no duplicate games).

    I have been totally frustrated in trying to produce the second database with Chessbase 10. While I can identify which games I would like to flag for deletion, when I run the Pack Database command it consistently results in a corrupted database. It doesn't matter which games, or how many are flagged for deletion. To top it off Chessbase 10 can just pack up and stop working at almost any point. I have the same problem on two computers, so it is probably not a problem due to my hardware or other software.

    The bottom line is you may want to hold off on obtaining Mega Database 2010, or at least be aware of the potential headaches. I will post an update if I can get things working. I will be contacting Chessbase support after the holidays.

    Dave
    Last edited by Dave Broughton; Sunday, 27th December, 2009, 10:20 AM.

  • #2
    Re: Megabase 2010- a caution

    Originally posted by Dave Broughton View Post
    Further to Hugh Brodie's note - I just received my copy a few days ago via a special offer for ChesBase Magazine subscribers. The purpose of this posting is to offer a word of caution about buying Mega Database 2010.

    It contains 4,463,293 games, game fragments, tournament reports, and random odds and ends.
    How many of my games appear? I never buy their database.
    Gary Ruben
    CC - IA and SIM

    Comment


    • #3
      Re: Megabase 2010- a caution

      Gary Ruben wrote:

      How many of my games appear? I never buy their database.
      I don't know about Megabase, but my Canadian database has one of your games - a 53-move loss to Jack Woodbury (Manitoba Open, 1965). It was originally published in the April, 1966 issue of "Canadian Chess Chat".

      However - in my "correspondence" database, I have 142 of your games, from 1971 to 2007.

      Comment


      • #4
        Re: Megabase 2010- a caution

        Originally posted by Gary Ruben View Post
        How many of my games appear? I never buy their database.
        Hi Gary - I am afraid that you don't have any games in their database. They do market a separate correspondence that may have some of your games.

        Cheers

        Dave

        Comment


        • #5
          Re: Megabase 2010- a caution

          Originally posted by Hugh Brodie View Post
          I don't know about Megabase, but my Canadian database has one of your games - a 53-move loss to Jack Woodbury (Manitoba Open, 1965). It was originally published in the April, 1966 issue of "Canadian Chess Chat".

          However - in my "correspondence" database, I have 142 of your games, from 1971 to 2007.
          It sounds possible as I think I might have played in that 1965 Open. I came to Toronto in 1964 but returned to Manitoba for a few months in the spring of 1965. It would have been the only event I played that year, I guess.

          I seem to recall Jack was a fine chess player back then.

          If you have the game in pgn would you mind sending it to g.ruben at earthlink.net as I'm curious where my blunder came. I'm curious what my play looked like that year. I have no scoresheets from that event.

          In 1971 to 1974 I was probably playing a lot more OTB than correspondence.
          Gary Ruben
          CC - IA and SIM

          Comment


          • #6
            Re: Megabase 2010- a caution

            Originally posted by Dave Broughton View Post
            Hi Gary - I am afraid that you don't have any games in their database. They do market a separate correspondence that may have some of your games.

            Cheers

            Dave
            Thanks, Dave. I don't submit any games to them so they must get the games from the ICCF.

            Gary
            Gary Ruben
            CC - IA and SIM

            Comment


            • #7
              Re: Megabase 2010- a caution

              Originally posted by Dave Broughton View Post
              I have been totally frustrated in trying to produce the second database with Chessbase 10. While I can identify which games I would like to flag for deletion, when I run the Pack Database command it consistently results in a corrupted database. It doesn't matter which games, or how many are flagged for deletion. To top it off Chessbase 10 can just pack up and stop working at almost any point. I have the same problem on two computers, so it is probably not a problem due to my hardware or other software.
              I have had the same problems myself. The program seems incapable of doing these operations on giant databases. I am glad to know that I am not alone...

              Comment


              • #8
                Re: Megabase 2010- a caution

                I have stayed with CB9. After seeing CB10 in action, I felt there weren't enough improvements to warrant spending the money on an upgrade. CB9 does hang or shut down periodically, but I've never lost any more data than possibly a game I was entering.

                As I wrote a while back, I tried various operations with the 100-million game database provided by FICS (all FICS games played since 1998). CB9 seemed to start generating corrupted games after passing the 30 to 40 million game mark when I tried merging several databases. Subsets of these databases (one as large as 17 million games) caused no problems at all.

                I have occasionally had problems "packing" deleted games when the database had text files within it. Copying the original database to a new one without the text files cleared up the problem.

                Comment


                • #9
                  Re: Megabase 2010- a caution

                  Originally posted by Dave Broughton View Post
                  Since I have a lot of disk space I like to have 2 versions of the database, one is the full database that can be used to lookup the odd thing, and a more useful (to me) smaller database of just the high quality games (minimum of 10 moves, both players over 2100, complete games data, no duplicate games).

                  I have been totally frustrated in trying to produce the second database with Chessbase 10. While I can identify which games I would like to flag for deletion, when I run the Pack Database command it consistently results in a corrupted database. It doesn't matter which games, or how many are flagged for deletion. To top it off Chessbase 10 can just pack up and stop working at almost any point. I have the same problem on two computers, so it is probably not a problem due to my hardware or other software.Dave
                  Dave,

                  A few years ago I used ChessBase 8 to produce a 2100+ database, but I didn't follow your recipe at all. Instead I searched for the games by rating and by decade, and when the output had compiled I copied it to the clip database and from there into a new database named "2100 club." I had to do this several times to cover the entire database.

                  Deleting huge numbers of games and then trying to pack the resulting output strikes me as a formula for RAM overload. But I'm not a CB ubergeek, so I can't really say why things should work or not. But I do recommend the method I described as I have never had any trouble with it.

                  Comment


                  • #10
                    Re: Megabase 2010- a caution - some things now working ok

                    Hi - after lots of testing on 3 computers (all with Windows 7, two are 64 bit and one 32 bit), all with lots of ram (from 3 to 6 GB), I found one piece of software that seem to cause ChessBase 10 (and 9 too) serious problems. It is WD Anywhere Backup software (used to automatically backup my networked home computers, but can also be used on standalone pc's).

                    After turning off the backups of any chessbase related folders things seem to be working. I suspect that there must of been some file contention issues as Chessbase database packing would update one of the component files of the Mega database and the backup software would start backing it up but Chessbase would still need it for further processing.

                    I now have a much more manageable 1.7 million game database. When I run the Chessbase good games filter it selects a little over a million games. It has tougher selection criteria than I do (defining good games as games in which at least one player has an Elo rating above 2350 or at least one of the players has an IM or GM title. The function excludes blitz, rapid and simultaneous games. Games are excluded that are shorter than seven moves, and drawn games less than 20 moves long are also excluded).

                    My original comments about the quality of games in the Mega database 2010 are still valid - there are easily close to 100,000 "games" which are little more than electronic garbage. The database does come with a nice feature to add current games during 2010, though I don't know if it is signifigantly better than the free games service from TWIC.

                    Cheers

                    Dave Broughton

                    Comment


                    • #11
                      Re: Megabase 2010- a caution

                      I think many of the game fragments and zero-move games are in the database simply to "complete" a tournament - i.e. there may be several games missing from a round-robin event, and the zero-moves games are added so that an accurate crosstable can be generated.

                      Comment


                      • #12
                        Re: Megabase 2010- a caution

                        Hello Hugh,

                        I did no find any reference about a game database on FIDE's site; is there another site where it can be downloaded?

                        Thanks,

                        François Bertin

                        Originally posted by Hugh Brodie View Post
                        As I wrote a while back, I tried various operations with the 100-million game database provided by FICS (all FICS games played since 1998). CB9 seemed to start generating corrupted games after passing the 30 to 40 million game mark when I tried merging several databases. Subsets of these databases (one as large as 17 million games) caused no problems at all.

                        Comment


                        • #13
                          Re: Megabase 2010- a caution

                          I made no reference to the FIDE site - the large database was on the FICS (Free Internet Chess Server) site and contains millions of games played on that server - the vast majority being of poor quality or played by weak players. In addition - only the nicknames/handles of the players are given.

                          Comment


                          • #14
                            Re: Megabase 2010- a caution

                            I have also observed problems with CB10. First, I found I could no longer load TWIC updates into MegaBase; this was resolved in a fashion with MB2009 by providing a games update feature using the ChessBase banks, not as timely but adequate. Second, I have increasingly found incomplete games, i.e. scores that stopped many moves before the notation in reference books. Finally, there is a lot of crap in all large databases but MB2009 seems to be a significant step down and gets worse after integrity checks with CB10.
                            I recall a GM friend of mine telling me that he never, ever, attemps to do anything to modify or correct or append new games to MegaBase because of bad experiences.

                            Comment


                            • #15
                              Re: Megabase 2010- a caution

                              The problem with the TWIC games is that the players' first names are not provided in full - only the initial(s). This causes problems when you merge TWIC games into another database (e.g. a Megabase) which has complete first names. A "check for doubles" won't necessarily pick up the same games with names spelt differently. This happened when I was updating "Canbase" yesterday - I had a number of games attributed to "Halldor, P" which I already had under the "real" name of "Palsson, Halldor" - I had to delete them manually. (someone reported this by email - otherwise I never would have known)

                              Comment

                              Working...
                              X