The 100 million game database

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • The 100 million game database

    Yes - a 100 million game database is out there for you game junkies to download. It consists of all the games played on FICS (Free Internet Chess Server) since 1999. Of course - the quality of games will vary greatly, with many incomplete or unstarted games. I'm hoping to extract games in which both players are masters (or higher), and possibly certain openings. Even if 99.99% of the games are garbage - that still leaves 10,000 quality games.

    I successfully downloaded all the PGN files (in .bz2 format - which seems to be recognized by any RAR decompressing program - e.g. WinRAR). Allow 14 G of disk space to download the raw data; and another 80 G for the uncompressed PGN files. It's a Torrent file - there were lots of seeders, and everything downloaded overnight for me.

    I tried merging them into one CB database - but kept getting integrity errors anytime the database got greater than about 48 million games. An integrity check gave "Fatal error - CBH prolog damaged". A smaller database gave me no problems. It takes about 24 hours to convert all the PGN files to CBH files.

    Let me know if any of you have success in getting and processing these games.

    ------------------------------------------------------------------------

    Posted on the newsgroup rec.games.chess.misc:

    Dear all,

    Here is a .torrent file for all recorded FICS games from November 1999
    onwards.
    The games haven been converted into PGN format and verified.


    If there are no seeds please send me an e-mail. I attempt to keep at
    least
    one seed with sufficient bandwidth alive at all times, but it is not
    automated.


    Torrent file: http://marcelk.net/logics/FICS-games...200910.torrent


    Contents looks like this:


    FICS-games/
    fics-1999-11.pgn.bz2
    fics-1999-12.pgn.bz2
    fics-2000-01.pgn.bz2
    ... etc ...
    fics-2009-08.pgn.bz2
    fics-2009-09.pgn.bz2
    fics-2009-10.pgn.bz2


    The total number of games is over 100 million.
    The total download size is around 15GB, zipped.


    http://ficsgames.com/ has a searchable version of this data online.


    Marcel van Kervinck

  • #2
    Re: The 100 million game database

    Yure too much Hugh!
    And armwrestled vs 2400 and won !
    :D

    Comment


    • #3
      Re: The 100 million game database

      Strange things happen with CB9 when handling large files! Databases with more than about 45 million games start giving corrupt player data (i.e. a "games" listing shows the White and Black players with the same names for games past about game number 45 million). Integrity checks fail.

      I then created three databases of roughly equal size (around 33 million games each) to avoid the above problem. This time - games near the end of the file lost all their "move" data (i.e. all games at this point showed no moves when I double-clicked on the games - although the move count was non-zero in the game description). Integrity checks start up OK - but I didn't wait for any to complete.

      File 1:
      All games with a game number greater than 28,165,270 had no move data.
      File 2:
      All games with a game number greater than 28,167,356 had no move data.
      File 3:
      All games with a game number greater than 28,263,712 had no move data.

      I will try again - creating four files of roughly 25 million games each.

      Comment


      • #4
        Re: The 100 million game database

        Interesting. Sort of. i dont care if Carl bilodeau,doesn't like ur suit, either:)
        he's in tow of Hebert, sounding off (brown-noser -- mais avec les enfants)re .quebec youth championships; not bilingual, as u are Hugh. Nor do i have access as a current cfc governor, like you are re TRANSPARENCY {in an grassroots campaign bought off, just like some top players(for what--a few hundred $$?) }of bids for editorialship of the new cfc newsletter( in english only, but overseen by a francophone female junior co-ordinator). Is there anyway to parse on the basis of rating, ie. screen each quarter or third (maybe try CB8 -- who knows) ie. scratch those games that have both opponents less than 2000? Then this database may have some "integrity".

        Then send me the results after you've put in thousands of hours:)

        Cheers, Dave

        Comment

        Working...
        X