Cascla: the 39 categories

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cascla: the 39 categories

    There are 39 categories of Cascla which are possible for a conventional board starting set-up. For games with all legal moves, this set of 39 Casclas covers all possibilities.

    Also, the Cascla concept is scalable for situations such as Chess960, with different starting positions, and for variants where more or fewer pieces are utilized in the starting position (such as the so-called 'Capablanca Chess', played on a 10x* board with two extra pieces (Ministers) per side), and other variants. So long as castling is a permitted operation in the chess variant being considered, then Cascla has relevance for that setting.

    Let us list the 39 categories of Cascla (without lag included) for a conventional board starting set-up.

    K/K: Both sides castle Kingside (subset with 6 possibilities):
    K'+/K- ; K'=/K= ; K'-/K+ ; K+/K'- ; K=/K'= ; K-/K'+.

    K/Q: White castles Kingside, Black castles Queenside (subset with 6 possibilities):
    K'+/Q- ; K'=/Q= ; K'-/Q+ ; K+/Q'- ; K=/Q'= ; K-/Q'+.

    Q/K: White castles Queenside, Black castles Kingside (subset with 6 possibilities):
    Q'+/K- ; Q'=/K= ; Q'-/K+ ; Q+/K'- ; Q=/K'= ; Q-/K'+.

    Q/Q: Both sides castle Queenside (subset with 6 possibilities):
    Q'+/Q- ; Q'=/Q= ; Q'-/Q+ ; Q+/Q'- ; Q=/Q'= ; Q-/Q'+.

    K/N, N/K: One side castles Kingside, the other side does not castle (N) (subset with 6 possibilities):
    K+/N- ; K=/N= ; K-/N+ ; N+/K- ; N=/K= ; N-/K+.

    Q/N, N/Q: One side castles Queenside, the other side does not castle (N) (subset with 6 possibilities):
    Q+/N- ; Q=/N= ; Q-/N+ : N+/Q- ; N=/Q= ; N-/Q+.

    N/N: Neither side castles (subset with 3 possibilities):
    N+/N- ; N=/N= ; N-/N+.

    Adding up the seven subsets' populations gives us a grand total of 39.

  • #2
    Re: Cascla: the 39 categories

    Originally posted by Frank Dixon View Post
    There are 39 categories of Cascla which are possible for a conventional board starting set-up. For games with all legal moves, this set of 39 Casclas covers all possibilities.

    Also, the Cascla concept is scalable for situations such as Chess960, with different starting positions, and for variants where more or fewer pieces are utilized in the starting position (such as the so-called 'Capablanca Chess', played on a 10x* board with two extra pieces (Ministers) per side), and other variants. So long as castling is a permitted operation in the chess variant being considered, then Cascla has relevance for that setting.

    Let us list the 39 categories of Cascla (without lag included) for a conventional board starting set-up.

    K/K: Both sides castle Kingside (subset with 6 possibilities):
    K'+/K- ; K'=/K= ; K'-/K+ ; K+/K'- ; K=/K'= ; K-/K'+.

    K/Q: White castles Kingside, Black castles Queenside (subset with 6 possibilities):
    K'+/Q- ; K'=/Q= ; K'-/Q+ ; K+/Q'- ; K=/Q'= ; K-/Q'+.

    Q/K: White castles Queenside, Black castles Kingside (subset with 6 possibilities):
    Q'+/K- ; Q'=/K= ; Q'-/K+ ; Q+/K'- ; Q=/K'= ; Q-/K'+.

    Q/Q: Both sides castle Queenside (subset with 6 possibilities):
    Q'+/Q- ; Q'=/Q= ; Q'-/Q+ ; Q+/Q'- ; Q=/Q'= ; Q-/Q'+.

    K/N, N/K: One side castles Kingside, the other side does not castle (N) (subset with 6 possibilities):
    K+/N- ; K=/N= ; K-/N+ ; N+/K- ; N=/K= ; N-/K+.

    Q/N, N/Q: One side castles Queenside, the other side does not castle (N) (subset with 6 possibilities):
    Q+/N- ; Q=/N= ; Q-/N+ : N+/Q- ; N=/Q= ; N-/Q+.

    N/N: Neither side castles (subset with 3 possibilities):
    N+/N- ; N=/N= ; N-/N+.

    Adding up the seven subsets' populations gives us a grand total of 39.

    Hi Frank, this is an interesting idea. I'm in favor of anything that can capture more properties of recorded games.

    I'd like to make a few hopefully constructive suggestions.

    (1) Capturing the information about half-moves between castling.... this seems like overkill. Would you really expect to find some meaningful pattern existing where this value is 17 versus where it is 13, all other values being the same, for example? I think anything you capture there, even over millions of games, is just going to be statistical noise.

    (2) Even capturing the information where one side castles before the other -- whether it's Black or White. It may be harder to convince you of this, but I think this also will result in statistical noise. Is it really possible that castling first (or alternatively, NOT castling first) is some kind of meta-influence in game results? Think of all the possible chess openings, and then think of FOR EACH OPENING whether castling first or not castling first is an influence (remember, there are transpositions to most openings, so one could castle first in a specific opening OR not castle first in that same opening). I just don't think anything meaningful would be found in that capture. However, I do concede that it's one binary bit of information (White castles first = 1 (true) or 0 (false). So if you really want it, that's fine.

    The rest is good stuff. Kudos on thinking of this idea.

    My idea for shortening the syntax:

    - first character is game result (+, - or =)
    - second character is White castling (K, Q or N)
    - third character is Black castling (K, Q or N)

    You would end up with 27 possibilities:

    +KK =KK -KK
    +KQ =KQ -KQ

    +QK =QK -QK
    +QQ =QQ -QQ

    +KN =KN -KN
    +QN =QN -QN
    +NK =NK -NK
    +NQ =NQ -NQ
    +NN =NN -NN


    If you do insist on capturing which side castles first, this is one more character: W or B. And you would only have to add this character for those results that do NOT contain an 'N', i.e. the games where both sides castled.

    So you would end up with your 39 possibilities but with a shorter syntax:

    +KKW =KKW -KKW
    +KQW =KQW -KQW

    +KKB =KKB -KKB
    +KQB =KQB -KQB

    +QKW =QKW -QKW
    +QQW =QQW -QQW

    +QKB =QKB -QKB
    +QQB =QQB -QQB

    +KN =KN -KN
    +QN =QN -QN
    +NK =NK -NK
    +NQ =NQ -NQ
    +NN =NN -NN


    And again, if you REALLY want to capture number of half moves between castlings (which I hope you realize is just too much information!), you just need to add a number for the games where both sides castled. So for the cases where one side castles 17 half moves before the other, this would be:

    +KKW17 =KKW17 -KKW17
    +KQW17 =KQW17 -KQW17

    +KKB17 =KKB17 -KKB17
    +KQB17 =KQB17 -KQB17

    +QKW17 =QKW17 -QKW17
    +QQW17 =QQW17 -QQW17

    +QKB17 =QKB17 -QKB17
    +QQB17 =QQB17 -QQB17

    +KN =KN -KN
    +QN =QN -QN
    +NK =NK -NK
    +NQ =NQ -NQ
    +NN =NN -NN


    Frank, are you familiar with the Hadoop Big Data stack? Hadoop is an open-source framework for dealing with terabytes, petabytes of data. Hadoop Core includes the HDFS data storage format, along with Map/Reduce.

    Other tools in the Hadoop stack include Hive and Spark (plenty of other tools as well -- Pig, Sqoop, Flume, Oozie -- but Hive and Spark are the main ones). Hive allows for SQL-like queries, and Spark is for data analytics and machine learning. Spark can use either Python or Scala as its underlying language.

    If you are familiar with any of that, then all you need is the program (most easily written in Python) to go through pgn games and parse the games to produce the above data. You store it all in a Hadoop HDFS storage cluster (or alternatively, store it in an SQL database such as MySQL, then transfer it to a Hadoop HDFS cluster using Sqoop). Once you have it in HDFS files, you can analyze it using Spark to extract the patterns.

    This is a lot of work, and you need the HDFS cluster. The nice thing about such a cluster is that it can be commodity hardware. Combined with the open-source nature of Hadoop, it's not all that expensive.

    The expense is the amount of work needed to produce good results. You first of all need to write the parser that produces the above data from raw pgn files. That could be done in Python. But you'd have to test it extensively. You'd have to be 100% certain that no pgn file is parsed incorrectly. That by itself is a HUGE amount of work.

    Then you'd need to set up the cluster, and then transfer the data to the cluster using either Hadoop commands or Sqoop. That might only be gigabytes of data, not necessarily terabytes. Doesn't matter actually, Hadoop can handle petabytes and that is what it is for. RDBMS technology bogs down in the terabyte range, Hadoop was created to overcome that very problem.

    Bottom line is it would cost lots of money for programming. I could do it for you since I work with Hadoop, but it's certain that you or anyone else in chess can't afford me or anyone like me.

    So your idea is great, by all means go ahead with your idea (minus I hope the number of half moves between castling). Its just that for anyone to actually use it, this would take a devoted volunteer who knows Hadoop or some similar technology to really turn it into any kind of result. So you may never see any actual discovery come out of it.

    But wouldn't it be interesting.... if somewhere in all those millions of pgn games..... there were such patterns waiting to be discovered.... that's what I love about data analytics!!!!
    Only the rushing is heard...
    Onward flies the bird.

    Comment

    Working...
    X