Hacking Chess: Data Munging

This is a supplement to the Hacking Chess with the MongoDB Pipeline. This post has instructions for rolling your own data sets from chess games.

Download a collection of chess games you like. I’m using 1132 wins in less than 10 moves, but any of them should work.

These files are in a format called portable game notation (.PGN), which is a human-readable notation for chess games. For example, the first game in TEN.PGN (helloooo 80s filenames) looks like:

[Event "?"]
[Site "?"]
[Date "????.??.??"]
[Round "?"]
[White "Gedult D"]
[Black "Kohn V"]
[Result "1-0"]
[ECO "B33/09"]

1.e4 c5 2.Nf3 Nc6 3.d4 cxd4 4.Nxd4 Nf6
5.Nc3 e5 6.Ndb5 d6 7.Nd5 Nxd5 8.exd5 Ne7
9.c4 a6 10.Qa4  1-0

This represents a 10-turn win at an unknown event. The “ECO” field shows which opening was used (a Sicilian in the game above).

Unfortunately for us, MongoDB doesn’t import PGNs in their native format, so we’ll need to convert them to JSON. I found a PGN->JSON converter in PHP that did the job here. Scroll down to the “download” section to get the .zip.

It’s one of those zips that vomits its contents into whatever directory you unzip it in, so create a new directory for it.

So far, we have:

$ mkdir chess
$ cd chess
$ ftp ftp://ftp.pitt.edu/group/student-activities/chess/PGN/Collections/ten-pg.zip ./
$ unzip ten-pg.zip
$ wget http://www.dhtmlgoodies.com/scripts/dhtml-chess/dhtml-chess.zip
$ unzip dhtml-chess.zip

Now, create a simple script, say parse.php, to run through the chess matches and output them in JSON, one per line:

for ($i=0; $igetGameDetailsAsJson($i)."n";


Run parse.php and dump the results into a file:

$ php parse.php > games.json

Now you’re ready to import games.json.

Back to the original “hacking” post

7 thoughts on “Hacking Chess: Data Munging

  1. 700 lines in PHP
    20 lines in Coffeescript:

    game = game.replace(/”/g,””)      lines = game.split(“n”)  hash = {}  for i in [0..7]  s = lines[i].replace(“[“,””).replace(“]”,””)  arr = s.split(” “)    hash[arr[0]]=arr[1]moves = lines[8].split(” “)list = []  for move,i in moves  if i%2==0    white=move.split(“.”)[1]  else    black=move                         list.push [white,black]                         white=””                     if white != “”  list.push [white]                                            hash[“moves”]=list  alert JSON.stringify(hash)


  2. game = game.replace(/”/g,””)
    lines = game.split(“n”)
    assert 10,lines.length
    hash = {}
    for i in [0..7]
      s = lines[i].replace(“[“,””).replace(“]”,””)
      arr = s.split(” “)  
    moves = lines[8].split(” “)
    list = []  
    for move,i in moves
      if i%2==0
        list.push [white,black]                     
    if white != “”
      list.push [white]                                            
    alert JSON.stringify(hash)


    1. Cool! I’m not familiar with Coffeescript, can you do file IO with it?  (And no dumping on PHP. That package wasn’t great, but it does have a lot more functionality than I’m using above.)


      1. CoffeeScript transpiles into JavaScript and has exactly the same features.
        Check out http://jashkenas.github.com/coffee-script/
        and the PGN2JSON code here http://tinkerbin.com/iY2VCcDF
        (change language to Coffeescript before running)
        I think CS and MongoDB is a perfect match!


    2. Since I can do the same thing in PHP:

      $game = str_replace('"', '', $game);
      $lines = explode("n", $game);
      $hash = array();
      	if($turn % 2 == 0) {
      		list($junk, $white) = explode(".", $move);
      	} else {
      		$list[] = array($white, $move);
      		$white = null;
      	$list[] = array($white);
      $hash["moves"] = $list;

      in the same number of lines, what do I win? 😉 Maybe the knowledge / experience to understand that every language is just a tool in a developer’s toolbox and that being arrogant about one’s preferred choice of language is rather juvenile… 🙂


  3. Hi,
    I need to convert about 80 pgn files to json. If you are still on this project, can you tell me how to change the parser.php so that it converts all the pgn files in a folder to .json files of the same names? Also how to run this parser.php?
    It is probably a simple “for” or “while” loop but I am not a programmer, but understand php only as a part of wordpress coding only.



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: