This is a supplement to the Hacking Chess with the MongoDB Pipeline. This post has instructions for rolling your own data sets from chess games.
Download a collection of chess games you like. I’m using 1132 wins in less than 10 moves, but any of them should work.
These files are in a format called portable game notation (.PGN), which is a human-readable notation for chess games. For example, the first game in TEN.PGN (helloooo 80s filenames) looks like:
[Event "?"] [Site "?"] [Date "????.??.??"] [Round "?"] [White "Gedult D"] [Black "Kohn V"] [Result "1-0"] [ECO "B33/09"] 1.e4 c5 2.Nf3 Nc6 3.d4 cxd4 4.Nxd4 Nf6 5.Nc3 e5 6.Ndb5 d6 7.Nd5 Nxd5 8.exd5 Ne7 9.c4 a6 10.Qa4 1-0
This represents a 10-turn win at an unknown event. The “ECO” field shows which opening was used (a Sicilian in the game above).
Unfortunately for us, MongoDB doesn’t import PGNs in their native format, so we’ll need to convert them to JSON. I found a PGN->JSON converter in PHP that did the job here. Scroll down to the “download” section to get the .zip.
It’s one of those zips that vomits its contents into whatever directory you unzip it in, so create a new directory for it.
So far, we have:
$ mkdir chess $ cd chess $ $ ftp ftp://ftp.pitt.edu/group/student-activities/chess/PGN/Collections/ten-pg.zip ./ $ unzip ten-pg.zip $ $ wget http://www.dhtmlgoodies.com/scripts/dhtml-chess/dhtml-chess.zip $ unzip dhtml-chess.zip
Now, create a simple script, say parse.php, to run through the chess matches and output them in JSON, one per line:
getNumberOfGames(); for ($i=0; $igetGameDetailsAsJson($i)."n"; } ?>
Run parse.php and dump the results into a file:
$ php parse.php > games.json
Now you’re ready to import games.json.
700 lines in PHP
20 lines in Coffeescript:
game = game.replace(/”/g,””) lines = game.split(“n”) hash = {} for i in [0..7] s = lines[i].replace(“[“,””).replace(“]”,””) arr = s.split(” “) hash[arr[0]]=arr[1]moves = lines[8].split(” “)list = [] for move,i in moves if i%2==0 white=move.split(“.”)[1] else black=move list.push [white,black] white=”” if white != “” list.push [white] hash[“moves”]=list alert JSON.stringify(hash)
LikeLike
game = game.replace(/”/g,””)
lines = game.split(“n”)
assert 10,lines.length
hash = {}
for i in [0..7]
s = lines[i].replace(“[“,””).replace(“]”,””)
arr = s.split(” “)
hash[arr[0]]=arr[1]
moves = lines[8].split(” “)
list = []
for move,i in moves
if i%2==0
white=move.split(“.”)[1]
else
black=move
list.push [white,black]
white=””
if white != “”
list.push [white]
hash[“moves”]=list
alert JSON.stringify(hash)
LikeLike
Cool! I’m not familiar with Coffeescript, can you do file IO with it? (And no dumping on PHP. That package wasn’t great, but it does have a lot more functionality than I’m using above.)
LikeLike
CoffeeScript transpiles into JavaScript and has exactly the same features.
Check out http://jashkenas.github.com/coffee-script/
and the PGN2JSON code here http://tinkerbin.com/iY2VCcDF
(change language to Coffeescript before running)
I think CS and MongoDB is a perfect match!
LikeLike
Nice 🙂
LikeLike
Since I can do the same thing in PHP:
in the same number of lines, what do I win? 😉 Maybe the knowledge / experience to understand that every language is just a tool in a developer’s toolbox and that being arrogant about one’s preferred choice of language is rather juvenile… 🙂
LikeLike
Hi,
I need to convert about 80 pgn files to json. If you are still on this project, can you tell me how to change the parser.php so that it converts all the pgn files in a folder to .json files of the same names? Also how to run this parser.php?
It is probably a simple “for” or “while” loop but I am not a programmer, but understand php only as a part of wordpress coding only.
Thanks
LikeLike