Chess game data mining: exploring the advantage of the bishop pair with pgn-extract


In chess, there are many structural and material features that it is widely accepted can affect the outcome of a game and are therefore either desirable to achieve or best avoided. The most basic is that having more material than your opponent usually conveys a significant advantage, but there are plenty of others that apply when material is balanced. For instance: an isolated pawn often represents a weakness because it has to be defended by a piece, while having your rook on the seventh rank is likely to be hard for your opponent to defend against.

Despite the fact that the experience of chess players bears out these widely held wisdoms, it does no harm to try to find empirical evidence to support them. So in this post I am going to explore one of the ways in which one might mine a database of chess games to evidence of the impact of a particular feature in practice. In the process I will also touch on the care that must be taken both in preparing the data and in interpreting the results.

Previous studies

It is widely acknowledged that a player with two bishops has an advantage over an opponent with either two knights or a bishop and knight and previous studies have sought to quantify this by examining the results of games. The earliest study I am aware of was by GM Gennady Timoshschenko[1] in which he examined the relative strengths of different bishop and knight combinations. I haven’t been able to locate the original text of this study but some details are given by Larry Kaufman in Beware the Bishop Pair[2]. He reports that Timoshchenko looked at 150,000 games and found in one his his results that two bishops won by a margin of 70% to 30% over two knights. Kaufman also reports that when comparing a single bishop against a single knight, the number of pawns on the board affected the balance, with larger numbers of pawns hindering a bishop’s mobility over that of a knight. Unfortunately, Kaufman provides no details of either the nature of the games studied (e.g., player rating levels), the way in which the material balance was identified, or the number of games identified with a particular piece combination.

In 1995, Mark Sturman[3] extended the original study using a database of over 350,000 games and noted that he only presented results where there were at least 100 games available for his data points. He found that the greatest advantage was the case of BB vs NN but only had three data points: for 4, 5 or 6 pawns. From his diagram it appears that the winning percentages for the bishop pair were between 66% and 70%. However, it isn’t clear what role draws played in calculating the percentages.

In 1999, Kaufman conducted a further study[4] and provided useful details of his method: a database of about 925,000 games was culled to around 300,000 selecting only those having a FIDE rating of at least 2300. He required the material balance of interest to persist for at least 6 ply and a sample size of at least 200 for each data point. Kaufman was interested in interpreting advantage in material terms rather than win percentages. Overall he concluded that the bishop pair was worth half a pawn on average but also refined this by saying that, “the bishop pair is worth less than half a pawn when most or all the pawns are on the board, and more than half a pawn when half or more of the pawns are gone.”

Preparing the data

For this exploration I am not going to try to reproduce all of the results of previous studies but simply the case of two bishops versus two knights, with the remaining material completely balanced. Nevertheless, the principles could easily be applied to any other material balances of interest, such as B vs N, BBN vs BNN, Q vs RR, etc.

For the data processing I am going to be using pgn-extract[5] – a program for processing games in PGN notation. I first started writing this program just for my personal use back in 1994 but later released it as free, open-source software, and I still continue to maintain and extend its functionality in response to user requests.

For the source of the games I used the free PGN source KingBase (as of Jan 2018, without any of the 2018 updates). These are all games played since 1990 by players with a rating of at least 2000 – not quite as strong as Kaufman’s data set[4] but still of a reasonable strength.

The first stage of the process was to clean up the data. While the dataset contains over 2 million games, over 100,000 of these turn out to be duplicates. I used the -D (delete duplicates) option of pgn-extract to remove them, leaving around 1.95 million. Of these, just over 100 had Result tags that conflicted with the result recorded at the end of the moves. This is not an uncommon feature of freely available data. Typically the examples here were games where one side had won with checkmate but the Result tag recorded either a draw or a loss for the winning player! These were corrected using the --fixresulttags option of pgn-extract.

The next stage was to isolate those games in which one player had a bishop pair and the other a knight pair, with all other material being equal. Kaufman[4] required a material stability of six ply and this is an important consideration. A game in which BB vs NN only lasts fleetingly is unlikely to have a significant impact on a game’s outcome. For comparison of the difference this makes to data set size, with a stability of 2-ply, our 1.95 million games were reduced to 39,000 whereas a stability of 4-ply reduced it to 31,000 and 6-ply to 25,000. With such large differences in the size of data set to be analysed, there is clearly potential for significant differences in the results.

The material match (-z) option of pgn-extract allows a particular material combination to be specified along with a stability length in ply. For instance:

6 b2n0q*r*p* b0n2q=r=p=

specifies that, for a stability of 6-ply, one side must have 2 bishops, 0 knights and any number of queens, rooks and pawns, while the opponent must have 0 bishops, 2 knights and exactly the same number of queens, rooks and pawns as the other side. With the -z option this pattern will be applied equally to both White and Black, matching all BB vs NN games regardless of player colour. A second stage was then applied to those mixed games to separate into colour-specific games using the -y option, which uses the same material pattern syntax but applies the first pattern to White and the second pattern to Black. The pattern to isolate games where Black has the two bishops is:

6 b0n2q*r*p* b2n0q=r=p=

The separated games were extracted to two files bbnn.pgn (12744 games) and nnbb.pgn (12369 games). Each file was then further separated into files of White BB win, White BB loss, Black BB win, etc. by using the -Tr option of pgn-extract to select only those games having a particular result. For instance:

pgn-extract -Tr1-0 bbnn.pgn --output white-bb-win.pgn 
pgn-extract -Tr0-1 nnbb.pgn --output black-bb-win.pgn 
pgn-extract -Tr0-1 bbnn.pgn --output white-bb-loss.pgn 

These win/loss/draw files served as the basis for the analysis of the basic results covered in the next section.

Basic results

The overall percentages from the 25,000 games with 6-ply stability for win:draw:loss were: 0.45:0.28:0.27. Clearly, this confirms that, on average, having the two bishops provides a significant advantage over having the two knights, all other material being equal.

Previous studies also looked at the effect of pawn numbers on the outcomes. The games can be further sub divided by specifying an explicit number of pawns for each side; for instance:

6 b0n2q*r*p8 b2n0q=r=p=

matches only those games where Black has the bishop pair while there are still 8 pawns on the board for both sides. The bbnn.pgn and nnbb.pgn files were analysed for 0 to 8 pawns and the win/draw/loss percentages calculated for each.

The following table combines results for both White and Black and shows the breakdown of percentages for the side with the bishop pair when the number of pawns is taken into account. It only show results for pawn numbers with at least 100 games.

Percentage for BB vs NN
pawns win draw loss # games
8 0.416 0.311 0.274 2131
7 0.421 0.272 0.308 10061
6 0.461 0.280 0.259 10681
5 0.508 0.268 0.225 5178
4 0.522 0.303 0.175 1833
3 0.510 0.355 0.135 602
2 0.455 0.497 0.049 143

In our data set, there is relatively little difference in the win percentage for 3, 4 and 5 pawns but the loss percentage decreases continuously from 7 pawns to 2. The lower loss percentage for 8 pawns compared to 7 is an interesting anomaly which we consider further in the next section.

Taking a deeper look at the results

Aside from examining a much wider range of material combinations in a similar fashion, the previous studies referenced here didn’t really go beyond this level of basic analysis. However, a little care is needed in taking the percentages in the table above at face value. For instance, notice that the number of games recorded in the table is actually 30,629, which is more than the 25,113 games that were isolated from the original data set. The reason is, of course, that a bishop pair arising when there are N pawns on the board is quite likely to remain when there are fewer pawns on the board, and this persistence is part of the long term influence of the material balance. This duplication for different numbers of pawns accounts for the additional 5,000 count.

What isn’t so obvious is that while a bishop pair arising when there are six pawns each on the board and persisting until there are three pawns each will not necessarily contribute to the statistics when there are five and four pawns each on the board. The reason for this is the stability constraint on material matches. If material stability does not persist for the full 6 ply during the five- and four-pawn stages of the game then the game will not be classified as BB vs NN for those numbers of pawns, despite the fact that it should be for the purposes of statistical analysis.

Another case to consider is where the bishop pair identified when there are N pawns each is surrendered without further pawns being exchanged. For all values of N from 2 to 7, around 20% of the games fall into this category. Should these games be considered as influencing the win rate of BB vs NN? The influence will almost certainly depend on the length of the retention and the stage of the game. Interestingly, for the case of 8 pawns, the loss is only 10% of the games, suggesting that bishop pairs obtained early tend to be retained for longer and exercise a greater influence on the game’s outcome. It is possible that this (at least partly) explains the anomalous loss rate noted above when there is a bishop pair with 8 pawns. However, I have not confirmed this speculation.


This post has highlighted some of the ways in which pgn-extract[5] might be used to mine data from large chess databases to quantify the influence of particular material combinations. While I have only focussed on the single case of a bishop pair against a knight pair, the approach is widely applicable to other cases. I have also tried to highlight some of the care that must be taken in data preparation and in the interpretation of the results.


In summer 2018 I supervised an MSc dissertation by Joshua Cheah who used pgn-extract and his own program to explore a much broader range of material balances and positional characteristics, such as outposts. While the particular analysis presented here is my own, working with Joshua was the motivation to put together this post and he tracked down the references to the previous studies.


  1. Timoshschenko, Gennady, ICCA Journal, Dec 1993.
  2. Kaufman, Larry, The Relative Value of the Pieces, Computer Chess Reports, 4:2, pp 33-34, 1994. Online:
  3. Sturman, Mark, Beware the Bishop Pair, Computer Chess Reports, 5:2, pp 58-59, 1995. Online:
  4. Kaufman, Larry, The Evaluation of Material Imbalances, Chess Life, 1999. Online:
  5. Barnes, David J., pgn-extract: Portable Game Notation (PGN) Manipulator for Chess Games, 1994-2018. Online:

Programming++: building on what has gone before

When learning to program, the earliest struggles are often with the mechanics of simply generating valid code. This can be deeply frustrating to novices because programming language translators (compilers) are unwaveringly rigid in what they are prepared to accept and will reject everything that does not conform. But once those syntactic basics have been acquired, the door is open to creating new and exciting programs that no one has ever written before.

One of the ways in which programmers create new programs is to pay attention to what others have written in the past. By building on what others have already written, they can often save themselves a lot of time and, thereby, focus their creative energies on those parts that are novel in their own particular programs. As a teacher, I encourage this approach early on when I advise my students to make use of library classes for storing collections of data, for instance. While it is certainly important to understand how those library classes work, that doesn’t mean you have to write the code from scratch every time you need to use a collection. The key is to take what is available and either write your code around it or adapt it to fit your needs.

I applied this principle recently when I was asked to add some new functionality to a program that I have been maintaining on-and-off for the past 20 years. It is an open-source project written in C called pgn-extract. It allows chess players to search files of chess games for matches on all sorts of different criteria, such as particular players, openings, endings, etc. One of the program’s users (JS) asked me if it would be possible make it look for particular board positions arising during a game. For instance, it might be interesting to look for examples of how a tricky endgame was played out by expert players where the pieces were in a particular configuration. The complicated part was that JS didn’t want users to have to specify the exact position of every piece on the board. Rather, there would be a few pieces they were interested in but the rest didn’t matter.

To my knowledge there wasn’t anything like this already in existence but the task reminded me of the familiar Computer Science topic of pattern matching. This is where you want to look for something that matches a fuzzy pattern rather than something exact, often using regular expression notation. For instance, using the pattern “[cC]at*" to match words starting with “cat” or “Cat”, such as “cattle” and “Catcher”. The notation is well-known in computing through the Unix command grep. While this notation is generally used for matching ordinary text, I thought it might be possible to adapt the idea to match chess boards – particularly if a board could be represented in a text-like form.

My starting point was a nice description of the implementation of grep written by Rob Pike (documented by Brian Kernighan). Representing a board textually is actually fairly easy and chess players do it all the time. For instance P6r means: “white pawn, six empty squares and then a black rook”. I adapted the familiar grep notation to better fit a chess context to be able to distinguish between black and white pieces, and then added it to my program – acknowledging the dependence on Rob Pike, of course!

Happily the original requester, JS, was pleased with the result but then he went on to demonstrate a nice serendipitous use of the new feature that I would never have thought of. He sent me a picture of former world champion Mikhail Tal staring hard at an unseen opponent. The picture is fairly well known, but who was the opponent on the receiving end of Tal’s menacing stare? JS encoded the part of the board visible in the picture using the new notation (*/*/*/*/????b??q/*/????N??P/R??Q1BR1) and ran the program over all the games that Mikhail Tal ever played! The result was the game against Nikola Padevsky, played in Leipzig in 1960.

Programming++ is not just about the mechanics of writing correct code, it is about taking and adapting existing ideas to create something new, and sometimes something fun!

Depending on Technology

Last weekend I did something out of the ordinary and attended the FIDE Candidates Chess Tournament in London, where eight participants are currently competing against each other for the right to take on the current World Chess Champion for the title. For just a few years longer than computing has, chess has been part of my life since my teenage years, although I gave up serious competitive chess about 15 years ago when writing my first programming text book. I had been invited to go along by my friend and colleague Julio, and while I am fairly out of touch with the contemporary chess scene, he was easily able to point out the two Grandmasters sitting in corner of Starbucks, or the ‘second’ of one of the tournament favourites as we passed him on the street on our way home in the evening.

Though watching others play chess has famously been compared to being as entertaining as watching paint dry, I have to confess that I found the day enormously enjoyable, despite spending the best part of six hours in a darkened room with practically no conversation! But the day was thought-provoking from the perspective of a Computer Scientist, too, because I was reminded once again of just how dependent we are computer-based technology in practically every area of our lives.

In my youthful, confident years of playing chess, I used to be amused at the idea of taking on a chess-playing computer and would have considered buying one as a waste of money. The technology was so crude and the power so limited that it was easy for an average club player to obtain a better position from the opening and go on to win by either a well-timed attack beyond the program’s limited vision, or better strategic play. However, as everyone now knows since Deep Blue beat Gary Kasparov, the tables have been turned and it is the programs that would laugh if they could! Any chess player who seriously aspires to be any good would be a fool not to use computer analysis and competition in their study to get better.

But therein lies a problem.In a scenario not dissimilar from having covert Google access while playing “Who Wants to be a Millionaire?”, the chess world is now struggling to deal with the fact that players may be tempted not to leave that computer support behind in the training room but to have it with them in a tournament game and depend on it in a game. Such is the concern that someone in the audience might have access to computer analysis and be able to communicate it to the players, that we spectators at the Candidates were not permitted to carry any electronic devices into the tournament hall and had to pass through an airport security gate and be body-searched with hand-held wands. This is the chess-world’s equivalent of doping control; interestingly, not of the participants but of the spectators!

Dependence on technology that has become addictive; the computers are too powerful and unfettered access to them is considered harmful.

Fortunately, there was access in the playing hall to technology considered harmless. To help the spectators follow the games, a large screen displayed the current state of play on each board, with an indication of how much time each player had left. The playing boards and pieces contain electronic sensors to keep the displays up to date. In addition, one of the tournament sponsors – Samsung – had supplied Galaxy tablets that displayed the same board information and provided a video commentary.

Assuming that you are not in the paint-drying camp – from the start it was clear that things were going to get pretty exciting later on as four of the players had a huge time advantage over their opponents, who would likely struggle to complete their required forty moves in two hours.

In the most desperate situation was Vassily Ivanchuk who had about 20 seconds left to complete 15 moves! But, at this point, technological failure struck as the live boards stopped updating and my Galaxy tablet froze, too. Despite repeated reboots it wasn’t going to give me the video feedback, and I could only watch Ivanchuk’s hunched back obscuring the board.

Now I was the one dependent on technology, and when it failed I was completely scuppered. I might as well have been sitting in the dark.

Of course, this sort of situation where technology lets you down is all too familiar – your mobile phone battery is flat and you feel completely isolated. Yet neither situation is a complete disaster. Failure here is really just an inconvenience.

But what happens when dependence on technology is life critical – in a medical situation or when flying, for instance?  And as technological control pushes into more and more areas of our lives (cars, electronic wallets, etc.) the potential for mere inconvenience to become something much more serious increases dramatically.

As I sat in the technological dark, watching Ivanchuk pluck at his eyebrows trying to find his own unassisted way out, I was reminded of the responsibility of those of us who teach Computer Science of the need to inculcate in our students a strong sense of responsibility towards those whose lives might one day be dependent on the quality of the programs they write.