In chess, there are many structural and material features that it is widely accepted can affect the outcome of a game and are therefore either desirable to achieve or best avoided. The most basic is that having more material than your opponent usually conveys a significant advantage, but there are plenty of others that apply when material is balanced. For instance: an isolated pawn often represents a weakness because it has to be defended by a piece, while having your rook on the seventh rank is likely to be hard for your opponent to defend against.
Despite the fact that the experience of chess players bears out these widely held wisdoms, it does no harm to try to find empirical evidence to support them. So in this post I am going to explore one of the ways in which one might mine a database of chess games to evidence of the impact of a particular feature in practice. In the process I will also touch on the care that must be taken both in preparing the data and in interpreting the results.
It is widely acknowledged that a player with two bishops has an advantage over an opponent with either two knights or a bishop and knight and previous studies have sought to quantify this by examining the results of games. The earliest study I am aware of was by GM Gennady Timoshschenko in which he examined the relative strengths of different bishop and knight combinations. I haven’t been able to locate the original text of this study but some details are given by Larry Kaufman in Beware the Bishop Pair. He reports that Timoshchenko looked at 150,000 games and found in one his his results that two bishops won by a margin of 70% to 30% over two knights. Kaufman also reports that when comparing a single bishop against a single knight, the number of pawns on the board affected the balance, with larger numbers of pawns hindering a bishop’s mobility over that of a knight. Unfortunately, Kaufman provides no details of either the nature of the games studied (e.g., player rating levels), the way in which the material balance was identified, or the number of games identified with a particular piece combination.
In 1995, Mark Sturman extended the original study using a database of over 350,000 games and noted that he only presented results where there were at least 100 games available for his data points. He found that the greatest advantage was the case of BB vs NN but only had three data points: for 4, 5 or 6 pawns. From his diagram it appears that the winning percentages for the bishop pair were between 66% and 70%. However, it isn’t clear what role draws played in calculating the percentages.
In 1999, Kaufman conducted a further study and provided useful details of his method: a database of about 925,000 games was culled to around 300,000 selecting only those having a FIDE rating of at least 2300. He required the material balance of interest to persist for at least 6 ply and a sample size of at least 200 for each data point. Kaufman was interested in interpreting advantage in material terms rather than win percentages. Overall he concluded that the bishop pair was worth half a pawn on average but also refined this by saying that, “the bishop pair is worth less than half a pawn when most or all the pawns are on the board, and more than half a pawn when half or more of the pawns are gone.”
Preparing the data
For this exploration I am not going to try to reproduce all of the results of previous studies but simply the case of two bishops versus two knights, with the remaining material completely balanced. Nevertheless, the principles could easily be applied to any other material balances of interest, such as B vs N, BBN vs BNN, Q vs RR, etc.
For the data processing I am going to be using pgn-extract – a program for processing games in PGN notation. I first started writing this program just for my personal use back in 1994 but later released it as free, open-source software, and I still continue to maintain and extend its functionality in response to user requests.
For the source of the games I used the free PGN source KingBase (as of Jan 2018, without any of the 2018 updates). These are all games played since 1990 by players with a rating of at least 2000 – not quite as strong as Kaufman’s data set but still of a reasonable strength.
The first stage of the process was to clean up the data. While the dataset contains over 2 million games, over 100,000 of these turn out to be duplicates. I used the
-D (delete duplicates) option of pgn-extract to remove them, leaving around 1.95 million. Of these, just over 100 had
Result tags that conflicted with the result recorded at the end of the moves. This is not an uncommon feature of freely available data. Typically the examples here were games where one side had won with checkmate but the
Result tag recorded either a draw or a loss for the winning player! These were corrected using the
--fixresulttags option of pgn-extract.
The next stage was to isolate those games in which one player had a bishop pair and the other a knight pair, with all other material being equal. Kaufman required a material stability of six ply and this is an important consideration. A game in which BB vs NN only lasts fleetingly is unlikely to have a significant impact on a game’s outcome. For comparison of the difference this makes to data set size, with a stability of 2-ply, our 1.95 million games were reduced to 39,000 whereas a stability of 4-ply reduced it to 31,000 and 6-ply to 25,000. With such large differences in the size of data set to be analysed, there is clearly potential for significant differences in the results.
The material match (
-z) option of pgn-extract allows a particular material combination to be specified along with a stability length in ply. For instance:
:-z 6 b2n0q*r*p* b0n2q=r=p=
specifies that, for a stability of 6-ply, one side must have 2 bishops, 0 knights and any number of queens, rooks and pawns, while the opponent must have 0 bishops, 2 knights and exactly the same number of queens, rooks and pawns as the other side. With the
-z option this pattern will be applied equally to both White and Black, matching all BB vs NN games regardless of player colour. A second stage was then applied to those mixed games to separate into colour-specific games using the
-y option, which uses the same material pattern syntax but applies the first pattern to White and the second pattern to Black. The pattern to isolate games where Black has the two bishops is:
:-y 6 b0n2q*r*p* b2n0q=r=p=
The separated games were extracted to two files
bbnn.pgn (12744 games) and
nnbb.pgn (12369 games). Each file was then further separated into files of White BB win, White BB loss, Black BB win, etc. by using the
-Tr option of pgn-extract to select only those games having a particular result. For instance:
pgn-extract -Tr1-0 bbnn.pgn --output white-bb-win.pgn pgn-extract -Tr0-1 nnbb.pgn --output black-bb-win.pgn pgn-extract -Tr0-1 bbnn.pgn --output white-bb-loss.pgn etc.
These win/loss/draw files served as the basis for the analysis of the basic results covered in the next section.
The overall percentages from the 25,000 games with 6-ply stability for win:draw:loss were: 0.45:0.28:0.27. Clearly, this confirms that, on average, having the two bishops provides a significant advantage over having the two knights, all other material being equal.
Previous studies also looked at the effect of pawn numbers on the outcomes. The games can be further sub divided by specifying an explicit number of pawns for each side; for instance:
:-y 6 b0n2q*r*p8 b2n0q=r=p=
matches only those games where Black has the bishop pair while there are still 8 pawns on the board for both sides. The
nnbb.pgn files were analysed for 0 to 8 pawns and the win/draw/loss percentages calculated for each.
The following table combines results for both White and Black and shows the breakdown of percentages for the side with the bishop pair when the number of pawns is taken into account. It only show results for pawn numbers with at least 100 games.
In our data set, there is relatively little difference in the win percentage for 3, 4 and 5 pawns but the loss percentage decreases continuously from 7 pawns to 2. The lower loss percentage for 8 pawns compared to 7 is an interesting anomaly which we consider further in the next section.
Taking a deeper look at the results
Aside from examining a much wider range of material combinations in a similar fashion, the previous studies referenced here didn’t really go beyond this level of basic analysis. However, a little care is needed in taking the percentages in the table above at face value. For instance, notice that the number of games recorded in the table is actually 30,629, which is more than the 25,113 games that were isolated from the original data set. The reason is, of course, that a bishop pair arising when there are N pawns on the board is quite likely to remain when there are fewer pawns on the board, and this persistence is part of the long term influence of the material balance. This duplication for different numbers of pawns accounts for the additional 5,000 count.
What isn’t so obvious is that while a bishop pair arising when there are six pawns each on the board and persisting until there are three pawns each will not necessarily contribute to the statistics when there are five and four pawns each on the board. The reason for this is the stability constraint on material matches. If material stability does not persist for the full 6 ply during the five- and four-pawn stages of the game then the game will not be classified as BB vs NN for those numbers of pawns, despite the fact that it should be for the purposes of statistical analysis.
Another case to consider is where the bishop pair identified when there are N pawns each is surrendered without further pawns being exchanged. For all values of N from 2 to 7, around 20% of the games fall into this category. Should these games be considered as influencing the win rate of BB vs NN? The influence will almost certainly depend on the length of the retention and the stage of the game. Interestingly, for the case of 8 pawns, the loss is only 10% of the games, suggesting that bishop pairs obtained early tend to be retained for longer and exercise a greater influence on the game’s outcome. It is possible that this (at least partly) explains the anomalous loss rate noted above when there is a bishop pair with 8 pawns. However, I have not confirmed this speculation.
This post has highlighted some of the ways in which pgn-extract might be used to mine data from large chess databases to quantify the influence of particular material combinations. While I have only focussed on the single case of a bishop pair against a knight pair, the approach is widely applicable to other cases. I have also tried to highlight some of the care that must be taken in data preparation and in the interpretation of the results.
In summer 2018 I supervised an MSc dissertation by Joshua Cheah who used pgn-extract and his own program to explore a much broader range of material balances and positional characteristics, such as outposts. While the particular analysis presented here is my own, working with Joshua was the motivation to put together this post and he tracked down the references to the previous studies.
- Timoshschenko, Gennady, ICCA Journal, Dec 1993.
- Kaufman, Larry, The Relative Value of the Pieces, Computer Chess Reports, 4:2, pp 33-34, 1994. Online: http://www.chesscomputeruk.com/html/computer_chess_reports.html
- Sturman, Mark, Beware the Bishop Pair, Computer Chess Reports, 5:2, pp 58-59, 1995. Online: http://www.chesscomputeruk.com/html/computer_chess_reports.html
- Kaufman, Larry, The Evaluation of Material Imbalances, Chess Life, 1999. Online: https://www.chess.com/article/view/the-evaluation-of-material-imbalances-by-im-larry-kaufman
- Barnes, David J., pgn-extract: Portable Game Notation (PGN) Manipulator for Chess Games, 1994-2018. Online: https://www.cs.kent.ac.uk/~djb/pgn-extract/.