An empirical validation protocol for large-scale agent-based models

by Sylvain Barde and Sander van der Hoog, discussion paper KDPE 1712, July 2017.

Non-technical summary

Despite recent advances in bringing agent-based models (ABMs) to the data, the estimation or calibration of model parameters remains a challenge, especially when it comes to large-scale agent-based macroeconomic models. Most methods, such as the method of simulated moments (MSM), require in-the-loop simulation of new data, which may not be feasible for models that are computationally heavy to simulate. Nevertheless, because ABMs are becoming an important tool for policy making it is a relevant issue to be able to validate them properly, so that they can be compared to other policy-related models.

Our work proposes a new 3-stage protocol for validating computationally demanding simulation models, based on previous work by Salle & Yildizoglu (2014). The major advantage of this protocol is that it relies on a set of metrologies that are all available as ‘oﬀ-the-shelf’ software, requiring only a coordination of their implementation:

1. Eﬃcient sampling & Data generation: The protocol start by generating an eﬃcient sampling of the parameter space of the ABM, using the Near-Orthogonal Latin Hypercube (NOLH) design of Cioppa and Lucas (2007). This is followed by the generation of simulated data using the ABM on the sample points provided by the NOLH. This is the computationally heavy step, which only needs to be performed once.

2. Probabilistic modelling using MIC scoring & Empirical data: The simulated data is ﬁrst used to train the Markov Information Criterion (MIC) algorithm of Barde (2016a,2016b) on each sample point, which is then scored on the empirical data. This is provides a ‘response surface’ by mapping from the pre-determined set of NOLH parameter calibration points into a ﬁtness landscape using the MIC score as the ﬁtness metric.

3. Surrogate modelling of MIC Response Surface using kriging: In the ﬁnal stage we use kriging (Krige, 1951) to generate a surrogate model of the set of MIC responses and obtain an interpolated ‘MIC response surface’. By optimizing over this simpler model it is relatively quick to ﬁnd new candidate sample points with possibly better performance.

The 3-stage protocol is applied to the Eurace@Unibi model of Dawid et al. (2016,2017). This macroeconomic ABM displays strong emergent behaviour and is able to generate a wide variety of nonlinear economic dynamics, including endogenous business and ﬁnancial cycles. In addition, it is a computationally heavy simulation model, so it ﬁts our targeted use-case. The empirical data used for the analysis consists of monthly data for 3 macroeconomic data (Industrial production, CPI and unemployment rate) from 30 OECD countries and the Eurozone.

Following the ﬁrst step of the protocol 513 distinct sample points are generated using an NOLH design for 8 core parameters of the model. The simulated data generated in the ﬁrst stage consists of 1000 simulated series of 1000 time periods for each sample point. Because a single series requires 13 minutes and 20 seconds to run, generating the full set requires 114,000 CPU hours. While using a High Performance Computing (HPC) cluster speeds up the process, Eurace@unibi remains a heavy model to simulate. By contrast, the scoring of the data using the MIC in the second stage only requires a modest 513 CPU hours.

The results obtained by applying the protocol to the Eurace@unibi model are promising, as tight estimated are obtained for several parameters of the Eurace@unibi. The quality of the kriging model is tested by re-running the protocol on a new sample of points obtained through the optimistion of the response surface, and by verifying that the predicted MIC values provided by the kriging model match the realised MIC score obtained through the validation exercise. The very high correlation (0.926) between the MIC scores predicted by the kriging model and the ones obtained by re-running the entire protocol conﬁrms that kriging provides an accurate interpolation of the MIC response surface.

While this exercise provides a successful proof-of-concept for the protocol, particularly the kriging stage, several weaknesses are identiﬁed. A ﬁrst is the fact that the NOLH design cannot be extended easily if further samples are required. This can be remedied by using another design, such as a sobol sequence, which can be extended more easily. Another weakness is the fact that the MIC measurement relies on univariate conditioning, which probably introduces measurement errors when used in a multivariate setting and reduces the accuracy of the response surface. Solving this problem requires using a multivariate version of the MIC, which is the focus of ongoing research.

You can download the complete paper here.