“All models are wrong, but some are useful”
– George Box
What is a model?
Mathematical models are part of our everyday life: they are used for predicting car performance and design, the impact of COVID infections, the impact of carbon dioxide emissions on the climate, forecasting the weather in a city or stock prices in the market, predicting consumer behaviour and economic activity, and so on. They are, however, often poorly understood by the general population, and sometimes by the people that use them, especially if they did not create them themselves.
At the most basic level, a model can be argued to be a mathematical equation: a variable – or collection of variables – that produces a given desired output via a defined mathematical relationship between the variable(s) and the output. Simple models have few input variables, a single or few outputs, and can be run on any household computer in a matter of seconds. Complex models can even have submodels (a model inside the model), hundreds of variables, and take days or even weeks on supercomputers to give an output, such as the ones used for climate predictions. Nonetheless, a feature shared by simple and complex models alike is that, if they aspire to be useful, they must appropriately capture the most important variables that give rise to the output – they must appropriately model the process they intend to represent. This sounds intuitive at glance, but it is effectively the Achilles’ heel of every model, and every modeller’s primary source of concern, regardless of the context the model is intended for.
Let us consider the Australian housing market for a minute. If we are interested in understanding and predicting, say, the price of a house, we would have to create a model that captures the most important variables that drive said price. Most people are familiar with some of these variables: size of the house, number of bedrooms and bathrooms, size of the block of land, location, and more. However, to create an effective model – that is, a model that accurately predicts the price of a house in Australia over a reasonable period in the future – we would have to consider many variables that might not be presently available or easy to obtain, for example how much people will stretch themselves at auctions to secure the house considering their present financial situations. We would additionally be compelled to consider variables we simply cannot consider; variables that will simply never be available at the time of modelling, for example future changes in zoning policies, future changes in the cost of building materials, and more. Therefore, we would be forced to make assumptions about these variables when creating a model. Do not let the word assumption mislead you, though: assumptions are a normal part of any model, nonetheless, even the most effective and useful ones. Not all assumptions are innocuous, however, and some are more dangerous than others, which is why these assumptions can also be conceptualised as limitations. In the author’s opinion, the most important ones to be aware of are:
(i) that the variables currently included in the model are sufficient to capture what the model intends to represent (in our case the price of the house),
(ii) that the current performance of the model – that is, the difference or error between the prediction and the observed data (the predicted price and the actual observed price in our example) – is acceptable, and
(iii) that past relationships and interactions between these variables will remain stable enough in the future, so that the model will remain useful for a reasonable amount of time.
We will touch on these assumptions throughout this article as, depending on context, they can make or break the model, and lead to disastrous consequences. If you are, for example, a small company using the model we just created to lock in the selling price of the house before committing to building it, how much money can you afford to lose per house due to the model’s errors? On the other hand, if you are a government using this model to predict how much revenue you will raise via stamp duty over a term these errors could cost you billions in miscalculations. If you are a postgraduate student using a different model to, say, predict the effect of interest rates on inflation in Australia, creating a model with subpar assumptions could simply mean that you would have to settle for a mid-range journal instead of a top one, or having to review it importantly to get it published. On the other hand, if you are the Reserve Bank of Australia, this can translate into wrongful predictions on how many times and how much you need to raise interest rates, ultimately costing thousands and thousands of dollars a year to already struggling households – to real people. If you make the wrong assumptions about vaccine effectiveness during COVID modelling, you could end up creating draconian policies that amount to little reduction in transmission of infection rates, while at the same time violating important principles such as medical informed consent. All in the name of predictions born from models with wrongful and/or incomplete assumptions.
It is hard to overstate how much these assumptions matter. Complex models such as epidemiological, economic, climate and behavioural ones are however much more sensitive to these assumptions and limitations mainly but not limited to the incapacity of the model to account for the complexity of the problem at hand – no one can effectively model how people ultimately react to policy, for example, or how people will react to the introduction of disruptive technology. Importantly, the variables included in the model will primarily be a function of the creators’ biases (what they think is important, and why) as well as the current technological limitations (how much computational power can be used). There is sadly no escaping these shortcomings. Furthermore, what is the “cost” of an error in a model is not a straightforward calculation or conclusion, and it is often a moral question – one cannot simply find a scale to weigh how much it means saving a life, making a wrong medical diagnosis, triaging life-saving supplies, or destroying people’s businesses to maximise another output. These shortcomings should be held front of mind whenever creating or using a model for inform policy, or when being on the receiving end of said policy, especially because many models are simply proven wrong as soon as they hit the real world.
Testing a model in the real world
The first thing anyone needs to do after conceptualising and building a model is to test it to ascertain its performance. It is common to first test the newly created model against a portion of the data that was available but not used to create the model. This data is known to the modeller but not to the model, therefore it can be used as “unknown” data – a proxy for future events. Provided the model performs well against this data – this is, that the errors between predicted data and the actual data are minimal and/or acceptable, the model can be put to the test in the real world, which is ultimately where it needs to perform. This is where the real problems start.
Say I am using a model to make predictions on the price of a stock X, but since I already created the model I will be acting on its prediction (what is the point of creating the model if I am not going to use it?) The model predicts the price of my stock X will rise tomorrow, so I go in and buy as much as I can to capitalise on my trustworthy model. Sure enough, the price goes up, and I conclude that the model’s prediction was correct. But I failed to notice that my intervention (purchase of X) could have been the cause of the increase in X’s price, rather than the model prediction being correct. Would the price have risen had I not intervened in the market? Some might rightfully argue that a single person’s action is unlikely to have such an effect in the price of a stock (a market-level effect), and that the most likely explanation is that the model’s prediction was right. But what if I am a managed fund or a large investor and the capital sums I am using are large enough to affect the market price of the stock? The latter scenario is precisely what happens when we use models’ predictions to enact policy: we are introducing changes to the relationship between the variables directly, thus violating one of the most important assumptions we made – that past relationships and interactions between these variables will remain stable enough in the future. We might have added instability that we did not account for when modelling, and depending on the policy at hand, this instability can be significant.
When scientist test these kinds of scenarios they do their utmost to carefully control the most important variables, and one of the most important ways to control for the aforementioned problem is to set up what is known as a control group; a group that has, ideally, all the characteristics of the group upon which a treatment will be performed, so that the only difference between the groups is the treatment and thus the results can ideally only be attributed to the treatment. But there are many instances where having an effective control group is simply impossible: we do not have a second Australia (or a second planet Earth) with comparable population and economic behaviour to test, for example, whether interest rate policies derived from our economic models are correct, or to test whether our COVID models were right in the number of deaths prevented due to lockdowns and mandatory vaccination, or to test the effects of our policies to cut carbon dioxide emissions and then compare the effects against the IPCC’s predictions. Even if we had this second Australia, the ethical implications of the proposed test would make the testing inviable in the first place. We are thus left with the only available option: hypothesising and calculating – to the best of our ability with yet another truckload of assumptions and limitations – what the effects would have been had we not acted on our models’ predictions. We would nonetheless still be blinded to that alternative reality, because by enacting and enforcing a given policy (say, COVID lockdowns) we already changed the trajectory that the model was supposed to capture. In this light, many might argue that the model’s predictions were correct, and that the policy enacted prevented the forecasted outcome. But the possibility that the model’s predictions were wrong is also real, yet we will never know for certain because we simply could not observe Australia without said policies (comparing Australia to countries that did not implement some of these policies comes with the pitfall that these foreign populations are not Australians, and do not behave or react like Australians, thus limiting the strength of the extrapolations).
Many doomsday predictions of famine, population booms or shrinkage, shortages, and so on have come to the fore in the past, and many have not eventuated. These failed predictions are the bread and butter of the most ardent critics of scientists and policymakers that chose to act on them, and critics hold these failed predictions as living proof of how wrong the scientists and policymakers got it. But I would argue that when the goalpost changes – when the population’s behaviour or economic landscape changes following enacted policy driven by the belief in a prediction – neither the prediction nor the implemented policy can be evaluated anymore because we could not observe the experimental group with the policy applied and the control group without any intervention. That the prediction was right and that the policy saved the situation will be the argument of those who stand with said scientists and policymakers for whatever reason, while those who want to drag them through the mud for their preferred reasons will insist they were wrong – both sides simply digging their heels further. But the reality does not change: in many instances of predictions and ensuing policy we will simply never know whether the prediction was right or not, or if the policy effectively made a difference in regards to the prediction. We can only evaluate to what degree the assumptions were correct, and in this light the assumption that the relationship between variables will remain the same for the foreseeable future is usually the one that fails due to, for example, the introduction of disruptive technology and unforeseen events – no economic model could have predicted the COVID pandemic or the creation of ChatGPT, for example.
Fortunately, in many other scenarios of model testing we can simply do away with this problem of the control group and collect the real data that the models were supposed to forecast. Climate change models tend to fall in this basket mainly because, for the most part, very few countries have implemented policies that dramatically curb their carbon dioxide emissions, while models have been running for many years and iterations and their predictions have been compared to the real, observed data. This however does not guarantee that the models and their predictions will fare better than in other scenarios, though.
The website RealClimate currently collates a succinct view of these iterations comparing the observed temperature to the forecasted (and hindcasted) values coming from these models, and these views highlight some of the problems of the current climate models. Briefly, there is evidence that some models are running ‘too hot’, meaning that the predictions exaggerate the impacts of global warming, and that they are predicting a future that gets hotter faster than it might be. Others argue that, for the most part, the models might be correct but might also not be, for a myriad of reasons. Other experts simply argue that our current models fall short on many important fronts, to the point that they are not fit for purpose, for yet another set of reasons. Amongst the latter, the most notable reason is what is known as the “climate sensitivity” to carbon dioxide – how much warming we can expect from carbon dioxide emissions –, which appears to differ greatly between models and scientific approaches. It seems critical that, if experts and policymakers are to pontificate about the potential issues of carbon dioxide emissions, they should at least estimate this critical parameter within boundaries that do not allow for such divergence in outcomes across models. It also begs the question: how can we all be so sure of the exact impact of carbon dioxide emissions when not even experts can agree on such a critical issue? In reality, the increase in global mean surface temperature depends largely on the model used to calculate it.
In their defence, these models are trying to capture one of the most complex problems, involving some of the most complex equations, in which there will always be discrepancies among groups in terms of initial assumptions, what are the most critical variables to include and exclude, how to account for the unaccountable, and so on. But these discrepancies are what allow the critical reader to conclude that the statement that “the science is settled”, as some put it, simply does not match the reality of the scientific endeavour in climate modelling – or in any complex scientific endeavour, for that matter –, and the need to “get it right” in this space can understandably drive models to overfit the data (a problem where a model created predicts the known data too well, and by extension does a poor job at predicting the future data). The problems in complex modelling thus compound exponentially as soon as we leave the modelling space and enter the “what do we do about it” space.
Additionally, the past lack of transparency on climate models tuning (parameterisation, creating equations to account for variables and effects that the model itself cannot account for) further casts shadows of doubt on the basic premise required to build an effective model; a premise that applies to all modelling endeavours and not just climate: that we fully and truly understand and account for the problem we are modelling. This does not mean that scientists do not know well what they already know about the problem; their expertise is truly a blessing for the shared progress of humanity. It is however simply a statement of how much they do not know about the problem – the known unknowns and the unknown unknowns –, as well as how much they could not account for in the model, and the effects of not accounting for it. The truth is in the model: if I created a model based on my understanding of a complex problem and the model fails to predict the outcomes in the short or long term within acceptable margins, can I really say with confidence that I truly understand and account for the problem I am trying to model? This debate however escapes the scope of this article, but it alludes directly to the first modelling assumption we explored – that the variables currently included in the model are sufficient to capture what the model intends to represent.
Should we even use models, then?
No model can perfectly account for all the variables and all the complexities in a problem, and we still have not exhausted the well of knowledge on most of our problems, especially the most pressing ones. Does this mean we should not use any model in any decision-making process?
Models, when properly understood, are some of the most useful tools we have at our disposal to make sense of the world, to make predictions, and to act on them. Beyond the recurring problem of model limitations, the biggest problem lies at the intersection between their limitations and the political and ethical realms: what do we do about these models and predictions? This is not a problem of policymakers only; scientists and experts can be politically driven as well and have all manner of incentives to downplay the limitations of their models, or to not disclose them in the first place. While politicians suffer from the need to be popular and coming up with announcements that will address peoples’ concerns to win votes, scientists and experts have their own set of incentives, many of which can become maladaptive in time: they need to publish in journals, be trustworthy in front of the media and political liaisons that come to them for answers, deliver soundbites for news articles or executive reports that cannot account for the complexity of the problem or model at hand, and so on. It is in this space that the nuances naturally embedded in complex modelling become an afterthought, and the whole exercise of using complex models to peek into the most likely future is thus bastardised. From a careful and nuanced conversation about the complexity of the problem at hand and our imperfect understanding of it – as well as the models that try to capture said complexity –, it gets transformed into a simplistic, headline-friendly and dangerous exercise in crystal ball gazing where livelihoods, economies and hard-earned civil liberties are at risk. Models simply become the next ghastly tool for people’s clairvoyance, now bestowed with the power of computers, yet still employed in the age-old ritual of influencing people to make decisions – decisions that, often, are of the moral kind rather than of the scientific one. We do not have a God-given mandate to prioritise life over risks, or safety over freedom, for example, and no amount of data or models can tilt those scales.
The problem is thus neither our currently limited understanding of complex problems, nor the limitations of our modelling approaches, but the undying human nature of those bestowed with power to use whatever methods necessary to actualise their visions. To them, these complex models – and the experts who create them – are just the perfect excuse to mask their real motivations behind the curtain of science’s authority. Whether the models are correct or not is a secondary matter, because if they are not, by the time the evidence is in it will be someone else’s problem, many confounding variables will have entered the complexity of reality (and can thus be used to shift the blame), and there will be all forms of legal and political recourses protecting them from their past sins.
What can normal people like us do? Remain sceptical, read widely, and ask the hard questions. If there is some silver lining coming out of the past couple of years, it is that people are increasingly distrusting of authorities, their coat-shifting discourses, and their incapacity to admit their mistakes. This is fertile ground for people to exercise the critical thinking they have for so long outsourced to representatives, to experts, and to the never-ending parade of know-it-all activists on social media. When people thinking critically, put their representatives to the test, and push for more transparency, narratives cloaked in scientific jargon and shrouded in intellectual classism start falling apart.
Models were, are, and will always be a critical part of our understanding of the world, and to make sense of what might come to pass in years to come. Despite their limitations, we will use them one way or another. We simply need to remain vigilant when someone uses them to make large-scale decisions – the larger the scale, the more vigilant we need to be. Otherwise, our very hard-earned ways of living, the fruit of our labour, and even our civil rights might all be modelled out of existence.