This discussion relates to the paper “Operational Resilience in the UK Financial Sector: Practical Guidance” by R. D. Chanon, L. Habahbeh, P. Klumpes and S. Mann discussed at the IFoA Sessional on 10 December 2024.
The Moderator (Ms C. Hodges): Thank you for joining us for today’s sessional meeting on Operational Resilience. We welcome Robert Chanon, Lawrence Habahbeh and Paul Klumpes. I am Cath Hodges, the moderator. I am a client partner at Foremost, and I lead the modelling proposition there. We are a technical firm, so my focus tends to be on model risk management and quantitative risk techniques like risk appetite. I sit on the ILAG (Investment in Life Assurance Group) risk management working group, where I have a particular responsibility for prudential risk.
Operational resilience is not new. Human beings have been implementing operational resilience planning since we fortified citadels, laid away stores of food that could be preserved and trained up our defence forces, even if it was not known by that name.
Operational resilience has been a hot topic in financial regulation recently, with the Prudential Regulation Authority (PRA) issuing discussion papers, the operational resilience statement of policy and various other speeches and Financial Conduct Authority (FCA) guidelines, as well as new requirements coming from the EU. The latter includes the Digital Operational Resilience Act (DORA), which comes into effect in January, as well as many other countries, including Canada, Hong Kong issuing new national guidelines.
It is also a crucial time for operational resilience for reasons outside of regulation – geopolitical risks, deep fakes and climate change are all creating new risks all the time. The danger of being caught unaware has never been greater.
Operational resilience is not the same as crisis planning. Recovery or crisis planning is vital but reactive. Operational resilience is a much more holistic subject. At its heart is the need, not to wait for things to happen, but to actively manage our businesses to prevent threats from crystallising and mitigating the impacts from those that do.
We are fortunate to have three experts in the field as speakers today.
Dr Robert D. Chanon is the Global Insurance and Pensions Consulting Partner at Tata Consulting Services. With extensive experience in enterprise risk across academia and industry, he has previously served as a Chief Risk Officer and Director of Enterprise Risk. Dr Chanon holds a PhD in Actuarial Science, an MBA and a joint honours degree in Languages, Economics and Politics, having studied at Science Politique Bordeaux and Université Paris Dauphine. He is a member of the Institute and Faculty of Actuaries (IFoA), a certified Fellow of the Institute of Risk Management, and a member of Information Systems Audit and Control Association (ISACA) with CISA and C risk qualifications. He regularly contributes to risk management publications, industry journals and research studies and facilitates the IFoA’s Climate Change course.
Lawrence Habahbeh is a distinguished actuary with extensive experience. His expertise spans enterprise risk management, market risk and counterparty credit risk. He brings a wealth of knowledge in assessing and mitigating complex geopolitical and financial risks. He has a proven track record of developing innovative risk management strategies and tools, leveraging his deep understanding of both market dynamics and geopolitical contexts. His role involves providing strategic insights and solutions to clients, helping them navigate intricate risk landscapes and enhance their resilience. As a member of the IFoA, Lawrence plays a leading role in the Risk Management Board and the Operational Resilience and Third-Party Risk Management Task and Finish Group. His leadership in spearheading the Climate and Sustainability Working Group and the Black Swan Insurance Working Party underscores his commitment to advancing risk management frameworks in response to emerging threats and uncertainties. His academic background, including an MSc in Actuarial Science from Heriot-Watt University, complements his practical experience. Lawrence regularly contributes to risk management publications, industry journals and research studies.
We also have Dr Paul Klumpes, Associate Professor of Accounting at Aalborg University. He was previously Professor of Accounting at Abu Dhabi University, Professor of Finance at Nottingham Business School, Professor of Accounting at EDHEC Business School and Imperial College London and Swiss Re Chair of Risk Accounting at the University of Nottingham. He holds a PhD in accounting and is a Fellow of CPA Australia and an Honorary Fellow of the IFoA. He regularly publishes in academically refereed journals and writes book chapter contributions. His current research interests cover the interrelationship of financial sector institutions, voluntary reporting and risk management practices for emerging risks related to climate, nature and cyber-attacks.
Dr P. J. M. Klumpes, Hon. F.I.A.: Thank you very much, Cath (Hodges), and welcome everybody to this session. It is a privilege to work with such esteemed colleagues and with Cath (Hodges) to deliver this presentation. We are very pleased to have so many participants from around the world as this is a topic of relevance internationally.
I will give an initial introduction to the paper and the overall framework to the paper. Robert (Chanon) will discuss his insights into risk culture and IT issues. Lawrence (Habahbeh) will present his operational scenarios framework that is particularly important from an actuarial perspective. I will then return with concluding remarks, after which we will invite Cath (Hodges) to provide her insights and commentary on the paper. We will go to the Q&A, and Cath (Hodges) will end the session with some closing remarks.
The paper to which we refer in this discussion is published in the British Actuarial Journal. We also have a summary paper that has been recently published in the Enterprise Risk magazine at the Institute of Risk Management.
In today’s presentation, I have tried to summarise the key aspects that cover:
What is the journey that your organisation is on?
Where are you up to with that journey?
What matters, who matters and why does it matter?
Operational resilience has been around for many years. However, it is only relatively recently that regulators have addressed operational resilience as an overarching framework, encompassing a more holistic set of issues of the organisation, which allows it to deliver its critical operations through disruption.
What are the critical operations (otherwise known as important business services or IBS)? Services are deemed to be important if they have a material impact on the organisation’s visibility, cause customer harm or impact its strategy. There is very much a cause and effect here. It is not just any service. It is a service that also has a deliverable to a customer, a key stakeholder or to the organisation itself.
Finally, there is the issue of what is called impact tolerance. Tolerance is an issue that must be quantified in most contexts. For example, I have previously been an auditor. Auditors must identify their risk tolerance to the client and what they can sustain in terms of intrinsic risk. In the context of operational resilience, a quantification of the level of disruption that the IBS can absorb, before it then impacts the organisation’s liability or causes significant customer harm, is needed. It is important to emphasise the importance of setting impact tolerances. Actuaries have skills that they can bring to bear in this area.
Once we have identified the key concepts, the next stage is to identify what we might call the key principles as set out by the PRA and/or the FCA in the UK. There has been a series of statements and policies since 2017, as Cath (Hodges) has highlighted, when they began to elaborate on what this concept of operational resilience was all about. The actual guidance is made up of a few statements of policy, and some of these statements are specifically in the context of information technology and important business services; more recently, there has been guidance about the role of critical third parties to the UK financial sector. Therefore, this is an all-encompassing set of guidance. However, to a researcher in this area, it is vague. It is not very specific as to what the relevant concepts are, how they should be built together or their interconnection with enterprise risk management. Statements issued through speeches by key managers at various industry bodies, which are available on websites, elucidate some of the aspects of the guidance.
The EU requirements are very much in the context of parliamentary and policy-level detailed guidance concerning the NIS2 Directive, which is the overall ICT directive, meant to elucidate and elaborate on the requirements. It is supplemented by the DORA. DORA applies to all critical infrastructure industry. It is not just the financial sector – the NIS2 DORA framework is meant to cover all critical infrastructure, whether it is defence, utilities, power or telecommunications; all of these actors are subject to these requirements. Furthermore, it is incorporated in the Directive and in DORA that the supply chain is also directly affected. If you are a supply chain provider to an EU-based critical infrastructure firm, your organisation is also subject to these requirements and that is not always obvious from the requirement.
We have also undertaken some research to understand what is happening in other countries. I specifically focused on what might be called the former British Commonwealth countries, where there is a similar legal framework to the UK and an advanced system of financial service provision. If you have a look at some of these countries, like Canada, Singapore in particular and maybe also Australia, there is much more specific and explicit guidance. In the context of Singapore, for example, there is a specific focus on customer requirements. It is embedded within the Singapore Monetary Authority requirements that firms must address specifically the requirements and needs of their consumer end users. Canada, by contrast, has a very explicit connection with the design of the enterprise risk management system. In other words, the organisation must document specified risk resilience and risk culture, but it must also explain how that connects back to the overall enterprise risk management system. That is a requirement that is much more specific. Australia, New Zealand and Hong Kong have gone some way down this path as well, although perhaps not in as much detail. Hong Kong has embedded the operational resilience requirements into its policy manual and has a very detailed explanation and requirements in terms of mapping. It also specifies how the regulator is planning to implement and operationalise these requirements through onsite inspections and other forms of double checking.
I have now elucidated the key principles at a very high level – further details are available in the paper. I would now like to pass the presentation across to Robert (Chanon), who will address the IT and risk culture issues.
Dr R. D. Chanon: Thank you, Paul (Klumpes). Through our research and through my own experience, risk culture is something that is key for the success for this type of a project within an enterprise. The ISO31000 highlights the importance of the organisational risk culture that is relevant for the internal context to the design of an effective operational risk resilience programme. Drawing from personal experience, for a project like this to be successful, the “tone from the top” or support from the CEO is key. That will be the driver to changing the risk culture within an organisation, and to changing the risk attitudes of employees throughout the organisation, and for them to be conscious that what they are doing may impact the operational resilience of that organisation.
Secondly, again with support from the board, risk appetite and tolerances are also key to understanding the level of risk that an organisation can take on. This is where actuaries can help an organisation understand what their risk appetite and risk strategy can be, understanding that risk appetite is directly tied back to strategic goals.
The third point we made in the paper is that within DORA and all the other papers that have been published around the world, it is essential to start with a strong, solid information technology foundation for managing IT risk. There is within ISACA a good framework that gives us the base for scenarios and loss events, but there tends to be a disconnect between Enterprise Risk Management (ERM) and the ISACA framework. Specific loss events were defined in the framework: Actor/Threat Community; Intent and Motivation; Threat Event; Asset/Resource (IT assets); Effect and Timing.
These give a very good basis to start with and need to be developed into an ERM framework and have an ERM approach to operational resilience.
Within the ISACA framework, they describe the top-down scenario identification and the bottom-up scenario identification that is quite new to IT. There is usually a different culture within the IT function, the risk management function and the rest of the business. These need to be brought together to achieve operational resilience.
The risk factors for IT are different. There are external environmental factors and internal environmental factors, but IT perceives risk as vulnerabilities or threats. This is especially true today in a digital working space, for example, in insurance and banking, where we have moved from a paper-based environment to a digital environment. For operational resilience, a firm should be thinking differently about their ERM framework, not just as an IT department that provides services. From recent events, it is apparent that IT becomes the cyber-attack surface, the very first function that gets attacked and hence requires different risk management capabilities looking at stress and scenario development within IT. It is key to understanding what those threats and vulnerabilities are and generating the service tickets within IT systems such as ServiceNow for them to remediate those vulnerabilities.
IT risk is something that is not very well understood because it is such a specialist topic. Westerman (2005) published a pyramid of four basic risks (Figure 1).

Figure 1. Westerman’s pyramid of four basic risks.
At the bottom of the pyramid is the availability of systems. If it is available, is it accessible? Can you either access it yourself, or is it accessible to someone outside? The next step is the accuracy – are its data accurate or not? That gives the agility of the IT system – not only the agility of it working but the agility to change. That is key to the other three risks below. If it is not agile, it cannot be accurate, it cannot be very accessible and it is certainly not going to be available.
This brings us on to the risk maturity model – we have used Carnegie Mellon’s (Table 1).
Table 1. Carnegie Mellon risk maturity model

Risk maturity is crucial in operational proceedings. At an initial stage, resilience is not very high. There is a trade-off between being at an initial stage of this matrix and not being resilient, versus being defined, managed and optimised. All of these are directly tied to risk budgets. With severe budget constraints, and at an initial stage, there will be no capability of delivering operational resilience. To get to a defined state, it needs a budget and a commitment from the board, but it also needs the right skills within risk teams. Optimising can take 3–5 years. But even to get to a defined stage, starting at initial needs a plan, budget, support from the company’s board and at least a 3-year time horizon.
Why does it matter? Over the last few years, we have seen quite a lot of changes within the regulation on and around consumer duty and conduct risk. Consumer duty refers to the rating requirement that financial firms act in the best interests of their customers and work to remove conduct risk, that is, the risk of inappropriate or unethical, unlawful behaviour in a firm by employees and management.
When we discuss operational resilience, we must consider the customer because the customer is the one paying for the product/service. Hence, operational resilience is also about the customer first and foremost, and this is why consumer duty and conduct risks are so important. They must be at the forefront of consideration when constructing an operational risk framework for operational resilience.
The three pillars of a customer-centric operational resilience framework are to identify and prepare, respond and react and recover and learn (Figure 2).

Figure 2. Customer-centric operational resilience.
The foundation of these pillars is the integral physical IT and cybersecurity considerations.
At the top of your pyramid, you have a customer-centric operation resilience, which must be the foremost priority of the board, not just because DORA says this, or because it is a regulation, but rather because it is protecting your customer. If you do not protect your customer, your system fails, and the customer goes elsewhere.
We build on pillars, where in risk management, in the first pillar, we identify and prepare the firm. The second pillar is that we respond and react. We have identified the risks, and then we go through risk mitigation. We react through better controls, and we recover and learn what has happened. We go through our normal Business Continuity Plan (BCP) plans. All of this is underneath an overarching foundation of physical integrity and IT infrastructure and that is also cybersecure.
Mr L. Habahbeh: Thank you, Robert (Chanon). We are going to talk about the operational resilience risk preparation and developing an operational resilience scenario framework. Before we go into detail, it is important to appreciate that operational resilience is not limited to a particular event. It needs to be to the consequences that are generated by different types of events that are considered greater than the ability of any single organisation to cope with or these events that would require a state-level response. One recent example would be the COVID-19 pandemic. These events, what we call emerging risks, tend to share similar characteristics or traits. They can be local, or they can spread globally. They act as amplifiers to existing risks. For example, if you look at the current geopolitical events in Europe and the Middle East amplified by the July 2024 CrowdStrike outage, that software update impacted about 8.5 million PCs worldwide. So, they tend to act as amplifiers. They do have the potential to be systemic risk triggers, meaning that these emerging risks can act as potential disruptions to manmade financial, economic and security systems that support our way of life. They happen concurrently, meaning that you can have multiple events with similar consequences happening at the same time, where the consequences come together in a particular risk context. They start generating unforeseen effects and they are difficult to quantify. This is not true of all of them, but some of them do not have any historical data that we can use. There are many assumptions that need to go into understanding these types of risk when we conduct a risk assessment. Their interaction and interconnectedness with other types of risks are not fully clear.
When we talk about these emerging risks, we are talking about risks that can generate common consequences. For example, they can generate local or global travel disruption, and that can feed into other types of risk impacts, such as supply chains and imports of food, energy and pharmaceuticals that can have knock-on effects on other sectors of the economy (Figure 3).

Figure 3. Emerging risks and common consequences.
Another example could be events that can impact the power grid of a country with implications for other sectors in the economy, such as energy, communication and transportation (Figure 4).

Figure 4. Propagation of risks through socio-economic systems.
Such events tend to generate knock-on consequences. An important point is that when conducting a risk assessment, we do not want to focus on a single event. Usually in the risk register we try to ascertain the likelihood of having, for example, another COVID-type pandemic over the one-year capital horizon. We need to change that question and need to ask, “What is the likelihood that we get any single event out of a portfolio of extreme events that can generate similar consequences?” Obviously, the probability of this happening is much higher than thinking about the probability of a single event happening over the 1-year capital horizon (Table 2).
Table 2. Classification of emerging risks

We can break them down into “known knowns”. These are easily identified risks. Then we have the emerging risks. These are concurrent, systematic in nature and they act as amplifiers. Then there are the completely unknown ones. These are the ones that are referred to in the literature as events that are extremely rare and have a huge impact. We cannot imagine these events happening and tend to explain them after the fact. These are the Black Swan type of events that come out of left field. For example, failure of or loss of confidence in a major component of the financial sector or a major disruption to multiple third-party service providers. When we think about operational resilience, we need to think about the financial ecosystem and the financial ecosystem contains different components: financial institutions, such as banks, investment banks, asset managers and mortgage providers, and non-bank financial institutions, such as insurance companies, hedge funds and pension funds. Then there are critical third-party (CTP) service providers, for example, cloud service providers, payment systems, and central counterparties. We need to be able to understand the linkages between all of these components so that we can conduct an operational resilience risk assessment at the organisational level. We must see what kind of events can impact those linkages and cause disruptions. For example, large-scale cyber-attacks, geopolitical incidents including sustained civil unrest and strikes, pandemics and natural disasters such as earthquakes with a large magnitude similar to the ones experienced in February 2023 in Turkey or the Icelandic volcanic eruption of 2010–11 that caused travel disruption. These are the kinds of events or risk classes that we need to focus on because they can create disruptions to our important business services. These risks mostly generate common consequences. What we mean by common consequences is that if we focus on the consequences that were generated by the COVID-19 pandemic back in 2020, the first thing it did was create travel disruption. Whenever there is a travel disruption, there are knock-on consequences and triggered risks that filter through the economy, impacting different sectors, and that also triggers other risks at a local organisational level. For example, during the pandemic, because of these travel disruptions and the failure of imports, of pharmaceuticals, for example, you had secondary impacts on the availability of different medications in different parts of a particular economy. It is important to appreciate that resilience is not to a particular event, but rather it is to the consequences generated and triggered by a particular event as the risk rips through society. The consequences that are generated can have immediate consequences and have a higher triggered risk. Currently, we do not have a map of all the consequences generated from a particular pandemic and the interaction of these consequences that are transferred to other parts of the system. That needs to be researched further and summarised in a dependence map whereby we can identify the dependence between the organisation and the external critical third parties that are providing services and how those critical third parties are mapped to the internal processes of the organisation, such as the people, systems and procedures.
For both financial and non-financial institutions, as well as third parties they work with, critical infrastructure is energy, communication, transportation, healthcare and water. When there is a destructive event that disrupts a critical infrastructure, it generates initial impacts. These impacts cascade through the economy and start generating second and third-order triggered risks. For example, if you have a natural event that causes a disruption to the power grid and that causes a loss of power, that can impact disruptions to GPS and mobile phones. It could cascade into the energy sector. It could impact the transportation sector, and it could cascade to other critical sectors of the economy with knock-on impacts on supply chains. It prevents people from accessing their cash from cash machines, causing bank failures and perhaps even bank runs. We need to be able to understand all of these connections and linkages in order to assess operational resilience and the risks from different types of events, how they impact linkages and how those linkages might impact the important business services that the organisation is providing.
Key considerations in building an operational resilience framework:
The first thing is that you need to have a holistic view to understand different actors with whom you engage and interact and define scenarios that can potentially disrupt the important business services being provided.
First, you need to identify the important business services. Let’s say in a bank, the IBS would provide payment services to clients. The next step would be to assign the impact tolerance levels using some metrics. For example, according to the UK regulatory framework, we can assess the impact tolerance using outage time or maybe the number of customers that are affected by a disruptive event. We need to consider the risk appetite of the board. We need to consider the historical narrative of similar disruptions so that we can assess what the different levels of disruption to clients and customers are that can be tolerated. In the UK, payments need to be cleared over a period of a maximum of two business days, which means that if I am going to set an impact tolerance for payment as an important business service, it needs to be no more than three days. There is a cut-off point at 3 days, such that beyond this point this could generate significant stress to customers. Customers might start closing accounts, and because of that, that could create a situation where the soundness and viability of the organisation come into question. Depending on the size of the organisation, that can create a systemic risk that can spread to the whole financial sector.
Consider a range of risks and shocks – the type of events that we want to look for are the disruptive events that are emerging risks, which are greater than the ability of a single firm to cope with, and the ones that would require a state-level response. One way to identify these emerging threats is to have a dedicated team that specialises in the identification, assessment and management of extreme event risks specifically, be they natural disasters or manmade extreme events, such as nuclear threats and emerging technologies like AI.
Some questions to answer relating to emerging risks would be as follows:
How to identify them?
How to assess them in terms of their likelihood of occurrence?
How to assess the quantifiable impacts, for example, the economic damage, the financial damage and the number of deaths and injuries?
How to assess the non-quantifiable dimensions, such as loss of certain services and psychological impacts?
The level of disruption would depend, of course, on the regulatory-stipulated requirements. We gave an example of an important business service for a bank, where payments disrupted beyond three days could lead to the health and viability of the institution being at risk. Based on this kind of information, we can set impact tolerance levels at the important business levels.
Dr Klumpes: I am going to talk a little bit more about why this matters. In other words, why is it important to get this right and identify all scenarios? Why is it important to understand the IT? There are two elements to that. First, it is now clear that the PRA, especially in the context of insurance, is going to implement its compliance and enforcement actions. This addresses specifically both the form and nature of the intervention. It also applies to CTP, and it is not just the major insurer or the major financial, but it is also its supply chain that is going to be potentially subject to actions. As it is stated very clearly, individualist concerns individuals – the “PRA authorised person” and their conduct in connection with providing that protection to the policy holders and/or the customers. That is quite a clear message coming out that perhaps was not elucidated so much in the original operational resilience guidelines that were issued some years ago.
I would like to wrap up with the resilience maturity dashboard. This is a very interesting concept that was initiated by the Organisation for Economic Co-operation and Development (OECD) in collaboration with a number of tax administration authorities around the world. It was originally designed in the context of enterprise risk management with the aim of understanding how that authority viewed enterprise risk management in terms of the level of their maturity. We can begin at the stage where ERM is an emerging concept; in other words, it is not well understood. Perhaps some pockets understand ERM and its connection to operational resilience, but it is generally done in a reactive mode. That is clearly not correct in the context of operational resilience, which is meant to be active and proactive.
We can then move to the progression stage. This is where there is some capability in place and understanding of the interconnection of risk assessment, risk culture and risk management. There is a systematic organisational approach, but it is very limited in context. It is very limited as to the budget and to its connection with resource allocation.
The next stage is the established stage. In this case, there is a culture that embeds operational resilience and enterprise risk management into the organisation and into its small processes. There is coordination, promotion and risk information that is being considered through the decision-making process.
We then move to the leading organisations. In a leading organisation, we expect ERM and operationalism to be fully integrated into both strategic planning, performance management and connected to the risk appetite. There would be a culture across the organisation, understanding the roles and responsibilities of individuals and a continual flow of information.
Finally, we get to the aspirational level. This is a very high level because essentially at this point operational resilience is completely integrated, not only with strategy, not only with performance management, but through the entire organisational culture. There is then a use for AI and other advanced technology or tools to identify, monitor and treat the risk in a dynamic way. That is the major challenge that we have found with our research in the UK context.
This ERM resilience maturity dashboard concept is across several dimensions. It covers the overall design, the strategy and the governance and more importantly, the softer aspects of ERM resilience dimensions related to identification, evaluation, treatment, review, communication and reporting. We undertook an analysis looking at the leading insurers, banks and asset managers in the UK. Figure 5 gives a summary of our findings.

Figure 5. OR maturity of UK financial institutions.
The banks represented in Figure 5 by the blue bars are the most mature and often have well-established support, both for the payment systems and through their asset management operations. The insurers lag a little behind the banks. The reason they lag is not that they do not have the basic understanding of the concepts and the tools but because of the inability to integrate that into their holistic processes and recognise the importance of a thorough review and continuous monitoring action.
Thirdly, we discovered the asset managers, who are in terms of their overall size slightly smaller in scale and complexity than the banks and insurers, lag behind both. They have been picking up and are now approaching a relatively high level. About 40% of asset managers say they are covering nearly half, if not more, of the requirements. As for operational resilience maturity, they are moving from an emerging to a well-established level. That is our analysis of the situation until the end of last year.
Mr Chanon: There are several subject areas in which we can enhance our operational resilience and focus on the importance of a solid information technology base, be it customer-centric for the organisation and the operational resilience scenario framework. The Bank of England and PRA now have an enforced robust enforcement framework.
With the current regulations, organisations must identify the level of operational maturity and their risk appetite and the ability to be able to pay for losses because every operational risk has a financial impact.
We then identified further issues that need examination: critical third-party risk management, the so-called TPRM, climate and nature transition planning issues and recovery and resolution.
Mr Habahbeh: To conclude, some of the serious deficiencies of national risk assessments, not just in the UK but in Jordan and elsewhere, are a lack of attention given to high-uncertainty risk and emerging risks. We tend to focus on recent events. Take, for example, the COVID-19 pandemic. There was ample evidence to suggest that we had a serious problem, but no one paid attention until it crystallised. Scientists had data from the 1700s to be able to predict the 2010–11 Icelandic volcanic eruption. The world fails to identify these extreme risks. Therefore, there needs to be a single point of accountability either within a government department or at different institutions that oversees the identification, assessment and management of these extreme events. They should bring together or have access to academics and a range of expertise that support the process of identifying, assessing and managing extreme events.
Mr Chanon: In the UK at the national level, there is some risk identification, but it does not flow down to businesses. There is a disconnect between what the state can do as the overall governing body and how that can help businesses.
Moderator: Thank you very much for that very erudite presentation. The crucial thing is to think of operational resilience as an enabler, not a limiting factor. Having a strong operational resilience framework is what enables financial institutions to push boundaries. If we think about what can go wrong and if we have plans to avoid and mitigate those impacts, that is what allows our businesses to move forward. It is about balancing risk, reward and customers.
I am going to start off by talking about risk culture. There are many mentions of risk culture in the paper. This is what will make or break operational resilience activity in an organisation. If there is not a good risk culture, there will not be good engagement. There will not be buy-in, and therefore, there will not be the quality of insight that is needed for a good operational resilience framework.
I am happy to see mention of diversity because it is a very important theme in operational resilience. In this context, it is an example of something that is truly multidisciplinary across legal, HR, IT, finance and all of the professions involved in a finance institution, but also diversity in every sense. By having a diverse group of people in the workshops and the brainstorming sessions we can avoid groupthink and get the most successful outcomes from those sessions.
The other thing about risk culture, which is mentioned in the paper, is the issue of being considerate of the people within the organisations and hence not adding to workloads. Bringing this into people’s “business as usual” within their day-to-day working lives is crucial because if people see it as something adding to their workloads, they won’t be as welcoming to it as they would otherwise be. People who are overworked or overstressed make mistakes, make bad decisions and have poorer risk judgement. So, that is a nice theme to see drawn out within the paper.
The phrase “user experience” is used in this paper as I have not seen it elsewhere in the context of risk management. The paper puts forth the idea that the people in an organisation who are building these frameworks need to be enjoying the process.
A lot of building up an operational resilience framework in practice happens through workshops with a lot of brainstorming and human input. That makes it fundamentally fallible because, frankly, people make bad decisions and make mistakes and the perception of risk tends to be flawed.
There is an oft-quoted survey that shows that nearly all drivers think that they are safer than the average driver. This is clearly not true. We are subject to a lot of biases. One I would specifically draw attention to is the peak-end rule, which says that the things that we find easiest to bring to mind are the extreme events, situations, examples and recent examples. That creates a skew in what we contribute to meetings and what we include in papers or frameworks.
I would recommend doing some research into calibration training, which reduces some of those biases and inaccuracies. Some examples of calibration training would be to move to using things like range assessments and precise definitions, rather than using point examples. This would mean that instead of estimating that an impact is going to be between X and Y, we would say that there is a 95% chance that the impact is going to be greater than X pounds and less than Y pounds. Some of the descriptions that are used in risk management are much less easy to define consistently across a group of people. We often talk about the reasonable worst case. If something is reasonably likely to happen, that is very subjective and could mean anything. Is that a 20% or a 60% chance of happening? So, precise definitions help.
Using other forms of workshops can help with eliminating some of those biases. One example is postmortem analysis. It is a different way of approaching problems. In the workshop, you must assume that an event has already happened, and the discussion within the workshop is what has caused that event to happen. You assume it has happened already, so that overrules some of that inbuilt bias.
I will move on to the identification of risks and scenarios that Lawrence (Habahbeh) talked about. Sometimes we notice things when we first see them, and sometimes, we are blind to things we see very often. Unfortunately, experience is not a good way to identify risks at all. For the purposes of operational resilience and identification of risks and scenarios, we talk about an iceberg model. An awful lot of the risk is completely invisible. It is below the surface. We need to be creative and need to find ways to predict the unpredictable, and how events can combine to create impacts that we have never seen before.
On identification of risks and scenarios, it is important to consider the tail risks – those hugely disruptive events that have very low probability. Risk managers are inclined to ignore these events because they are so unlikely, but they may have such a large impact that they still dominate the calibration of the total risk. Hence, that is an important point to consider in operational resilience that perhaps we tend to neglect in our risk management.
As well as emerging and yet to emerge risks, it is important to consider connections or non-independence. Consider this example from the airline industry. For a long time, the flight industry claimed that there is about one in a billion chance that anything could happen to all the hydraulic pipes. It is such a small probability of anything happening to one of them, and it would need four to go wrong to have a serious problem. But the pipes are very close together, and if something happens to one, it is likely to happen to all of them.
As actuaries, thinking about such non-independence should be an intuitive concept, and it is important to bring it into thinking about risk events, operational resilience and also accumulations. Here, my example would be head injuries in sports, where the risk events start to pile up. They are accumulating well before there is any impact visible.
My final point is that risk avoidance and mitigation techniques need to be holistic. We need to approach this across all the dynamics. We might find a risk event that has arisen due to human error. For example, there was a terrible tram crash in the United States, which was due to a driver who had been on his phone and not paying attention. A mitigation or avoidance strategy to make sure that does not repeat would be making sure the drivers are not able to take their phones on the tram in the future. However, it is also important to back that up with some system protections as well, such as adding automatic braking to engines.
Conversely, if a financial modelling has created an error because there is an error in the model, the model would be corrected and risk prevention strategies would be added within the model, probably also adding extra human analysis and checking. It is about creating that holistic system of risk avoidance and mitigation, considering barriers, redundancies and recoveries across the various kinds of issues including human errors.
Moderator (starting the Q&A session): Any views on how important the speed of mitigation action can be to minimise the impact of risks and therefore as key parts of overall operational resilience? This requires detailed execution planning. For example, procuring medical equipment and drugs in an emerging pandemic with global competition for restricted resources or adjusting portfolio of assets by risk in response to financial market disruption events.
Mr Chanon: The speed of execution is key. If you have identified a risk, there is a direct correlation between the actual cost of the risk and the passage of time if it is not mitigated. These tend to be parabolic. The earlier you mitigate it, the less it is going to cost.
Mr Habahbeh: If it is a disruptive event, it is important to understand the causes, but it is also important to try to map the consequences of this event as they rip through different systems and sectors of society and then see how that could impact your position. Once you map these consequences, you have probably prepared yourself for a Black Swan event without even predicting it. A strong mapping exercise includes mapping the consequences; understanding dependencies (both internal dependencies and external components); understanding the external dependencies with other components of the system, including how they interact; and the density at each node of the operational resilience grid. Once that kind of modelling and mapping takes place, mitigation strategies can be applied to minimise the impact of the onset of these kinds of events. Because once they rip through society, they propagate and they can have discernible direction and magnitude.
Mr Chanon: It is key at the start of the analysis to understand the risk and to conduct a root cause analysis to put mitigations in place. If the risk is not understood well, those mitigations can be ineffective or partially ineffective. By understanding the root cause, we gain time, cost and efficiency.
Moderator: I noticed in the paper and in the slides that the data shows that operational resilience maturity in insurance firms seemed to fall in 2022. Do you have any theories as to why that was?
Dr Klumpes: We looked at two insurers, and in one of them, there was a change in the leadership of the firm and in the governance. After that change, they decided that almost the entire management of the main risks related to cyber and risk management would be effectively outsourced to the IT department. The big difference between that firm and the other firm was that the other firm continued to evolve and integrate in a holistic way, in its judgements, in the notion of interdisciplinarity and with regard to diversity.
I think that sometimes in firms their culture tends to be dominated by one area. If it is dominated by IT, that will yield an IT solution. But is that really a business solution?
Mr Chanon: For an operational resilience project to be successful, it has to come from the top. It is top-down. It is part of the job, even down to putting it in job descriptions, and that makes a lot of difference. Not because people see it as part of their job description, but people start to understand risk management, and then it leads to fewer problems. The key to being successful, not only for operational resilience but for ERM, is that culture.
Moderator: In the paper, you mentioned horizon scanning as a way to identify risks. Could you tell us more about how to conduct horizon scanning?
Mr Habahbeh: To be able to apply strategic foresight and horizon scanning to identify types of disruptions, companies need to have a unit that specialises in extreme risks – to identify, assess and manage those events as they happen. That would require open channels to speak to academics and different risk/domain experts to get their opinions to support operations. Imagine, for example, the exposure in a financial ecosystem to different CTP service providers. There could be a few CTP service providers that are very dense, meaning that every component of the financial system and the non-banking financial system is connected to it. If a malicious actor can understand this map and they are able to generate a concurrent cyber-attack on these CTP service providers, they can create significant damage that can potentially cascade into a systemic event impacting the whole financial ecosystem. In trying to map those dependencies and those sorts of linkages, we can borrow from graph theory the concept of nodes and linkages and then work out concentrations to gain insights as to exactly where a company stands in this network operational grid.
Mr Chanon: To develop an emerging risk model that is current and relevant, Gen AI and different modelling techniques could be useful as they create an update every 15 minutes. There are news events around the world that may impact the insurance portfolio or banking portfolio, or the third-party supplier may impact your business. Updates could be proactive as to when an emerging risk is identified and provide updated judgement of how much mitigation will cost.
Question: Graph theory was mentioned. What other areas of mathematics, other than stochastic modelling, can help with developing operational resilience insights? Monte Carlo simulation is the obvious answer, but does anyone want to add to that?
Mr Habahbeh: It is important to understand nature. It is important to speak to physicists who are dealing with complex system analysis because they would have interesting theories about how these sorts of system components interact with each other, creating emergent behaviour that cannot be decomposed into the individual components generating that behaviour. Impacts of black swans or highly improbable events are generated from within the system itself, for example the COVID-19 pandemic. When we observe that inputs and outputs are becoming random and unpredictable and the behaviour of the system reaches a critical threshold, physicists/seismologists would be able to draw parallels with earthquake analysis and volcanoes. That is because these tend to exhibit similar behaviour, such as heavy-tailed or parallel distributions of the earthquake magnitude, or the time the system remains under duress, or how it organises itself into that critical state; according to a paper published in 1987 by Per Bak called “Self-Organised Criticality”. It is important to understand those concepts of rare events that can occur in natural and manmade systems and see how these concepts apply to the management and not prediction.
Dr Klumpes: Neil Cantle, together with John Evans, had developed this cladistics model of mapping and interdependencies. There are theories connected to nature – it is not just a matter of dependencies. It is understanding the nature of interdependencies and some of these non-reciprocals and the fact that we cannot expect some form of contractual reaction. Events that can happen through biodiversity loss, degradation, etc., have implications for operational resilience, especially for those organisations that choose to target a net zero or choose to adhere to the Montreal agreement, and therefore make commitments about biodiversity loss. These are then monitored by Non-Governmental Organisation (NGOs) and the stakeholders very carefully. Therefore, it does require some out of the box thinking, not just in terms of the theories and the statistical models but also in terms of the generic aspects.
Question: Operational resilience has become a feature of the operating model for banks, insurers and asset managers. The pensions regulator does not seem to be at the same stage for workplace trust-based pensions, master trusts or local government schemes. Do you think that this will come soon?
Mr Chanon: I am working on a project at a pension provider under the Pensions Regulator (TPR). In the new guidance they issued, ERM is mentioned. It is not mentioned as ISO31000. There is no mention of COSO. I find it strange that those who are subject to the pension regulator are not subject to FCA regulation. I think that is an anomaly and I do not think that is quite right. There is a lot more to come in the pension space.
Dr Klumpes: I conducted research for the Australian Law Reform Commission about 30 years ago on the issue of the diversity of the complexity of the regulatory system affecting financials. On the one hand, there is pure contract-based fiduciary responsibility between, say, an asset manager with a full-on insurance contract. On the other hand, there are notions of equitable trusts, many of them involving a third-party trustee. But they are very much at the occupational pool level. These organisations do not have the same concept of what is called a responsible entity. In the UK, having discussed with colleagues at the TPR and others, we see there is a gap. If the chosen pension provider is one of the big insurance firms, and/or its asset management subsidiary, it will be subject to the requirements of the FCA and the Tax Receivable Agreement (TRA), where relevant. On the other hand, if it were NEST, or one of the local government pension schemes, it would be a completely different situation. In the European Union, the regulator is meant to cover both entities. In Australia, that is not the case.
Question: Prompted by the discussion around complex systems and non-linear behaviour, does an occurrence such as a sudden shift in asset prices, as in the financial crisis, fall within operational resilience because it affects the financial system’s ability to pay, for example, if there was a sudden shift in the prices of fossil fuel assets?
Mr Habahbeh: I think you are referring to whether there is a climate risk premium currently reflected in asset prices across different asset classes, be they equity or fixed income issues that are issued by fossil fuel companies. In this case, we are talking about pricing, and we are talking about a premium that is not priced in. Do you think that there will come a tipping point or a Minsky moment where all investors will start selling these assets across the board and across different markets? It is a plausible scenario. However, it can purely impact the market risk and counterparty credit risk of financial institutions that hold exposure to these assets through equity, or through fixed income investments or other types of structured products, where the underlying assets reflect a company that trades, sells or engages with any fossil fuel assets. I do not see a tipping point, or the market suddenly sensing a problem and selling. There would more likely be some gradual movement of prices to reflect the risk.
Dr Klumpes: I have done some research into the impact of behavioural biases on investment decision-making. It affects issues such as to what extent an investment manager decides to have a positive or negative sentiment towards the stock, be they underweight or overweight. We had a look at things like the hit rate and the win-to-loss ratio, which are not often understood but typically vary. We discovered that this notion of trading on gains and losses rather than risks and rewards is the classic theory by Tversky and Kahneman. This notion of prospect theory is quite relevant to the notion of operational resilience. Some of these systemic issues can cause a tipping point in decision-making behaviour. It goes from being totally rational to being much more intuition based. Therefore, it is about win-to-loss and gains and losses that are then driving the investor in a herd sense. We do some find some evidence in support of that.
Mr Chanon: Recall the recent price hikes in gas and electricity because of the Ukraine war. Germany was heavily reliant on gas, and they had to build liquid gas facilities for storage to be able to reduce their reliance on Russian gas. That is also operational resilience, even without thinking about the financial modelling and behavioural aspects.
Moderator: Actuaries are well placed to take a key role in operational resilience. We are already familiar with the mathematical concepts involved. We can apply our capital estimation techniques in the field of risk quantification. We have skills in communicating at the required level and with the boards. We have a holistic understanding of the financial industry, and we can collaborate across professions. We are well placed as a profession to take a leading role in operational resilience. Operational resilience is a concept that does not just apply to the financial industry. Learning how operational resilience is tackled in other industries such as the airline or healthcare sectors can be beneficial, since they can offer more tangible examples. Thank you for giving us a practical framework with which to approach operational resilience and insightful comments on the paper today. Thank you to Robert (Chanon), Lawrence (Habahbeh) and Paul (Klumpes) and to everybody who is listening.




