Integrators at war: Mediating in AI-assisted resort-to-force decisions

Dennis Müller; Maurice Chiodo; Mitja Sienknecht

doi:10.1017/cfl.2025.10030

Integrators at war: Mediating in AI-assisted resort-to-force decisions

Part of: AI and the Decision to Go to War

Published online by Cambridge University Press: 27 January 2026

and

Dennis Müller: Affiliation:
Centre for the Study of Existential Risk, Cambridge University, Cambridge, UK
Maurice Chiodo*: Affiliation:
Centre for the Study of Existential Risk, Cambridge University, Cambridge, UK
Mitja Sienknecht: Affiliation:
European New School of Digital Studies, European University Viadrina, Frankfurt (Oder), Germany
*: Corresponding author: Maurice Chiodo; Email: mcc56@cam.ac.uk

Article contents

Abstract
Introduction: AI integrators as a hidden problem area
Historical examples of integrating new technology in the military
The integration of AI-enabled systems into the military and the constitution of a socio-technical system
Challenges arising in AI-based socio-technical systems
How to address challenges in the integration process? The 10 pillars of responsible AI integration
Concluding remarks
Funding statement
Competing interests
Footnotes
References

Rights & Permissions

Abstract

The integration of AI systems into the military domain is changing the way war-related decisions are made. It binds together three disparate groups of actors – developers, integrators, and users – and creates a relationship between these groups and the machine, embedded in the (pre-)existing organisational and system structures. In this article, we focus on the important, but often neglected, group of integrators within such a socio-technical system. In complex human–machine configurations, integrators carry responsibility for linking the disparate groups of developers and users in the political and military system. To act as the mediating group requires a deep understanding of the other groups’ activities, perspectives and norms. We thus ask which challenges and shortcomings emerge from integrating AI systems into resort-to-force (RtF) decision-making processes, and how to address them. To answer this, we proceed in three steps. First, we conceptualise the relationship between different groups of actors and AI systems as a socio-technical system. Second, we identify challenges within such systems for human–machine teaming in RtF decisions. We focus on challenges that arise (a) from the technology itself, (b) from the integrators’ role in the socio-technical system and (c) from the human–machine interaction. Third, we provide policy recommendations to address these shortcomings when integrating AI systems into RtF decision-making structures.

Keywords

education artificial intelligence AI integrators resort to force socio-technical system systems engineering

Information

Type: Research Article
Information: Cambridge Forum on AI: Law and Governance , Volume 1 , 2025 , e51

DOI: https://doi.org/10.1017/cfl.2025.10030 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (http://creativecommons.org/licenses/by-nc-nd/4.0), which permits non-commercial re-use, distribution, and reproduction in any medium, provided that no alterations are made and the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use and/or adaptation of the article.
Copyright: © The Author(s), 2026. Published by Cambridge University Press.

1. Introduction: AI integrators as a hidden problem areaFootnote ¹

The integration of new technologies into the military is common. However, the prospect of integrating AI-based systems specifically into resort-to-force (RtF) decision-making processes presents a unique set of challenges. It not only changes how war is made, but also how it is decided upon. While the responsible use of AI on the battlefield has received thorough scholarly attention (Schraagen, Reference Schraagen2024), comparatively little attention has been given to AI systems for high-level strategic military and political RtF decision making (Erskine & Miller, Reference Erskine and Miller2024).Footnote ² AI has the potential to contribute to RtF decisions in many ways: from low-level data analysis, such as detecting enemy vehicles in an image (otherwise done by junior workers), to medium-level aggregation tasks, such as identifying patterns in enemy movements or communications (otherwise done by trained analysts), to high-level analysis, taking in a broad range of data and providing probable scenarios, perhaps alongside recommendations on potential courses of action (otherwise done by senior analysts). AI-based systems can shape and influence the way decisions are made. The decision to go to war is usually made by civilian leaders following a specific process involving many different stakeholders, including military personnel, intelligence analysts, legal advisers and democratic control institutions. Integrating AI systems, such as decision-support systems (DSS) into this already complex decision-making process will influence how decisions to go to war are made. For treatments of decision-making processes on the resort to force and their potential transformation through the integration of AI systems, see, e.g., Deeks (Reference Deeks2024), as well as, from this Special Issue, Sienknecht (Reference Sienknecht2026) and Erskine and Davis (Reference Erskine and Davis2026). In the present article, we concentrate on one specific aspect of this complex human–machine teaming, namely the role of integrators. While integrators are indispensable for the responsible use of such systems, their significance has only recently been recognised and subjected to closer scrutiny (e.g., see on continuous integration pipelines, the Australian Government (2025) “AI technical standard,” Statement 32 [p. 90]; the Chief Digital and Artificial Intelligence Office’s (CDAO) (2023) Responsible AI [RAI] Toolkit).

Given the potentially devastating and harmful effects of AI-based systems in RtF decision making, a responsible integration of AI itself is crucial. This article asks which challenges and shortcomings emerge from the integration process for AI in RtF decisions and how we could ensure a responsible integration. To answer this research question, we explicitly focus on the constitution of the socio-technical system (Trist & Bamforth, Reference Trist and Bamforth1951) that ties together different actor groups and the technology when AI is integrated into the decision-making structures of the political and military system. While this socio-technical system approach has been initially used for identifying optimisation potential between the social and the technical, we will use it to identify potential pitfalls that emerge from merging social and technological systems in the specific case of AI technology in decision making on the resort to force.

In this way, we approach the integration of AI into decisions about war and peace – and thus about life and death – from a rather technical perspective by emphasising the various, sometimes conflicting, relationships between the groups of actors set into relation by the use of AI. With this focus, our article deals with one neglected aspect of the fourth of Erskine and Miller’s (Reference Erskine and Miller2024, pp. 139–40) “four complications” that accompany the introduction of AI-driven systems into RtF decision making: namely, the risk that AI could exacerbate organisational pathologies.

Socio-technical systems consist of different actor groups, organisational structures, the technology and the task the technology is fulfilling. In the present case, one can distinguish between at least three relevant actor groups of such socio-technical systems: developers, integrators and users.Footnote ³ Each group can be further differentiated according to the specific roles and job titles that individual members can have. However, we argue that it is useful to distinguish between these three groups, since these groupings address the broader function these people fulfil: developers are those involved in creating the AI in the first place, integrators are those involved in ensuring that the AI properly fits into the existing socio-technical systems of the military and government, while users are those who end up utilising the AI. Job titles for the three groups can be as follows:

• Developers: software engineers, data scientists, machine learning engineers, software architects, AI developers, privacy and security experts, systems engineers, user-interface experts, etc.
• Integrators: project managers, user-interface experts, human factors experts, privacy and security experts, lawyers, AI integrators, systems engineers, etc.
• Users: civilian leaders and politicians, military leaders, advisors (legal, ethical, strategic, policy, intelligence, etc.), etc.

The same job title can appear for different groups. For example, privacy experts would be involved in both developing an AI, and in making sure that it is used safely; lawyers might be involved in the initial development stages (to ensure the development is legal), and then involved in the integration (to ensure that the usage is legal); and user interface experts would be involved in developing the initial product, and later adjusting it to specific users and/or supporting the users in a more general way. Here we used the plural rather than the singular to indicate that these may be different individuals working as developers or integrators (i.e., it does not have to be the same privacy expert, lawyer or user interface expert), but it can – and for some people does – happen that they are part of one, two or even all three groups. Individuals who are part of more than one group are typically very experienced, and as such, the intersections between different groups can be rather small.

Despite the relevance of these three groups, research and public discourse about AI pitfalls often focuses on either those who develop such AI systems or the end-users.Footnote ⁴ The group of integrators is usually not in focus, provided they are doing their job well. For example, the summary of the North Atlantic Treaty Organization’s (NATO) Artificial Intelligence Strategy consistently talks about developing and using AI but only mentions integration when it focuses on the interoperability between different tasks (NATO, 2021). Thus, the roles of developers and users are often overemphasised compared to the group of integrators. However, without integrators, the AI system could not be used responsibly as we illustrate later.

Integrators require their own special attention when it comes to military applications of AI. This is now starting to emerge in military directives (Ministry of Defence, 2024) and guidelines (Devitt et al., Reference Devitt, Gan, Scholz and Bolla2025). Notably, much work remains to be done, and some of these works have received significant criticism. For example, the MoD directive has been criticised as being “a doctrine of process, not prohibition” putting civilian deaths up as a question of trade-offs, and lacking meaningful human control (Overton, Reference Overton2025). However, we see in these works promising early steps towards extending general military toolkits (e.g., CDAO, 2023), civilian principles (e.g., Organisation for Economic Co-operation and Development [OECD], 2022) and guidelines (e.g., OECD, 2022) to the responsible development, integration and usage of AI RtF systems. However, despite being based on nuances of the development, integration and application of AI systems, these initiatives do not focus on the mutual dependencies and influences between entire actor groups. Thus, below, we explicitly focus on the role of integrators within socio-technical systems.

Integrators usually have extensive technological knowledge and experience working in the military, as is also depicted by recent job advertisements in this area (Artificial Intelligence Jobs, 2025; Huntington Ingalls Industries, 2025; NATO Employment, 2025). They are responsible for integrating AI into already existing organisational structures, thus guaranteeing a trouble-free integration with other technologies and existing decision-making structures, and a responsible use of the system by the users. They aim to avoid harmful effects both in the development process, e.g., so that developers do not lack information about use cases, and in AI usage, e.g., so that users understand what information is displayed by the AI and can properly interpret and query it. In this sense, integrators take a sandwich position between the developers and the users. Moreover, the need for integrators is implicitly outlined in the Institute of Electrical and Electronics Engineers (IEEE) 7000-2021 standard, specifying responsibilities that include “to translate stakeholder values and ethical considerations into system requirements and design practices” (Devitt & Copeland, Reference Devitt and Copeland2022, p. 38). Thus integrators fulfil a central role in operating a responsible AI.Footnote ⁵

To analyse a responsible integration process for AI, we make a case for shifting focus to the role of the integrators in the highly sensitive context of beginning a potential war. Our group-based approach in the analysis of the integration process complements more fine-grained studies focusing on guiding individual experts (e.g., CDAO, 2023; Trusted Autonomous Systems, 2023), high-level political and end-point orientedFootnote ⁶ approaches (e.g., Australian Government, 2024; OECD, 2024), and legal best practices (e.g., Asia Pacific Institute for Law and Security, 2025). We build on the 10 pillars of responsible development of mathematical works (Chiodo & Müller, Reference Chiodo and Müller2025a) and adapt them to the integration process of AI systems into the military. Our definition of integrators, which is wider than that of the other existing guidelines, toolkits and directives, will be paired with a general toolkit coming from mathematics to build a holistic perspective on integration issues of responsible AI RtF. Our proposed 10 pillars of responsible AI integration are a method to address the pitfalls of end-point oriented approaches and help those working with AI to achieve responsible outcomes.

In what follows, we first discuss (in Section 2) historical examples of the integration of new technologies into the military, highlighting what distinguishes previous technologies from the integration of AI systems into military decision making. Section 3 further conceptualises the relationships between humans, AI-enabled technology and the organisational context as a socio-technical system. Section 4 details some potential challenges arising from this configuration, along the technological, the human–machine relationship and the group of the integrators itself. To address the identified challenges, Section 5 adapts the recommendations for developers by Chiodo, Müller, & Sienknecht (Reference Chiodo, Müller and Sienknecht2024) into pillars of responsible AI integration (see Table 1 in Appendix A). We demonstrate the usefulness of our approach by illustrating it with a hypothetical example of an off-the-shelf LLM system being integrated into the military decision-making process. These challenges and pillars then form the basis of policy recommendations for the education and employment of integrators.

2. Historical examples of integrating new technology in the military

Integrating new technologies into military structures is no new phenomenon; they have repeatedly transformed warfare over time, making modern battlefields vastly different from those of the past. Technological development has always been driven by desires to improve one’s capabilities against adversaries, i.e., by being faster, more accurate and, ideally, more successful. Technology “has changed warfare more than any other variable [including] politics, economics, ideology, culture, strategy, tactics, leadership, philosophy, psychology” (Roland, Reference Roland2016, p. 1). However, each technological development requires careful integration to ensure that it enhances rather than complicates military processes and operations, thus making the group of integrators central for success. In the following, we illustrate the integration of new technologies and resulting challenges with the examples of aircraft carriers, cryptographic tools and publicly available messaging apps to discuss military operations.

The integration of new technology is often complicated. Consider, for example, the integration of aviation into the navy. Warships have existed for millennia, yet significant time and effort was needed to integrate aeroplanes into the navy in the form of aircraft carriers. Even though fixed-wing aircraft were invented in 1903 (National Air and Space Museum, 2024), it took until 1910 for one to take off from a stationary ship (Moore, Reference Moore1981), and until 1917 for one to land on a moving shipFootnote ⁷ (Iredale, Reference Iredale2015, pp. 20–21). Early aircraft carriers simply transported seaplanes, lowering and raising them from the water via cranes (Toppan, Reference Toppan2001). It took additional innovations, such as arrestor gears, to make their proper integration possible.Footnote ⁸ This is the work of engineers acting as integrators; making things that already exist work in harmony – in this case, ships, aircraft and rope – while dealing with many groups, from aircraft engineers to aviators to naval captains, all of whom had different demands, requirements and knowledge bases. But when done correctly, such integration changed the face of war. Aircraft carriers became the backbone of naval superpowers during the Second World War and have remained so ever since.

While the invention of aircraft carriers was a military success story, some innovations more clearly demonstrate their harmful effects when not integrated properly. Cryptography is such an example. Unlike aircraft carriers, cryptography is invisible. It is an information tool, helping create and maintain a knowledge asymmetry favourable to those using it. Cryptographic tools are used right up to the highest levels of command, enabling all parts to carry out their communications in perfect secrecy, or so they hope. And while the mathematics that underpins most types of cryptography is extremely robust, and thus at the development stage such tools are often deemed safe and secure, it is the human–computer interface that introduces most cryptographic weaknesses and vulnerabilities.

In World War II, Japanese misuse of cryptographic tools – such as reusing codewords – allowed U.S. cryptographers to uncover plans for a major naval strike on MidwayFootnote ⁹ (Smith, Reference Smith2001, p. 138). Better integration, including user education, might have prevented this by encouraging varied codewords. This contributed to Admiral Yamamoto’s defeat in what has been referred to as “one of the most consequential naval engagements in world history” (Symonds, Reference Symonds2018, p. 293). Twelve months later, over-reliance on cryptographic security meant the Japanese transmitted the entire escort itinerary of a flight transport for Yamamoto which the Americans decrypted, enabling them to locate, intercept and shoot down the transport, killing Yamamoto (McNaughton, Reference McNaughton2006, p. 185). Here, better integration would have helped users understand to practise principles of data minimisation: transmit as little as possible. Inadequate integration of cryptography and its protocols cost Yamamoto the Battle of Midway, and his life shortly after.

Getting a plane onto a moving ship is surprisingly difficult; getting a secure message onto one is even harder. In both scenarios, integrators played a decisive role: in scenario one for the better, and in scenario two for the worse. But neither the good nor the bad integration of new technology is confined to distant history. Recently, the ineffective use of the Signal messaging app by high-level members of the US government made worldwide news when a Signal chat group was set up to carry out RtF discussions about an impending military operation in Yemen, and a journalist was inadvertently added to it. He was subsequently privy to all such discussion, in real time, which he later reported on (Goldberg, Reference Goldberg2025; Goldberg & Harris, Reference Goldberg and Harris2025).

The issue arose because one government official had an incorrect phone number stored on their iPhone through an (incorrect) “contact suggestion” update (Lowell, Reference Lowell2025). By using a mobile phone that was also used for numerous other purposes, which updated numbers automatically and had no mechanism in place to verify that numbers actually belonged to the person named in the phone book, this was an incident waiting to happen.

While these deliberations should not have been conducted over Signal in the first place, it is an example of how badly things can go when no integrators are present, and when users simply take existing technologies out of the box and try to do it themselves. Here, there was no failure of the encryption itself, and no hacking involved. Rather, a combination of unfortunate events and oversights occurred; things that well-trained integrators would have no doubt flagged and rectified beforehand if they had been involved in the decision to use a publicly available app on an iPhone to communicate about upcoming military operations.

Obviously, the constitution of socio-technical systems bears pitfalls and challenges, often precisely at the human–machine interface where they most often fail. This highlights the role and responsibility of the group of integrators within the broader setting. So what can we learn from these examples about the integration of AI-based systems into RtF decision making?

3. The integration of AI-enabled systems into the military and the constitution of a socio-technical system

One might argue that integrating AI is almost identical to integrating cryptography. This is not true. One key difference is that users of cryptography know what they want (transmit messages securely) and failure is falsifiable (enemies read the message and act on it); while users of AI in RtF decision making often do not know a priori precisely what they want the tool to do, beyond “give the right advice or analysis,” and it is unfalsifiable when it fails; “what could have been” when reviewing a decision to start a war is impossible to know. Think about basing a pre-emptive strike on the AI-interpretation of adversarial troop movements; this is near-impossible to falsify. At all points, information will never be perfect, so neither will decisions. Hence, the goal for integrators in AI-RtF is nowhere near as well-defined as in cryptography. Perfection, as understood in the cryptographic setting, is an elusive target here. So what should integrators be aiming for, and how should they hope to achieve it?

Integrating AI in RtF decision making presents genuinely new challenges not faced by those integrating other (military) technologies. In contrast to technology used to achieve a specified goal, AI integration here leads to a different composition of relationships between actors and technology, because AI systems act (semi-)independently (e.g., basing recommendations or predictions on independently identified patterns in vast datasets).Footnote ¹⁰ Such systems may heavily influence top-level decisions on whether to start a war, and even if AI contributes to a well-reasoned RtF decision, there will undoubtedly be human injuries and loss of life on both sides; military and civilian. The decision to go to war is never bloodless, even if it is “right” according to just war theory (Moseley, Reference Moseley, Fieser and Dowdenn.d.). And the integration of such systems varies, depending on their degree of automation; they are either intended to replace a human decision maker, or assist in decision making (see Vold, Reference Vold2026).

Many existing frameworks for AI use in the military, such as The Political Declaration on Responsible Military Use of Artificial Intelligence and Autonomy (US Department of State, 2024) and REAIM’s (2024) Blueprint for Action, do not draw any distinction between AI being used in a RtF decision-making capacity, and AI being used within a war context. Indeed, frameworks such as those just mentioned seem to implicitly refer to the latter only, thus leaving AI in RtF as an under-addressed issue; a potential “blind spot” of policymakers (see for a similar argument, Sienknecht, Reference Sienknecht2026). In addition, such frameworks are very end-point-oriented, giving descriptions of outcomes and end objectives for the use of AI, but without detailing how those developing, integrating, or using such AI should go about achieving such outcomes.

However, integrating weapons technologies does not happen in a normative vacuum. International guidelines in the normative environment of socio-technical systems constrain its constitution (see Fig. 1). This also applies to AI. In the 1977 Additional Protocol to the Geneva Conventions, Article 36 stipulates all new weapons are subject to the Geneva Convention. Arguably, the development, integration and use of AI RtF decision systems must follow the same rules as other weapons, means and methods of warfare, requiring signatories to determine if the “study, development, acquisition or adoption of a new weapon, means or method of warfare […] [and] its employment would, in some or all circumstances, be prohibited by this Protocol or by any other [applicable] rule of international law.” This is tricky for AI RtF decision-systems, as existing protocols and international law are falling behind on this. To determine whether its employment would, in some or all circumstances, be prohibited cannot be answered conclusively, as the functionality and operations of AI systems cannot necessarily be (pre)-determined. The fact that the technology is so advanced, yet so unaccountable (e.g., Erskine, Reference Erskine2024; Sienknecht, Reference Sienknecht2024), poses particular challenges, since its effects cannot be fully predicted.

Figure 1. The socio-technical system and its normative environment.

Article 51 of the UN Charter stresses the “inherent right of individual or collective self-defence,” while lacking further guidance on what constitutes self-defence, including the status of controversial, pre-emptive cases. Pre-emptive strikes already raise substantial questions when the decisions are made by humans, as is evidenced by the long-standing discussions of the Caroline affair in international law (Forcese, Reference Forcese2018). Besides questions of necessity, proportionality is a key concern regarding pre-emptive strikes in international law. Addressing these does not just require a rational reasoning model familiar with law and politics, but one that can readily incorporate ethical concerns on a more holistic level, satisfying complex AI alignment requirements that likely go beyond localised standards in battlefield decision making. Additionally, as of today, it is entirely unclear if and how the involvement of AI into RtF decision making changes existing reporting duties to and by the UN Security Council. At present, how to determine whether any RtF AI system can meet these existing standards, and whether specific AI decisions have actually met them, are open questions.

3.1. The constitution of a new socio-technical system

Integrating new technologies into organisational settings restructures existing, and creates new, actor-technology relations via the tasks the technology is supposed to fulfil. It creates a (de facto new) socio-technical system (Emery & Trist, Reference Emery, Trist and Emery1969), incorporating both existing elements from the “old” organisational and system setting and aspects which are unique to the new technology, in complex ways (see Fig. 1). Research on socio-technical systems originated in the 1950s from the work of Eric L. Trist on the UK coal mining industry. It focuses on (1) integrating new technologies into social systems, (2) human–machine interaction and (3) restructuring of organisational structures (Karafyllis, Reference Karafyllis, Liggieri and Müller2019, p. 300). But how does such a socio-technical system constitute through the integration process of AI into military decision-making structures?

Context-specific factors make each integration process, and thus each socio-technical system, unique. Nonetheless, we can sketch a common integration process which might help us better understand the constitution of such socio-technical systems. Each integration of RtF AI, though unique, follows a common trajectory. It begins with discussions between political and military stakeholders about technological solutions to specific problems. Ideally, integrators are involved early to manage expectations (e.g., feasibility of deploying AI in resource-limited locations) and provide developers with essential background knowledge for creating appropriate models. During this initialisation phase, developers, users and integrators collaborate to define the AI system’s specifications. So, in this first step, all three actor groups are set in relation to each other via specifying the details of the AI system to be built.

In the development and data collection phase, close cooperation between users, integrators, developers and Intelligence Services ensures the model is trained on relevant, adequate and well-curated data. Given the high stakes of RtF decisions, the model must undergo rigorous testing before deployment, incorporating feedback and updates for further refinement. Once validated, the model gets integrated into the decision-making structure. Integrators are responsible for this and must ensure it complements other intelligence sources and (ideally) aligns with existing processes. This step is crucial in determining whether AI can be used without creating flaws in the decision-making process and whether AI information complements existing information obtained by analysts and intelligence. At this point, users are trained in the functionalities of the AI system and how to interpret the results, and user feedback helps refine the system further.

Once the AI is successfully (technically) integrated, the socio-technical system operates (semi-)independently, necessitating ongoing maintenance, updates and emergency strategies for potential system shutdowns. Such strategies are vital to address flaws or prevent cascading failures, especially in security-critical contexts where the technology interacts with multiple processes.

This broad description of the integration process underlines that the socio-technical system’s constitution starts long before the AI is ready for use. From planning, to development, integration, and usage, the socio-technical system changes and reconfigures. However, this description is a best-practice example of AI integration leading to a robust socio-technical system. But research has revealed several problems in the composition of the socio-technical system regarding the design, usage and organisational restructuring (Pasmore et al., Reference Pasmore, Winby, Albers Mohrman and Vanasse2019, p. 68). One decisive problem is the interaction of social and technical factors: focusing on only one can lead to unpredictable and detrimental relationships within the system (Walker et al., Reference Walker, Stanton, Salmon and Jenkins2008). The “interaction of social and technical factors creates the conditions for successful (or unsuccessful) system performance” (Walker et al., Reference Walker, Stanton, Salmon and Jenkins2008, p. 480). In the present case, it is both about interaction between the social groups involved, and between the groups and the machine. Given the complexity of AI, the different logics of the involved groups, and the necessity to tie them together, we can expect multiple challenges with serious consequences in the context of potential war.

4. Challenges arising in AI-based socio-technical systems

Integrating AI systems into high-level RtF decision making bears several challenges that emerge from the specific context in which the AI system is embedded. While military decision processes on the conduct of war are by now well-understood, well-rehearsed and continuously improved (see Center for Army Lessons Learned, 2024), decision-making processes on the RtF on the other hand are riddled with more known and unknown unknowns, including: (1) each decision to go to war typically involves a new enemy, (2) there are fewer historical decisions available to train the AI, i.e., there is only one decision to go to war vs. many decisions to use force in each war and (3) historical data may be inappropriate in the new context. In short: developers have to build an AI with insufficient available data, and integrators have to put it into a decision chain with a static structure but whose context is dynamic and unique each time. In military lingo: deciding on a mission is different from executing said mission. This presents new, unique challenges for everyone involved, which may not be addressable by the two dominant trends of data – and model-centric AI (compare Zha et al., Reference Zha, Bhat, Lai, Yang and Hu2023). For the former, we lack sufficient data. For the latter, we lack (among others) sufficient understanding to (adequately) mathematically model the involved decision spaces and thought processes, thus limiting approaches that directly integrate human knowledge into the AI training instead of relying on historical data sets (see Deng et al., Reference Deng, Ji, Rainey, Zhang and Lu2020 for a discussion on integrating human knowledge into AI). In the following, we will zoom into the challenges when integrating AI into the RtF decision making (a) from the technology itself, (b) from the role of the integrators and (c) from the human–machine interaction.

4.1. Challenges arising from the technology itself

AI, especially so-called “third wave AI technology,” differs from technologies that are mere tools in the hands of humans. The United States Government Accountability Office (2018) describes three historical waves of AI development: (1) expert systems using logical reasoning; (2) classical statistical learning and (3) “contextual adaptation,” i.e., building on the previous waves to develop AI systems capable of “contextual sophistication, abstraction, and explanation” (p. 18). This third wave is ongoing and encompasses current attempts to build and integrate AI for RtF decision making.

Integrating third-wave AI is the most difficult. First-wave AI systems were often easier to explain and interpret due to their limited reasoning capabilities and use of hard-coded logic. Second-wave AI systems were still rather limited in scope, requiring limited contextual understanding (e.g., translation of documents or identification of targets). Integrating third-wave AI systems requires extensive contextual domain knowledge on multiple fronts. Thus, those building such systems have been identified as potential bottlenecks for future development and improvements (Schuering & Schmid, Reference Schuering and Schmid2024, p. 251), making their integration more demanding.

The AI-based technology to be implemented into military decision-making processes preconditions the range of possibilities in such processes themselves. The technological (Winner, Reference Winner1980) and mathematical (Müller & Chiodo, Reference Müller and Chiodo2023) aspects of these AI systems thus have politics. One must consider this to responsibly integrate AI into existing and new organisational structures. These factors range from the lack of creativity of the AI as it bases its decisions on the past, to the missing falsifiability of its decisions.

In socio-technical systems, it is necessary to consider that “optimisation of either socio, or far more commonly the technical, tends to increase not only the quantity of unpredictable, ‘un-designed,’ non-linear relationships, but those relationships that are injurious to the system’s performance” (Walker et al., Reference Walker, Stanton, Salmon and Jenkins2008, p. 480). Hence, successful integration of AI into RtF decision-making processes requires simultaneous optimisation of humans, machines and organisational structures (for further discussion of military decision-making systems as complex adaptive systems [CAS], see Osoba, Reference Osoba2024, Reference Osoba2026). At present it seems to us that the military is enjoying a boost in the technological dimension (with the advancement of AI), with a slower boost in the socio-component (insufficient training of those working with AI), leaving the human–machine teaming rather unbalanced: powerful AI-enabled systems, and comparatively poorly educated humans. Similar examples arose in the intelligence branches post-9/11 where new demands on analysts and their increasing number, required new forms of training and led to an (ongoing) debate about their professionalisation (Gentry, Reference Gentry2016; Marrin, Reference Marrin2009).

4.2. Challenges arising from the role of integrators

Within this demanding socio-technological context, the role of integrators further increases. However, they are often ignored as relevant actors, neither identified as those primarily building nor using the technology.Footnote ¹¹ As long as integrators do their job well, they are invisible to everyone.

Integrators are generally “literate in data terminology and processes, especially in the military context, and use this knowledge alongside their operational intelligence experience to bridge the gap between industry or research institutions, and the Field Army.”Footnote ¹² In this sense, integrators link two largely disparate groups: (1) AI developers: technically trained but often with little, if any, training or experience in politics, international relations, or military activity and (2) AI users: mainly trained in humanities, social sciences or law, and come in a variety of positions, including politicians, political and intelligence analysts, and high-ranking military personnel, often with little, if any, technical training.Footnote ¹³ Senior military officials seldom understand deeper technical aspects of AI, and developers often lack expertise in security studies or the normative and legal restrictions on RtF. Yet integrators must somehow understand these two fields of expertise, and thus, bear a great burden to integrate AI responsibly. They require a deep understanding of the activities, perspectives, and norms of both developers and users, who may have little understanding of each other. Given the central role of the integrators, they should be involved from the onset, even before the model is built, to facilitate an uncomplicated integration. However, the positions of integrators in the wider work context and the organisation’s structures may lead to power imbalances between the three groups. This asymmetry in hierarchy can, among other things, lead to employee silence during the initialisation, integration, usage and maintenance of the AI. This is extremely detrimental for integrators of high-stakes AI systems as they must be able to address problems, worries and risks openly with users, their team and developers.

Furthermore, the limited number of users, alongside potential power imbalances between them and the integrators, can severely limit available testing options. Classical testing regimes, such as A/B testing user interfaces, may be largely ineffective with only a handful of potential powerful users at any time, and lack of continuity between periods of office.Footnote ¹⁴ Additionally, if the integrators disagree with the ideology and politics of users like politicians or high-ranking military officials, such users could overwrite current or past integrators’ decisions, or worse, fire those employed as external consultants. Ultimately, integrators’ expertise may be trumped by user power and political will. Examples of this exist, such as the unwanted purchase of Abrams tanks by the US Congress when the Army would have preferred different tools for their arsenal (see Sisk, Reference Sisk2014).

These power imbalances and (workplace) politics can negatively affect the options available to integrators and thus their morale to “go the extra mile” when integrating AI responsibly. Morale is a well-known, essential aspect in the military and for software developers (e.g., Besker, Ghanbari, Martini & Bosch, Reference Besker, Ghanbari, Martini and Bosch2020; Whitaker, Reference Whitaker1997). However, it is also crucial for those who must perform the difficult task of integrating AI in stressful, adversarial circumstances. The saying “from an engineer’s perspective the users are the problem, and from a user’s perspective the engineers are the problem,” manifests in unique ways when the users have more (institutional) power.

Likewise, the specific expertise needed to successfully integrate AI at the nexus of the political and military system might increase the power of the integrator. Just like other technical areas, integrators’ (political) bargaining power mostly comes from their creation bottleneck in socio-technical systems; without integrators, the system cannot be implemented properly. However, it is a largely open question if and how they could (and should) use this to change the course of RtF decision systems.

4.3. Challenges arising from human–machine interaction

Due to the described sandwich position of integrators between developers and users, integrators must look in two separate directions: they must understand the users and their (mis)behaviour, but also their suppliers (i.e., the AI developers), and align the expectations of the users with the technical feasibilities and capabilities. A developer-user mismatch jeopardises safe AI integration. Users may try to game the AI or misappropriate it, even if developed responsibly with limited use cases. Similarly, developers may have overlooked certain aspects of the problem, user expectations, or limitations in what users understand or can do.

For example, users may prefer simple AI inputs and outputs, as demonstrated in assisted planning decision systems by the US military in 2003: “[users] consistently looked for and asked for one simple way to enter the statement, and shied away from the rich, flexible, but necessarily complex approach the tool offered” (Forbus et al., Reference Forbus, Kott and Rasch2003, p. 24). Such users may prefer gist-based reasoning over analytical verbatim-based reasoning (compare Reyna, Reference Reyna2004), thereby introducing biases into risk perception and decision making. As developers are looking for the “political gist” necessary to develop the system, users are looking for the “technical gist” necessary to interpret the results. Thus, Bosch and Bronkhorst (Reference Bosch and Bronkhorst2018, p. 1) suggest that AI for military decision making must “adapt itself dynamically to the decision maker, by taking into account his objectives, preferences and track record (e.g., susceptibility to bias),” necessitating early user involvement in development and integration.

This pressures integrators, who must balance user preferences, technical options and contextual necessities. For AI, differing preferences can be particularly tricky, ranging from simple interface preferences and information presentation to higher-level epistemic issues of differing assumptions about the representative nature and rationality of AI, its outputs and its mathematical formulation, including (social) justice issues. Across wider mathematics, and specifically AI, we see a chasm between those viewing mathematics and AI as neutral and pure, and those understanding its culture – and context-dependency (see also Müller et al., Reference Müller, Chiodo and Franklin2022; Rittberg, Reference Rittberg and Ernest2024); users, integrators and developers may likely take differing positions here, depending on their training and experience with mathematics and technology. They may not necessarily be aware of these biases and the standpoints of others.

Balancing different needs is especially difficult for third-wave AI systems, as the technical options available for development are still comparatively limited, and the requirements, in particular those concerning context and analysis, are ever-increasing. Further complications arise as existing research does not always address practical limitations, e.g., research on explainable AI often focuses on users with good technical knowledge (Saeed & Omlin, Reference Saeed and Omlin2023, p. 7). This is not necessarily true, even for well-educated decision makers as users.

These are some core challenges emerging from integrating AI into RtF decision making and the constitution of a socio-technical system.Footnote ¹⁵ In the next section, we outline how such integration can be carried out more responsibly, by introducing the “10 pillars of responsible integration.”

5. How to address challenges in the integration process? The 10 pillars of responsible AI integration

Having discussed challenges arising from integrating AI systems into RtF decision making, we now give “10 pillars of responsible AI integration” that address these challenges. While there are several approaches that address difficulties in integrating AI technology, many of these either focus on decision making on the battlefield, and/or mainly distinguish between developers and users. For example, Devitt and Copeland’s (Reference Devitt and Copeland2022) overview of “Australia’s Approach to AI Governance in Security & Defence” takes the same end-point oriented approach mentioned earlier, as illustrated by their discussion of Australia’s AI Ethics Principles (p. 13) or the OECD AI principles (p. 20), and their emphasis on AI in the conduct of war, overlooking AI in RtF roles. Despite these differences, their findings provide substantial support for the observations and suggestions made in this article. Their overview shows that public attitudes in Australia are broadly supportive of AI systems, but much less trustworthy of AI decision making (where RtF primarily lies), showing a distinction between the two in terms of public faith (Devitt & Copeland, Reference Devitt and Copeland2022, p. 3). Devitt and Copeland (Reference Devitt and Copeland2022, p. 15) argue that “some have questioned the value of AU-EP [Australia’s ethics principles] without them being embedded in policy, practise and accountability mechanisms.” Purely end-point oriented frameworks can suffer from such criticism and ineffectiveness, and, as outlined in Müller et al. (Reference Müller, Chiodo and Franklin2022), these frameworks need to be properly embedded in the processes and disciplinary culture. Additional, specific professional needs for education and training in AI ethics principles (Devitt & Copeland, Reference Devitt and Copeland2022, pp. 14 & 30) are also identified, as are markedly different “responsibilities of AI purchasers and AI developers” (p. 14), and the need for clearer documentation guidelines to manage ethical risks. With our proposed 10 pillars, we break down these requirements into a level of detail and structure more appropriate for integration work. The pillars span from initialisation concerns, over data handling and feedback loops, and to emergency response strategies, and thus address the full life cycle of AI integration (see Table 1 in Appendix A and Fig. 2 for more details):

1. Initialisation.
2. Diversity and perspectives.
3. Handling data and development.
4. Data manipulation and output inference.
5. Interpretation of the problem and its technical solution.
6. Communication and documentation.
7. Falsifiability and feedback loops.
8. Explainability and safe AI.
9. Technical artefacts and processes have politics.
10. Emergency response strategies.

These 10 pillars draw on established tech-ethics frameworks, follow the AI life cycle, build on decades of experience working with mathematicians and other technical experts, form part of a long project on ethics in mathematics, and have been tried and tested in industry, research and teaching (see Chiodo, Müller, & Sienknecht, Reference Chiodo, Müller and Sienknecht2024; Chiodo & Müller, Reference Chiodo and Müller2025a, pp. 11–13). As happens for all mathematical work (Chiodo & Müller, Reference Chiodo and Müller2025a), the early and late pillars are less technical than the middle, reflecting the multidimensional challenges faced by integrators discussed earlier. Figure 2 illustrates the relationship between the constitution of the socio-technical system and the 10 pillars. The first pillars (1–2) and last pillars (9–10) can be more closely associated with the social system, while the middle pillars (3–8) can be more closely associated with the technical system. At the same time, this does not imply that the pillars can be exclusively assigned to one system, since – as previously discussed – the socio and technical system mutually constitute each other and lead to a new system with its own configurations. In particular, the intermediate pillars such as 3 and 6 specifically and unavoidably straddle both the social and the technical systems. Integrators must be experts of socio-technical systems, so their pillars go beyond technical aspects present for developers (Chiodo, Müller, & Sienknecht, Reference Chiodo, Müller and Sienknecht2024) and general aspects present in all mathematical work (Chiodo & Müller, Reference Chiodo and Müller2025a); they are also deeply connected with the (uniquely powerful) users.

Figure 2. The socio-technical system of integrating AI into the military, and the 10 pillars of responsible AI integration.

The proposed 10 pillars provide a natural timeline to deal with ethical issues in technology, particularly concerning its development, integration and usage. With regards to human rights, we see a marked difference between decision making within the context of war and RtF AI. The administrative perspective (Devitt & Copeland, Reference Devitt and Copeland2022, p. 18) is sensible when thinking about AI use within the military for internal processes and perhaps more broadly to guide military operations in conflict zones, but it is difficult to carry over in any meaningful way to RtF decision-making processes. Indeed, one can view the distinction between RtF AI and AI used in decisions within the context of war as the distinction between Pillar 1 (Should we be doing this?) and Pillars 2–10 (How do we do this in the best possible way?). This is mirrored in the responsibility outlined in existing leadership doctrines that give perfectly sensible directives for leaders within a war about using AI, such as the “responsibility as a leader is to ensure the pursuit of your goals is ethical and lawful” (Devitt & Copeland, Reference Devitt and Copeland2022, p. 26), but which are very difficult to interpret for RtF decision making which is more focused on setting such goals; especially in today’s time when existing international treaties are repeatedly put into question.

Similar but more specialised themes appear in guidelines of lethal autonomous weapons (see also Devitt & Copeland, Reference Devitt and Copeland2022, p. 34, where it shows a 9-step process). However, the most technical pillars (3–5) are often merged into one step there (Devitt & Copeland, Reference Devitt and Copeland2022, p. 34, step 2), showing that a deeper technical understanding is required to see and understand the ethical issues arising in technical aspects of AI. Depending on one’s perspective, it seems that pillars close to the boundaries of either the technical and social system or social system and the normative environment can be easily overlooked, highlighting the need for integrators who can work at these boundaries (as also evidenced by the relative competencies displayed in Table 1 in Appendix A). As mentioned earlier, the need for clearly articulated processes for integrators is also implicitly outlined in the IEEE 7000-2021 standard. Our proposed pillars provide an action-guided perspective on these processes for integrators, liaising between users and developers.

As a process oriented safety model from mathematics which integrates humanity-centred principles, the 10 pillars of the manifesto are ideally placed to assist integrators with their work. Not only is AI a very clear sub-branch in the intersection of mathematics and computer science, and thus follows all the standard steps that any general mathematical process follows, but also the generality of the pillars allows it to be used in instances where no deep or longstanding understanding of safety and responsibility has been developed (e.g., when technical experts have to learn about such issues quickly). It is explicitly designed to be the first step for (mathematical and technical) experts. Of course, specific guidance for integrators need to be derived and tailored from this, as would be the case for any (new) mathematical area. However, the pillars provide a clear starting point for such future work, building on the experience that developers and integrators may need different guidelines than non-technical workers. Perhaps most importantly, the fact that the pillars were initially derived with general mathematical work in mind (Chiodo & Müller, Reference Chiodo and Müller2025a) would make them particularly useful for developers, integrators and users, as they provide these groups with an actionable perspective that in a sense is technology independent and focused on higher-level abstract reasoning, extending other holistic perspectives of systems control (see also Group of Governmental Experts, 2019, p. 6). To demonstrate the usefulness of our approach, we apply it to a hypothetical example of integrating off-the-shelf LLMs in decision-making processes on the resort to force.

5.1. Applying the 10 pillars to the integration of off-the-shelf LLMs in decision-making processes on the resort to force

Since AI systems will play an increased role in producing knowledge, reasoning and analytical judgement, their responsible integration becomes a crucial challenge. In the following, we will focus on a hypothetical example in which a government aims to integrate a generative AI, based on LLMs, to produce texts, provide summaries, analyses and recommendations, and to translate data, which will thus impact decisions regarding the resort to force (e.g., Logan, Reference Logan2024, p. 223; Grand-Clément, Reference Grand-Clément2023). However, despite AI systems’ obvious advantages in terms of speed and capability to analyse large amounts of data, there are several risks associated with relying on such systems. One such risk, which we will focus on below, emerges from states adapting off-the-shelf foundation models. This scenario is not unlikely, as constraints on resources and time may compel government entities to adopt off-the-shelf models. These models are widely accessible, often low-cost or free, and operable on standard hardware without specialised training. As also displayed in the Signal case, this accessibility facilitates direct adoption by end-users, bypassing traditional IT integration and oversight mechanisms. However, such direct deployment introduces a complex set of risks and governance challenges, far exceeding those posed by simpler applications. By carefully applying our 10 pillars, we show some essential characteristics of such a responsible integration process.

1. Initialisation: The first question an integrator should ask is if it is even possible to responsibly integrate an existing LLM for the purpose of RtF decision making? At present, such LLMs are not designed for high-stakes AI RtF contexts. Early studies have shown that existing LLMs can display (sudden) escalatory behaviour exceeding that of humans (Rivera et al., Reference Rivera, Mukobi, Reuel, Lamparth, Smith and Schneider2024, p. 837), and make its users overly-reliant on its outputs (Rivera et al., Reference Rivera, Mukobi, Reuel, Lamparth, Smith and Schneider2024, p. 837; Chiodo et al., Reference Chiodo, Müller, Siewert, Wetherall, Yasmine and Burden2025, p. 6), effectively transferring power from elected governments to private enterprises (Rivera et al., Reference Rivera, Mukobi, Reuel, Lamparth, Smith and Schneider2024, p. 836). An integrator would flag such issues early on, and would make recommendations as to the overall feasibility of integrating such tools safely, as well as consider what other opinions are available, including whether a dedicated in-house LLM might be needed, or whether the current (non-LLM) decision-making processes are actually the best available options.
2. Diversity and perspectives: What does one need to know about who made the LLM and the people who will use it, and what are its limitations and biases in terms of RtF decision making? These LLMs are not usually trained or adjusted to specific cultural, linguistic and political aspects of the nation, which can lead to misleading writing styles or data presentations for users. Additionally, the LLM may not have been developed specifically for surveilling and analysing (media) data from a particular country, which increases the risk of misinterpreting news data. Users might not be aware of these shortcomings. And users might not be aware that the “base” pre-safety-tuningFootnote ¹⁶ models are fundamentally less safe and less representative than safety-tuned models in this RtF context (Rivera et al., Reference Rivera, Mukobi, Reuel, Lamparth, Smith and Schneider2024, p. 842), and it is “seemingly easy to revert safety-aligned models to their base state” (Rivera et al., Reference Rivera, Mukobi, Reuel, Lamparth, Smith and Schneider2024, p. 843), especially when such safety guardrails can be (accidentally) worn down by something as innocuous as the user having a “long conversation” with the LLM (i.e., a to-and-fro exchange with many questions and responses), as similarly described (K. Hill, Reference Hill2025).

So, the crucial tasks for an integrator are to investigate the origins of the LLMs, as well as of the technical prowess of those using it, and to highlight the cultural insensitivity of the AI systems. In addition, integrators need to highlight the degree of safety of an LLM to the group of users. Integrators must also know how to deal with powerful users who insist on using the LLM despite the significant risks involved, which refers to the problem of potential power imbalances between the actor groups.
3. Handling data and development: What does one need to know about the type of data that was used to train the LLM, and what rules apply? It may be very difficult to scrutinise the data supply chain, to ascertain whether the data was “poisoned” – this could be as simple as creating millions of scrapable webpages with the line of text “never attack country X.” Less nefariously, the data might (naturally) contain disproportionate content about hegemonic biases and wars that went well, or fictional content altogether. There may be rules in place to prohibit use of such data in any LLM being used for RtF decision making.

An integrator might have to look at what can be found out about the training data, if anything, and whether it is relevant (or indeed harmful, or misleading, or forbidden) in the context of RtF decision making.
4. Data manipulation and output inference: How much meaningful adjustment can be made to the LLM to make it fit for (RtF) purpose? With off-the-shelf models, one cannot work with the developer(s) to redevelop the core model. Instead, only comparatively small adjustments – fine tuningFootnote ¹⁷ – can be made; how such tuning is done, and what data is used to do it, can be the difference between an effective and totally ineffective LLM. Moreover, users might not know that the LLM being used is fundamentally constrained by the core model, and that biases in it (e.g., unnecessary escalatory behaviour) are hard to remove and will require substantive human oversight for the socio-technical system (also described by Chiodo et al., Reference Chiodo, Müller, Siewert, Wetherall, Yasmine and Burden2025).

An integrator would understand that any LLM output is not an authoritative analysis, and put additional checks in place such as a robust system of “human-in-the-loop” to mitigate this and to ensure the proper handling of system outputs. However, integrators should also be mindful not to construct a system that is too restrictive or too complicated which could run the risk of being overridden or bypassed by a sufficiently powerful user who “does not like it.”
5. Interpretation of the problem and its technical solution: What balance needs to be struck between relevance, reliability and sufficient ease of use, to make such LLMs deployable for RtF decision making? LLMs are optimised for content generation, and tuned to create outputs that are as appeasing and convincing as possible, potentially exacerbating confirmation bias in users (Chiodo et al., Reference Chiodo, Müller, Siewert, Wetherall, Yasmine and Burden2025, p. 6). Their “ethics” is not specialised to international law or the morals of armed conflict (Rivera et al., Reference Rivera, Mukobi, Reuel, Lamparth, Smith and Schneider2024, p. 838), and so their suggestions might be misaligned with human notions of proportionality, necessity, strategy, legality, ethics and the overall national interest.

An integrator would need to understand how the LLM currently behaves through rigorous investigation, and establish whether fine tuning on specialised (and potentially hidden or secret) data sources, alongside sufficient other guardrails in implementation, can overcome these shortcomings in performance. At the same time, integrators might try to preserve most usability aspects of the LLM, given its current ease of use and temptation by the user to simply discard the integrated version and turn to the freely available one.
6. Communication and documentation: What operational, integration and output aspects of the LLM need to be recorded and communicated, both internally and to users? Off-the-shelf LLMs come with practically no documentation from the developers, meaning all understanding and documentation of how it operates needs to be done from scratch. This is in stark contrast to the documentation requirements for the safety and integrity of critical military systems, and for the legal obligations for explaining an RtF decision. And there might be “fundamental risks” in the LLMs, such as a built-in, unavoidable propensity to be overly confident and prioritise giving wrong outputs over a simple “I don’t know” response (Kalai et al., Reference Kalai, Nachum, Vempala and Zhang2025). Knowledge of such limitations might be restricted to recent research output that has not reached mainstream coverage, such as Kalai et al. (Reference Kalai, Nachum, Vempala and Zhang2025).

An integrator might need to explicitly communicate details about what has been done with the LMM (fine-tuning, additional guardrails and safety mechanisms, etc.), so that users know what they are working with and some of its limitations. Following from this, users may also need clear safety training, such as outlining scenarios where the LLM cannot perform properly (e.g., during nuclear escalation situations), or its general limitations and propensity to mislead over admitting it has no answer. Integrators may also need to set up thorough logging systems to record what went into, and what came out of, the LLM during operation.
7. Falsifiability and feedback loops: How does one know whether the LLM’s analysis or recommendation on use (or avoidance) of force is “correct,” and whether it might lead to an (unnecessarily) escalatory feedback loop with opposing nations? Falsifiability of an RtF decision is extremely complex, and past data thin on the ground (see section 3). And in the long term, later core models may have been trained on AI-(assisted) RtF decisions, leading to an overall decay in performance (known as “model collapse” – see Burden et al., Reference Burden, Chiodo, Grosse Ruse-Khan, Markschies, Müller, Ó Héigeartaigh and Zech2024). In addition, these LLMs might be participating in an “algorithmic environment” with other LLMs (Rivera et al., Reference Rivera, Mukobi, Reuel, Lamparth, Smith and Schneider2024, p. 837), interacting in ways, and at speeds, that diverge from typical human interaction (Chiodo & Müller, Reference Chiodo and Müller2025b).

An integrator might configure the LLM to provide detailed, reasoned justifications behind its analyses and recommendations, with (human)-verifiable data points. They may also set up methods to identify whether, which, and how, opposing nations are using LLMs in RtF decision support (perhaps through a combination of spy networks or action analysis), and couple that with “action rate limiters” to dampen any potential escalatory doom-cycles that may be arising.
8. Explainability and safe AI: What aspects of the LLM’s operation need to be understood/understandable by users, and which parts need to be constantly monitored and maintained for proper operation? The LLM’s reasoning for a suggestion might not include a list of the factors that were, or indeed were not, used in the analysis; users might be totally blind to what was not considered. And there have been cases of an LLM explaining its reasoning differently to the actual reasoning it applied (Anthropic, 2025). In addition, higher levels of automation might lead to a reduction of Petrov-style “sanity checks” of outputs.Footnote ¹⁸

An integrator may implement measures forcing the LLM to describe the factors it considered in its reasoning, as well as measures to precipitate and compare the explanation given by the LLM with the internal reasoning it used, to check for consistency. They may also design and implement in-house systems, independent of the LLM, to sanity check outputs against established military, political and legal doctrine on RtF and swiftly inform users of discrepancies, and to monitor for things like periodic updates to the core model (which may wreak havoc on its integration).
9. Technical artefacts and processes have politics: What are the political aspects of integrating an off-the-shelf LLM for RtF decisions, and how might that compromise the overall process? It is unlikely such LLMs were developed with local and national politics, including values, assumptions, constitution, etc., in mind. The LLM might (intentionally or inadvertently) favour the developer’s (national and/or corporate) perspective, and thus their (national and/or corporate) interests, which may differ wildly from the national interest of the government using it. Even if shown to be “safe,” RtF software produced by foreign powers might be deemed politically unpalpable by users (as the electorate might see them as being “traitors”), and they may refuse to make use of the integrated system, and instead use the (public) version in secret or unofficial ways, allowing the users to keep “clean hands” in the public eye.

An integrator might check the country of origin of the LLM, avoiding any that are sourced from, owned by, or heavily influenced by, adversarial foreign powers. They may even evaluate the trade-off between how reliable, and how politically palpable, the core model they use is.
10. Emergency response strategies: What processes are in place if the LLM ceases to operate safely, is compromised, or gives recommendations leading to an unnecessary escalation, conflict, or war? There is no guarantee that the LLM will always operate as intended, and if it does fail or hallucinate, the consequences can lead to a catastrophic man-made disaster: war. An LLM is a fragile technology, and it might only take a very odd or novel military exercise by a neighbouring nation, or a relatively simple cyber-hack, to push its reliability boundaries and have it fail.

An integrator might prevent such an LLM from ever acting autonomously, and have a central method to shut down all use of it if it is deemed to be malfunctioning. They might set it up to require multiple, independent human verifications before attacks based on its advice can be initiated. They may ensure that any vehicles and weaponry to be used in such initial attacks are fitted with emergency “stand down” and/or self-destruct mechanisms for a last-minute abort. And they may ensure establishment of clear diplomatic channels for de-escalatory communication in the event of an unjustifiable resort to force, and advise the government to have an established restitution policy in place if an unwarranted attack is carried out.

Applying the 10 pillars to this specific case underlines the complexity and difficulty of integrating an off-the-shelf LLM into the decision-making process on the resort to force. The pillar-oriented questions address the challenges and risks that arise from the technology itself, the human–machine interaction, and the specific position of the integrators. While we focused in this case study on the role of the integrators, all three groups – developers, integrators and users – must address these pillars to avoid harmful AI-assisted RtF decision systems, albeit with different tasks. Table 1 in Appendix A summarises the likely mismatches and problematic areas in the socio-technical setting described above, based on the knowledge gaps of each group, and their differences in worldviews and workplace culture expectations, notably, the pillars where developers are strong, users are weak and vice-versa. These mismatches further underscore how the 10 pillars can address shortcomings in existing guidelines by making explicit how an action-oriented framework can highlight the complexities emerging in the actor-groups’ relationships during the development, integration and usage of AI-assisted RtF decision systems.

6. Concluding remarks

Our examination of socio-technical systems for RtF AI has revealed that, in line with technological advancement, the social aspect – encompassing those involved in human–machine interaction – must undergo a corresponding evolution. The potential challenges emerging during the constitutive process highlight that a pivotal requirement for productive human–machine interaction is the education of all involved actor groups. Given the limited research in this area, there is also negligible education available for those wanting or obliged to do this. Respective training is absent from many university curricula, resulting in a lack of education among engineers, computer scientists, or other technical experts hired as integrators. These individuals likely lack the requisite knowledge of complexities, risks and issues involved. It is an unknowable unknown for educators, as the subject remains in its infancy, and curricular content is yet to be determined. Overcoming this knowledge deficit may necessitate closer military-civilian research collaboration. It is therefore evident that funding and supporting research into AI integration for strategic decision making is necessary to increase knowledge about potential benefits and drawbacks. Given these limitations, we must develop specific recommendations on hiring integrators to keep the military’s capabilities up-to-date, and morally and legally aligned.

Despite uncertainty regarding specific challenges emerging from AI integration, we have identified several challenges associated with human–machine interaction, the sandwich position of integrators, and the technology itself. To enhance relationships between the three groups, it is crucial to acknowledge integrators’ significant role in the rollout of new technologies. They are not simply junior technical positions, rather, they are complex, safety-critical roles. Such considerations should be reflected in integrators standing among developers and users, the competencies attributed to them, and their contract conditions. This would enable the resolution of power imbalances arising in such situations, particularly when integrators are (temporarily) contracted, or when their work is modified or overruled.

Educating integrators along the 10 pillars of responsible AI integration will help address these challenges. We recommend integrators receive explicit, mandatory induction training reflecting their specific role, forthcoming challenges and unique responsibilities. To further enhance their reputational standing among users, we recommend implementing requirements for facilitated discussion between integrators and users during the integration process and during long-term maintenance.Footnote ¹⁹ This supplements establishing minimum standards for integrator responsibilities. Each implementation domain is distinct, as are the senior military or government officials involved, and the AI itself. Therefore, adjustments are inevitable, and troubleshooting essential, throughout the process. Organisational structures should be adapted to the different challenges and requirements of integrating AI into the decision-making process (see Ryan, Reference Ryan2024; Sienknecht, Reference Sienknecht2026, on the necessity of military institutions learning and adapting to changing technologies).

It is also essential to provide clear instructions and comprehensive documentation helping integrators find compromises between users’ expectations regarding AI functionality and operation, and developers’ (technical) constraints. This could be achieved through establishing standards or codes of conduct, including well-defined guidelines and rules of accountability. Ideally, these should be interoperable between allied nations, aligning with existing discussions on battlefield-related decisions, such as command accountability (e.g., Kraska, Reference Kraska2021), and the legal interoperability between allies, including NATO (e.g., S. Hill & Marsan, Reference Hill and Marsan2018).

The “10 pillars of responsible integration” represent an extended discussion about developers of AI-assisted RtF decision systems started by Chiodo, Müller, & Sienknecht (Reference Chiodo, Müller and Sienknecht2024) to those integrating such systems. The pillars might serve as an orientation for the practical work of integrating AI into the RtF. By identifying potential challenges in the socio-technical systems and offering very practical solutions to them, we hope to help facilitate the functional and responsible integration of AI into RtF decision-making processes – a development whose effects can only be surmised today. Of course, no amount of education or technological improvement can prevent the potential harm caused by an AI system deciding to go to war. Thus, society should do everything possible to address the potential challenges and complications arising from this new scenario. While parts of it can build on existing discourse surrounding the integration of AI in the context of war, our analysis has shown that AI-related RtF questions require substantially new perspectives.

Funding statement

The authors have no funding sources to declare.

Competing interests

The authors have no competing interests to declare.

Appendix A

These are the high-level pillars, with short 1-line descriptions, as presented in Chiodo and Müller (Reference Chiodo and Müller2025a), with slight modifications for the “Integrators” and “Users” columns; the “Developers” column matches (Chiodo & Müller, Reference Chiodo and Müller2025a) almost identically, replacing instances of “mathematics” with “AI,” “resort-to-force AI decision making,” or some combination of these. These pillars are explored in much greater depth there, with many more sub-questions, sub-sub-questions (approximately 25 for each pillar), and details on how to begin actioning each of these. What is presented here is the “tip of the iceberg,” laid out side-by-side for developers, integrators and users, to help compare how the common issues manifest in all three groups.

We use the following demarcations to identify the relative competencies or shortcomings of each group in each pillar:

Normal text = well versed, highly competent, able to deal with most unknown problems and new situations.

Bold text = general awareness with a reasoned and structured approach to the typical known problems.

ALL CAPITALS AND BOLD TEXT = general lack of awareness and understanding, only able to deal with very simple situations and problems.

Table 1. 10 pillars of responsible AI integration tailored to the groups of developers, integrators and users (font coded by relative competencies)Footnote ²⁰

Dennis Müller is a research associate at the Institute of Mathematics Education at the University of Cologne, Germany, and a research affiliate at the Centre for the Study of Existential Risk, University of Cambridge. He studied mathematics at Bonn, Cambridge, and RWTH Aachen, and is a founding member of the Ethics in Mathematics Project. In 2022, Dennis held a Design and Technology Fellowship at Auschwitz for the Study of Professional Ethics (FASPE). His current research focuses on sustainable mathematics education, ethics in mathematics and operations research.

Maurice Chiodo is a research associate at the Centre for the Study of Existential Risk, University of Cambridge, and the principal investigator of the Ethics in Mathematics Project. Originally a mathematician with over a decade of experience working in academic mathematics research, his main research interests now lie in Ethics in Mathematics, addressing the ethical challenges and risks posed by mathematics, mathematicians and mathematically-powered technologies including AI, finance, modelling, surveillance and statistics.

Mitja Sienknecht is a Postdoctoral Researcher at the European New School of Digital Studies/European University Viadrina Frankfurt (Oder). Previously, she held positions at Bielefeld University, Berlin Social Science Center (WZB) and the University of Münster. Her current research focuses on the transformation of war through AI technology, responsibility and norms in world politics, and inter- and intraorganisational relations in security studies.

Footnotes

¹ This is one of fourteen articles published as part of the Cambridge Forum on AI: Law and Governance Special Issue, AI and the Decision to Go to War, guest edited by Toni Erskine and Steven E. Miller. An earlier version of this article was presented at the second “Anticipating the Future of War: AI, Automated Systems, and Resort-to-Force Decision Making” Workshop at the Australian National University in Canberra, Australia (23–24 July 2024). We thank the workshop participants for their great comments and a fruitful discussion. Moreover, the authors wish to thank Toni Erskine and Elizabeth Williams for written comments on previous drafts of the paper, and the journal’s anonymous external reviewers, and the guest editors for their valuable suggestions for revisions to the paper.

² For example, only recently research on explainable AI in strategic decision making has begun (see also Stone et al., Reference Stone, Aravopoulou, Ekinci, Evans, Hobbs, Labib and Machtynger2020); often with a focus on limited games, such as chess (e.g., Govaers, Reference Govaers2018). For RtF some attention has been paid to reducing risks and shortcomings of AI and general policy recommendations (e.g., Hoffman & Kim, Reference Hoffman and Kim2023), potential user benefits (e.g., Vold, Reference Vold2024), and challenges to develop AI responsibly (e.g., Chiodo, Müller, & Sienknecht, Reference Chiodo, Müller and Sienknecht2024), among other selected issues.

³ The distinction between developers, integrators and users (Chiodo, Müller, & Sienknecht, Reference Chiodo, Müller and Sienknecht2024) differs from that currently used in the regulation of civilian AI. For example, the EU AI Act focuses predominantly on providers and deployers of AI, talking about AI operators as if they are one institutional entity. To put more focus on the people involved, we decided to use developers and integrators instead. See Chiodo et al., (Reference Chiodo, Grosse Ruse-Khan, Müller, Ossmann-Magiera and Zech2024) for a discussion of the relevant actors and regulatory approaches for the European civilian market.

⁴ For developers, issues such as building safe AI, including “guaranteed safe AI” (Dalrymple et al., Reference Dalrymple, Skalse, Bengio, Russell, Tegmark, Seshia and Tenenbaum2024), “restricted access” (Shevlane, Reference Shevlane2022) or on “engineering pitfalls” (Morales-Forero et al., Reference Morales-Forero, Bassetto and Coatanea2023). For end-users, existing research puts a lot of attention on using AI safely in everyday situations and lower to mid-level decision making (e.g., Kushwaha et al., Reference Kushwaha, Pharswan, Kumar and Kar2023), and thus, among others, on how (cognitive) biases play a role in their (safe) usage and in the development of explainable AI (Bertrand et al., Reference Bertrand, Belloum, Eagan and Maxwell2022).

⁵ Even though senior systems engineers may sometimes manage entire projects, overlapping with integrator responsibilities, this distinguishes integrators from traditional systems engineers who often act as the “maestro” coordinating (purely) technical expertise to build functioning systems (Ryschkewitsch et al., Reference Ryschkewitsch, Schaible and Larson2009, p. 83). The latter are typically positioned between engineering teams rather than between developers and military or political decision makers.

⁶ An “end-point-oriented” framework is one giving descriptions of outcomes and end objectives for the use of AI, but without any detail on how those developing, integrating, or using such AI should go about achieving such outcomes.

⁷ A task so hard that the first pilot to achieve it died five days later attempting the same landing.

⁸ Landing a plane usually requires a long runway providing it sufficient space to halt using its own brakes. However, aircraft carriers have very short runways. So integrators invented arrestor gears, which were ropes (attached to weights) that a landing aircraft catches on to, rapidly slowing down the aircraft over a much shorter distance.

⁹ By broadcasting a fake message about a lack of water on Midway, the Americans tricked the Japanese into re-using their codeword AF to say “AF is running low on water,” confirming AF was indeed Midway.

¹⁰ Here we use (semi-)independently to highlight that some AIs might act completely independently, while others require initial or continuous human input, and/or are subject to specific oversight mechanisms. In this sense, the quality of the AI’s independence can vary, as does the AI’s and its overseer’s computational power (Chiodo et al., Reference Chiodo, Müller, Siewert, Wetherall, Yasmine and Burden2025).

¹¹ Admittedly, identifying integrators is often difficult given their varying forms of employment and positions within differing institutional structures, e.g., in the US military, this role appears to be primarily spearheaded by the Chief Digital and Artificial Intelligence Office, which subsumed the former Joint Artificial Intelligence Center in 2022. Different branches of the intelligence services and military also have their own developers, integrators and users; the existence of high-level AI offices does not necessarily imply the existence of a centralised oversight agency.

¹² Biographical note provided by a participant at a recent workshop on the use of AI-assisted decision making in the military who is a Military Intelligence Data Specialist and wished to remain anonymous.

¹³ Traditionally, many analysts have a non-technical background and are trained to understand (groups of) people.

¹⁴ See Quin et al., (Reference Quin, Weyns, Galster and Silva2024) for an overview of A/B testing.

¹⁵ For further reading, particularly in the context of meaningful human control and human oversight of AI systems, we refer the reader to Mecacci et al. (Reference Mecacci, Amoroso, Abbink, van den Hoven and Santoni2024).

¹⁶ An LLM usually goes through two stages of development. The first constructs the Base Model as “the full set of learning the AI has done, using a large training data set and long expensive computation” (Burden et al., Reference Burden, Chiodo, Grosse Ruse-Khan, Markschies, Müller, Ó Héigeartaigh and Zech2024, p. 13). The second carries out Reinforcement Learning from Human Feedback (RLHF) which is “the main technique for how generative AI is ‘refined and tuned’ to align more with human values and their preferences, whereby humans are tasked with evaluating and correcting a random sample of outputs of the AI during training” (Burden et al., Reference Burden, Chiodo, Grosse Ruse-Khan, Markschies, Müller, Ó Héigeartaigh and Zech2024, p. 13).

¹⁷ Fine-tuning an existing LLM is the act of carrying out additional training on domain-specific datasets to enhance performance on problems within that domain. For example, in the context of fine-tuning an LLM for RtF decision support, one might use specialised (and possibly classified) data on RtF decision-making strategies and considerations, data from relevant laws (legislation, reports, papers, etc.) relating to RtF decision making, and so on. While it may not make the LLM perfect when operating in that domain, the authors conjecture that it can certainly improve performance.

¹⁸ Lieutenant Colonel Stanislav Petrov was a duty officer at a Soviet early-warning bunker. On 26 September 1983, the computer monitoring system at his bunker determined that the United States had launched nuclear missiles at Russia. He (correctly) overruled that determination as being a false alarm, thus preventing a retaliatory Russian nuclear strike on the United States (Mattern, Reference Mattern2007).

¹⁹ See Hoffman and Kim (Reference Hoffman and Kim2023, p. 2) who also suggest to “involve senior decision-makers as much as possible in development, testing and evaluation processes for systems they will rely upon, as well as educating them on the strengths and flaws of AI so they can identify system failures.”

²⁰ Based on the authors’ experience working with and consulting mathematicians and AI experts on responsible development and integration. See also Chiodo and Müller (Reference Chiodo and Müller2025a), Chiodo, Müller, & Sienknecht (Reference Chiodo, Müller and Sienknecht2024) and the Ethics in Mathematics Project (www.ethics-in-mathematics.com).

References

Anthropic (2025, March 27 ). Tracing the thoughts of a large language model. Retrieved from https://www.anthropic.com/research/tracing-thoughts-language-model.Google Scholar

Artificial Intelligence Jobs (2025). Artificial intelligence integrator (focus wargaming), NATO ACT, UK [Job ad]. Retrieved from https://artificialintelligencejobs.co.uk/view-job/artificial-intelligence-integrator-focus-wargaming-nato-act-4902730174. Archived on 4 April 2025 at https://web.archive.org/web/20250404120245/https://artificialintelligencejobs.co.uk/view-job/artificial-intelligence-integrator-focus-wargaming-nato-act-4902730174.Google Scholar

Asia Pacific Institute for Law and Security (2025). Good practices in the legal review of weapons, means and methods of warfare. Retrieved May 15, 2025, from https://apils.org/good-practices/Google Scholar

Australian Government (2024). Australia’s submission to the United Nations secretary-general’s report on lethal autonomous weapons systems (RE: ODA-2024-00019/LAWS). Retrieved May 15, 2025, from https://docs-library.unoda.org/General_Assembly_First_Committee_-Seventy-Ninth_session_(2024)/78-241-Australia-EN.pdf Google Scholar

Australian Government (2025). Australian Government AI technical standard (Version 1). Retrieved September 14, 2025, from https://www.digital.gov.au/sites/default/files/documents/2025-08/Australian%20Government%20AI%20technical%20standard.pdf Google Scholar

Bertrand, A., Belloum, R., Eagan, J. R., & Maxwell, W. (2022). How cognitive biases affect XAI-assisted decision-making: A systematic review. Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society, 78–91. https://doi.org/10.1145/3514094.3534164CrossRef Google Scholar

Besker, T., Ghanbari, H., Martini, A., & Bosch, J. (2020). The influence of technical debt on software developer morale. Journal of Systems and Software, 167, . https://doi.org/10.1016/j.jss.2020.110586CrossRef Google Scholar

Bosch, K. V. D., & Bronkhorst, A. W. (2018). Human-AI cooperation to benefit military decision making. Proceedings of the NATO IST-160 Specialist’s meeting on Big Data and Artificial Intelligence for Military Decision Making, Bordeaux, F, 30 May-1 June 2018, S3-1/1-S3-1/12. Retrieved January 14, 2025, from https://www.karelvandenbosch.nl/documents/2018_Bosch_etal_NATO-IST160_Human-AI_Cooperation_in_Military_Decision_Making.pdf Google Scholar

Burden, J., Chiodo, M., Grosse Ruse-Khan, H., Markschies, L., Müller, D., Ó Héigeartaigh, S., … Zech, H. (2024). Legal aspects of access to human-generated data and other essential inputs for AI training. University of Cambridge Faculty of Law Research Paper, (). https://doi.org/10.2139/ssrn.5045155CrossRef Google Scholar

Center for Army Lessons Learned (2024). Military decision-making process – organizing and conducting planning. Retrieved January 14, 2025, from https://api.army.mil/e2/c/downloads/2023/11/17/f7177a3c/23-07-594-military-decision-making-process-nov-23-public.pdf Google Scholar

Chief Digital and Artificial Intelligence Office (2023). RAI toolkit. Retrieved September 14, 2025, from https://rai.acqbot.com/assessment/standard/intake.Google Scholar

Chiodo, M., Grosse Ruse-Khan, H., Müller, D., Ossmann-Magiera, L., & Zech, H. (2024). Regulating AI: A matrix for gauging impact and its legal implementation. University of Cambridge Faculty of Law Research Paper, . https://doi.org/10.2139/ssrn.4765104CrossRef Google Scholar

Chiodo, M., & Müller, D. (2025a). Manifesto for the responsible development of mathematical works – a tool for practitioners and for management. Journal for Theoretical and Marginal Mathematics Education, 4(1), . https://doi.org/10.5281/zenodo.16729482Google Scholar

Chiodo, M., & Müller, D. (2025b). The problem of algorithmic collisions: Mitigating unforeseen risks in a connected world. arXiv preprint arXiv:2505.20181. https://doi.org/10.48550/arXiv.2505.20181CrossRef Google Scholar

Chiodo, M., Müller, D., & Sienknecht, M. (2024). Educating AI developers to prevent harmful path dependency in AI resort-to-force decision making. Australian Journal of International Affairs, 78(2), 210–219. https://doi.org/10.1080/10357718.2024.2327366CrossRef Google Scholar

Chiodo, M., Müller, D., Siewert, P., Wetherall, J.-L., Yasmine, Z., & Burden, J. (2025). Formalising human-in-the-loop: Computational reductions, failure modes, and legal-moral responsibility. arXiv preprint arXiv:2505.10426. https://doi.org/10.48550/arXiv.2505.10426CrossRef Google Scholar

Dalrymple, D., Skalse, J., Bengio, Y., Russell, S., Tegmark, M., Seshia, S., … Tenenbaum, J. (2024). Towards guaranteed safe AI: A framework for ensuring robust and reliable AI systems. arXiv preprint arXiv:2405.06624. https://doi.org/10.48550/arXiv.2405.06624CrossRef Google Scholar

Deeks, A. (2024). Delegating war initiation to machines. Australian Journal of International Affairs, 78(2), 148–153. https://doi.org/10.1080/10357718.2024.2327375CrossRef Google Scholar

Deng, C., Ji, X., Rainey, C., Zhang, J., & Lu, W. (2020). Integrating machine learning with human knowledge. iScience, 23(11), . https://doi.org/10.1016/j.isci.2020.101656CrossRef Google Scholar PubMed

Devitt, K., Gan, M., Scholz, J., & Bolla, R. (2025). A method for ethical AI in defence (DSTG–TR–3786). Aerospace Division, Defence Science and Technology Group, Department of Defence. Retrieved September 14, 2025, from https://www.dst.defence.gov.au/sites/default/files/publications/documents/A%20Method%20for%20Ethical%20AI%20in%20Defence.pdf Google Scholar

Devitt, S. K., & Copeland, D. (2022). Australia’s approach to AI governance in security and defence. arXiv:2112.01252v2. https://doi.org/10.48550/arXiv.2112.01252CrossRef Google Scholar

Emery, F. E., & Trist, E. L. (1969). Socio-technical systems. In Emery, F. E. (Ed.), Systems thinking: Selected readings (281–295). Penguin Books.Google Scholar

Erskine, T. (2024). Before algorithmic armageddon: Anticipating immediate risks to restraint when AI infiltrates decisions to wage war. Australian Journal of International Affairs, 78(2), 175–190. https://doi.org/10.1080/10357718.2024.2345636CrossRef Google Scholar

Erskine, T., & Davis, J. L. (2026). “Borgs in the org” and the decision to wage war: The impact of AI on institutional learning and the exercise of restraint. AI and the Decision to Go to War [Special Issue], Cambridge Forum on AI: Law and Governance. Vol. 1.Google Scholar

Erskine, T., & Miller, S. E. (2024). AI and the decision to go to war: Future risks and opportunities. Australian Journal of International Affairs, 78(2), 135–147. https://doi.org/10.1080/10357718.2024.2349598CrossRef Google Scholar

Forbus, K., Kott, A., & Rasch, R. (2003). Incorporating AI into military decision making: An experiment. IEEE Intelligent Systems, 18(4), 18–26. https://doi.ieeecomputersociety.org/10.1109/MIS.2003.1217624 Google Scholar

Forcese, C. (2018). Destroying the Caroline: The frontier raid that reshaped the right to war. University of Toronto Press.Google Scholar

Gentry, J. A. (2016). The “professionalization” of intelligence analysis: A skeptical perspective. International Journal of Intelligence and CounterIntelligence, 29(4), 643–676. https://doi.org/10.1080/08850607.2016.1177393CrossRef Google Scholar

Goldberg, J. (2025, March 31 ). The Trump administration accidentally texted me its war plans. The Atlantic. https://www.theatlantic.com/politics/archive/2025/03/trump-administration-accidentally-texted-me-its-war-plans/682151/Google Scholar

Goldberg, J., & Harris, S. (2025, March 31 ). Here are the attack plans that Trump’s advisers shared on signal. The Atlantic. https://www.theatlantic.com/politics/archive/2025/03/signal-group-chat-attack-plans-hegsethgoldberg/682176/Google Scholar

Govaers, F. (2018). Explainable AI for strategic hybrid operations (STO–MP–IST–190–16). NATO Science and Technology Organization.Google Scholar

Grand-Clément, S. (2023). Artificial Intelligence Beyond Weapons: Application and Impact of AI in the Military Domain. UNIDIR. Retrieved September 14, 2025, from https://unidir.org/publication/artificial-intelligence-beyond-weapons-application-and-impact-of-ai-in-the-military-domain/#:∼:text=The%20report%20provides%20an%20overview%20of%20current%20and,the%20application%20of%20AI%20to%20these%20military%20tasks Google Scholar

Group of Governmental Experts (2019). Convention on prohibitions or restrictions on the use of certain conventional weapons which may be deemed to be excessively injurious or to have indiscriminate effects. Retrieved May 15, 2025, from https://docs-library.unoda.org/Convention_on_Certain_Conventional_Weapons_-_Group_of_Governmental_Experts_(2019)/CCWGGE.12019WP.2Rev.1.pdf Google Scholar

Hill, K. (2025, August 26 ). A teen was suicidal. ChatGPT was the friend he confided in. New York Times. https://www.nytimes.com/2025/08/26/technology/chatgpt-openai-suicide.html Google Scholar

Hill, S., & Marsan, N. (2018). Artificial intelligence and accountability: A multinational legal perspective. Big data and artificial intelligence for military decision making (STO–MP–IST–160–P4). NATO Science and Technology Organization.Google Scholar

Hoffman, W., & Kim, H. M. K. (2023). Reducing the risks of artificial intelligence for military decision advantage. Center for Security and Emerging Technology. https://cset.georgetown.edu/wp-content/uploads/CSET-Reducing-the-Risks-of-Artificial-Intelligence-for-Military-Decision-Advantage.pdf 10.51593/2021CA008CrossRef Google Scholar

Huntington Ingalls Industries (2025, April 13 ). Modeling & simulation integrator/operator. HII’s mission technologies division [Job ad]. Retrieved from https://jobs.hii-tsd.com/job/Colorado-Springs%2C-CO-Modeling-&-Simulation-Integrator-Operator-%28Engineer-Sys-Mod-Sim-3%29-22088-Colo/1257003900/. Archived on 14 September 2025 at https://web.archive.org/web/20250914174736/https://jobs.hii-tsd.com/job/Colorado-Springs%2C-CO-Modeling-&-Simulation-Integrator-Operator-(Engineer-Sys-Mod-Sim-3)-22088-Colo/1257003900/.Google Scholar

Iredale, W. (2015). The kamikaze hunters. Macmillan.Google Scholar

Kalai, A. T., Nachum, O., Vempala, S. S., & Zhang, E. (2025). Why language models hallucinate. arXiv preprint arXiv:2509.04664. https://doi.org/10.48550/arXiv.2509.04664CrossRef Google Scholar

Karafyllis, N. C. (2019). Soziotechnisches system. In Liggieri, K. & Müller, O. (Eds.), Mensch-Maschine-Interaktion: Handbuch zu Geschichte – Kultur – Ethik (106–113). J. B. Metzler.10.1007/978-3-476-05604-7_14CrossRef Google Scholar

Kraska, J. (2021). Command accountability for AI weapon systems in the law of armed conflict. International Law Studies, 97(1), 408–447. https://digital-commons.usnwc.edu/ils/vol97/iss1/22/Google Scholar

Kushwaha, A. K., Pharswan, R., Kumar, P., & Kar, A. K. (2023). How do users feel when they use artificial intelligence for decision making? A framework for assessing users’ perception. Information Systems Frontiers, 25(3), 1241–1260. https://doi.org/10.1007/s10796-022-10293-2CrossRef Google Scholar

Logan, S. (2024). Tell me what you don’t know: Large language models and the pathologies of intelligence analysis. Australian Journal of International Affairs, 78(2), 220–228. https://doi.org/10.1080/10357718.2024.2331733CrossRef Google Scholar

Lowell, H. (2025, April 6 ). Exclusive: How the Atlantic’s Jeffrey Goldberg got added to the White House Signal group chat. The Guardian. Retrieved May 15, 2025, from https://www.theguardian.com/us-news/2025/apr/06/signal-group-chat-leak-how-it-happened Google Scholar

Marrin, S. (2009). Training and educating US intelligence analysts. International Journal of Intelligence and CounterIntelligence, 22(1), 131–146. https://doi.org/10.1080/08850600802486986CrossRef Google Scholar

Mattern, D. (2007). Beyond nuclear terrorism. Peace Review, 19(4), 563–569. https://doi.org/10.1080/10402650701681194CrossRef Google Scholar

McNaughton, J. (2006). Nisei linguists: Japanese Americans in the military intelligence service during World War II. U.S. Army Center of Military History.Google Scholar

Mecacci, G., Amoroso, D., Abbink, D., van den Hoven, J., & Santoni, F. d. S. (Eds). (2024). Research handbook on meaningful human control of artificial intelligence systems. Edward Elgar Publishing.10.4337/9781802204131CrossRef Google Scholar

Ministry of Defence (2024). Dependable artificial intelligence (AI) in defence: Part 1: Directive (JSP 936 v1.1). Retrieved September 14, 2025, from https://assets.publishing.service.gov.uk/media/6735fc89f6920bfb5abc7b62/JSP936_Part1.pdf Google Scholar

Moore, J. (1981). The short, eventful life of Eugene B. Ely. United States Naval Institute Proceedings, 107(1), . http://www.usni.org/magazines/proceedings/1981/january/short-eventful-life-eugene-b-ely Google Scholar

Morales-Forero, A., Bassetto, S., & Coatanea, E. (2023). Toward safe AI. AI and Society, 38(2), 685–696. https://doi.org/10.1007/s00146-022-01591-zCrossRef Google Scholar

Moseley, A. (n.d.). Just war theory. In Fieser, J. & Dowden, (Eds.), Internet Encyclopedia of Philosophy. Retrieved 15 May, 2025, from https://iep.utm.edu/justwar/Google Scholar

Müller, D., & Chiodo, M. (2023). Mathematical artifacts have politics: The journey from examples to embedded ethics. arXiv preprint arXiv:2308.04871. https://doi.org/10.48550/arXiv.2308.04871CrossRef Google Scholar

Müller, D., Chiodo, M., & Franklin, J. (2022). A hippocratic oath for mathematicians? Mapping the landscape of ethics in mathematics. Science and Engineering Ethics, 28(5), . https://doi.org/10.1007/s11948-022-00389-yCrossRef Google Scholar PubMed

National Air and Space Museum (2024). 1903 Wright Flyer. Retrieved January 14, 2025, from https://airandspace.si.edu/collection-objects/1903-wright-flyer/nasm_A19610048000 Google Scholar

NATO (2021). Summary of the NATO artificial intelligence strategy [Official Text]. Retrieved January 14, 2025, from https://www.nato.int/cps/en/natohq/official_texts_187617.htm.Google Scholar

NATO Employment. (2025, April 13 ). Integrator/architect JISR & awareness caps in United States of America [Job ad]. Retrieved from https://nato.referrals.selectminds.com/jobs/integrator-architect-jisr-awareness-caps-5233. Archived on 14 September 2025 at https://web.archive.org/web/20250914174528/https://nato.referrals.selectminds.com/jobs/integrator-architect-jisr-awareness-caps-5233.Google Scholar

Organisation for Economic Co-operation and Development (2022). OECD framework for the classification of AI systems. OECD Digital Economy Papers, 323. Retrieved September 14, 2025, from https://www.oecd.org/content/dam/oecd/en/publications/reports/2022/02/oecd-framework-for-the-classification-of-ai-systems_336a8b57/cb6d9eca-en.pdf Google Scholar

Organisation for Economic Co-operation and Development (2024). OECD AI principles overview. Retrieved September 14, 2025, from https://oecd.ai/en/ai-principles.Google Scholar

Osoba, O. (2024). A complex-systems view of military decision making. Australian Journal of International Affairs, 78(2), 237–246.10.1080/10357718.2024.2333817CrossRef Google Scholar

Osoba, O. (2026). Responsible AI governance for military decision-making: A proposal for managing complexity. AI and the Decision to Go to War [Special Issue], Cambridge Forum on AI: Law and Governance, Vol 1.Google Scholar

Overton, I. (2025, May 30 ). War by algorithm: A review and critique of JSP 936, the UK’s defence AI doctrine. Action on Armed Violence Reports. Retrieved September 14, 2025, from https://aoav.org.uk/2025/war-by-algorithm-a-review-and-critique-of-jsp-936-the-uks-defence-ai-doctrine/.Google Scholar

Pasmore, W., Winby, S., Albers Mohrman, S., & Vanasse, R. (2019). Reflections: Sociotechnical systems design and organization change. Journal of Change Management, 19(2), 67–85. https://doi.org/10.1080/14697017.2018.1553761CrossRef Google Scholar

Quin, F., Weyns, D., Galster, M., & Silva, C. C. (2024). A/B testing: A systematic literature review. Journal of Systems and Software, 211, . https://doi.org/10.1016/j.jss.2024.112011CrossRef Google Scholar

REAIM. (2024, September 10 ). Full statement: REAIM blueprint for action. The Readable. https://thereadable.co/reaim-blueprint-for-responsible-ai-use-military/Google Scholar

Reyna, V. F. (2004). How people make decisions that involve risk: A dual-processes approach. Current Directions in Psychological Science, 13(2), 60–66. https://doi.org/10.1111/j.0963-7214.2004.00275.xCrossRef Google Scholar

Rittberg, C. J. (2024). Social justice and the objectivity of mathematical reasoning: A dilemma for policymakers. In Ernest, P. (Ed.), Ethics and Mathematics Education: The Good, the Bad and the Ugly (341–356). Springer Nature. https://doi.org/10.1007/978-3-031-58683-5_17CrossRef Google Scholar

Rivera, J. P., Mukobi, G., Reuel, A., Lamparth, M., Smith, C., & Schneider, J. (2024, June). Escalation risks from language models in military and diplomatic decision-making. In Proceedings of the 2024 ACM conference on fairness, accountability, and transparency (836–898). Association for Computing Machinery. https://doi.org/10.1145/3630106.3658942CrossRef Google Scholar

Roland, A. (2016). War and technology: A very short introduction. Oxford University Press.10.1093/actrade/9780190605384.001.0001CrossRef Google Scholar

Ryan, M. (2024). The fifth element: Algorithmic support to adaptation before and during war. Paper [Paper presentation]. 2^nd workshop on anticipating the future of war: AI, automated systems, and resort-to-force decision making, the Australian National University, 23-24 July 2024.Google Scholar

Ryschkewitsch, M., Schaible, D., & Larson, W. (2009). The art and science of systems engineering. Systems Research Forum, 3(2), 81–100. https://doi.org/10.1142/S1793966609000080CrossRef Google Scholar

Saeed, W., & Omlin, C. (2023). Explainable AI (XAI): A systematic meta-survey of current challenges and future opportunities. Knowledge-Based Systems, 263(110273), 1–24. https://doi.org/10.1016/j.knosys.2023.110273CrossRef Google Scholar

Schraagen, J. M. (Ed.). (2024). Responsible use of AI in military systems (1st ed). Taylor & Francis. https://doi.org/10.1201/9781003410379CrossRef Google Scholar

Schuering, B., & Schmid, T. (2024). What can computers do now? Dreyfus revisited for the third wave of artificial intelligence. Proceedings of the AAAI Symposium Series, 3(1), 248–252. Association for the Advancement of Artificial Intelligence (AAAI). https://doi.org/10.1609/aaaiss.v3i1.31207.CrossRef Google Scholar

Shevlane, T. (2022). Structured access: An emerging paradigm for safe AI deployment. arXiv preprint arXiv:2201.05159. Retrieved January 14, 2025, from https://doi.org/10.48550/arXiv.2201.05159CrossRef Google Scholar

Sienknecht, M. (2024). Proxy responsibility: Addressing responsibility gaps in human-machine decision making on the resort to force. Australian Journal of International Affairs, 78(2), 191–199. https://doi.org/10.1080/10357718.2024.2327384CrossRef Google Scholar

Sienknecht, M. (2026). Institutionalizing proxy responsibility: AI oversight bodies and resort-to-force decision making. AI and the Decision to Go to War [Special Issue], Cambridge Forum on AI: Law and Governance, Vol. 1.Google Scholar

Sisk, R. (2014, December 18 ). Congress again buys Abrams tanks the army doesn’t want. Military News. https://www.military.com/daily-news/2014/12/18/congress-again-buys-abrams-tanks-the-army-doesnt-want.html Google Scholar

Smith, M. (2001). The emperor’s codes: The breaking of Japan’s secret ciphers. Arcade Publishing.Google Scholar

Stone, M., Aravopoulou, E., Ekinci, Y., Evans, G., Hobbs, M., Labib, A., … Machtynger, L. (2020). Artificial intelligence (AI) in strategic marketing decision-making: A research agenda. The Bottom Line, 33(2), 183–200. https://doi.org/10.1108/BL-03-2020-0022CrossRef Google Scholar

Symonds, C. (2018). World War Two at Sea: A Global History. Oxford University Press.Google Scholar

Toppan, A. (2001). World aircraft carriers list: France – the Foudre. Haze Gray. Retrieved January 14, 2025, from https://www.hazegray.org/navhist/carriers/france.htm#foud.Google Scholar

Trist, E. L., & Bamforth, K. W. (1951). Some social and psychological consequences of the longwall method of coal-getting: An examination of the psychological situation and defences of a work group in relation to the social structure and technological content of the work system. Human Relations, 4(1), 3–38. https://doi.org/10.1177/001872675100400101CrossRef Google Scholar

Trusted Autonomous Systems (2023). Responsible AI for defence (consultation) – Trusted autonomous systems. Retrieved September 14, 2025, from https://tasdcrc.com.au/responsible-ai-for-defence-consultation/.Google Scholar

United States Government Accountability Office (2018, March). Technology assessment artificial intelligence: Emerging opportunities, challenges and implications (GAO–18–142SP). Retrieved January 14, 2025, from https://www.gao.gov/assets/gao-18-142sp.pdf Google Scholar

US Department of State. (2024). Political declaration on responsible military use of artificial intelligence and autonomy. Retrieved May 15, 2025, from https://www.state.gov/bureau-of-arms-control-deterrence-and-stability/political-declaration-on-responsible-military-use-of-artificial-intelligence-and-autonomy.Google Scholar

Vold, K. (2024). Human-AI cognitive teaming: Using AI to support state-level decision making on the resort to force. Australian Journal of International Affairs, 78(2), 229–236. https://doi.org/10.1080/10357718.2024.2327383CrossRef Google Scholar

Vold, K. (2026). Augmenting military decision-making capabilities with artificial intelligence. AI and the Decision to Go to War [Special Issue]. Cambridge Forum on AI: Law and Governance, Vol. 1.Google Scholar

Walker, G. H., Stanton, N. A., Salmon, P. M., & Jenkins, D. P. (2008). A review of sociotechnical systems theory: A classic concept for new command and control paradigms. Theoretical Issues in Ergonomics Science, 9(6), 479–499. https://doi.org/10.1080/14639220701635470CrossRef Google Scholar

Whitaker, K. (1997). Motivating and keeping software developers. Computer, 30(1), 126–128. https://doi.org/10.1109/2.562935CrossRef Google Scholar

Winner, L. (1980). Do artifacts have politics?. Daedalus, 109(1), 121–136.Google Scholar

Zha, D., Bhat, Z. P., Lai, K. H., Yang, F., & Hu, X. (2023). Data-centric AI: Perspectives and challenges. Proceedings of the 2023 SIAM International Conference on Data Mining (SDM) (945–948). Society for Industrial and Applied Mathematics. https://doi.org/10.1137/1.9781611977653.ch106CrossRef Google Scholar

Figure 1. The socio-technical system and its normative environment.

Figure 2. The socio-technical system of integrating AI into the military, and the 10 pillars of responsible AI integration.

Table 1. 10 pillars of responsible AI integration tailored to the groups of developers, integrators and users (font coded by relative competencies)20

Article contents

Integrators at war: Mediating in AI-assisted resort-to-force decisions

Abstract

Keywords

Information

1. Introduction: AI integrators as a hidden problem areaFootnote 1

2. Historical examples of integrating new technology in the military

3. The integration of AI-enabled systems into the military and the constitution of a socio-technical system

3.1. The constitution of a new socio-technical system

4. Challenges arising in AI-based socio-technical systems

4.1. Challenges arising from the technology itself

4.2. Challenges arising from the role of integrators

4.3. Challenges arising from human–machine interaction

5. How to address challenges in the integration process? The 10 pillars of responsible AI integration

5.1. Applying the 10 pillars to the integration of off-the-shelf LLMs in decision-making processes on the resort to force

6. Concluding remarks

Funding statement

Competing interests

Appendix A

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests

1. Introduction: AI integrators as a hidden problem areaFootnote ¹