Why compliance analysts make great AI prompt engineers

Before I became a prompt engineer (or “the AI Whisperer” - thanks, TechInformed!), I used to do a very different job, compliance in a large UK retail bank.

In terms of public perception and hype, the differences between my two roles might seem vast, but I will argue in this blog that they share significant common ground. After all, the key to an effective compliance manager is being able to express yourself clearly and interpret regulation, risk and control frameworks, and business activities coherently. These skills are very similar to those needed to be an effective prompt engineer.

By applying prompt engineering techniques to advanced models like GPT-4, we can unlock high-quality outputs that are tailored to help compliance managers, regulatory experts and others do more work, of higher quality, and with the utmost interpretability and transparency.

Snake oil, you might say - but not so fast, there are real-world studies to back this view up.

Setting the scene#

Many compliance functions see AI as a potentially dangerous ‘black box’ with a mysterious, unexplainable layer between input and output. This concern is well-founded. Everyone has tales of LLM hallucinations and although they tend to improve in accuracy every time a new model is released, no Large Language Model (LLM), naively prompted, is 100% reliable. Regardless, the inner workings of ChatGPT, or any LLM, are certainly not explainable unless you happen to have 2 PhDs in LLM Mechanistic Interpretability and are working for one of the top research labs (and maybe not even then!).

But, we don’t need to give up or reject the usage of these tools. There is a way to use LLMs and get to explainable, reliable outputs. Methods such as breaking down queries into manageable, explainable steps and/or using multiple clean copies of an LLM can help us get to the right, accurate and explainable answer.

In this blog I want to demonstrate that the discipline of prompt engineering can be considered ‘modern AI compliance’ where the work is in breaking the model’s reasoning into understandable, explainable and auditable sub-steps - using advanced techniques like Chain of Thought reasoning and Selection-Inference, popularised by companies like OpenAI and DeepMind. Most of the papers we cite here come out of those two frontier labs.

The techniques outlined below nicely align with the main functions of a compliance and business control function: outcome observability, process-adherence monitoring and explainability.

Prompt engineering techniques#

There are two main techniques prompt engineers use to try and improve LLM outputs:

Outcome supervision: conditioning the LLM to produce higher quality results, by appending examples or instructions that encourage it to perform ‘Chain of Thought’ reasoning.
Process supervision: also focusing on improving LLM reasoning but by breaking each reasoning step into an individual LLM completion, meaning that every single reasoning step can be analysed for errors

Outcome supervision (Chain of Thought)

‘Focussing on outcomes and measuring them has enabled the FCA to deliver its objectives of consumer protection, competition in the interests of consumers and market integrity’

Outcomes-focussed regulation: a measure of success?
Speech by Charles Randell, Chair of the FCA and PSR, to the Finance & Leasing Association

In prompt engineering, we also rely heavily on outcome supervision, where we reward or punish our LLM based on the outcomes it achieves. One example of outcome-based supervision, which can be applied post-training (ie only with prompts), is Chain of Thought reasoning.

Initially developed in 2022 ¹ with subsequent improvements and evolutions ², it uses examples of successful outcomes, or special instructions designed to improve outcomes to induce better performance from the model.

Simple example

A good example of this, especially in the early 2023 generation of LLMs is with maths:

Pros and cons

Pros

Broad applicability

Significantly improves the performance of base-LLM capabilities for specific reasoning tasks.

Simpler to implement

Can be applied even without examples: statements such as “take a deep breath and think step-by-step” 3 have demonstrated significantly increased performance of the models (and spawned excellent viral ads).

Cons

Vulnerability to hallucinations

Without the step-by-step feedback provided by process supervision, models are more prone to inventing steps or facts—hallucinations—that can lead to unreliable answers.

Inaccuracy in reasoning

Models trained with outcome supervision may use flawed logic or incorrect steps but still arrive at the correct answer by coincidence. This misalignment means that while the outcome may be correct, the model’s reasoning process is not reliable for teaching or understanding.

Process supervision (Selection Inference)

If you find outcome supervision too loose, process supervision is stricter, resembling the typical regulatory compliance standards. It demands full documentation and accuracy at every step to produce a valid result. In prompt engineering, process supervision tends to be seen through a method called Selection Inference (SI). This method emphasizes causality, it ensures that each reasoning step is transparent and checkable for errors, carefully choosing relevant information through a structured reasoning process before providing an answer ⁵.

With process supervision, prompt engineering achieves a level of explainability that surpasses traditional models—and even human capabilities in some respects. Unlike humans, whose reasoning processes remain largely internal and opaque, well-designed AI systems can offer a fully transparent account of their decision-making paths.

The selection step focuses on choosing relevant facts from the provided data pertinent to solving a problem. Subsequently, the inference step utilizes only the information picked out during selection to deduce new evidence leading toward the final answer.

Unlike approaches that provide post hoc rationalizations, the SI framework ensures each step of reasoning follows from the preceding one, making the entire reasoning process transparent, interpretable, and more amenable to debugging.

‘Each inference step is made in isolation, based solely on the limited information provided by the Selection module, without direct access to the question or to previous steps of reasoning' 'Selection-Inference', DeepMind ⁵.

Real-world compliance example

In this example, we’ll apply these techniques to analyse the DORA (Digital Operational Resilience Act (DORA)) regulation. We’ll apply the following logic steps:

Present a Question that guides the inquiry. And provide Context (in this case, DORA). An LLM reviews the Context and Selects the pertinent information from DORA that addresses the posed question (but crucially doesn’t answer the question yet).
A fresh version of an LLM runs an Inference based on the selection. At this stage, a fresh GPT-4 context window is opened, and the model only sees the Selection, not the overall context or question.
Again, a fresh version of an LLM generates an Answer, based on the Question, Selection and Inference, in a fresh context window. This enables us to have a full reasoning trace as shown in the diagram.

This gives us a natural language, causal reasoning trace from question to answer.

Step 1

The LLM takes the question “How does DORA address the management of third-party ICT service risks and promote collective cyber defences?” and parses the DORA regulation to find the most pertinent content. In this case, Article 28 and Article 45.

These articles are passed to a new context window.

Step 2

A fresh version of the LLM, in a new context window, reviews these Articles and produces a series of Inferences to describe what they mean, and draws any relevant conclusions it believes are pertinent based on its knowledge and prompt engineering. In this instance, the model made five separate inferences, for example:

Inference 3: The management of ICT third-party risk by financial entities is subject to the principle of proportionality. This infers that the extent and depth of risk management practices should be aligned with the entity's size, complexity, and the criticality of its ICT-related dependencies. This proportionality allows for a flexible approach to risk management that can be tailored to the specific characteristics and risk profiles of different financial entities.

All five inferences are passed to another new context window, along with the original question, Article 28, and Article 45 from the DORA regulations.

Step 3

A fresh version of the LLM uses the Question, the Selection and the Inferences. It summarises these into an answer relevant to the question. The following is the answer given based on the question, selection and inferences set out above:

DORA addresses third-party ICT service risks and enhances collective cyber defenses by mandating financial entities to integrate these risks into their overall ICT risk management frameworks, ensuring a comprehensive approach to risk management. It holds financial entities fully accountable for compliance and obligations under the law, even when outsourcing to third-party ICT services, emphasizing the importance of due diligence and oversight. The principle of proportionality allows entities to tailor risk management practices to their specific characteristics and risk profiles. Furthermore, DORA encourages the exchange of cyber threat information among financial entities within trusted communities, aiming to improve sector-wide digital operational resilience while respecting confidentiality and data protection laws. This approach combines individual responsibility with collaborative efforts to strengthen the financial sector's defence against cyber threats.

Pros and cons

Pros

Reduces hallucinations

One of the significant challenges with LLMs is their tendency to produce confabulations or "hallucinations" during reasoning. Process supervision can help by grounding each step of reasoning in validated information or logic, which is crucial for domains demanding high-precision answers.

Transparent reasoning

Modeled on neurosymbolic AI, the system operates through independent modules. Each Inference is made in isolation, relying solely on targeted inputs from the Selection module, without reference to the question or previous reasoning steps. This method provides a clear, natural language explanation of the reasoning process, enhancing both trust and interpretability in LLM applications.

Cons

Complex implementation

Implementing process supervision requires meticulously creating or annotating the reasoning steps that lead to a conclusion, which can be significantly more labour-intensive and complex compared to outcome-based supervision.

Scalability challenge

Given the need for detailed step-by-step guidance, scaling process supervision to cover a broad array of knowledge domains and problems can be challenging. Without synthetic data, this might limit the breadth of contexts where the model can effectively apply its reasoning capabilities .

Summary#

I believe compliance analysts have the most to gain, and a great head start, in becoming experts in prompt engineering.

Two reasons stand out:

Maximise your impact: Using prompt engineering techniques, you can bring your compliance expertise directly into AI operations. Your in-depth understanding of regulations and business processes positions you perfectly to help design these systems. This not only ensures AI operations are guided by necessary regulations and standards but also enhances their efficiency and effectiveness.

Lead with transparency: Championing process supervision methods within prompt engineering offers an opportunity to make AI decision-making processes as transparent as the logic of your own compliance decisions. This way, you're not only helping to advance the use of AI but also ensuring these advances come with a high level of integrity and accountability.

Critically, the efficiency improvements associated with employing structured reasoning in LLMs are matched by a comparable boost in interpretability. This dual benefit – increased performance alongside clear, understandable outcomes – reinforces the significant value of incorporating these methodologies into your compliance toolkit. ⁶ ⁷.

Prompt engineering shouldn't be the preserve of the technical teams - it’s the best tool for making your business more explainable, more controllable and less biased across all AI-enabled workflows in the business.

References

Wei, Jason, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V. Le, and Denny Zhou. ".Chain-of-thought prompting elicits reasoning in large language models." Advances in neural information processing systems 35 (2022): 24824-24837.
Kojima, Takeshi, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. "Large language models are zero-shot reasoners" Advances in neural information processing systems 35 (2022): 22199-22213.
Yang, Chengrun, Xuezhi Wang, Yifeng Lu, Hanxiao Liu, Quoc V. Le, Denny Zhou, and Xinyun Chen. "Large language models as optimizers." arXiv preprint arXiv:2309.03409 (2023).
Lightman, Hunter, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. "Let's Verify Step by Step." arXiv preprint arXiv:2305.20050 (2023).
Creswell, Antonia, Murray Shanahan, and Irina Higgins. "Selection-inference: Exploiting large language models for interpretable logical reasoning." arXiv preprint arXiv:2205.09712 (2022).
Zhou, Pei, Jay Pujara, Xiang Ren, Xinyun Chen, Heng-Tze Cheng, Quoc V. Le, Ed H. Chi, Denny Zhou, Swaroop Mishra, and Huaixiu Steven Zheng. "Self-discover: Large language models self-compose reasoning structures." arXiv preprint arXiv:2402.03620 (2024).
Suzgun, Mirac, and Adam Tauman Kalai. "Meta-prompting: Enhancing language models with task-agnostic scaffolding." arXiv preprint arXiv:2401.12954 (2024).

Tomoro works with the most ambitious business & engineering leaders to realise the AI-native future of their organisation. We deliver agent-based solutions which fit seamlessly into businesses’ workforce; from design to build to scaled deployment.

Founded by experts with global experience in delivering applied AI solutions for tier 1 financial services, telecommunications and professional services firms, Tomoro’s mission is to help pioneer the reinvention of business through deeply embedded AI agents.

Powered by our world-class applied AI R&D team, working in close alliance with OpenAI, we are a team of proven leaders in turning generative AI into market-leading competitive advantage for our clients.