Insights
Before I became a prompt engineer (or “the AI Whisperer” - thanks, TechInformed!), I used to do a very different job, compliance in a large UK retail bank.
In terms of public perception and hype, the differences between my two roles might seem vast, but I will argue in this blog that they share significant common ground. After all, the key to an effective compliance manager is being able to express yourself clearly and interpret regulation, risk and control frameworks, and business activities coherently. These skills are very similar to those needed to be an effective prompt engineer.
By applying prompt engineering techniques to advanced models like GPT-4, we can unlock high-quality outputs that are tailored to help compliance managers, regulatory experts and others do more work, of higher quality, and with the utmost interpretability and transparency.
Snake oil, you might say - but not so fast, there are real-world studies to back this view up.
Many compliance functions see AI as a potentially dangerous ‘black box’ with a mysterious, unexplainable layer between input and output. This concern is well-founded. Everyone has tales of LLM hallucinations and although they tend to improve in accuracy every time a new model is released, no Large Language Model (LLM), naively prompted, is 100% reliable. Regardless, the inner workings of ChatGPT, or any LLM, are certainly not explainable unless you happen to have 2 PhDs in LLM Mechanistic Interpretability and are working for one of the top research labs (and maybe not even then!).
But, we don’t need to give up or reject the usage of these tools. There is a way to use LLMs and get to explainable, reliable outputs. Methods such as breaking down queries into manageable, explainable steps and/or using multiple clean copies of an LLM can help us get to the right, accurate and explainable answer.
In this blog I want to demonstrate that the discipline of prompt engineering can be considered ‘modern AI compliance’ where the work is in breaking the model’s reasoning into understandable, explainable and auditable sub-steps - using advanced techniques like Chain of Thought reasoning and Selection-Inference, popularised by companies like OpenAI and DeepMind. Most of the papers we cite here come out of those two frontier labs.
The techniques outlined below nicely align with the main functions of a compliance and business control function: outcome observability, process-adherence monitoring and explainability.
There are two main techniques prompt engineers use to try and improve LLM outputs:
‘Focussing on outcomes and measuring them has enabled the FCA to deliver its objectives of consumer protection, competition in the interests of consumers and market integrity’
Outcomes-focussed regulation: a measure of success?
Speech by Charles Randell, Chair of the FCA and PSR, to the Finance & Leasing Association
In prompt engineering, we also rely heavily on outcome supervision, where we reward or punish our LLM based on the outcomes it achieves. One example of outcome-based supervision, which can be applied post-training (ie only with prompts), is Chain of Thought reasoning.
Initially developed in 2022 1 with subsequent improvements and evolutions 2, it uses examples of successful outcomes, or special instructions designed to improve outcomes to induce better performance from the model.
Simple example
A good example of this, especially in the early 2023 generation of LLMs is with maths:
Pros and cons
Broad applicability
Significantly improves the performance of base-LLM capabilities for specific reasoning tasks.
Simpler to implement
Can be applied even without examples: statements such as “take a deep breath and think step-by-step” 3 have demonstrated significantly increased performance of the models (and spawned excellent viral ads).
Vulnerability to hallucinations
Without the step-by-step feedback provided by process supervision, models are more prone to inventing steps or facts—hallucinations—that can lead to unreliable answers.
Inaccuracy in reasoning
Models trained with outcome supervision may use flawed logic or incorrect steps but still arrive at the correct answer by coincidence. This misalignment means that while the outcome may be correct, the model’s reasoning process is not reliable for teaching or understanding.
If you find outcome supervision too loose, process supervision is stricter, resembling the typical regulatory compliance standards. It demands full documentation and accuracy at every step to produce a valid result. In prompt engineering, process supervision tends to be seen through a method called Selection Inference (SI). This method emphasizes causality, it ensures that each reasoning step is transparent and checkable for errors, carefully choosing relevant information through a structured reasoning process before providing an answer 5.
With process supervision, prompt engineering achieves a level of explainability that surpasses traditional models—and even human capabilities in some respects. Unlike humans, whose reasoning processes remain largely internal and opaque, well-designed AI systems can offer a fully transparent account of their decision-making paths.
The selection step focuses on choosing relevant facts from the provided data pertinent to solving a problem. Subsequently, the inference step utilizes only the information picked out during selection to deduce new evidence leading toward the final answer.
Unlike approaches that provide post hoc rationalizations, the SI framework ensures each step of reasoning follows from the preceding one, making the entire reasoning process transparent, interpretable, and more amenable to debugging.
‘Each inference step is made in isolation, based solely on the limited information provided by the Selection module, without direct access to the question or to previous steps of reasoning' 'Selection-Inference', DeepMind 5.
Real-world compliance example
In this example, we’ll apply these techniques to analyse the DORA (Digital Operational Resilience Act (DORA)) regulation. We’ll apply the following logic steps:
This gives us a natural language, causal reasoning trace from question to answer.
Step 1
The LLM takes the question “How does DORA address the management of third-party ICT service risks and promote collective cyber defences?” and parses the DORA regulation to find the most pertinent content. In this case, Article 28 and Article 45.
These articles are passed to a new context window.
Step 2
A fresh version of the LLM, in a new context window, reviews these Articles and produces a series of Inferences to describe what they mean, and draws any relevant conclusions it believes are pertinent based on its knowledge and prompt engineering. In this instance, the model made five separate inferences, for example:
Inference 3: The management of ICT third-party risk by financial entities is subject to the principle of proportionality. This infers that the extent and depth of risk management practices should be aligned with the entity's size, complexity, and the criticality of its ICT-related dependencies. This proportionality allows for a flexible approach to risk management that can be tailored to the specific characteristics and risk profiles of different financial entities.
All five inferences are passed to another new context window, along with the original question, Article 28, and Article 45 from the DORA regulations.
Step 3
A fresh version of the LLM uses the Question, the Selection and the Inferences. It summarises these into an answer relevant to the question. The following is the answer given based on the question, selection and inferences set out above:
DORA addresses third-party ICT service risks and enhances collective cyber defenses by mandating financial entities to integrate these risks into their overall ICT risk management frameworks, ensuring a comprehensive approach to risk management. It holds financial entities fully accountable for compliance and obligations under the law, even when outsourcing to third-party ICT services, emphasizing the importance of due diligence and oversight. The principle of proportionality allows entities to tailor risk management practices to their specific characteristics and risk profiles. Furthermore, DORA encourages the exchange of cyber threat information among financial entities within trusted communities, aiming to improve sector-wide digital operational resilience while respecting confidentiality and data protection laws. This approach combines individual responsibility with collaborative efforts to strengthen the financial sector's defence against cyber threats.
Pros and cons
Reduces hallucinations
One of the significant challenges with LLMs is their tendency to produce confabulations or "hallucinations" during reasoning. Process supervision can help by grounding each step of reasoning in validated information or logic, which is crucial for domains demanding high-precision answers.
Transparent reasoning
Modeled on neurosymbolic AI, the system operates through independent modules. Each Inference is made in isolation, relying solely on targeted inputs from the Selection module, without reference to the question or previous reasoning steps. This method provides a clear, natural language explanation of the reasoning process, enhancing both trust and interpretability in LLM applications.
Complex implementation
Implementing process supervision requires meticulously creating or annotating the reasoning steps that lead to a conclusion, which can be significantly more labour-intensive and complex compared to outcome-based supervision.
Scalability challenge
Given the need for detailed step-by-step guidance, scaling process supervision to cover a broad array of knowledge domains and problems can be challenging. Without synthetic data, this might limit the breadth of contexts where the model can effectively apply its reasoning capabilities .
I believe compliance analysts have the most to gain, and a great head start, in becoming experts in prompt engineering.
Two reasons stand out:
Maximise your impact: Using prompt engineering techniques, you can bring your compliance expertise directly into AI operations. Your in-depth understanding of regulations and business processes positions you perfectly to help design these systems. This not only ensures AI operations are guided by necessary regulations and standards but also enhances their efficiency and effectiveness.
Lead with transparency: Championing process supervision methods within prompt engineering offers an opportunity to make AI decision-making processes as transparent as the logic of your own compliance decisions. This way, you're not only helping to advance the use of AI but also ensuring these advances come with a high level of integrity and accountability.
Critically, the efficiency improvements associated with employing structured reasoning in LLMs are matched by a comparable boost in interpretability. This dual benefit – increased performance alongside clear, understandable outcomes – reinforces the significant value of incorporating these methodologies into your compliance toolkit. 6 7.
Prompt engineering shouldn't be the preserve of the technical teams - it’s the best tool for making your business more explainable, more controllable and less biased across all AI-enabled workflows in the business.
Tomoro works with the most ambitious business & engineering leaders to realise the AI-native future of their organisation. We deliver agent-based solutions which fit seamlessly into businesses’ workforce; from design to build to scaled deployment.
Founded by experts with global experience in delivering applied AI solutions for tier 1 financial services, telecommunications and professional services firms, Tomoro’s mission is to help pioneer the reinvention of business through deeply embedded AI agents.
Powered by our world-class applied AI R&D team, working in close alliance with Open AI, we are a team of proven leaders in turning generative AI into market-leading competitive advantage for our clients.
Continue Reading
We’re looking for a small number of the most ambitious clients to work with in this phase, if you think your organisation could be the right fit please get in touch.