Insights
Having built a number of these (and rescued a few more) this blog is written for leaders and organisations who are:
At the heart of knowledge management solutions is a concept called retrieval augmented generation (RAG). This diagram outlines the key concepts of how RAG works:
There is no 'one way' to do a knowledge management solution right.
We see 3 approaches that cover most if not all knowledge management use cases.
These approaches can be navigated by considering the following questions:
If you are querying between 1-10 documents at a time, you are best off using a tool like Microsoft Copilot or OpenAI’s ChatGPT (Approach 1) due to the speed you can deploy it and the low cost (assuming you have one of these tools available already). These solutions are performant for ad-hoc queries across small volumes of knowledge. (for example, we use “NDABot” to help us review Non-Disclosure Agreements with our clients, the ChatGPT GPT helps us review these to find any terms we need to investigate further).
If you are expecting to query lots of documents (100+) but the queries will always be variations upon “give me a summary answer to this question” then Approach 2 will be appropriate for your needs. This is more involved as it requires optimising how vast amounts of content are best embedded and stored but, with optimisation, this approach will provide suitable accuracy for these types of questions.
Note: we’ve worked with several customers who started out expecting only Q&A summary questions would be used in their knowledge chatbot, and then as trust built users quickly began to ask more complex questions. If you’ve followed Approach 2, this is where the possibility of ‘hallucinations’ or inappropriate answers starts to arise.
It is critical to have good ‘prompt monitoring’ in place to track how users are engaging with the system and continual measure the accuracy of responses against a designated benchmark
If queries are expected to be more complex (i.e. comparing concepts or inferring new information from existing information) then the kind of retrieval used in Approach 2 will break down. In this case, you need a more thorough approach to extract and index your fact-base from your knowledge, and different ways to join and parse this content which gives you confidence in the response. (You can learn more about how technically to do that here: Graph databases as RAG backends | Tomoro.ai ).
The final key question is the level of detail and confidence you require on source document traceability.
In our experience, the only method which provides high confidence of full traceability to source facts is Approach 3. In the other approaches, we often see missing, incorrect or hallucinated references, especially in complex, multi-document, queries.
The best way to understand the relative performance of these approaches is with a series of examples.
There are many models to choose from:
In most situations, the best choice is to keep it simple and use the most powerful model (i.e. Gemini, GPT-4 etc.) that you can within the constraints of your IT strategy, especially at the start of your initiative.
However, there are some exception scenarios:
If your organisation tends to use a significant volume of defined acronyms and terms which aren't present, or don't have the same definition, in standard English (or another language) then businesses should investigate fine-tuning as the method to embed this understanding into the knowledge AI chatbot.
If your business involves highly sensitive data, where you want the best possible understanding of the knowledge that is held within the LLM due to the nature of its training data it may be more appropriate to choose a model with greater transparency on the training process (i.e. Phi2b, Mixtral 8x7b etc.). These models are smaller (and hence trade capability for speed) but can be fine-tuned with relevant data to improve performance to reliable standards.
For both of these using smaller models, which are fine-tuned to optimise for performance, will increase speed whilst being able to control for accuracy and performance. In this case, performance can be either/both the time to first token returned, or shorter and more accurate succinct results, instead of pages of text.
Fine-tuning is the process of using labelled examples (typically 100s) of ‘if you see A, respond with B’ to update the model so future results more closely match those labelled examples.
There are many flavours of fine-tuning, each of which can affect the underlying model in different ways, with different levels of permanence. This will be the topic for a future blog.
Fine-tuning can become expensive, potentially 10-100xing the cost of running your knowledge chatbot. Therefore using it when appropriate is important, as noted above.
In conclusion, AI knowledge chatbots start out easy, and then become a bit more tricky relatively quickly!
We often see companies, sensibly, evolving through these approaches step-by-step, unless there is a valuable reason why you’d jump straight to the end. This is to be encouraged as colleagues and businesses build the literacy and skills required to use this technology safely.
With that said, it can quickly become a problem if users have mis-set expectations on the performance of the AI knowledge solution. We have seen several organisations roll out their solution with approach one or two, and then as users become more dependent on it they ask more complex queries which either error, or worse, provide incorrect information based on the context. Ongoing monitoring of query prompts and performance is critical to control runaway issues.
If you’d like to learn more or explore the technical details behind these solutions, please feel free to reach out to us.
Tomoro works with the most ambitious business & engineering leaders to realise the AI-native future of their organisation. We deliver agent-based solutions which fit seamlessly into businesses’ workforce; from design to build to scaled deployment.
Founded by experts with global experience in delivering applied AI solutions for tier 1 financial services, telecommunications and professional services firms, Tomoro’s mission is to help pioneer the reinvention of business through deeply embedded AI agents.
Powered by our world-class applied AI R&D team, working in close alliance with Open AI, we are a team of proven leaders in turning generative AI into market-leading competitive advantage for our clients.
Continue Reading
We’re looking for a small number of the most ambitious clients to work with in this phase, if you think your organisation could be the right fit please get in touch.