Not another blog on AI for data management!

Author: Sandi Chanda and Albert Phelps

Yes, it is. For several years, machine learning and artificial intelligence have been applied to better manage data. So what’s new?

Through our experiences in building data management agents, we have understood how to use GenAI best to do your data management tasks. Based on our day-to-day work, we talk about the following:

Why and how are AI agents good at core data management activities?
LLM-powered AI agents are excellent in doing all of the above if given the right direction.
Don’t think GenAI is a replacement for your current data management toolkit; it is an enhancer!
How to get started

Why and how are AI agents good at core data management activities?

If we create a simple list of tasks that we do to manage data it looks something like the following,

  1. Understand the definition of data
  2. Check if the data is good or bad
  3. Fix the data if it is bad
  4. Change the structure of the fixed data to match the target system
  5. Do the above tasks many times at different points of the data lifecycle

All of the above tasks are either analysis tasks or rule creation and execution ones. Whether it is analysing values or attribute names, understanding the definition, determining what rules to run to measure the quality of data or, based on the results, executing fixes.

LLM-powered AI agents are excellent in doing all of the above if given the right direction.

They are flexible enough to handle surprises, but controllable enough to steer towards a target state and ensure multi-hop explainability when they do decide changes are needed.

So, what did we do to prove it? We built an AI agent Data Squad!

The Data Squad helps users manage their data’s end-to-end lifecycle, enabling data transformations from raw feeds to curated, well-described, high-quality datasets.

We built the Data Squad to do the dirty data management jobs we hate! We've rolled the squad out across several horrendous datasets now, and working alongside expert humans, they've drastically increased the accuracy and speed with which we can understand, clean up, migrate, and manage complex data.

Let’s introduce the agents that make up the squad:

  1. Metadata Analyst: Summarises the dataset's metadata, including column schema and sensitivity classification.
  2. Data Quality Analyst: Reviews and assesses the dataset's quality, proposing rules and remedial steps for data cleansing.
  3. Anomaly Detector: Identifies and explains anomalies within the dataset using statistical and contextual reasoning.
  4. Data Architect: Optimises the dataset's structure by proposing normalised and semi-normalised data models.
  5. Data Transformer: Transforms the dataset into the desired format, incorporating insights from metadata, quality, and structure analyses.

Too good to be true? Check out the Data Squad in action

In this 10-minute video, we have captured the end-to-end process the Data Squad enables and brings to life the art of the possible!

Don’t think GenAI is a replacement for your current data management toolkit; it is an enhancer!

Okay, but let’s not get carried away here. We have seen the Data Squad to be massively useful for running very manual tasks very effectively, but that does not mean you should throw away the data management tools you have built and invested in. They still have a role to play as the tools that agentive solutions like the Data Squad use and improve with.

Right now, you need to use the Data Squad as your best set of new colleagues in the Data Management team, not as a replacement for all other resources. We’ve used them across all sorts of aspects of the data lifecycle:

Source data extraction: Use rule-based and AI-based techniques to extract data from tables, documents, and other media to migrate.

Generating metadata: Use rule-based and AI-based techniques to generate data product metadata descriptions (entities, relationships, etc.).

Writing DQ rules and code: AI-written DQ scripts using existing rules, to-be data structure, and metadata definitions to infer and develop a complete set of DQ checks.

Testing DQ: An AI agent to orchestrate the execution of data quality rules across all datasets – in the context of the appropriate metadata and high-quality examples.

Assisting with DQ issue resolution: Two-pronged triage and resolution flow,
automated resolution for low-complexity issues
escalation for SME review for higher complexity issues – with a feedback loop to improve automated corrections over time

How to get started

If you want to try it out yourself, start using LLMs to conduct data analysis jobs for you. Understand how they work, what they are good at, and their limitations. Get comfortable with prompting LLMs to carry out data jobs.

But if you want to give our Data Squad a go, you know where to find us!

Tomoro works with the most ambitious business & engineering leaders to realise the AI-native future of their organisation. We deliver agent-based solutions which fit seamlessly into businesses’ workforce; from design to build to scaled deployment.

Founded by experts with global experience in delivering applied AI solutions for tier 1 financial services, telecommunications and professional services firms, Tomoro’s mission is to help pioneer the reinvention of business through deeply embedded AI agents.

Powered by our world-class applied AI R&D team, working in close alliance with Open AI, we are a team of proven leaders in turning generative AI into market-leading competitive advantage for our clients.

We’re looking for a small number of the most ambitious clients to work with in this phase, if you think your organisation could be the right fit please get in touch.