
I have conversations every week with savvy, forward-thinking SaaS leaders. They see the power of AI, and they’re eager to apply it to their business. Inevitably, the same question comes up:
"This is amazing, but can't we just use the latest models from ChatGPT on our own company data to get these answers?"
It’s an honest question, and on the surface, it makes perfect sense. The tools are more accessible than ever. Why not just DIY it?
The simple answer is that the magic of getting reliable, game-changing insights from AI has very little to do with the AI model itself. The magic is in the data you feed it. Without an immense amount of preparation, asking an LLM to analyze your raw company data is like asking a world-class chef to make a gourmet meal out of a dumpster.
The result will be messy, unpredictable, and you definitely shouldn't bet your business on it. At Lumopath, we’ve dedicated our entire business to solving this, and I want to share a bit of the complex data work required to make AI a trustworthy strategic partner.
When you point an LLM at raw, unstructured data from your various work tools (think years of Slack messages, millions of emails, and endless document revisions), you run into three critical roadblocks:
Hallucinations: AI models are designed to find patterns and generate likely responses. When they encounter messy, incomplete, or contradictory data, they fill in the gaps by making things up. In a business context, this could mean inventing a customer complaint or misattributing the reason for a missed goal, leading you to solve a problem that doesn't actually exist.
Incomplete Takeaways & Context Window Limits: You can’t just upload your entire company’s digital history and ask a question. LLMs have a “context window,” a limit to how much information they can consider at once. You are forced to feed it small chunks of data, which means the AI never sees the full picture and provides superficial, out-of-context answers.
Lack of Business Nuance: An AI model doesn’t inherently know that an email with the subject "Acme QBR Prep" is an internal, proactive effort related to the "Acme Corp" account. Without being taught this context, it’s just another email. It's a sophisticated parrot, not a business strategist.
So, how do you get from raw data to reliable insights? It requires a robust, multi-layered data engine. This is something an organization could attempt to build in-house, but it is an incredibly costly and difficult undertaking.
Here is a simplified look at the data processing work we do at Lumopath before we let any LLM work its magic.

Step 1: Raw Data Extraction & Synthesis First, you need to pull data from everywhere work happens. That means direct API integrations with dozens of tools across your tech stack—from communications (email, calendar, chat) to context systems (CRM, HRIS) to ticketing and project management. This creates a single, massive pool of raw activity.
Step 2: Data Cleanup & Pre-Processing (The Unsung Hero) This step is critical and tedious. Our engine automatically cleans the raw data to ensure accuracy. Examples include:
De-duplicating activities (e.g., knowing a single meeting invite that goes to 10 people is one event, not ten).
Excluding automated "noise" (e.g., system-generated notifications from Jira or out-of-office replies).
Establishing cross-platform user identities to understand that activity from multiple email addresses or accounts belongs to one person.
Step 3: Identification & Labeling (Adding Business Context) This is where we turn raw data into business intelligence.
Account Labeling: We use a multi-pronged approach to tie activity to the right customer account. An internal email between two employees is correctly mapped to "Acme Corp" because the subject line mentions "Acme QBR prep." This is how we uncover the ~65% of "hidden work" that never gets logged in a CRM.
Topic Extraction & Categorization: Using our own AI models, we then label each activity with crucial business context. We identify if the activity was related to an "Escalation," "Onboarding," or "Renewal." We categorize it as "Proactive vs. Reactive" or "Internal vs. External."
Step 4: Time Mapping & Inference Finally, for activities that don’t have a built-in duration (like writing a document or a series of Slack messages), our algorithms intelligently infer the time spent based on dozens of contextual clues.
Only after this entire process is complete—after the data is extracted, cleaned, contextualized, and structured—can an LLM be layered on top to provide the powerful, data-driven decisions our partners rely on.
This foundational work is what allows leaders to move beyond anecdotes and ask the questions that truly matter:
"Which of my clients are taking up the most effort, and is that effort aligned with their revenue?"
"What are the specific, repeatable behaviors that separate my top performers from the rest of the team?"
"Where are the systemic process bottlenecks that are draining my team's capacity?"
While LLMs will continue to evolve and become more powerful, their value will always be constrained by the quality of the data they are fed. Our mission at Lumopath is to do the hard part for you, transforming your organization's digital exhaust into your most valuable strategic asset.
Written by: Mikey Renan
Cofounder @ Lumopath