Generative Ai
Case Study: Automating Quarterly Reports with an AI-Powered Workflow
Introduction
At one of our productized services for Ruby and Rails maintenance (FastRuby.io ), we spend a lot of time thinking making our processes maintainable, efficient, and adaptable over time.
That mindset is what led us to build an internal AI system to automate one of the most important (and time-consuming) parts of our fixed-cost monthly maintenance service (Bonsai): Our quarterly progress reports.
This article walks through how we mapped a very human, manual process to a structured AI-driven workflow and what we learned along the way.
What Is Bonsai?
Bonsai is our signature fixed-cost, monthly maintenance service for Ruby and Rails applications
The idea is simple:
- Gradually pay down technical debt
- Keep applications secure, stable, and maintainable
- Move at a pace that makes sense for each customer
Every quarter, we deliver a detailed report that helps clients understand the technical health of their application and where we are heading next.
Each report has three core sections:
- Next Priorities: What we recommend working on next, based on impact and risk.
- Accomplishments: A summary of what was completed during the quarter.
- Additional Findings: Important observations related to security, complexity, and dependencies.
These reports are valuable but producing them is also unbillable work, which is what sparked this project.
The Original (Very Human) Workflow
Before AI entered the picture, the process looked roughly like this:
1. Gather Data
We collected data from three main sources:
- Jira tickets
- GitHub pull requests
- Static analysis tools (RubyCritic, Brakeman, Bundler Audit, etc.)
2. Analyze, Synthesize, Summarize
This is the most time-consuming part. We extract information from the raw data to build each section of the report by:
- Cross-reference Jira tickets with merged PRs
- Extract meaningful accomplishments
- Review static analysis results and group findings
- Infer next priorities based on gaps, risks, and client goals
3. Write the Report
Writing the report is straightforward once you complete the previous steps. It involves writing all of the raw content into a polished, client-ready document. This content must then be structured and validated to match previous Bonsai reports, ensuring consistent style and progress.
Finally, the entire report must be thoroughly reviewed to double-check for accuracy and completeness before finalization.
This worked but it didn’t scale. And it relied heavily on senior engineers’ time.
Mapping the Process to an AI-Powered Workflow
Instead of jumping straight to a “do everything” agent, we broke the workflow into explicit, specialized components that mirror the real-world process.
Think less autonomous robot, more well-run assembly line with three key stages:
- Data Providers
- Curators
- Reviewers
Data Providers
The data provider layer serves as the initial stage, tasked with gathering and standardizing raw input. This involves collecting data, specifically Jira issues, and linking them to their respective GitHub pull requests.
Furthermore, this layer offers access to the outputs of various static analysis tools, such as RubyCritic, Brakeman, and Bundler Audit.
At this stage, no summarization or interpretation happens. The goal is boring, reliable data access. Exactly what we want.
Curators
Curators take raw data and turn it into structured, usable inputs for report sections.
Accomplishments
Accepts the combined data from Jira and GitHub. Then, with help from an LLM we extract key facts and enrich each completed task with three to seven bullet points.
The prompt for these bullet points looks like this:
For each task, create 3-7 bullet points that capture:
- **Primary deliverable**: What was completed (reference PR URLs for traceability)
- **Technical highlights**: Key implementation details from commit summaries
- **Problem-solving**: Issues resolved or decisions made
- **Business value**: Why this work matters (when evident from the data)
Additional Findings
With help from an LLM, we group static analysis results into three categories: Security, Complexity, and Dependencies.
Next Priorities
With an LLM, we infer logical action items based on gaps identified in Accomplishments and Additional Findings.
At this point, we have content, but not something we would present to a customer.
That’s where reviewers come in.
Reviewers: The Quality Gate
Reviewers are a key part of this system.
For Accomplishments & Additional Findings
Each subsection is reviewed by an LLM tasked with:
- Accuracy Check: Does the content reflect the underlying data correctly?
- Completeness Check: Does it tell a complete, coherent story?
- Quality Assessment: How does it compare to real Bonsai reports in tone, clarity, and structure? We provide examples from previous high-quality reports to guide this step.
- Decision: The options are to approve the draft, revise it with an improved version, or request more information.
This is an example from the prompt we used:
1. Accuracy check: Does the subsection content accurately reflect all key information from the bullet points
and task summary and description
2. Completeness check: Does the content paragraph tell a complete story?
3. Quality assessment: Compare the title and the content paragraph with the provided examples from previous
report for content completeness, tone, and style
4. Decision: Decide to either approve, provide an improved draft or request additional information
The reviewer returns a structured JSON response that includes the following fields:
- Accuracy: pass/fail based on accuracy check
- Completeness: pass/fail based on completeness assessment
- Quality: excellent/good/neutral/poor/bad based on quality assessment
- Decision: approve/request_info/revise based on final decision made
- Revised list: [“…”] if the decision is to revise, your revised list.
- Reasoning: 2-3 sentences explaining your decision
The JSON structure looks like this when a revision is needed:
{
"accuracy": "pass",
"completeness": "pass",
"quality": "good",
"decision": "revise",
"revised_subsection": {
"title": "...",
"body": "..."
},
"reasoning": "..."
}
For Next Priorities
The review prompt returns the same JSON structure but the prompt is slightly different for the next priorities section of the report. It focuses on:
- Logical correctness
- High-value coverage (one item per theme)
- Professional writing quality
This is an example from the prompt we used:
1. Accuracy check: Does the list of next priorities accurately reflect the logical next steps for the project
as they can be inferred from the data provided?
2. Completeness check: Does the list contain all high-value items that can be inferred from the data,
keeping it to ONE single item per theme?
3. Quality assessment: Does the writing meet professional standards?
4. Decision: Decide to either approve, provide an improved draft or request additional information.
The output is still structured and machine-readable, making it easy to act on programmatically.
From JSON to Report
Once all sections pass review, the resulting JSON objects are assembled into a complete quarterly report.
At that point, an engineer who worked on the project does a final human review, not from scratch, but from a strong, high-quality draft.
The result: faster reports, consistent quality, and far less cognitive overhead.
Why This Project Matters
Quarterly reports are essential to the Bonsai service, but they’re also unbillable overhead.
By automating the most time-consuming parts of the process:
- We reduce internal cost
- We keep Bonsai pricing competitive
- We maintain (and often improve) report quality
Most importantly, this is not about replacing engineers. It’s about letting LLMs do what they’re good at:
- Synthesizing large amounts of data
- Producing consistent first drafts
- Acting as tireless reviewers
That frees our software engineering team to focus on judgment, context, and real problem-solving work.
What We Learned
Here are a few lessons learned from this internal project.
1. Agents vs. Workflows Matter
Not every problem needs a ReAct-style agent.
In many cases, a well-defined workflow of specialized components is:
- Easier to reason about
- Easier to debug
- Easier to improve incrementally
2. Start from the Real World
Mapping an existing human process to AI works far better than inventing a “clever” AI-first design.
If humans already do it well, you can usually teach a machine to help.
3. Reviewers Are Non-Negotiable
Generation is cheap. Quality is not.
Explicit review steps, especially when guided by real examples, are what make the output trustworthy enough for client-facing work.
This project is one of our favorite examples of using AI to augment engineering rather than replacing them. As most good internal tools do, it started with a simple question:
“Why are we still doing this the hard way?”
Turns out, we didn’t have to.
If you’re maintaining a Ruby application and want to reduce overhead without sacrificing quality, get in touch to learn more about Bonsai and how we apply AI where it actually makes sense.
If you’re interested in a similar, custom AI solution to some of your most time-consuming tasks, get in touch with us and we would be happy to help.
OmbuLabs.ai is a Philadelphia-based software boutique building custom AI solutions . Your data holds untapped potential. We can turn it into a competitive advantage 🚀