Case Study: Automating Quarterly Reports with an AI-Powered Workflow

Introduction

At one of our productized services for Ruby and Rails maintenance (FastRuby.io opens a new window ), we spend a lot of time thinking making our processes maintainable, efficient, and adaptable over time.

That mindset is what led us to build an internal AI system to automate one of the most important (and time-consuming) parts of our fixed-cost monthly maintenance service (Bonsai): Our quarterly progress reports.

This article walks through how we mapped a very human, manual process to a structured AI-driven workflow and what we learned along the way.

What Is Bonsai?

Bonsai is our signature fixed-cost, monthly maintenance service for Ruby and Rails applications opens a new window

The idea is simple:

Gradually pay down technical debt
Keep applications secure, stable, and maintainable
Move at a pace that makes sense for each customer

Every quarter, we deliver a detailed report that helps clients understand the technical health of their application and where we are heading next.

Each report has three core sections:

Next Priorities: What we recommend working on next, based on impact and risk.
Accomplishments: A summary of what was completed during the quarter.
Additional Findings: Important observations related to security, complexity, and dependencies.

These reports are valuable but producing them is also unbillable work, which is what sparked this project.

The Original (Very Human) Workflow

Before AI entered the picture, the process looked roughly like this:

1. Gather Data

We collected data from three main sources:

Jira tickets
GitHub pull requests
Static analysis tools (RubyCritic, Brakeman, Bundler Audit, etc.)

2. Analyze, Synthesize, Summarize

This is the most time-consuming part. We extract information from the raw data to build each section of the report by:

Cross-reference Jira tickets with merged PRs
Extract meaningful accomplishments
Review static analysis results and group findings
Infer next priorities based on gaps, risks, and client goals

3. Write the Report

Writing the report is straightforward once you complete the previous steps. It involves writing all of the raw content into a polished, client-ready document. This content must then be structured and validated to match previous Bonsai reports, ensuring consistent style and progress.

Finally, the entire report must be thoroughly reviewed to double-check for accuracy and completeness before finalization.

This worked but it didn’t scale. And it relied heavily on senior engineers’ time.

Mapping the Process to an AI-Powered Workflow

Instead of jumping straight to a “do everything” agent, we broke the workflow into explicit, specialized components that mirror the real-world process.

Think less autonomous robot, more well-run assembly line with three key stages:

Data Providers
Curators
Reviewers

Data Providers

The data provider layer serves as the initial stage, tasked with gathering and standardizing raw input. This involves collecting data, specifically Jira issues, and linking them to their respective GitHub pull requests.

Furthermore, this layer offers access to the outputs of various static analysis tools, such as RubyCritic, Brakeman, and Bundler Audit.

At this stage, no summarization or interpretation happens. The goal is boring, reliable data access. Exactly what we want.

Curators

Curators take raw data and turn it into structured, usable inputs for report sections.

Accomplishments

Accepts the combined data from Jira and GitHub. Then, with help from an LLM we extract key facts and enrich each completed task with three to seven bullet points.

The prompt for these bullet points looks like this:

For each task, create 3-7 bullet points that capture:
  - **Primary deliverable**: What was completed (reference PR URLs for traceability)
  - **Technical highlights**: Key implementation details from commit summaries
  - **Problem-solving**: Issues resolved or decisions made
  - **Business value**: Why this work matters (when evident from the data)

Additional Findings

With help from an LLM, we group static analysis results into three categories: Security, Complexity, and Dependencies.

Next Priorities

With an LLM, we infer logical action items based on gaps identified in Accomplishments and Additional Findings.

At this point, we have content, but not something we would present to a customer.

That’s where reviewers come in.

Reviewers: The Quality Gate

Reviewers are a key part of this system.

For Accomplishments & Additional Findings

Each subsection is reviewed by an LLM tasked with:

Accuracy Check: Does the content reflect the underlying data correctly?
Completeness Check: Does it tell a complete, coherent story?
Quality Assessment: How does it compare to real Bonsai reports in tone, clarity, and structure? We provide examples from previous high-quality reports to guide this step.
Decision: The options are to approve the draft, revise it with an improved version, or request more information.

This is an example from the prompt we used:

1. Accuracy check: Does the subsection content accurately reflect all key information from the bullet points
and task summary and description
2. Completeness check: Does the content paragraph tell a complete story?
3. Quality assessment: Compare the title and the content paragraph with the provided examples from previous
report for content completeness, tone, and style
4. Decision: Decide to either approve, provide an improved draft or request additional information

The reviewer returns a structured JSON response that includes the following fields:

Accuracy: pass/fail based on accuracy check
Completeness: pass/fail based on completeness assessment
Quality: excellent/good/neutral/poor/bad based on quality assessment
Decision: approve/request_info/revise based on final decision made
Revised list: [“…”] if the decision is to revise, your revised list.
Reasoning: 2-3 sentences explaining your decision

The JSON structure looks like this when a revision is needed:

{
  "accuracy": "pass",
  "completeness": "pass",
  "quality": "good",
  "decision": "revise",
  "revised_subsection": {
    "title": "...",
    "body": "..."
  },
  "reasoning": "..."
}

For Next Priorities

The review prompt returns the same JSON structure but the prompt is slightly different for the next priorities section of the report. It focuses on:

Logical correctness
High-value coverage (one item per theme)
Professional writing quality

This is an example from the prompt we used:

1. Accuracy check: Does the list of next priorities accurately reflect the logical next steps for the project
as they can be inferred from the data provided?
2. Completeness check: Does the list contain all high-value items that can be inferred from the data,
keeping it to ONE single item per theme?
3. Quality assessment: Does the writing meet professional standards?
4. Decision: Decide to either approve, provide an improved draft or request additional information.

The output is still structured and machine-readable, making it easy to act on programmatically.

From JSON to Report

Once all sections pass review, the resulting JSON objects are assembled into a complete quarterly report.

At that point, an engineer who worked on the project does a final human review, not from scratch, but from a strong, high-quality draft.

The result: faster reports, consistent quality, and far less cognitive overhead.

Why This Project Matters

Quarterly reports are essential to the Bonsai service, but they’re also unbillable overhead.

By automating the most time-consuming parts of the process:

We reduce internal cost
We keep Bonsai pricing competitive
We maintain (and often improve) report quality

Most importantly, this is not about replacing engineers. It’s about letting LLMs do what they’re good at:

Synthesizing large amounts of data
Producing consistent first drafts
Acting as tireless reviewers

That frees our software engineering team to focus on judgment, context, and real problem-solving work.

What We Learned

Here are a few lessons learned from this internal project.

1. Agents vs. Workflows Matter

Not every problem needs a ReAct-style agent.

In many cases, a well-defined workflow of specialized components is:

Easier to reason about
Easier to debug
Easier to improve incrementally

2. Start from the Real World

Mapping an existing human process to AI works far better than inventing a “clever” AI-first design.

If humans already do it well, you can usually teach a machine to help.

3. Reviewers Are Non-Negotiable

Generation is cheap. Quality is not.

Explicit review steps, especially when guided by real examples, are what make the output trustworthy enough for client-facing work.

This project is one of our favorite examples of using AI to augment engineering rather than replacing them. As most good internal tools do, it started with a simple question:

“Why are we still doing this the hard way?”

Turns out, we didn’t have to.

If you’re maintaining a Ruby application and want to reduce overhead without sacrificing quality, get in touch to learn more about Bonsai opens a new window and how we apply AI where it actually makes sense.

If you’re interested in a similar, custom AI solution to some of your most time-consuming tasks, get in touch with us opens a new window and we would be happy to help.

OmbuLabs.ai is a Philadelphia-based software boutique building custom AI solutions opens a new window . Your data holds untapped potential. We can turn it into a competitive advantage 🚀

Our AI Services

Turn your data into a competitive advantage

View AI Services opens a new window

Generative Ai