
After months of work, we’re excited to release Agentune Analyze & Improve v1.0 – the next major milestone in autonomous, performance-driven agent engineering
If you’ve shipped a real customer-facing agent, you’ve probably hit the same wall as everyone else:
Agentune is the open-source answer to this: an engine for the Analyze → Improve → Evaluate loop of agent performance optimization.

Agentune Analyze will produce recommendations for agent improvements (prompt, workflows, additional data, tools) like:
Recommendation: If the user hasn’t mentioned their decision timeline by minute three, insert a gentle timeline probe.
Finding: Deals with early timeline qualification show 2× higher conversion rates.
Evidence example: In several unsuccessful calls, the rep waits until the final minute to ask about timing, discovering too late that the buyer was not ready to purchase this quarter.
Recommendation: End the call with a clear next step or calendar commitment.
Finding: While 50% of successful calls include a scheduled follow-up, only 8% of unsuccessful calls do.
Evidence example: In several conversations, the agent promises to “send an email” but fails to confirm a specific date/time, and the customer ends the call without agreeing to any follow-up.
Recommendation: Insert a clarification step when the user provides partial account information but expresses urgency.
Finding: Calls with incomplete account details plus urgency lead to repeated back-and-forth in 30% of the cases and 45% longer handle times.
Evidence example: In several conversations, users provide only their first name and last four digits, say “this is urgent,” and the agent attempts troubleshooting before verifying full account identity, causing delays.
Recommendation: Trigger a confirmation step when the user gives contradictory details (e.g., mismatched dates or amounts).
Finding: In 30% of conversations with contradictions, unresolved confusion leads to compliance-risk notes or follow-up calls.
Evidence example: In multiple calls, customers state two different withdrawal amounts, and the agent proceeds without clarification, resulting in incorrect case filings.
Several months ago we released Agentune-Simulate, a customer - agent conversation simulator that lets you test agents offline, generate data and evaluate outcomes of the conversations (e.g. sales conversion). Now we’re expanding that into a full optimization stack with:
The next release will go further: connecting the dots between conversations and structured data (CRM, ERP, product catalog, etc.), to further contextualize the insights and ground them in enterprise data.
Before talking about analysis and improvement, it’s worth recapping what’s already there.
Agentune-Simulate lets you build a “twin” customer simulator from real conversations. It learns the distributions of intents, phrasing and flows, and then uses that to simulate new conversations against your agent – before you go live.
Typical workflow:
At a code level, it looks roughly like this (from the PyPI quickstart):
pip install agentune-simulatefrom agentune.simulate import SimulationSessionBuilder
from langchain_openai import ChatOpenAI
# 1. Prepare or load your outcomes and vector store
outcomes = ... # outcome labels per conversation / scenario
vector_store = ... # semantic index over your real conversations
# 2. Build a simulation session
session = SimulationSessionBuilder(
default_chat_model=ChatOpenAI(model="gpt-4o"),
outcomes=outcomes,
vector_store=vector_store,
).build()
# 3. Run simulations against your agent
results = await session.run_simulation(real_conversations=conversations)With just that, you can:
This was the first release: a solid Evaluate layer.
The new package agentune-analyze fills in the Analyze → Improve part of the loop.
At a high level, it:
It’s designed to “turn real conversations into insights that measurably improve your AI agents” by replacing intuition-driven tuning with evidence-driven decisions.
Before a tool like this, you likely had:
But you didn’t have:
Agentune Analyze & Improve is explicitly built to do that.
You provide:
Agentune:
Output: a set of interpretable drivers, each with:
Given those drivers, the Improve component:
Those recommendations then flow naturally into:
A minimal runnable example demonstrating the complete Agentune Analyze workflow: loading conversation data, running analysis, and generating action recommendations.
pip install agentune-analyze
export OPENAI_API_KEY="your-api-key"import asyncio
import os
from pathlib import Path
from agentune.analyze.api.base import LlmCacheOnDisk, RunContext
from agentune.analyze.feature.problem import ProblemDescription
async def main() -> None:
data_dir = Path(__file__).parent / 'data'
# Define the problem
problem = ProblemDescription(
target_column='outcome',
problem_type='classification',
target_desired_outcome='process paused - customer needs to consider the offer',
name='Customer Service Conversation Outcome Prediction',
description='Analyze auto insurance conversations and suggest improvements',
target_description='The final outcome of the conversation'
)
# Create run context with LLM caching
async with await RunContext.create(
llm_cache=LlmCacheOnDisk(str(Path(__file__).parent / 'llm_cache.db'), 300_000_000)
) as ctx:
# Load data
conversations_table = await ctx.data.from_csv(
str(data_dir / 'conversations.csv')
).copy_to_table('conversations')
messages_table = await ctx.data.from_csv(
str(data_dir / 'messages.csv')
).copy_to_table('messages')
# Configure join strategy
join_strategy = messages_table.join_strategy.conversation(
name='messages',
main_table_key_col='conversation_id',
key_col='conversation_id',
timestamp_col='timestamp',
role_col='author',
content_col='message'
)
# Split data
split_data = await conversations_table.split(train_fraction=0.9)
# Run analysis
results = await ctx.ops.analyze(
problem_description=problem,
main_input=split_data,
secondary_tables=[messages_table],
join_strategies=[join_strategy]
)
# Generate recommendations
recommendations = await ctx.ops.recommend_actions(
analyze_input=split_data,
analyze_results=results,
recommender=ctx.defaults.conversation_action_recommender()
)
if __name__ == '__main__':
if 'OPENAI_API_KEY' not in os.environ:
raise ValueError('Please set OPENAI_API_KEY environment variable')
asyncio.run(main())cd agentune_analyze/examples
python e2e_simple_example.py
Agentune Analyze includes utilities to generate interactive HTML dashboards for exploring your results. These are convenience tools to help visualize outputs so you can integrate them into your applications.
The interactive analysis dashboard includes:
Note: The R² (R-squared) metric shows what percentage of variance in the target outcome is explained by each feature. Values range from 0 (no predictive power) to 1 (perfect prediction). Higher values indicate stronger predictive features.
For examples of how to create dashboards, check out the Getting Started Notebook.
Example dashboard:

The recommendations dashboard provides:
Note: You can also access the full text report programmatically using recommendations.raw_report if needed for further processing.
Example dashboard:

All Agentune code can be used as a python library. It has no dependencies other than Python packages, and makes no additional assumptions about the environment it runs in unless told to by user code. It doesn't require any external services or processes, and it doesn't run any sub-processes by default.
This is linked to two product requirements:
For detailed information on architecture and design principles, see the Architecture Guide.
If you’re already running agents in production, here’s a concrete path:
Open an issue on GitHub or contact the maintainers. See the main README for details.
Reach us at agentune-dev@sparkbeyond.com
SparkBeyond delivers AI for Always-Optimized operations. Our Always-Optimized™ platform extends Generative AI's reasoning capabilities to KPI optimization, enabling enterprises to constantly monitor performance metrics and receive AI-powered recommendations that drive measurable improvements across operations.
The Always-Optimized™ platform combines battle-tested machine learning techniques for structured data analysis with Generative AI capabilities, refined over more than a decade of enterprise deployments. Our technology enables dynamic feature engineering, automatically discovering complex patterns across disparate data sources and connecting operational metrics with contextual factors to solve the hardest challenges in customer and manufacturing operations. Since 2013, SparkBeyond has delivered over $1B in operational value for hundreds of Fortune 500 companies and partners with leading System Integrators to ensure seamless deployment across customer and manufacturing operations. Learn more at SparkBeyond.com or follow us on LinkedIn.
Apply key dataset transformations through no/low-code workflows to clean, prep, and scope your datasets as needed for analysis
Apply key dataset transformations through no/low-code workflows to clean, prep, and scope your datasets as needed for analysis
Apply key dataset transformations through no/low-code workflows to clean, prep, and scope your datasets as needed for analysis
Apply key dataset transformations through no/low-code workflows to clean, prep, and scope your datasets as needed for analysis
Apply key dataset transformations through no/low-code workflows to clean, prep, and scope your datasets as needed for analysis
Apply key dataset transformations through no/low-code workflows to clean, prep, and scope your datasets as needed for analysis