Octuner - Multi-Provider LLM Optimizer¶
Optimize LLM providers, models, and parameters — without the guesswork.
Octuner is a lightweight library that solves the decision-making process when integrating with LLMs, especially in multi-step model chaining scenarios.
Why Octuner?¶
Building LLM applications often feels like solving a puzzle:
- Which provider? OpenAI, Gemini, Anthropic… or self-hosted (Ollama, vLLM, etc.)?
- Which model? GPT-4o, Gemini Pro, Claude…?
- Which parameters? Temperature, top-p, max_tokens…?
- How to balance quality, cost, and latency?
Things get harder with model chaining, where each step depends on the previous one:
Manual trial-and-error leads to inconsistent performance, wasted budget, and provider lock-in. Octuner removes the guesswork.
Quick Start¶
Build a tiny sentiment chain that first explains why, then outputs a single-word label. You’ll pass an explicit YAML config path so it’s ready for optimization.
1. Create your model chain¶
from octuner import MultiProviderTunableLLM
class SentimentChain:
def __init__(self, config_file: str):
# Reason step (clear explanation)
self.reasoner = MultiProviderTunableLLM(
config_file,
default_provider="openai",
default_model="gpt-4o-mini",
)
# Label step (concise single-word output)
self.labeler = MultiProviderTunableLLM(
config_file,
default_provider="gemini",
default_model="gemini-1.5-flash",
)
def _build_reason_prompt(self, text: str) -> str:
return (
"Explain the sentiment (positive/negative/neutral) of the text below. "
"Keep the reasoning short and specific.\n\n"
f"Text: {text}\n"
)
def _build_label_prompt(self, reasoning: str) -> str:
return (
"Given the reasoning below, respond with only one word: "
"positive | negative | neutral.\n\n"
f"Reasoning:\n{reasoning}\n"
)
def predict(self, text: str) -> dict:
reason = self.reasoner.call(self._build_reason_prompt(text)).text
label = self.labeler.call(self._build_label_prompt(reason)).text.strip().lower()
return {"sentiment": label, "why": reason}
2. Add a dataset and metric¶
dataset = [
{"input": "I love this!", "target": {"sentiment": "positive"}},
{"input": "This is awful.", "target": {"sentiment": "negative"}},
{"input": "It's fine.", "target": {"sentiment": "neutral"}},
]
def metric(output, target):
return 1.0 if output["sentiment"] == target["sentiment"] else 0.0
3. Optimize¶
from octuner import AutoTuner, apply_best
chain = SentimentChain("configs/llm.yaml") # explicit YAML config path
tuner = AutoTuner.from_component(
component=chain,
entrypoint=lambda c, x: c.predict(x),
dataset=dataset,
metric=metric,
)
# Focus on the most impactful knobs first
tuner.include([
"reasoner.provider_model", "reasoner.temperature",
"labeler.provider_model", "labeler.temperature",
])
result = tuner.search(max_trials=12, mode="pareto")
result.save_best("optimized_sentiment_chain.yaml")
apply_best(chain, "optimized_sentiment_chain.yaml")
print(chain.predict("The new UI is a joy to use."))
Full example¶
from octuner import MultiProviderTunableLLM, AutoTuner, apply_best
class SentimentChain:
def __init__(self, config_file: str):
self.reasoner = MultiProviderTunableLLM(
config_file,
default_provider="openai",
default_model="gpt-4o-mini",
)
self.labeler = MultiProviderTunableLLM(
config_file,
default_provider="gemini",
default_model="gemini-1.5-flash",
)
def _build_reason_prompt(self, text: str) -> str:
return (
"Explain the sentiment (positive/negative/neutral) of the text below. "
"Keep the reasoning short and specific.\n\n"
f"Text: {text}\n"
)
def _build_label_prompt(self, reasoning: str) -> str:
return (
"Given the reasoning below, respond with only one word: "
"positive | negative | neutral.\n\n"
f"Reasoning:\n{reasoning}\n"
)
def predict(self, text: str) -> dict:
reason = self.reasoner.call(self._build_reason_prompt(text)).text
label = self.labeler.call(self._build_label_prompt(reason)).text.strip().lower()
return {"sentiment": label, "why": reason}
dataset = [
{"input": "I love this!", "target": {"sentiment": "positive"}},
{"input": "This is awful.", "target": {"sentiment": "negative"}},
{"input": "It's fine.", "target": {"sentiment": "neutral"}},
]
def metric(output, target):
return 1.0 if output["sentiment"] == target["sentiment"] else 0.0
chain = SentimentChain("configs/llm.yaml")
tuner = AutoTuner.from_component(
component=chain,
entrypoint=lambda c, x: c.predict(x),
dataset=dataset,
metric=metric,
)
tuner.include([
"reasoner.provider_model", "reasoner.temperature",
"labeler.provider_model", "labeler.temperature",
])
result = tuner.search(max_trials=12, mode="pareto")
result.save_best("optimized_sentiment_chain.yaml")
apply_best(chain, "optimized_sentiment_chain.yaml")
print(chain.predict("The new UI is a joy to use."))
Key Features¶
Multi-Provider Optimization¶
Automatically discover the best combination of:
- Providers: OpenAI, Gemini, Anthropic, or self-hosted (Ollama, vLLM, etc.)
- Models: GPT-4o, Gemini Pro, Claude, or self-hosted (Llama, Mistral, etc.)
- Parameters: temperature, top_p, max_tokens, web search
- Capabilities: Web search, etc.
Multiple Optimization Modes¶
- Pareto: Balance quality, cost, and latency (default)
- Constrained: Maximize quality within cost/latency limits
- Scalarized: Optimize weighted combination of metrics
- Quality-focused: Maximize performance regardless of cost/time
- Cost-focused: Minimize spending while meeting quality thresholds
- Speed-focused: Optimize for fastest response within quality bounds
Flexible Parameter Control¶
providers:
openai:
model_capabilities:
gpt-4o-mini:
supported_parameters: [temperature, top_p, max_tokens]
parameter_ranges:
temperature: [0.0, 2.0]
max_tokens: [50, 4000]
default_parameters:
temperature: 0.7
max_tokens: 1000
Web Search Integration¶
- OpenAI: Built-in web search capabilities
- Gemini: Native Google grounding tool for web context
- Tunable: Let optimization decide when web search improves performance
Learn More¶
- Installation & Setup - Get started quickly
- Getting Started - Complete workflow, examples, and custom provider setup
- API Reference - Complete API documentation
- Contributing - How to contribute to Octuner
Octuner helps developers build better LLM applications by systematically optimizing the quality vs cost vs time triangle through explicit configuration management and data-driven parameter tuning.