SHIPIT Agent¶

v1.0.7 — Agents for every role

New in 1.0.7: 12 new tools (GitHub · GitLab · SQL · Vision · PDF · LangSmith/OTel trace exporters · Figma · Salesforce · Stripe · Google Sheets · Zendesk · read-only LinkedIn search) and 9 new specialist personas (code-reviewer-bot · release-engineer · figma-designer · sales-rep · account-executive · sales-ops · recruiter · finance-analyst · customer-support-agent). shipit-agent now ships agents for every role, not just developers. See the changelog for the full story.

v1.0.6 — Bulletproof 24h Autopilot, Dashboard Renderer, LiteLLM Proxy

New in 1.0.6: Autopilot hardened for 24-hour runs (cumulative budgets across resume · SIGTERM-safe · end-to-end dollar tracking · corrupt-checkpoint quarantine), a new render_dashboard tool the agent drives to produce Claude-Desktop-style HTML one-pagers, and first-class LiteLLM-proxy support — plug every agent into your own proxy in three fields.

v1.0.5 — Prebuilt Agents, ShipCrew, Notifications, Cost Tracking

New in 1.0.5: 40 prebuilt agent personas, ShipCrew DAG orchestration, Slack/Discord/Telegram notifications, and cost tracking with budgets. See the changelog for the verified notebook and test coverage shipped in this repo.

SHIPIT Agent is a standalone Python agent library focused on a clean runtime:

bring your own LLM — or use any of seven built-in provider adapters
attach Python tools, remote MCP servers, or connector-style third-party tools (Gmail, Drive, Slack, Linear, Notion, Jira, Confluence)
attach packaged or custom skills to steer agent behavior and reusable workflows
iterate tool-using agents with configurable retry and router policies
stream structured events (including reasoning / thinking blocks) as they happen
inspect every step: reasoning, tool arguments, tool outputs, retries, final answer
compose reusable agent profiles with system prompts and tool selections locked in
keep clean boundaries between runtime, tools, MCP, policies, and profiles

Built for developers who want the agent loop observable, interchangeable, and out of the way.

Install¶

pip install shipit-agent

With optional extras:

pip install 'shipit-agent[openai]'         # OpenAI SDK
pip install 'shipit-agent[anthropic]'      # Anthropic SDK (native thinking blocks)
pip install 'shipit-agent[litellm]'        # LiteLLM (Bedrock, Gemini, Groq, Together, …)
pip install 'shipit-agent[playwright]'     # In-process browser for open_url and web_search
pip install 'shipit-agent[all]'            # Everything

30-second example¶

from shipit_agent import Agent
from shipit_agent.llms import OpenAIChatLLM

agent = Agent.with_builtins(llm=OpenAIChatLLM(model="gpt-4o-mini"))

for event in agent.stream("Search the web for today's Bitcoin price in USD."):
    print(event.type, event.message)

Emits events like:

run_started           Agent run started
step_started          LLM completion started
reasoning_started     🧠 Model reasoning started
reasoning_completed   🧠 Model reasoning completed
tool_called           Tool called: web_search
tool_completed        Tool completed: web_search
run_completed         Agent run completed

Why SHIPIT Agent¶

Live reasoning events

Extended thinking blocks from o1/o3/gpt-5/Claude/gpt-oss are automatically extracted and streamed as reasoning_started / reasoning_completed events. Your UI can show a live "Thinking" panel for free.

Reasoning guide
Truly incremental streaming

agent.stream() runs the agent on a background thread and yields events through a queue as they happen. Works in Jupyter, VS Code, WebSocket, SSE, and terminals.

Streaming guide
Bulletproof Bedrock tool pairing

Every toolUse gets a paired toolResult. Planner output is injected as user context, not orphan tool-results. Hallucinated tool names get synthetic error results. Multi-iteration Bedrock loops just work.

Architecture
Semantic tool discovery

tool_search lets the agent ask "which tool should I use for X?" and get a ranked shortlist. No more 28-tool context bloat, no more tool hallucinations.

Tool search guide
Zero-friction provider switching

Edit one line in .env — SHIPIT_LLM_PROVIDER=openai — and build_llm_from_env() does the rest. Seven providers supported out of the box.

Environment setup
Playwright-powered open_url

In-process Chromium fetches JS-rendered pages with a realistic UA, handles anti-bot 503s, and falls back to stdlib urllib if Playwright isn't installed. No external scraper services.

Prebuilt tools
Parallel tool execution

When the LLM returns multiple tool calls, run them concurrently with parallel_tool_execution=True. Results stay in order. Typically 2-3x faster for multi-tool turns.

Parallel execution guide
Hooks & middleware

AgentHooks with @on_before_llm, @on_after_llm, @on_before_tool, @on_after_tool for cost tracking, rate limiting, content filtering, and guardrails. No subclassing.

Hooks guide
:material-async: Async runtime

AsyncAgentRuntime with async run() and async stream() for FastAPI, Starlette, and modern async Python. Same features as the sync runtime.

Async guide
Graceful error recovery

Tool failures produce error messages instead of crashing the run. The LLM sees the error and can try a different approach. Safer retry defaults prevent retrying on bugs.

Error recovery guide

Next steps¶

Install and run the quick start — get an agent running in five minutes
Explore streaming events — understand the 14 event types and what they carry
Reasoning and thinking steps — render a live "Thinking" panel in your UI
Create a custom tool — build a new tool from scratch
Use skills — packaged skills, custom skills, Agent, and DeepAgent workflows
MCP integration — attach remote MCP servers to extend capabilities
Parallel tool execution — speed up multi-tool turns
Hooks & middleware — add cost tracking, logging, and guardrails
Async runtime — use with FastAPI and async Python
Context window management — track tokens and manage context limits
Error recovery — graceful failure handling and retry policies

Try it now — runnable examples¶

The repo ships with 7 numbered, copy-pasteable examples covering every major feature. Pick one and run it in 30 seconds.

#	What	Run
1	Hello, agent. The shortest possible runnable example	`python examples/01_hello_agent.py`
2	Live streaming with colored reasoning events	`python examples/02_streaming_with_reasoning.py`
3	Same agent, 5 different LLM providers back-to-back	`python examples/03_provider_swap.py`
4	End-to-end research workflow with web search + URL fetching	`python examples/04_research_agent.py "your question"`
5	Custom tools — function-style and class-style	`python examples/05_custom_tool.py`
6	Persistent chat session with file-backed memory	`python examples/06_chat_session.py`
7	Semantic tool discovery with `tool_search`	`python examples/07_tool_search.py`

See the full examples README →

Provider compatibility matrix¶

Provider	Reasoning blocks	Tool calling	Streaming	Bedrock pairing	Built-in tools
OpenAI (`o1`, `o3`, `o4`, `gpt-5`)	✅ Native	✅	✅	n/a	✅
OpenAI (`gpt-4o`, `gpt-4o-mini`)	❌	✅	✅	n/a	✅
Anthropic (`claude-opus-4`, `claude-3.7`)	✅ Native (with `thinking_budget_tokens`)	✅	✅	n/a	✅
AWS Bedrock (`gpt-oss-120b`)	✅ Via LiteLLM	✅	✅	✅ Bulletproof	✅
AWS Bedrock (`anthropic.claude-*`)	✅ Via LiteLLM	✅	✅	✅ Bulletproof	✅
Google Gemini (`gemini-1.5-pro`)	❌	✅	✅	n/a	✅
Google Vertex AI	❌	✅	✅	n/a	✅
Groq (`llama-3.3-70b`)	❌	✅	✅	n/a	✅
Together AI	❌	✅	✅	n/a	✅
Ollama (local)	❌	✅	✅	n/a	✅
DeepSeek R1 (via LiteLLM proxy)	✅ Native	✅	✅	n/a	✅
LiteLLM Proxy (self-hosted gateway)	✅ Pass-through	✅	✅	n/a	✅

Tip: if you want a "Thinking" panel UI without paying for o1/Claude, AWS Bedrock's openai.gpt-oss-120b-1:0 is the cheapest reasoning-capable model in the matrix and ships with Agent.with_builtins(llm=BedrockChatLLM()) out of the box.

What you get vs. what you don't¶

✅ shipit-agent does	❌ shipit-agent does NOT do
Run agents with tools, MCP, memory, sessions	Train models or fine-tune
Stream events incrementally as they happen	Provide a hosted control plane
Extract reasoning blocks from any provider	Replace LangChain / LangGraph / CrewAI wholesale
Guarantee Bedrock tool-pairing correctness	Manage your cloud infrastructure
Support 9 LLM providers via one API	Lock you into a specific vendor
Ship with 28+ built-in tools	Force you to use any of them
Stay out of your way (small, focused runtime)	Hide the agent loop behind abstractions

This is a library, not a framework. The runtime is small enough to read in one sitting (shipit_agent/runtime.py is under 400 lines). Bring your own LLM, tools, and storage; the runtime composes them and gets out of the way.