Skip to content

SHIPIT Agent

v1.0.7 — Agents for every role

New in 1.0.7: 12 new tools (GitHub · GitLab · SQL · Vision · PDF · LangSmith/OTel trace exporters · Figma · Salesforce · Stripe · Google Sheets · Zendesk · read-only LinkedIn search) and 9 new specialist personas (code-reviewer-bot · release-engineer · figma-designer · sales-rep · account-executive · sales-ops · recruiter · finance-analyst · customer-support-agent). shipit-agent now ships agents for every role, not just developers. See the changelog for the full story.

v1.0.6 — Bulletproof 24h Autopilot, Dashboard Renderer, LiteLLM Proxy

New in 1.0.6: Autopilot hardened for 24-hour runs (cumulative budgets across resume · SIGTERM-safe · end-to-end dollar tracking · corrupt-checkpoint quarantine), a new render_dashboard tool the agent drives to produce Claude-Desktop-style HTML one-pagers, and first-class LiteLLM-proxy support — plug every agent into your own proxy in three fields.

v1.0.5 — Prebuilt Agents, ShipCrew, Notifications, Cost Tracking

New in 1.0.5: 40 prebuilt agent personas, ShipCrew DAG orchestration, Slack/Discord/Telegram notifications, and cost tracking with budgets. See the changelog for the verified notebook and test coverage shipped in this repo.

SHIPIT Agent is a standalone Python agent library focused on a clean runtime:

  • bring your own LLM — or use any of seven built-in provider adapters
  • attach Python tools, remote MCP servers, or connector-style third-party tools (Gmail, Drive, Slack, Linear, Notion, Jira, Confluence)
  • attach packaged or custom skills to steer agent behavior and reusable workflows
  • iterate tool-using agents with configurable retry and router policies
  • stream structured events (including reasoning / thinking blocks) as they happen
  • inspect every step: reasoning, tool arguments, tool outputs, retries, final answer
  • compose reusable agent profiles with system prompts and tool selections locked in
  • keep clean boundaries between runtime, tools, MCP, policies, and profiles

Built for developers who want the agent loop observable, interchangeable, and out of the way.


Install

pip install shipit-agent

With optional extras:

pip install 'shipit-agent[openai]'         # OpenAI SDK
pip install 'shipit-agent[anthropic]'      # Anthropic SDK (native thinking blocks)
pip install 'shipit-agent[litellm]'        # LiteLLM (Bedrock, Gemini, Groq, Together, …)
pip install 'shipit-agent[playwright]'     # In-process browser for open_url and web_search
pip install 'shipit-agent[all]'            # Everything

30-second example

from shipit_agent import Agent
from shipit_agent.llms import OpenAIChatLLM

agent = Agent.with_builtins(llm=OpenAIChatLLM(model="gpt-4o-mini"))

for event in agent.stream("Search the web for today's Bitcoin price in USD."):
    print(event.type, event.message)

Emits events like:

run_started           Agent run started
step_started          LLM completion started
reasoning_started     🧠 Model reasoning started
reasoning_completed   🧠 Model reasoning completed
tool_called           Tool called: web_search
tool_completed        Tool completed: web_search
run_completed         Agent run completed

Why SHIPIT Agent

  • Live reasoning events


    Extended thinking blocks from o1/o3/gpt-5/Claude/gpt-oss are automatically extracted and streamed as reasoning_started / reasoning_completed events. Your UI can show a live "Thinking" panel for free.

    Reasoning guide

  • Truly incremental streaming


    agent.stream() runs the agent on a background thread and yields events through a queue as they happen. Works in Jupyter, VS Code, WebSocket, SSE, and terminals.

    Streaming guide

  • Bulletproof Bedrock tool pairing


    Every toolUse gets a paired toolResult. Planner output is injected as user context, not orphan tool-results. Hallucinated tool names get synthetic error results. Multi-iteration Bedrock loops just work.

    Architecture

  • Semantic tool discovery


    tool_search lets the agent ask "which tool should I use for X?" and get a ranked shortlist. No more 28-tool context bloat, no more tool hallucinations.

    Tool search guide

  • Zero-friction provider switching


    Edit one line in .envSHIPIT_LLM_PROVIDER=openai — and build_llm_from_env() does the rest. Seven providers supported out of the box.

    Environment setup

  • Playwright-powered open_url


    In-process Chromium fetches JS-rendered pages with a realistic UA, handles anti-bot 503s, and falls back to stdlib urllib if Playwright isn't installed. No external scraper services.

    Prebuilt tools

  • Parallel tool execution


    When the LLM returns multiple tool calls, run them concurrently with parallel_tool_execution=True. Results stay in order. Typically 2-3x faster for multi-tool turns.

    Parallel execution guide

  • Hooks & middleware


    AgentHooks with @on_before_llm, @on_after_llm, @on_before_tool, @on_after_tool for cost tracking, rate limiting, content filtering, and guardrails. No subclassing.

    Hooks guide

  • :material-async: Async runtime


    AsyncAgentRuntime with async run() and async stream() for FastAPI, Starlette, and modern async Python. Same features as the sync runtime.

    Async guide

  • Graceful error recovery


    Tool failures produce error messages instead of crashing the run. The LLM sees the error and can try a different approach. Safer retry defaults prevent retrying on bugs.

    Error recovery guide


Next steps


Try it now — runnable examples

The repo ships with 7 numbered, copy-pasteable examples covering every major feature. Pick one and run it in 30 seconds.

# What Run
1 Hello, agent. The shortest possible runnable example python examples/01_hello_agent.py
2 Live streaming with colored reasoning events python examples/02_streaming_with_reasoning.py
3 Same agent, 5 different LLM providers back-to-back python examples/03_provider_swap.py
4 End-to-end research workflow with web search + URL fetching python examples/04_research_agent.py "your question"
5 Custom tools — function-style and class-style python examples/05_custom_tool.py
6 Persistent chat session with file-backed memory python examples/06_chat_session.py
7 Semantic tool discovery with tool_search python examples/07_tool_search.py

See the full examples README →


Provider compatibility matrix

Provider Reasoning blocks Tool calling Streaming Bedrock pairing Built-in tools
OpenAI (o1, o3, o4, gpt-5) ✅ Native n/a
OpenAI (gpt-4o, gpt-4o-mini) n/a
Anthropic (claude-opus-4, claude-3.7) ✅ Native (with thinking_budget_tokens) n/a
AWS Bedrock (gpt-oss-120b) ✅ Via LiteLLM ✅ Bulletproof
AWS Bedrock (anthropic.claude-*) ✅ Via LiteLLM ✅ Bulletproof
Google Gemini (gemini-1.5-pro) n/a
Google Vertex AI n/a
Groq (llama-3.3-70b) n/a
Together AI n/a
Ollama (local) n/a
DeepSeek R1 (via LiteLLM proxy) ✅ Native n/a
LiteLLM Proxy (self-hosted gateway) ✅ Pass-through n/a

Tip: if you want a "Thinking" panel UI without paying for o1/Claude, AWS Bedrock's openai.gpt-oss-120b-1:0 is the cheapest reasoning-capable model in the matrix and ships with Agent.with_builtins(llm=BedrockChatLLM()) out of the box.


What you get vs. what you don't

✅ shipit-agent does ❌ shipit-agent does NOT do
Run agents with tools, MCP, memory, sessions Train models or fine-tune
Stream events incrementally as they happen Provide a hosted control plane
Extract reasoning blocks from any provider Replace LangChain / LangGraph / CrewAI wholesale
Guarantee Bedrock tool-pairing correctness Manage your cloud infrastructure
Support 9 LLM providers via one API Lock you into a specific vendor
Ship with 28+ built-in tools Force you to use any of them
Stay out of your way (small, focused runtime) Hide the agent loop behind abstractions

This is a library, not a framework. The runtime is small enough to read in one sitting (shipit_agent/runtime.py is under 400 lines). Bring your own LLM, tools, and storage; the runtime composes them and gets out of the way.