The Claude API has matured considerably. What was once a straightforward text-in, text-out endpoint now encompasses streaming, tool use, a 1M token context window, prompt caching that can cut costs by 90%, a 50%-off Batch API, an Agent SDK that runs autonomous loops, and structured outputs. The documentation has grown to match.
The result is a powerful surface area — and a real onboarding challenge. Developers new to the API can end up in the wrong part of the docs, picking the wrong model tier, or ignoring cost levers that would save them thousands of dollars a month at scale.
This guide cuts straight to what you need: the setup, the core patterns you'll use in every integration, and the advanced features worth knowing before you go to production.
Setup: API Key, SDK, First Call
Start at platform.claude.com — this is the developer console where you create API keys, monitor usage, and manage workspaces. New accounts receive $5 in free credits, no credit card required.
Anthropic provides official client SDKs in Python, TypeScript, Java, Go, Ruby, C#, and PHP. Each SDK provides idiomatic interfaces, type safety, and built-in support for features like streaming, retries, and error handling.
Install the Python SDK:
pip install anthropic
Store your key as an environment variable — never hardcode it:
export ANTHROPIC_API_KEY="sk-ant-api03-..."
Your first call takes five lines:
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello, Claude"}]
)
print(message.content[0].text)
The SDK reads ANTHROPIC_API_KEY from the environment automatically. The messages array is the conversation: a list of role/content objects. The model field selects your model tier.
Which model to use: Opus 4.6 is best for complex analysis, coding, and creative tasks requiring deep reasoning. Sonnet 4.6 is the ideal balance of intelligence and speed for most production workloads. Haiku 4.5 provides lightning-fast responses for high-volume, latency-sensitive applications. For most new integrations, start with Sonnet 4.6 — it's the sweet spot of capability and cost, and you can scale up to Opus or down to Haiku once you understand your workload.

Core Patterns: System Prompts, Multi-Turn, Streaming
System prompts are how you set Claude's behavior across the entire conversation — persona, constraints, output format, domain context. Pass them as a top-level system parameter, not as a user message:
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system="You are a concise technical writer. Respond in plain English, no jargon.",
messages=[{"role": "user", "content": "Explain rate limiting."}]
)
Multi-turn conversations require you to manage the message history yourself — the API is stateless. Pass the full conversation array on each call:
messages = [
{"role": "user", "content": "What is Docker?"},
{"role": "assistant", "content": "Docker is a containerization platform..."},
{"role": "user", "content": "How does it relate to Kubernetes?"}
]
Each call sends the complete history. This means your context window is your state store — and token costs accumulate with every turn.
Streaming is the pattern you want for any user-facing application. Without it, users stare at a spinner while the model generates a full response. With it, tokens appear as they're generated:
with client.messages.stream(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Explain async/await"}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
When creating a Message, you can set "stream": true to incrementally stream the response using server-sent events (SSE). The Python and TypeScript SDKs offer multiple ways of streaming. The .stream() helper is the cleanest path for most use cases.
Tool Use: Giving Claude Access to Real Data
Tool use (also called function calling) is how you connect Claude to external systems — APIs, databases, calculators, search engines. You define the tools, Claude decides when to call them, you execute the calls, and Claude incorporates the results.
Client tools are specified in the tools top-level parameter of the API request. Each tool definition includes a name, description, and input schema.
tools = [{
"name": "get_weather",
"description": "Get current weather for a city",
"input_schema": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"}
},
"required": ["city"]
}
}]
The response loop: call the API with tools defined, check if stop_reason == "tool_use", execute the tool, append the result to messages, and call the API again. The tool use documentation includes the complete turn-by-turn pattern with code examples.
For production systems with multiple tools, the SDK now includes a Tool Runner in beta that handles this loop automatically — worth evaluating before building a custom orchestration layer.
Production Cost Levers: Caching and Batching
These two features have the highest ROI for any team using the API at scale, and most developers discover them too late.
Prompt caching lets you reuse expensive context — system prompts, long documents, tool definitions — across API calls at 10% of the normal input token cost. Caches are isolated per workspace, ensuring data separation between workspaces within the same organization.
The simplest way to enable it is automatic caching: add a single cache_control field at the top level of your request, and the system automatically manages breakpoints as conversations grow. For a 3,000-token system prompt making 10,000 daily requests at Sonnet pricing, the difference between cached and uncached is roughly $2,700/month vs. $270/month. That's a decision worth making in your first week.
The Batches API offers significant cost savings — all usage is charged at 50% of standard API prices. The tradeoff is latency: batch requests complete within 24 hours (typically much faster), making this ideal for any workload that doesn't need real-time responses. Document processing, content generation pipelines, evaluation runs, nightly analysis jobs — all prime candidates. The pricing discounts from prompt caching and Message Batches can stack, providing even greater cost savings when both features are used together. Combined, these two levers can reduce costs by up to 95% versus naive API usage.

The Agent SDK: When You Need Autonomous Loops
For applications that go beyond single-turn or multi-turn chat — tasks that require reading files, running commands, making decisions across multiple steps — the Agent SDK is a higher-level abstraction built on the same tools powering Claude Code.
from claude_agent_sdk import query, ClaudeAgentOptions
import asyncio
async def main():
options = ClaudeAgentOptions(
allowed_tools=["Read", "Edit", "Bash"],
permission_mode="acceptEdits"
)
async for message in query(
prompt="Find and fix the bug in auth.py",
options=options
):
print(message)
asyncio.run(main())
The Agent SDK includes built-in tools for reading files, running commands, and editing code, so your agent can start working immediately without you implementing tool execution. Everything that makes Claude Code powerful is available in the SDK.
The key distinction: the Messages API requires you to build and manage the agentic loop yourself. The Agent SDK provides it ready-made. For most agentic use cases — autonomous code review, file processing pipelines, research agents — the SDK is the faster path to production.
What This Means
The Claude API is not a single tool — it's a family of surfaces that serve different architectural needs. The Messages API is your core building block: fast, flexible, fine-grained control. Streaming is the default for anything user-facing. Tool use is how you ground Claude in real data. Prompt caching and the Batch API are not optional optimizations — they're the difference between sustainable and unsustainable unit economics at scale. The Agent SDK is the shortcut for applications that need autonomous, multi-step execution.
The practical progression: start with a basic Messages API call, add streaming immediately, implement prompt caching once your system prompt stabilizes, and only introduce tool use and the Agent SDK when your use case genuinely requires them. Complexity compounds quickly in AI systems; introduce it in response to demonstrated need, not in anticipation of it.
For the full, always-current reference — including model string names, rate limit tiers, and SDK-specific patterns — the official Claude API documentation is the authoritative source. Bookmark it; it updates frequently.