Universal Tool Design Cheatsheet for AI Agentic Engineering
I've built a lot of AI tools. The pattern I use doesn't change whether I'm working with raw JSON, Langchain, Strands, Anthropic SDK, or Pydantic. Only the syntax varies.
The thinking is always the same: understand the problem, design the boundary, handle errors, define returns, implement. This cheatsheet is that thinking, applicable to any framework.
Use it as a reference when building tools. Use it to review other people's work. Use it to catch mistakes before they become expensive bugs.
Phase 1: Understand the Problem (Before You Write Anything)
Seriously—don't write code yet.
Answer these questions first. Write them down:
- What business problem does this tool solve? (not “what does it do”, but why does it matter?)
- Why can't Claude do this without calling a tool? (what's the gap you're filling?)
- Who will use this? (LLM? Humans? Both?)
- What data must the tool receive to make a decision?
- What data must the tool return?
I use this with an example: upload_thumbnail tool for an e-commerce platform
- Problem: Product images can't go live without validation. Bad dimensions break layouts. Corrupted files break pipelines.
- Why tool: Claude can't directly access S3 or validate pixel dimensions. The system needs to.
- User: E-commerce AI assistant managing product catalogs. Or a human who needs Claude to handle uploads.
- Needs: File location, dimensions, product ID, file format metadata
- Returns: CDN URL (where image lives), thumbnail ID (for tracking), status (success/failure)
This takes 10 minutes. Skipping it costs you hours later.
Phase 2: Design the Boundary (Validate at Entry)
The boundary is where the LLM hands data to your system. Validate aggressively here.
Why? Because catching errors at the boundary is exponentially cheaper than discovering them after they've propagated through your system.
For each parameter, ask:
- Is this required or optional? (Use the decision tree below)
- What format is valid? (enum values? regex pattern? numeric range?)
- What constraints prevent disasters? (min/max file size, date ranges, format validation)
- Can I validate this immediately? (at the boundary, not deep in processing)
Required vs. Optional: The Real Logic
This is the distinction that trips people up. Here's the actual decision:
Make it REQUIRED if: – You can validate it at the boundary (immediately, without calling other services) – It prevents logical errors downstream (like invoice amount mismatches) – The LLM can reliably provide it (has access to the information)
Make it OPTIONAL if: – It can be generated or extracted asynchronously (e.g., OCR on an image for alt_text) – It's a nice-to-have that improves validation but isn't critical – The LLM might not have access to it
Quick decision tree:
Is this parameter critical to prevent errors?
├─ YES → Make it REQUIRED + add constraints
│ Example: invoice_amount (catches PO mismatches before processing)
│
└─ NO → Can it be generated later?
├─ YES → OPTIONAL (compute async)
│ Example: alt_text (from image analysis after upload)
│
└─ NO → Stop. Does the LLM really need to provide this?
Maybe it's not a parameter at all.
Phase 3: Define Error States (What Can Go Wrong)
Most tool designs fail here. They define errors like: { status: "error", message: "something failed" }. That's useless.
List every way your tool can fail:
- Invalid input (user error, LLM hallucination)
- Resource not found (the thing doesn't exist)
- Permission denied (auth error)
- Service unavailable (downstream system down)
- Timeout (performance)
- Partial success (batch operation: some succeeded, some failed)
For each error state, define:
- Error code (machine-readable:
SCREAMING_SNAKE_CASE) - Human message (the LLM understands what went wrong)
- Suggested action (what should the LLM do?)
- Retry-able: Can it try again or is it terminal?
Real Example: upload_thumbnail Errors
DIMENSION_MISMATCH
Message: "Image dimensions 500x400 do not match required 600x400"
Action: "Re-upload with correct dimensions or use image scaling"
Retry: Yes
PRODUCT_NOT_FOUND
Message: "Product ID 'Product-99999999' does not exist in database"
Action: "Verify product ID with user and retry"
Retry: No (need valid product ID from user)
FILE_CORRUPTED
Message: "File size mismatch: expected 524288 bytes, got 262144"
Action: "Re-upload from original source"
Retry: Yes
SIZE_EXCEEDS_LIMIT
Message: "File size (3.5 MB) exceeds maximum (2 MB)"
Action: "Compress image and retry"
Retry: Yes
Each one tells the LLM what to do next. That's the point. Bad errors make the LLM guess.
Phase 4: Design the Return Contract (What Claude Gets Back)
Be explicit about what happens.
On success: – What's the primary result? (what the user wanted) – What metadata is useful? (ID, timestamp, URL) – What can Claude do next with this result?
On failure: – Error code (machine-readable) – Error message (human-readable) – Suggested action – Retry-able flag
If async: – Job ID (for polling) – Status (pending/processing/complete/failed) – Polling URL – Estimated completion time
Critical: Make Success Explicit
Don't assume the LLM understands what happened. Be obvious:
{
"status": "success",
"thumbnail_id": "THUMB-20250214-ABC123",
"cdn_url": "https://cdn.example.com/thumbnails/...",
"alt_text": "Red running shoe, side view"
}
The LLM will key off status. Make it explicit, not implicit.
Phase 5: Write the Manifest (Implementation)
Whether you're using JSON, Langchain, Pydantic, or Strands—follow this structure:
1. Name
– snake_case, verb-based
– ✅ upload_thumbnail, delete_invoice, fetch_user_data
– ❌ thumbnail, data, processor
2. Description – 2-3 sentences: what, why Claude uses it, when – Be specific, not generic – ✅ “Upload product thumbnail to CDN. Validates 600x400px, <2MB, jpg/png/webp. Returns CDN URL.” – ❌ “Gets data”
3. Parameters – Type, format, constraints, description, example for each – Required array: which params MUST be present? – Constraints: min/max, enum, regex
4. Returns – Success schema: all fields with descriptions – Error schema: errorcode, errormessage, suggestedaction, retryable – Async schema (if needed): job_id, status, polling info
Implementation: Raw JSON (OpenAI Format)
{
"name": "upload_thumbnail",
"description": "Upload product thumbnail to CDN. Validates 600x400px, <2MB, jpg/png/webp. Returns CDN URL.",
"parameters": {
"type": "object",
"properties": {
"thumbnail_url": {
"type": "string",
"description": "S3 presigned URL of the thumbnail file",
"format": "uri",
"pattern": "^https://s3\\.amazonaws\\.com/.*\\.(jpg|jpeg|png|webp)$"
},
"product_id": {
"type": "string",
"description": "Product ID: 'Product-' + 6-8 digits",
"pattern": "^Product-[0-9]{6,8}$"
},
"image_width": {
"type": "integer",
"description": "Image width in pixels (must be 600)",
"minimum": 600,
"maximum": 600
}
},
"required": ["thumbnail_url", "product_id", "image_width"]
}
}
Implementation: Langchain (Python)
from langchain.tools import tool
from typing import Optional
@tool
def upload_thumbnail(
thumbnail_url: str,
product_id: str,
image_width: int,
image_height: int,
file_size_bytes: int,
file_format: str,
alt_text: Optional[str] = None
) -> dict:
"""
Upload product thumbnail to CDN.
Validates dimensions (600x400px), file size (<2MB), and format.
Returns CDN URL on success.
Args:
thumbnail_url: S3 presigned URL. Example: https://s3.amazonaws.com/thumb.jpg
product_id: Format: Product-123456 to Product-12345678
image_width: Must be exactly 600 pixels
image_height: Must be exactly 400 pixels
file_size_bytes: Between 1KB and 2MB
file_format: One of: jpg, jpeg, png, webp
alt_text: Optional accessibility text
Returns:
dict with: status, thumbnail_id (success), error_details (failure)
"""
pass
Implementation: Anthropic SDK (Python)
upload_tool = {
"name": "upload_thumbnail",
"description": "Upload product thumbnail. Validates 600x400px, <2MB, jpg/png/webp.",
"input_schema": {
"type": "object",
"properties": {
"thumbnail_url": {
"type": "string",
"description": "S3 presigned URL"
},
"product_id": {
"type": "string",
"description": "Format: Product-123456"
},
"image_width": {
"type": "integer",
"description": "Must be 600 pixels"
},
"image_height": {
"type": "integer",
"description": "Must be 400 pixels"
}
},
"required": ["thumbnail_url", "product_id", "image_width", "image_height"]
}
}
Implementation: Pydantic (Python)
from pydantic import BaseModel, Field
from typing import Optional
class UploadThumbnailInput(BaseModel):
thumbnail_url: str = Field(
...,
description="S3 presigned URL",
pattern="^https://s3\\.amazonaws\\.com/.*"
)
product_id: str = Field(
...,
description="Format: Product-123456",
pattern="^Product-[0-9]{6,8}$"
)
image_width: int = Field(
...,
description="Must be 600 pixels",
ge=600, le=600
)
image_height: int = Field(
...,
description="Must be 400 pixels",
ge=400, le=400
)
alt_text: Optional[str] = Field(
None,
description="Optional accessibility text",
min_length=10, max_length=500
)
Validation Strategy: Boundary vs. Async
Two approaches. Know when to use each.
Boundary Validation (Immediate)
Validate at entry using constraints. Catch errors before they propagate.
When: Parameters you can check without external services
Example: invoice_amount must match PO amount range
"invoice_amount": {
"type": "number",
"minimum": 0.01,
"maximum": 999999.99,
"description": "Must match PO amount within ±tolerance"
}
Async Validation (Later)
Validate after processing. Makes sense for expensive operations (OCR, image analysis).
When: Parameters requiring computation or external services
Example: alt_text semantic validation against image content (after upload)
"alt_text": {
"type": "string",
"description": "Optional. Validated asynchronously against image."
}
Common Mistakes (Don't Do These)
❌ Vague descriptions
"gets data" instead of "Retrieves invoice history for past 90 days"
❌ Missing constraints
Unbounded string allows 100,000 character input
❌ Required parameters LLM can't provide
Making "file_hash_sha256" required when only metadata is known
❌ Useless error states
"error" instead of "PRODUCT_NOT_FOUND: verify product ID"
❌ Missing return schema
Forgetting to document what success looks like
❌ Async tools with no job IDs
"will process in background" but no way to check status
❌ No examples for complex params
Pattern without showing what's valid
❌ Computing in LLM, not system
Asking LLM for SHA-256 hash instead of extracting from file
❌ Late boundary validation
Discovering PO mismatch after days of processing
❌ State management gaps
Allowing duplicate uploads without handling overwrites
Real-World Patterns
Pattern 1: File Upload Tools
Required:
- file_url (validate S3 access immediately)
- file_format (enum: jpg, png, webp)
- file_size_bytes (validate < 10MB at boundary)
Optional:
- alt_text (generated from image async)
- metadata (extracted from file async)
Error cases:
- INVALID_URL
- SIZE_EXCEEDS_LIMIT
- UNSUPPORTED_FORMAT
- FILE_CORRUPTED
Pattern 2: Data Reconciliation Tools
Required:
- reference_id (must exist)
- amount (must match expected value ±tolerance)
- date (must be within acceptable range)
Optional:
- notes (context, not critical)
Error cases:
- RECORD_NOT_FOUND
- AMOUNT_MISMATCH
- DATE_OUT_OF_RANGE
- DUPLICATE_DETECTED
Pattern 3: Action Tools (Delete, Update)
Required:
- resource_id (must exist)
- confirm_action (true to proceed)
- reason (audit trail)
Optional:
- cascade (delete related records? yes/no)
Error cases:
- RESOURCE_NOT_FOUND
- PERMISSION_DENIED
- CONFIRMATION_REQUIRED
- CASCADING_FAILED
The Quick Checklist
Before considering a tool “done”:
□ Name is clear and actionable (verb-based)
□ Description explains the why, not just the what
□ Required parameters prevent logical errors
□ Constraints prevent invalid inputs
□ Error cases are comprehensive (not just "error")
□ Error messages tell LLM what to do next
□ Return schema is complete (success + error + async)
□ Complex parameters have examples
□ Validation happens at boundary
□ Async operations return job IDs + polling
□ No vague descriptions
□ No unbounded strings/integers
□ No required params LLM can't reliably provide
□ Error codes clearly indicate next steps
□ Return fields are documented
When to Add Parameters vs. Handle Internally
Add as Parameter If:
- The LLM should decide this value
- Different values change behavior
- You want boundary validation
- It affects business logic
Handle Internally If:
- Only the system decides (backend concern)
- It's implementation detail (encryption, compression)
- It's derived from other parameters
- It's infrastructure config (database ID, S3 bucket)
Examples
✅ Parameter: po_number (LLM decides which PO to reconcile)
❌ Parameter: database_id (system decides internally)
✅ Parameter: invoice_amount (LLM provides, system validates)
❌ Parameter: encrypted_at_rest (backend concern)
✅ Parameter: file_url (LLM knows where file is)
❌ Parameter: s3_bucket_name (hardcoded in backend)
Quick Reference: Parameter Types
| Type | Constraints | Example |
|---|---|---|
| string | minLength, maxLength, pattern, enum | “user@example.com” |
| integer | minimum, maximum, enum | 42 |
| number | minimum, maximum | 3.14 |
| boolean | (none) | true |
| array | minItems, maxItems, items schema | [1, 2, 3] |
| object | properties, required | {“name”: “John”} |
Quick Reference: Standard Error Codes
INPUT_ERRORS:
INVALID_FORMAT
MISSING_REQUIRED_FIELD
VALUE_OUT_OF_RANGE
DATA_ERRORS:
NOT_FOUND
ALREADY_EXISTS
DUPLICATE_DETECTED
AUTH_ERRORS:
PERMISSION_DENIED
UNAUTHORIZED
ACCESS_REVOKED
SYSTEM_ERRORS:
SERVICE_UNAVAILABLE
TIMEOUT
INTERNAL_ERROR
BUSINESS_LOGIC_ERRORS:
AMOUNT_MISMATCH
STATE_INVALID_FOR_TRANSITION
QUOTA_EXCEEDED
Test Before You Build
Mental walkthrough. If you can't answer all six, your design isn't complete:
- Happy Path — LLM provides correct data → system processes → clear success response
- Invalid Input — LLM provides wrong type → system rejects at boundary → actionable error message
- Missing Required — LLM forgets a parameter → system says which one
- Not Found — LLM provides valid but non-existent ID → system clearly indicates it
- Async Operation — LLM calls async tool → gets job_id immediately → can poll for status
- Partial Failure — Batch operation: some succeed, some fail → LLM sees both with reasons
Universal Principle
The thinking is the same across every framework.
- Understand the problem (Phase 1)
- Design the boundary (Phase 2)
- Define error states (Phase 3)
- Design return contract (Phase 4)
- Write the manifest (Phase 5)
Whether you use JSON, Langchain, Strands, Anthropic SDK, or Pydantic—only the syntax changes. The thinking doesn't.
Build smarter tools. Design the boundary first. The rest follows.
Last updated: February 2025
References: – Anthropic Tool Use – OpenAI Function Calling – JSON Schema – Pydantic