agent.pybusinessLangchainv1.0.0
LangChain: Data Extraction Agent
Extracts structured data from unstructured documents like invoices, resumes, and reports.
Setup time: ~10 min
Model: GPT-4o
Cost: ~$0.15/day
Last updated: Mar 16, 2026
Template
agent.py
# Install: pip install langchain langchain-openai pydantic
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field
from typing import Optional
import json
import sys
import os
# --- Define extraction schemas ---
class Invoice(BaseModel):
vendor: str = Field(description="Company or person who sent the invoice")
invoice_number: Optional[str] = Field(description="Invoice or reference number")
date: str = Field(description="Invoice date in YYYY-MM-DD format")
due_date: Optional[str] = Field(description="Payment due date in YYYY-MM-DD format")
total: float = Field(description="Total amount due")
currency: str = Field(default="USD", description="Currency code")
line_items: list[dict] = Field(default=[], description="List of line items with description and amount")
class Contact(BaseModel):
name: str = Field(description="Full name")
email: Optional[str] = Field(description="Email address")
phone: Optional[str] = Field(description="Phone number")
company: Optional[str] = Field(description="Company name")
role: Optional[str] = Field(description="Job title or role")
# --- Extraction chain ---
llm = ChatOpenAI(model="gpt-4o", temperature=0)
def extract_data(text: str, schema_class: type[BaseModel]) -> dict:
schema_json = json.dumps(schema_class.model_json_schema(), indent=2)
prompt = ChatPromptTemplate.from_messages([
("system", f"""Extract structured data from the text. Return valid JSON matching this schema:
{schema_json}
If a field is not found, use null. Be precise with numbers and dates."""),
("human", "Text to extract from:\n\n{text}"),
])
chain = prompt | llm
result = chain.invoke({"text": text})
parsed = json.loads(result.content)
return schema_class(**parsed).model_dump()
if __name__ == "__main__":
if len(sys.argv) < 3:
print("Usage: python agent.py <invoice|contact> <file.txt>")
sys.exit(1)
schema_map = {"invoice": Invoice, "contact": Contact}
schema = schema_map.get(sys.argv[1])
if not schema:
print(f"Unknown schema: {sys.argv[1]}. Use: {list(schema_map.keys())}")
sys.exit(1)
with open(sys.argv[2]) as f:
text = f.read()
result = extract_data(text, schema)
print(json.dumps(result, indent=2))Setup
- 1
Copy the agent.py content above.
- 2
Create a Python virtual environment and install dependencies.
- 3
Set your OPENAI_API_KEY environment variable.
- 4
Run: python agent.py
Run with LangChain
This is a Python script using LangChain. Set up a virtual environment and install dependencies.
# 1. Create a virtual environment
python -m venv venv && source venv/bin/activate
# 2. Install dependencies
pip install langchain langchain-openai langchain-community chromadb
# 3. Set your API key
export OPENAI_API_KEY="sk-..."
# 4. Save the agent.py from above and run it
python agent.pyVersion History
v1.0.0Initial releaseMar 16, 2026
Framework
LangchainRequirements
Python 3.10+
OpenAI API key
Estimated cost
~$0.15/day
on GPT-4o model
File type
agent.py
Version
v1.0.0
Updated Mar 16, 2026
You might also like
SOUL.md
Client Onboarding
Track new client setup tasks. Send reminders until everything is done.
~$0.20/day · ~10 min setup
HEARTBEAT.md
Invoice Tracker
Track sent invoices. Alert on overdue payments.
~$0.10/day · ~5 min setup
HEARTBEAT.md
Lead Qualifier
Score inbound leads from form submissions. Route hot leads immediately.
~$0.30/day · ~10 min setup