From Prompts to Real Files: A Developer's Guide to AI File Generation

From Prompts to Real Files: A Developer's Guide to AI File Generation

posted Originally published at dev.to 11 min read

Ask ChatGPT to "create a sales report PDF with a revenue chart." A year ago, it would paste some markdown and wish you luck. Today, it spins up a sandboxed Python environment, runs reportlab and matplotlib, and hands you a real, downloadable PDF file.

This is the shift from text generation to artifact generation -- and every major LLM vendor now supports it through their API. Claude, OpenAI, and Gemini each give developers a way to prompt an LLM and get back actual files: PDFs, spreadsheets, charts, slide decks, whatever you can create with Python.

This post walks through the universal pattern behind file generation, then shows you exactly how to do it with each vendor -- working code included.


The Universal Pattern

Despite different APIs, all three vendors follow the same three-step architecture:

Image description

Every vendor-specific implementation is a variation on this flow. The details change, but three concepts repeat everywhere:

  1. Tool declaration -- you opt in to code execution by including a specific tool in your API request. It's never on by default.
  2. Sandboxed execution -- the LLM's code runs in an isolated container with no internet access. Common libraries (pandas, matplotlib, reportlab) come pre-installed.
  3. File retrieval -- each vendor has a different mechanism to get the bytes out. Some give you a file ID to download; others return bytes inline.

Once you internalize this pattern, learning any vendor's API is just a matter of mapping it to these three steps.


Claude: Code Execution + Files API

Claude's file generation is the most full-featured option for document creation. It provides a persistent container with full bash access, a rich set of pre-installed document libraries, and a clean Files API for uploads and downloads.

Generating a PDF from a Prompt

Enable the code_execution_20250825 tool, send your prompt, then extract file IDs from the response and download them through the Files API.

import anthropic

client = anthropic.Anthropic()

# Step 1: Request with code execution enabled
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    tools=[{"type": "code_execution_20250825", "name": "code_execution"}],
    messages=[{
        "role": "user",
        "content": "Create a one-page PDF sales report with a revenue chart for Q1 2026."
    }]
)

# Step 2: Extract file IDs from the response
file_ids = []
for block in response.content:
    if block.type == "bash_code_execution_tool_result":
        result = block.content
        if result.type == "bash_code_execution_result":
            for item in result.content:
                if hasattr(item, "file_id"):
                    file_ids.append(item.file_id)

# Step 3: Download each generated file
for file_id in file_ids:
    content = client.beta.files.download(file_id)
    metadata = client.beta.files.retrieve_metadata(file_id)
    content.write_to_file(metadata.filename)
    print(f"Saved: {metadata.filename}")

The response content blocks have a nested structure: you're looking for bash_code_execution_tool_result blocks, which contain bash_code_execution_result objects, which contain items with file_id attributes. The files.download() call gives you the raw bytes; retrieve_metadata() gives you the original filename.

Why bash_code_execution? When you include the code_execution_20250825 tool, Claude actually gets two sub-tools: bash_code_execution (run shell commands) and text_editor_code_execution (create and edit files). To generate a file, Claude typically writes a Python script with the text editor sub-tool, then runs it via bash. The result block is named after whichever sub-tool produced the output -- and since it's the bash execution that creates the final file, that's the block type you parse. This is also why Claude has full bash access unlike the other vendors: it's not running Python in a restricted interpreter, it's executing real shell commands. The _20250825 tool version introduced this bash/text-editor split, replacing the earlier _20250522 version that was Python-only.

Uploading a CSV, Getting Back a Chart + PDF

To process your own data, upload via the Files API first, then attach the file to your prompt alongside the code execution tool.

import anthropic

client = anthropic.Anthropic()

# Upload your input file
uploaded = client.beta.files.upload(file=open("sales_data.csv", "rb"))

# Send the file + prompt with code execution
response = client.beta.messages.create(
    model="claude-sonnet-4-6",
    betas=["files-api-2025-04-14"],
    max_tokens=4096,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Analyze this sales CSV. Create a bar chart of revenue by region "
                        "and save it as 'revenue_chart.png'. Also generate a one-page PDF "
                        "summary report of the key findings."
            },
            {"type": "container_upload", "file_id": uploaded.id},
        ],
    }],
    tools=[{"type": "code_execution_20250825", "name": "code_execution"}],
)

# Download all generated files
for block in response.content:
    if block.type == "bash_code_execution_tool_result":
        result = block.content
        if result.type == "bash_code_execution_result":
            for item in result.content:
                if hasattr(item, "file_id"):
                    content = client.beta.files.download(item.file_id)
                    metadata = client.beta.files.retrieve_metadata(item.file_id)
                    content.write_to_file(metadata.filename)
                    print(f"Downloaded: {metadata.filename}")

A single prompt can produce multiple files. In this case, you'll get both the PNG chart and the PDF report. Always iterate the full response -- never assume a single file.

Container Reuse: The Key to Iteration Workflows

Claude containers persist for 30 days. When your first request creates a container, the response includes a container.id. Pass it to subsequent calls and Claude picks up right where it left off -- all files from the previous request are still on disk.

# First call creates the container
response1 = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    messages=[{"role": "user", "content": "Generate a sales report PDF."}],
    tools=[{"type": "code_execution_20250825", "name": "code_execution"}],
)
container_id = response1.container.id

# Subsequent calls reuse the same container
response2 = client.messages.create(
    container=container_id,
    model="claude-sonnet-4-6",
    max_tokens=4096,
    messages=[{"role": "user", "content": "Update the chart on page 2 to use a pie chart instead."}],
    tools=[{"type": "code_execution_20250825", "name": "code_execution"}],
)

This enables "conversational file editing" -- users can iterate on documents without re-uploading data or starting from scratch.

Pre-installed Libraries

Claude's sandbox comes with the document generation essentials: reportlab (PDFs), python-docx (Word), python-pptx (PowerPoint), openpyxl (Excel), pandas, matplotlib, pillow, pypdf, pdfplumber, seaborn, scipy, and scikit-learn. Since Claude has full bash access, you can also pip install anything else you need during the session.


OpenAI: Responses API + Code Interpreter

OpenAI's Responses API (the successor to the deprecated Assistants API) uses the Code Interpreter tool for file generation. The pattern is similar to Claude, but the response structure and file retrieval mechanism differ.

Generating a CSV with Code Interpreter

Enable the code_interpreter tool, then parse container_file_citation annotations from the response to find generated files.

from openai import OpenAI

client = OpenAI()

# Step 1: Request with code interpreter enabled
response = client.responses.create(
    model="gpt-5.2",
    tools=[{
        "type": "code_interpreter",
        "container": {"type": "auto"}
    }],
    input="Generate a CSV file named 'q1_report.csv' with 10 rows of financial data."
)

# Step 2: Extract file references from annotations
# The response structure nests deep: output → message → content → output_text → annotations
for item in response.output:
    if item.type == "message":
        for content_block in item.content:
            if content_block.type == "output_text":
                for annotation in content_block.annotations:
                    if annotation.type == "container_file_citation":
                        # Step 3: Download from the container endpoint
                        file_data = client.containers.files.content.retrieve(
                            file_id=annotation.file_id,
                            container_id=annotation.container_id
                        )
                        with open(annotation.filename, "wb") as f:
                            f.write(file_data.read())
                        print(f"Downloaded: {annotation.filename}")

The annotation traversal is the trickiest part. Don't try to shortcut it with response.output_text -- that gives you a plain string with citation markers, not the actual file references.

Uploading a File, Transforming It

Upload via the standard Files API, then pass the file ID in the container config.

from openai import OpenAI

client = OpenAI()

# Upload the file
uploaded = client.files.create(
    file=open("sales_data.csv", "rb"),
    purpose="user_data"
)

# Pass it to code interpreter via container config
response = client.responses.create(
    model="gpt-5.2",
    tools=[{
        "type": "code_interpreter",
        "container": {
            "type": "auto",
            "file_ids": [uploaded.id]
        }
    }],
    input="Analyze this sales CSV. Create a bar chart of revenue by region and save it as a PNG."
)

# Download generated files from annotations
for item in response.output:
    if item.type == "message":
        for content_block in item.content:
            if content_block.type == "output_text":
                for annotation in content_block.annotations:
                    if annotation.type == "container_file_citation":
                        file_data = client.containers.files.content.retrieve(
                            file_id=annotation.file_id,
                            container_id=annotation.container_id
                        )
                        with open(annotation.filename, "wb") as f:
                            f.write(file_data.read())
                        print(f"Downloaded: {annotation.filename}")

You can also request higher memory tiers -- 1g (default), 4g, 16g, or 64g -- by setting "memory_limit" in the container config. Useful when processing large datasets.

OpenAI Gotchas

The cfile_ 404 trap. Generated files have IDs prefixed with cfile_. If you try to download them using the standard client.files.content() endpoint, you'll get a 404. You must use client.containers.files.content.retrieve() instead. This has tripped up every developer at least once.

20-minute container expiry. OpenAI containers are ephemeral -- they expire after 20 minutes of inactivity. Download your files immediately after generation. There is no 30-day persistence like Claude.

Missing annotations fallback. There's a known edge case where container_file_citation annotations don't appear in the response. When this happens, check response.output for items of type code_interpreter_call and inspect their outputs for file references:

if not file_refs:
    for item in response.output:
        if item.type == "code_interpreter_call":
            for output_item in getattr(item, "outputs", []):
                if hasattr(output_item, "file_id"):
                    # Download using output_item.file_id and output_item.container_id
                    pass

Gemini: Inline Results + Structured Output

Gemini takes a fundamentally different approach. It doesn't return downloadable file artifacts with file IDs. Instead, code execution results come back inline -- matplotlib charts as raw image bytes, everything else as text or JSON.

This isn't a technical limitation -- Google has the infrastructure to build containers and file artifact systems. The gap is strategic. Google's file generation story lives in Google Workspace, not in the developer API:

  • Gemini in Docs generates full first drafts from prompts, matching writing styles and pulling data from Gmail, Drive, and the web.
  • Gemini in Sheets builds entire spreadsheets from natural language and auto-populates cells with live data.
  • Gemini in Slides generates themed slides, with full presentation generation from a single prompt on the roadmap.

This makes business sense for Google. Anthropic and OpenAI are API-first companies -- their revenue comes from developers using their APIs, so building sandboxes and file download endpoints directly serves their customers. Google's revenue comes from Workspace subscriptions. When Gemini generates a spreadsheet in Workspace, it creates a Google Sheet (not an .xlsx), keeping users in the Google ecosystem. An API that produces vendor-neutral files would undermine that.

The practical implication: Gemini's API-level file generation gap is unlikely to close anytime soon. The structured output and inline image patterns below are the right long-term approaches, not temporary workarounds.

For developers, this means Gemini is best suited for quick charts and data transforms, while complex document creation belongs with Claude or OpenAI.

Generating a Chart (Inline Image)

Enable the code_execution tool, then extract image bytes directly from the response parts.

from google import genai
from google.genai import types

client = genai.Client(api_key="YOUR_API_KEY")

response = client.models.generate_content(
    model="gemini-2.5-flash",
    config=types.GenerateContentConfig(
        tools=[types.Tool(code_execution=types.ToolCodeExecution)]
    ),
    contents="Generate a bar chart of quarterly revenue: Q1=$2.1M, Q2=$2.8M, Q3=$3.2M, Q4=$3.9M."
)

# Gemini returns results inline -- no separate download step
for part in response.candidates[0].content.parts:
    if part.executable_code:
        print("Code ran:", part.executable_code.code[:80], "...")
    if part.code_execution_result:
        print("Output:", part.code_execution_result.output)
    if part.as_image() is not None:
        with open("revenue_chart.png", "wb") as f:
            f.write(part.as_image().image_bytes)
        print("Chart saved as revenue_chart.png")

No file IDs, no download endpoints. The image bytes are right there in the response. For text/data output, it shows up in code_execution_result.output.

Structured Output for CSV Generation

Gemini's strongest file generation pattern is actually indirect: get structured JSON data back, then format it locally with whatever library you prefer.

import json
import pandas as pd
from google import genai
from google.genai import types

client = genai.Client(api_key="YOUR_API_KEY")

# Ask for structured JSON output
response = client.models.generate_content(
    model="gemini-2.5-flash",
    config=types.GenerateContentConfig(response_mime_type="application/json"),
    contents="Return a JSON array of 10 tech companies with fields: name, ticker, market_cap, sector."
)

# Convert to CSV locally -- you control the formatting
data = json.loads(response.text)
df = pd.DataFrame(data)
df.to_csv("tech_companies.csv", index=False)
print(f"Saved {len(df)} rows to tech_companies.csv")

This "structured output" approach gives you 100% control over formatting and is the most reliable way to produce files from Gemini. Let the model do what it's good at (data generation), and handle the file formatting yourself.

30-Second Execution Timeout

Gemini's code execution sandbox has a hard 30-second timeout. This makes it ideal for quick chart generation and data transforms, but rules it out for heavy document creation tasks like multi-page PDF reports or complex PowerPoint decks. For those, use Claude or OpenAI.


Which API for What?

Feature Claude OpenAI Gemini
Sandbox Type Reusable container (30-day expiry) Ephemeral container (20-min idle timeout) Stateless sandbox (30s timeout)
Resources 5 GiB disk, 5 GiB RAM, 1 CPU Up to 64 GB RAM (tiered) Token-limited (inline output)
Shell Access Full bash Python only Python only
File Download Files API (files.download()) Container endpoint (containers.files.content.retrieve()) Inline in response (no download step)
Best Use Case Complex documents (PDF, DOCX, PPTX) Heavy data processing + file gen Quick charts and data transforms
pip install Yes (bash access) No (isolated sandbox) No (isolated sandbox)

The short version:

  • Complex documents (PDF reports, slide decks, Word docs with formatting): Claude. The pre-installed document libraries and 30-day container persistence make it the best fit.
  • Large dataset processing (crunching big CSVs, Excel transformations): OpenAI. The ability to request up to 64 GB of RAM is unmatched.
  • Quick visualizations (charts, graphs, simple data summaries): Gemini. Inline image return means fewer API calls and faster turnaround.
  • Maximum formatting control: Any model's Structured Output mode. Get JSON data back, render locally with your own libraries.

The Self-Hosted Alternative: Run Your Own Sandbox

The three vendor APIs above all run code in their infrastructure. You send a prompt, they spin up a container, and they hand you back the file. This is convenient, but it means your data leaves your network, you're bound by each vendor's sandbox limits (30-second timeouts, no internet, fixed library sets), and you pay per-execution fees.

There's a fourth option: run the sandbox yourself. In this pattern, you call any LLM API to generate code (without enabling the vendor's code execution tool), then execute that code locally in an isolated environment on your own machines. You get the same prompt-to-file workflow, but you control the execution environment.

Why Self-Host?

  • Data residency. In regulated industries (healthcare, finance, government), sending code and data to a third-party sandbox may violate compliance requirements. A local sandbox keeps everything on your infrastructure.
  • No vendor sandbox limits. You choose the timeout, the RAM, the disk, the installed libraries. Need 10 minutes of execution time? A GPU? Network access to internal services? Your sandbox, your rules.
  • Cost at scale. Vendor sandbox pricing is per-session or per-hour. At high volume, running your own execution infrastructure can be significantly cheaper.
  • Model flexibility. Since you're decoupling "generate the code" from "run the code," you can use any LLM -- including open-source models, fine-tuned models, or your own -- to prod

1 Comment

0 votes

More Posts

I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt

Karol Modelskiverified - Mar 19

Sovereign Intelligence: The Complete 25,000 Word Blueprint (Download)

Pocket Portfolioverified - Apr 1

From Prompts to Goals: The Rise of Outcome-Driven Development

Tom Smithverified - Apr 11

Architecting a Local-First Hybrid RAG for Finance

Pocket Portfolioverified - Feb 25

TypeScript Complexity Has Finally Reached the Point of Total Absurdity

Karol Modelskiverified - Apr 23
chevron_left

Related Jobs

View all jobs →

Commenters (This Week)

6 comments
2 comments
1 comment

Contribute meaningful comments to climb the leaderboard and earn badges!