Message Windowing

Overview

Long conversations can exceed Claude’s context window. NeoCash’s message windowing system in src/lib/message-windowing.ts intelligently trims conversations to fit within a 160,000-token budget while preserving the most important context.

The Problem

A typical NeoCash conversation might include:

Dozens of messages with detailed financial analysis
Uploaded documents (PDFs, spreadsheets) as file attachments
Tool call results with structured data
Extended thinking blocks

Without windowing, a long goal thread with documents could easily exceed 200k tokens, causing API failures.

Token Budget

Constants

Parameter	Default	Purpose
Token budget	160,000	Maximum estimated tokens for the API call
Recent count	6	Number of recent messages to keep fully intact
Char-to-token ratio	4:1	Approximate conversion (4 characters ≈ 1 token)

Three-Step Strategy

Step 1: Convert Unsupported File Types

Before windowing, file attachments are processed:

DOCX/XLSX: Extracted to plain text via extractFileText()
Other unsupported types: Replaced with placeholder text
Images and PDFs: Passed through to Claude as-is

This ensures file content is always available in some form, even if the original binary format isn’t supported.

Step 2: Strip Files from Old Messages

The most recent N messages (default: 6) keep their file attachments intact. For all older messages:

File parts are replaced with text placeholders describing the attachment
This dramatically reduces token count for older messages while preserving the conversation flow

For example, a 50-page PDF attachment in message #3 becomes a simple text note like “Attached: financial-report.pdf” while the same PDF in message #18 (recent) stays as the full file content.

Step 3: Drop Old Messages if Over Budget

If the conversation still exceeds the token budget after file stripping:

Keep the first user message — this provides the original conversation context
Remove the oldest non-first messages one at a time until under budget
The first user message is always preserved, even if it’s very old

API

async function prepareMessagesForAPI(
  messages: UIMessage[],
  options?: {
    tokenBudget?: number;
    recentCount?: number;
  }
): Promise<{
  messages: UIMessage[];
  trimmed: boolean;
  estimatedTokens: number;
}>

Parameters

messages: The full conversation message array
options.tokenBudget: Override the default 160k budget
options.recentCount: Override the default 6 recent messages to keep

Return Value

messages: The trimmed message array ready for the API
trimmed: Whether any messages were removed
estimatedTokens: Approximate token count of the output

Design Decisions

Why Keep the First Message?

The first user message often contains the core question or goal context. Dropping it would make the conversation incoherent — the AI would lose the “why” of the entire thread.

Why 6 Recent Messages?

Six messages typically covers 3 complete exchange rounds (user → assistant × 3). This gives Claude enough recent context to maintain conversational coherence while leaving room for system prompts and tool definitions.

Why 160k Budget?

Claude’s context window is 200k tokens. The 160k budget leaves a 40k buffer for:

System prompt (~2-4k tokens)
Tool definitions (~8-10k tokens)
Memory context (~200 tokens)
Response generation space