Skip the Browser: How HTML-to-Markdown Cuts Agent Token Costs by 80%
All Posts

Skip the Browser: How HTML-to-Markdown Cuts Agent Token Costs by 80%

Ryan Turner
Ryan Turner · Head of Growth

For most read-only agent tasks, you do not need a full browser at all. Fetch the page, convert it to clean markdown, and hand that to the model. Stripping markup, scripts, and styling before the model reads anything removes noise the model never needed. As a result, it cuts your token bill, often by more than half.

The mistake is treating every web task as a browser-automation problem. Reading a docs page, pulling an article, or grabbing a product spec is a fetch-and-convert problem. You only reach for a browser when the page fights back.

Key Takeaways
  • For read-only tasks, fetch and convert to markdown instead of driving a browser.
  • Raw HTML wastes tokens on markup, inline scripts, styles, and boilerplate the model ignores.
  • Practitioners report token cuts of around 80% from this swap; measure your own pages before you trust any single figure.
  • Use the MCP Fetch reference server or a render API that returns markdown directly.
  • Keep a real browser for logins, JS-gated content, and interactive flows.

This post sits inside a larger guide on how to give AI agents live web access. Here we focus on the cheapest path: skip the browser when you can.

Why does raw HTML waste so many tokens?

Raw HTML carries a large payload the model does not need. HTML-to-markdown conversion is the step that strips tags, inline scripts, style blocks, tracking pixels, nav chrome, and footer boilerplate, keeping only the readable content. The model pays for every one of those discarded tokens on input. Moreover, that cost recurs on every page, every run, across every agent in your fleet.

Think about a typical article page. The text you want might be a few thousand words. The HTML around it, however, carries <div> nesting, class soup, analytics snippets, and ad-tech scaffolding that often outweighs the prose. Feed that straight into a context window and you burn budget on structure the model discards anyway.

Markdown, in contrast, keeps the content and drops the noise. Headings stay headings, links stay links, and lists stay lists. Everything else, the scripts, the styling, the layout wrappers, falls away. You get the meaning, not the machinery.

The scale matters because agents are about to be everywhere. In 2025, Gartner predicted that Gartner, Gartner Predicts 40% of Enterprise Apps Will Feature Task-Specific AI Agents by 2026 40% of enterprise apps would feature task-specific AI agents by the end of 2026, up from under 5% in 2025. As a result, when that many agents read the web, per-page token waste compounds into a real line item.

How much can HTML-to-markdown actually save?

The savings are large but page-dependent, so treat any headline number as a starting point, not a promise. Practitioners report cuts of around 80% from converting HTML to markdown before the model reads it, per dev.to, Browser Tools for AI Agents Part 4: Skip the Browser (2026). That figure is self-reported by practitioners and vendors, not independently verified, so it belongs in your hypothesis column rather than your budget. For example, a content-heavy page wrapped in light markup will save less than a script-bloated app shell that hides a single paragraph of real text. Both shrink, but the ratio swings hard with the page. So measure your own targets. We ran the token counts this way on representative pages: take ten of them, count tokens for the raw HTML version and the markdown version, then look at the spread. In our testing, you will usually see cuts well past half, sometimes far more. However, the only number that matters for your budget is the one you measured on your own pages. Anchor your cost model in that, not in a headline.

This habit pays off twice. You shrink input tokens today. Furthermore, you build a measurement baseline that flags regressions when a target site changes its layout next quarter. From our work across agent workloads, that baseline is the difference between catching a cost spike in a dashboard and discovering it in an invoice.

How do you convert HTML to markdown in an agent pipeline?

Two patterns cover most cases: a fetch-and-convert tool wired into your agent, or a render API that returns markdown directly. Both remove the same noise. The difference is who runs the fetch and how well it handles sites that resist automated access.

Option 1: the MCP Fetch reference server

The simplest entry point is the MCP Fetch reference server, which fetches a URL and converts the HTML to markdown in one step. It ships in the official Model Context Protocol servers repo, so any MCP-compatible agent can call it as a tool. For internal docs, public articles, and sites that do not block bots, this is often all you need.

The catch is access. A plain fetch goes out from your server IP, and a growing share of the web now treats unfamiliar automated traffic as hostile. In 2025, Imperva, 2025 Bad Bot Report found automated bots made up 51% of all web traffic in 2024, the first time bots passed humans in a decade, with bad bots at 37%. As a result, defenses tuned for that volume will often block a naked fetch before you ever get HTML to convert.

Option 2: a render API that returns markdown

When the target resists a plain fetch, move the fetch onto infrastructure built to get through, and ask it to return markdown directly. Massive's Web Render API exposes a Browsing endpoint with format=markdown, so the page comes back prompt-ready in a single call. No separate fetch step, no client-side converter to maintain, no HTML staged in memory.

Two things make this practical at scale. First, markdown is a first-class output format on the endpoint, not a bolt-on, so the conversion happens where the page is rendered. Second, the request leaves from a real consumer-device network spanning 195+ countries and roughly 1.3M daily active devices, so the fetch reaches sites that reject datacenter traffic. Residential proxies are connections that route through real consumer devices rather than datacenter ranges, which is why they read as ordinary visitors. We measured this gap in our own vendor benchmarking: residential IPs land success rates on protected sites far above datacenter IPs (rough ranges of about 85 to 99% versus 20 to 40%). Notably, treat that as a vendor benchmark, not independent research.

That access matters more every month. In 2025, Cloudflare, Cloudflare Just Changed How AI Crawlers Scrape the Internet-at-Large began blocking AI crawlers by default across roughly 20% of the web on July 1, 2025. As a result, if your fetch cannot reach the page, the cheapest markdown pipeline in the world returns nothing.

You can also tune the call. The Browsing endpoint offers speed tiers and a difficulty parameter, runs sync or async, and holds sticky sessions up to 12 minutes on the same egress when a multi-step read needs continuity. For one-shot reads, in contrast, request markdown and move on.

When do you still need a real browser?

You still need a browser when the content does not exist until something runs in one. Logins, multi-step forms, infinite scroll, and JS-gated content all require a live rendering context and real interaction. Fetch-and-convert returns an empty shell on those pages, because the markup arrives before the data does.

The honest rule we apply: skip the browser for read-only, reach for one for read-write or interactive. If your task is "read this page and summarize it," convert to markdown. If it is "log in, click through three screens, and submit," however, you need automation that drives a real session. Browser automation is the practice of programmatically driving a real rendering engine to click, type, and wait, exactly the work fetch-and-convert cannot do.

When you do cross that line, the framework and the infrastructure both matter. For example, picking the automation layer is its own decision, covered in agent browser frameworks. Similarly, the question of running that fleet yourself versus buying it shows up fast, which is the focus of managed browser infrastructure. The decision tree is simple at the top: try markdown first, escalate to a browser only when the page forces it.

One more reason to default to markdown: it is the format your grounding layer wants anyway. Grounding is the practice of feeding a model live, retrieved context so its answers track real sources instead of stale training data. Clean markdown feeds directly into retrieval and context assembly, which is why it shows up again in grounding LLMs with live web data. In other words, skipping the browser is not just cheaper; it produces the exact artifact the rest of your pipeline already expects.

Sources

Frequently Asked Questions

Does HTML-to-markdown always cut tokens by 80%?

No. The 80% figure is practitioner and vendor self-reported, not independently verified, and the real number depends on the page. Script-heavy pages save more; lean pages save less. Therefore, measure ten of your own targets to set a budget you can trust.

Will I lose data converting HTML to markdown?

You lose layout and styling, not content. Headings, links, lists, and text survive; scripts, CSS, and chrome do not. If you need attribute-level detail like specific data tags, capture raw HTML for those pages and convert everything else.

Why not just fetch the page myself?

You can, and the MCP Fetch server makes it easy, until the target blocks you. With bots now the majority of web traffic and many sites blocking unfamiliar automated requests by default, plain fetches fail often enough that a render API on a real-device network becomes the reliable path.

Does markdown output help with AI Overviews or search tasks?

For reading arbitrary pages, yes. For structured SERP or AI-answer retrieval, however, a dedicated Search endpoint is usually a better fit than fetching result pages, since it returns the data already parsed instead of leaving you to convert search HTML.