How do I split documents for AI?

Default to the **smart** cut (also called recursive). It tries to cut **by paragraph first**, if a paragraph is too long, **by sentence**, if a sentence is still too long, by word. That preserves meaning best. The popular tool LangChain uses this approach and most off-the-shelf RAG setups follow it. Cutting **by paragraph** works well for technical docs and books. **By sentence**: for chats and short descriptions. **Equal pieces**: fast but breaks meaning, only as a last resort.

What is "boundary repeat" (overlap) and do I need it?

Imagine cutting a document onto three pages. An important sentence falls **right on the seam** between pages 1 and 2, half is here, half is there. When the bot looks for the answer, it picks either page 1 or page 2, but **neither holds the full sentence**. A boundary repeat means: **the tail of page 1 also appears at the start of page 2**. Now the full sentence is on both. Typical setting: **10-20%** of piece length. Trade-off: a bit more tokens to pay for.

Why does the bot not find an answer that IS in the document?

Common reasons: - **(1) the answer got cut** between two pieces, paste the document here and check visually. - **(2) pieces are too small** so the answer lacks context (*"click here"*, where?). Increase piece length. - **(3) pieces are too big** and the answer drowns in noise. Shrink them. - **(4) character-based cuts** break words mid-letter. Switch to *"smart"* or *"by sentence"*. This tool shows all those problems on **one screen**.

What is the best piece length?

Depends on the document. **150-300 tokens** is a good starting point for most cases, one piece holds one thought. For typical use: - **FAQs** (short Q&A): **100-200** tokens. - **articles and manuals**: **300-500**. - **code**: one function as one piece (typically **200-500**). Rule: a piece should hold **one complete thought**. Too small: context is lost. Too big: meaning gets diluted. Here you can test different settings without writing any code.

What is LangChain and why does everyone use it?

LangChain is a **popular developer tool** that handles a lot of the dirty work for you: reading files, cutting them into pieces, connecting to databases and to the bot. Their **smart cutting** algorithm has become a de facto standard, most RAG projects use exactly this approach. The **"smart"** option in this tool does the same thing as LangChain. So you can **test the result** before writing a single line of code.

Can I use this for books or long PDFs?

Yes, but **paste a fragment**, not the whole thing, the browser will choke on 500 pages at once. Recommended: take **one chapter** or **a dozen typical pages**, test settings here, then apply the same setup programmatically to the full book. For **long books** the typical setup is 300-500 tokens per piece with a 10% repeat. For **technical PDFs** (tables, lists): paragraph cuts often work better.

Why do different models (GPT vs Claude vs Gemini) show different token counts?

Because **each company has its own token dictionary**. The word *"documentation"* might be 3 tokens for GPT and 5 for Claude. For non-English text **Claude and Gemini usually count more tokens than GPT**. That's not a bug, that's the difference between services. Practically: if you plan to **search documents** via OpenAI but **answer** with Claude, **both matter**: search by OpenAI numbers, query budget by Claude numbers. You can switch here and check both.

What does "+ X% extra" in the stats mean?

How many **extra tokens** the boundary repeat added compared to the original text. **0%** = pieces add up perfectly to the original (no repeat). **+10-20%** = standard repeat. That extra **costs you**: each 1% is 1% more to pay for *"teaching"* the bot and 1% more storage in the database. So don't overdo the repeat past 20%.

Why does "by paragraph" sometimes merge several paragraphs into one piece?

Because your paragraphs are **shorter than the piece length you set**. The algorithm keeps adding paragraphs until it has reached the target length, then *"closes"* the piece. Example: paragraphs of 50 tokens each, piece length 300 → 6 paragraphs end up together. **That's a good thing**, better a few coherent longer pieces than many short ones the bot can barely read.

RAG text chunker - free

See how a bot would slice your document into pieces

Want a bot that answers questions based on your files (manuals, FAQs, terms, a book)? The bot doesn't read the whole document at once. First you have to cut it into pieces (called chunks), and the bot searches those pieces one at a time.

This tool shows you with your own eyes what that split looks like. Paste a text, pick a way to cut, and see colored pieces. Each one in a different color, each with a token count (a *"token"* is roughly a chunk of a word, used to measure length).

Five ways to cut: smart (tries not to break paragraphs or sentences, the best default), by paragraph, by sentence, into equal pieces of N tokens, into equal pieces of N characters. Each gives a different result, here you'll see which fits your text.

How to use it

Paste a long text into the field. An article, terms of service, book chapter, meeting notes, anything.
Pick a way to cut. If you don't know which, leave "smart" (a solid default for most texts).
Use the slider to set piece length in tokens. A reasonable range is 150-300, one piece holds roughly one thought.
Use the "repeat at the boundary" slider to set how many sentences should overlap between neighbouring pieces. This helps when an important sentence falls right on the cut line. Typically 10-20% of piece length.
Pick a model (GPT, Claude, Gemini). Each counts tokens differently, so the numbers will differ.
Below you'll see colored pieces. Each in a different color, with a token count and position in the text.
The stats panel shows: how many pieces, shortest / average / longest, total tokens, and how many extra tokens the boundary repeat added.

When this is useful

Six typical situations where this visualization gives you a concrete answer instead of a guess:

Building a bot for company documents. You have 200 PDF manuals. Paste one sample doc, click through three cutting ways, see which best preserves meaning. Decision in 5 minutes instead of an hour reading docs.
The bot can't find the answer, even though it IS in the document. A very common problem. Paste the doc where you know the answer is. Check whether that part is in one color (whole, coherent), or whether it got cut in half between two pieces. If cut, increase piece length or turn on the boundary repeat.
**Explaining *"what is chunking"* to a teammate**. Paste anything, show on screen. Five minutes of visual explanation beats an hour of dry theory.
Estimating cost. A bot that knows your documents charges per token. Here you see exactly how many tokens your text becomes after cutting (with or without overlap). Multiply by the service rate and you have a concrete number.
Picking between GPT, Claude and Gemini. Each has a different limit on how much it fits in one query. Here you check how many of your pieces can fit one query in each. Gemini's big window holds maybe 30 pieces, GPT might hold 5-10.
Testing different piece lengths (150 vs 300 vs 500 tokens). Small pieces = the bot sees less context and gets things wrong more often. Big pieces = each one drowns in unrelated stuff. The visualization shows where the sweet spot for your data sits.

Questions and answers

A chunk is a piece of text we cut a long document into before feeding it to the bot. The bot doesn't read the whole book at once, it finds the best-matching piece for your question and only looks at that piece when it answers. So how you cut matters: if the answer fits inside one piece, the bot finds it. If it gets cut between two pieces, the bot might miss it entirely.

See how a bot would slice your document into pieces

How to use it

Paste a long text into the field. An article, terms of service, book chapter, meeting notes, anything.

Pick a way to cut. If you don't know which, leave "smart" (a solid default for most texts).

Use the slider to set piece length in tokens. A reasonable range is 150-300, one piece holds roughly one thought.

Use the "repeat at the boundary" slider to set how many sentences should overlap between neighbouring pieces. This helps when an important sentence falls right on the cut line. Typically 10-20% of piece length.

Pick a model (GPT, Claude, Gemini). Each counts tokens differently, so the numbers will differ.

Below you'll see colored pieces. Each in a different color, with a token count and position in the text.

The stats panel shows: how many pieces, shortest / average / longest, total tokens, and how many extra tokens the boundary repeat added.

When this is useful

Six typical situations where this visualization gives you a concrete answer instead of a guess:

Building a bot for company documents. You have 200 PDF manuals. Paste one sample doc, click through three cutting ways, see which best preserves meaning. Decision in 5 minutes instead of an hour reading docs.
The bot can't find the answer, even though it IS in the document. A very common problem. Paste the doc where you know the answer is. Check whether that part is in one color (whole, coherent), or whether it got cut in half between two pieces. If cut, increase piece length or turn on the boundary repeat.
**Explaining *"what is chunking"* to a teammate**. Paste anything, show on screen. Five minutes of visual explanation beats an hour of dry theory.
Estimating cost. A bot that knows your documents charges per token. Here you see exactly how many tokens your text becomes after cutting (with or without overlap). Multiply by the service rate and you have a concrete number.
Picking between GPT, Claude and Gemini. Each has a different limit on how much it fits in one query. Here you check how many of your pieces can fit one query in each. Gemini's big window holds maybe 30 pieces, GPT might hold 5-10.
Testing different piece lengths (150 vs 300 vs 500 tokens). Small pieces = the bot sees less context and gets things wrong more often. Big pieces = each one drowns in unrelated stuff. The visualization shows where the sweet spot for your data sits.

Questions and answers

RAG text chunker

See how a bot would slice your document into pieces

How to use it

When this is useful

Questions and answers

Related tools

Embedding cost calculator

LLM token counter

LLM cost calculator

RAG text chunker

See how a bot would slice your document into pieces

How to use it

When this is useful

Questions and answers

Related tools

Embedding cost calculator

LLM token counter

LLM cost calculator