See how a bot would slice your document into pieces
Want a bot that answers questions based on your files (manuals, FAQs, terms, a book)? The bot doesn't read the whole document at once. First you have to cut it into pieces (called chunks), and the bot searches those pieces one at a time.
This tool shows you with your own eyes what that split looks like. Paste a text, pick a way to cut, and see colored pieces. Each one in a different color, each with a token count (a *"token"* is roughly a chunk of a word, used to measure length).
Five ways to cut: smart (tries not to break paragraphs or sentences, the best default), by paragraph, by sentence, into equal pieces of N tokens, into equal pieces of N characters. Each gives a different result, here you'll see which fits your text.
How to use it
- Paste a long text into the field. An article, terms of service, book chapter, meeting notes, anything.
- Pick a way to cut. If you don't know which, leave "smart" (a solid default for most texts).
- Use the slider to set piece length in tokens. A reasonable range is 150-300, one piece holds roughly one thought.
- Use the "repeat at the boundary" slider to set how many sentences should overlap between neighbouring pieces. This helps when an important sentence falls right on the cut line. Typically 10-20% of piece length.
- Pick a model (GPT, Claude, Gemini). Each counts tokens differently, so the numbers will differ.
- Below you'll see colored pieces. Each in a different color, with a token count and position in the text.
- The stats panel shows: how many pieces, shortest / average / longest, total tokens, and how many extra tokens the boundary repeat added.
When this is useful
Six typical situations where this visualization gives you a concrete answer instead of a guess:
- Building a bot for company documents. You have 200 PDF manuals. Paste one sample doc, click through three cutting ways, see which best preserves meaning. Decision in 5 minutes instead of an hour reading docs.
- The bot can't find the answer, even though it IS in the document. A very common problem. Paste the doc where you know the answer is. Check whether that part is in one color (whole, coherent), or whether it got cut in half between two pieces. If cut, increase piece length or turn on the boundary repeat.
- **Explaining *"what is chunking"* to a teammate**. Paste anything, show on screen. Five minutes of visual explanation beats an hour of dry theory.
- Estimating cost. A bot that knows your documents charges per token. Here you see exactly how many tokens your text becomes after cutting (with or without overlap). Multiply by the service rate and you have a concrete number.
- Picking between GPT, Claude and Gemini. Each has a different limit on how much it fits in one query. Here you check how many of your pieces can fit one query in each. Gemini's big window holds maybe 30 pieces, GPT might hold 5-10.
- Testing different piece lengths (150 vs 300 vs 500 tokens). Small pieces = the bot sees less context and gets things wrong more often. Big pieces = each one drowns in unrelated stuff. The visualization shows where the sweet spot for your data sits.