Stochastic Tokenisation Improves Robustness

Sophie Steger¹ · Rui Li² · Sofiane Ennadir³ · Anya Sims⁴ · Arno Solin² · Franz Pernkopf¹ · Martin Trapp⁵

¹Institute of Signal Processing and Speech Communication, Graz University of Technology, Graz, Austria, ²ELLIS Institute Finland & Aalto University, Espoo, Finland, ³King AI Labs, Microsoft Gaming, ⁴University of Oxford, Oxford, United Kingdom, ⁵KTH Royal Institute of Technology, Stockholm, Sweden

arXiv Code

T L D R

We analyse how training with stochastic tokenisations affects robustness to adversarial attacks and random perturbations, and show that uniformly sampled stochastic tokenisations improve robustness without increasing inference cost.

Abstract

The widespread adoption of large language models (LLMs) has increased concerns about their robustness. Vulnerabilities in perturbations of tokenisation of the input indicate that models trained with a deterministic canonical tokenisation can be brittle to adversarial attacks. Recent studies suggest that stochastic tokenisation can deliver internal representations that are less sensitive to perturbations. In this paper, we analyse how stochastic tokenisations affect robustness to adversarial attacks and random perturbations. We systematically study this over a range of learning regimes (pre-training, supervised fine-tuning, and in-context learning), data sets, and model architectures. We show that pre-training and fine-tuning with uniformly sampled stochastic tokenisations improve robustness to random and adversarial perturbations. Evaluating on uniformly sampled non-canonical tokenisations reduces the accuracy of a canonically trained Llama-1b model by 29.3%. We find that training with stochastic tokenisation preserves accuracy without increasing inference cost.

Contributions

We study how training with stochastic tokenisation affects robustness against random and adversarial tokenisations.
We analyse the sampling distribution of STOCHASTOK, finding that it is biased, and introduce unbiased uniform sampling schemes.
We provide theoretical insights into adversarial robustness introduced by stochastic tokenisation.

Problem: Tokenisation Brittleness

In subword tokenisation, string sequences are mapped to tokens. Text is typically encoded using a deterministic function that returns the canonical tokenisation. However, multiple other token sequences can reconstruct the same string, referred to as non-canonical tokenisations.

Same string, multiple valid tokenisations

Canonical: "revolution" → revolution
Non-canonical: "revolution" → revolution
Non-canonical: "revolution" → revolution

In standard training, models see only canonical tokenisations and become brittle when evaluated on non-canonical tokenisations.

As the level of stochasticity increases (increasing normalised edit distance), the accuracy of Llama-1b trained with canonical tokenisation (CANON) sharply drops while the same model fine-tuned with any stochastic tokenisation scheme (STOK, STOK-UNI, or UNI-K) remains robust to perturbations during testing.

This motivates our core question: Does training with stochastic tokenisation improve robustness to non-canonical tokenisations (both random and adversarial), and if so, which sampling strategy works best?

Question 1 Does stochastic tokenisation improve robustness?

If we train LLMs with stochastic tokenisations instead of only the canonical one, do they become more robust to non-canonical (random) tokenisations?

Experiments

Stochastic tokenisation scheme: STOCHASTOK, with a tunable parameter α controlling the level of stochasticity.
We evaluate pre-training, fine-tuning, and in-context learning (ICL).
Benchmarks: LANGUAGE GAME, CUTE, and standard MCQ datasets.
Models: Tiny-LLM (from scratch), Llama-1b (LoRA fine-tuning), Llama-8b (ICL).

Results

Canonically trained models are brittle: accuracy drops sharply under non-canonical tokenisations.
Pre-training with stochastic tokenisation helps, but gains are moderate alone.
Fine-tuning with stochastic tokenisation improves robustness, even with a small α.
ICL with stochastic tokenisation gives mild robustness gains, but less than fine-tuning.

Figure 2 left: Tiny-LLM stochastic pre-training robustness

Figure 2 right: Tiny-LLM stochastic fine-tuning robustness

Pretraining a tiny LLM from scratch. Left, stochastic pre-training improves accuracy and mild robustness; right, additional stochastic fine-tuning (α_fine > 0) after α_pre = 0.1 substantially improves robustness.

Figure 3: Llama-1b robustness after stochastic fine-tuning alone

Canonically pre-trained Llama-1b checkpoint—stochastic fine-tuning of a LoRA adapter is sufficient to improve robustness against non-canonical tokenisations.

Training with stochastic tokenisations improves robustness while keeping clean accuracy intact. Using publicly available pre-trained checkpoints, stochastic fine-tuning alone achieves substantial robustness gains. ICL with stochastic tokenisation also helps, but to a lesser extent than fine-tuning.

Question 2 Improving with uniform sampling of tokenisations?

Does the distribution over stochastic tokenisations matter for robustness, and can we do better than STOCHASTOK?

Experiments

We analyse STOCHASTOK sampling: It is biased and has incomplete support.
New sampling schemes with increasing uniformity and support:

STOCHASTOK-UNI: uniform conditional on per-token split counts.
UNIFORM-K: uniform over all tokenisations at a given edit distance.
UNIFORM: uniform over all valid tokenisations.

Evaluate on random non-canonical tokenisations and increasing perturbation strength.

Histogram over tokenisations with edit distance 4 from the canonical tokenisation of "revolution". STOCHASTOK induces a biased distribution over a subset of segmentations, whereas STOCHASTOK-UNI samples uniformly with full support.

Higher uniformity and larger support lead to better robustness.

Question 3 Does stochastic tokenisation improve robustness against adversarial tokenisations?

Beyond random tokenisations, does training with stochastic tokenisation improve robustness to adversarial tokenisation attacks, and why?

Experiments

We compare canonical, and stochastic tokenisation during fine-tuning and report accuracy under canonical and adversarial tokenisation.
We analyse distances between non-canonical tokenisation representations to understand robustness mechanisms.

Results

Canonical fine-tuning accuracy collapses under attack (≈94% → ≈6%).
Stochastic schemes show massive robustness gains. More uniform sampling leads to stronger adversarial robustness.
Representation analysis shows reduced sensitivity to tokenisation changes.
Theory: Stochasticity smooths embedding space, reducing Lipschitz constants and adversarial vulnerability.

RQ3: Adversarial robustness comparison (original)

Accuracy under canonical and adversarial tokenisation for Llama-1b across fine-tuning strategies; canonical training collapses under attack, while stochastic schemes remain substantially more robust.

RQ3: Adversarial robustness comparison (new)

Normalised distances between representations of non-canonical tokenisations of the 1k most frequent English words. Canonical fine-tuning does not affect the distances of alternative tokenisations compared to the zero-shot Llama-1b. Stochastic tokenisations reduce distances in deeper layers.

Stochastic fine-tuning reduces brittleness under adversarial tokenisation. Theoretically and empirically, stochasticity smooths representations and reduces adversarial vulnerability.

BibTeX

@article{steger2026stochastic, title = {Stochasticity in Tokenisation Improves Robustness}, author = {Sophie Steger and Rui Li and Sofiane Ennadir and Anya Sims and Arno Solin and Franz Pernkopf and Martin Trapp}, journal = {arXiv preprint arXiv:xxxx.xxxxx}, year = {2026} }