Stochastic Tokenisation Improves Robustness

Sophie Steger1 · Rui Li2 · Sofiane Ennadir3 · Anya Sims4 · Arno Solin2 · Franz Pernkopf1 · Martin Trapp5

1Institute of Signal Processing and Speech Communication, Graz University of Technology, Graz, Austria, 2ELLIS Institute Finland & Aalto University, Espoo, Finland, 3King AI Labs, Microsoft Gaming, 4University of Oxford, Oxford, United Kingdom, 5KTH Royal Institute of Technology, Stockholm, Sweden

T L D R

We analyse how training with stochastic tokenisations affects robustness to adversarial attacks and random perturbations, and show that uniformly sampled stochastic tokenisations improve robustness without increasing inference cost.

Abstract

The widespread adoption of large language models (LLMs) has increased concerns about their robustness. Vulnerabilities in perturbations of tokenisation of the input indicate that models trained with a deterministic canonical tokenisation can be brittle to adversarial attacks. Recent studies suggest that stochastic tokenisation can deliver internal representations that are less sensitive to perturbations. In this paper, we analyse how stochastic tokenisations affect robustness to adversarial attacks and random perturbations. We systematically study this over a range of learning regimes (pre-training, supervised fine-tuning, and in-context learning), data sets, and model architectures. We show that pre-training and fine-tuning with uniformly sampled stochastic tokenisations improve robustness to random and adversarial perturbations. Evaluating on uniformly sampled non-canonical tokenisations reduces the accuracy of a canonically trained Llama-1b model by 29.3%. We find that training with stochastic tokenisation preserves accuracy without increasing inference cost.

Contributions

Problem: Tokenisation Brittleness

In subword tokenisation, string sequences are mapped to tokens. Text is typically encoded using a deterministic function that returns the canonical tokenisation. However, multiple other token sequences can reconstruct the same string, referred to as non-canonical tokenisations.

Same string, multiple valid tokenisations

Canonical: "revolution" → revolution
Non-canonical: "revolution" → revolution
Non-canonical: "revolution" → revolution

In standard training, models see only canonical tokenisations and become brittle when evaluated on non-canonical tokenisations.

Canonical models collapse under non-canonical tokenisation while stochastic schemes remain robust
As the level of stochasticity increases (increasing normalised edit distance), the accuracy of Llama-1b trained with canonical tokenisation (CANON) sharply drops while the same model fine-tuned with any stochastic tokenisation scheme (STOK, STOK-UNI, or UNI-K) remains robust to perturbations during testing.

This motivates our core question: Does training with stochastic tokenisation improve robustness to non-canonical tokenisations (both random and adversarial), and if so, which sampling strategy works best?

Question 1 Does stochastic tokenisation improve robustness?

If we train LLMs with stochastic tokenisations instead of only the canonical one, do they become more robust to non-canonical (random) tokenisations?

Experiments

  • Stochastic tokenisation scheme: STOCHASTOK, with a tunable parameter α controlling the level of stochasticity.
  • We evaluate pre-training, fine-tuning, and in-context learning (ICL).
  • Benchmarks: LANGUAGE GAME, CUTE, and standard MCQ datasets.
  • Models: Tiny-LLM (from scratch), Llama-1b (LoRA fine-tuning), Llama-8b (ICL).

Results

  • Canonically trained models are brittle: accuracy drops sharply under non-canonical tokenisations.
  • Pre-training with stochastic tokenisation helps, but gains are moderate alone.
  • Fine-tuning with stochastic tokenisation improves robustness, even with a small α.
  • ICL with stochastic tokenisation gives mild robustness gains, but less than fine-tuning.
Figure 2 left: Tiny-LLM stochastic pre-training robustness Figure 2 right: Tiny-LLM stochastic fine-tuning robustness
Pretraining a tiny LLM from scratch. Left, stochastic pre-training improves accuracy and mild robustness; right, additional stochastic fine-tuning (αfine > 0) after αpre = 0.1 substantially improves robustness.
Figure 3: Llama-1b robustness after stochastic fine-tuning alone
Canonically pre-trained Llama-1b checkpoint—stochastic fine-tuning of a LoRA adapter is sufficient to improve robustness against non-canonical tokenisations.

Training with stochastic tokenisations improves robustness while keeping clean accuracy intact. Using publicly available pre-trained checkpoints, stochastic fine-tuning alone achieves substantial robustness gains. ICL with stochastic tokenisation also helps, but to a lesser extent than fine-tuning.

Question 2 Improving with uniform sampling of tokenisations?

Does the distribution over stochastic tokenisations matter for robustness, and can we do better than STOCHASTOK?

Experiments

RQ2: Uniform distribution comparison
Histogram over tokenisations with edit distance 4 from the canonical tokenisation of "revolution". STOCHASTOK induces a biased distribution over a subset of segmentations, whereas STOCHASTOK-UNI samples uniformly with full support.

Higher uniformity and larger support lead to better robustness.

Question 3 Does stochastic tokenisation improve robustness against adversarial tokenisations?

Beyond random tokenisations, does training with stochastic tokenisation improve robustness to adversarial tokenisation attacks, and why?

Experiments

  • We compare canonical, and stochastic tokenisation during fine-tuning and report accuracy under canonical and adversarial tokenisation.
  • We analyse distances between non-canonical tokenisation representations to understand robustness mechanisms.

Results

  • Canonical fine-tuning accuracy collapses under attack (≈94% → ≈6%).
  • Stochastic schemes show massive robustness gains. More uniform sampling leads to stronger adversarial robustness.
  • Representation analysis shows reduced sensitivity to tokenisation changes.
  • Theory: Stochasticity smooths embedding space, reducing Lipschitz constants and adversarial vulnerability.
RQ3: Adversarial robustness comparison (original)
Accuracy under canonical and adversarial tokenisation for Llama-1b across fine-tuning strategies; canonical training collapses under attack, while stochastic schemes remain substantially more robust.
RQ3: Adversarial robustness comparison (new)
Normalised distances between representations of non-canonical tokenisations of the 1k most frequent English words. Canonical fine-tuning does not affect the distances of alternative tokenisations compared to the zero-shot Llama-1b. Stochastic tokenisations reduce distances in deeper layers.

Stochastic fine-tuning reduces brittleness under adversarial tokenisation. Theoretically and empirically, stochasticity smooths representations and reduces adversarial vulnerability.

BibTeX


          
@article{steger2026stochastic,
  title   = {Stochasticity in Tokenisation Improves Robustness},
  author  = {Sophie Steger and Rui Li and Sofiane Ennadir and Anya Sims and Arno Solin and Franz Pernkopf and Martin Trapp},
  journal = {arXiv preprint arXiv:xxxx.xxxxx},
  year    = {2026}
}