namespace-guard

Cross-script confusable detection, slug safety, and LLM Denial of Spend defence in one zero-dependency package.

/neym-speys gahrd/ noun A zero-dependency utility that prevents slug collisions across users, organisations, and reserved routes. The world's first library to detect confusable characters across non-Latin scripts, backed by 4,174 character pairs measured across 245 fonts using vector-outline raycasting (RaySpace).
Try it out View on GitHub Read the blog post Protecting LLM pipelines? Start here

How it works

Format Validation

Validates identifiers against a configurable regex pattern. Default: 2–30 lowercase alphanumeric characters and hyphens.

"a" → invalid

Reserved Names

Blocks system routes, brand names, and other reserved identifiers. Supports categorised lists with custom error messages.

"admin" → reserved (system)

Collision Detection

Checks multiple database tables in parallel. Users, organisations, teams - one call covers them all, with alternative suggestions.

"sarah" → taken by user

Anti-Spoofing

Detects visually deceptive Unicode across 22+ scripts, including cross-script confusable pairs between non-Latin scripts that no existing standard covers.

"аdmin" → Cyrillic "а" looks like "a"
"ᅵ丨" → Hangul/Han cross-script (ray distance 0.004)

Profanity & Evasion

Blocks common profanity obfuscation (Unicode confusables + substitutions) with conservative defaults, and supports external moderation libraries via validators.

"5h1t" → blocked

Invisible Character Guard

Catches zero-width joiners, direction overrides, and other invisible Unicode characters that make identifiers look clean while hiding extra bytes.

"a‍dmin" → contains a zero-width joiner

Domain Spoof Detection

Detects realistic IDN homograph attacks. Only flags single-script spoofs that could actually be registered as domains — mixed-script labels are excluded because ICANN registrars reject them.

"раураӏ" spoofs "paypal" (all-Cyrillic)

Research Behind The Pipeline

We measured 4,174 character pairs across 245 system fonts using vector-outline raycasting (RaySpace) and discovered 3,525 cross-script confusables that no standard covers.

Visual measurement pipeline

  • 4,174 confusable pairs scored across 245 system fonts via vector-outline raycasting (RaySpace)
  • Each pair carries a danger score (0–1) representing geometric similarity across fonts. The shipped dataset uses a 0.5 floor; for higher precision, filter at 0.7 (574 pairs)
  • 3,525 novel cross-script pairs between non-Latin scripts: Hangul/Han, Cyrillic/Greek, Greek/Han, Cyrillic/Arabic, and more
  • Glyph-reuse detection to distinguish font fallback from true visual similarity
  • Full dataset published as confusable-vision (CC-BY-4.0)

Normalization composability

  • 31 documented NFKC vs TR39 divergence vectors, shipped as nfkc-tr39-divergence-v1
  • Two production map modes: CONFUSABLE_MAP (NFKC-first) and CONFUSABLE_MAP_FULL (TR39/raw-input)
  • Published artifacts: composability vectors + benchmark corpus (confusable-bench.v1)
  • Accepted into Unicode public review (PRI #540)

Built-in Profiles

Pick a profile, override what you need. Profiles are defaults, not lock-in.

consumer-handle

For public usernames and handles. Strict anti-impersonation defaults.

  • 2-30 chars
  • lowercase letters, numbers, hyphens
  • purely numeric names blocked
Best for: social/community handles

org-slug

For teams, workspaces, and company slugs. Slightly longer and conservative by default.

  • 2-40 chars
  • lowercase letters, numbers, hyphens
  • purely numeric names blocked
Best for: org/workspace URLs

developer-id

For technical IDs and package-style names. Allows more length and numeric-only IDs.

  • 2-50 chars
  • lowercase letters, numbers, hyphens
  • purely numeric names allowed
Best for: internal/dev tooling identifiers

Override example

Most setups only need one or two overrides.

const guard = createNamespaceGuardWithProfile("org-slug", {
  allowPurelyNumeric: true,
}, adapter);

Scope

What you get and where the boundary is.

What it does

  • Normalizes and validates slug format
  • Checks collisions across multiple data sources
  • Blocks reserved names
  • Scores and enforces Unicode confusable risk
  • Prevents confusable-driven token inflation in LLM pipelines (isClean/canonicalise/scan)
  • Flags invisible and direction-control characters
  • Lets you plug in moderation or policy validators

Out of scope

  • No bundled third-party moderation datasets (bring your own or use the built-in list)
  • Not a replacement for account security (2FA, verification, abuse ops)
  • No guarantee of perfect abuse detection

Try it out

Type a slug to run claimability checks against simulated data, including reserved names, collisions, spoofing risk, invisible-character checks, and optional profanity evasion checks.

Policy profile:
Suggestion strategy:
Security options:
Visual weights:

Try these:
Simulated database

Users

sarah bob charlie

Organisations

acme-corp github vercel

Reserved (system)

admin api settings dashboard login signup help support billing

Reserved (brand)

namespace-guard

Moderation list (demo)

English list (~2.7k terms) Custom additions included

Integration (9 adapters)

import { createNamespaceGuardWithProfile } from "namespace-guard";
import { createPrismaAdapter } from "namespace-guard/adapters/prisma";
import { PrismaClient } from "@prisma/client";

const prisma = new PrismaClient();

const guard = createNamespaceGuardWithProfile("consumer-handle", {
  reserved: ["admin", "api", "settings"],
  sources: [
    { name: "user", column: "handle" },
    { name: "organization", column: "slug" },
  ],
}, createPrismaAdapter(prisma));

await guard.assertClaimable("acme-corp");
// throws if invalid, reserved, taken, or too confusable
import { createNamespaceGuard } from "namespace-guard";
import { createDrizzleAdapter } from "namespace-guard/adapters/drizzle";
import { eq } from "drizzle-orm";
import { db } from "./db";
import { users, organizations } from "./schema";

const adapter = createDrizzleAdapter(db, { users, organizations }, eq);

const guard = createNamespaceGuard({
  reserved: ["admin", "api", "settings"],
  sources: [
    { name: "users", column: "handle" },
    { name: "organizations", column: "slug" },
  ],
  suggest: { strategy: ["sequential", "random-digits"], max: 3 },
}, adapter);

const result = await guard.check("acme-corp");

if (result.available) {
  // Safe to create
} else {
  console.log(result.message);      // "That name is already in use."
  console.log(result.suggestions); // ["acme-corp-1", "acme-corp-4821", "acme-corp1"]
}
import { createNamespaceGuard } from "namespace-guard";
import { createKyselyAdapter } from "namespace-guard/adapters/kysely";
import { Kysely, PostgresDialect } from "kysely";

const db = new Kysely({ dialect: new PostgresDialect({ pool }) });

const guard = createNamespaceGuard({
  reserved: ["admin", "api", "settings"],
  sources: [
    { name: "users", column: "handle" },
    { name: "organizations", column: "slug" },
  ],
  suggest: { strategy: ["sequential", "random-digits"], max: 3 },
}, createKyselyAdapter(db));

const result = await guard.check("acme-corp");

if (result.available) {
  // Safe to create
} else {
  console.log(result.message);      // "That name is already in use."
  console.log(result.suggestions); // ["acme-corp-1", "acme-corp-4821", "acme-corp1"]
}
import { createNamespaceGuard } from "namespace-guard";
import { createKnexAdapter } from "namespace-guard/adapters/knex";
import Knex from "knex";

const knex = Knex({ client: "pg", connection: process.env.DATABASE_URL });

const guard = createNamespaceGuard({
  reserved: ["admin", "api", "settings"],
  sources: [
    { name: "users", column: "handle" },
    { name: "organizations", column: "slug" },
  ],
  suggest: { strategy: ["sequential", "random-digits"], max: 3 },
}, createKnexAdapter(knex));

const result = await guard.check("acme-corp");

if (result.available) {
  // Safe to create
} else {
  console.log(result.message);      // "That name is already in use."
  console.log(result.suggestions); // ["acme-corp-1", "acme-corp-4821", "acme-corp1"]
}
import { createNamespaceGuard } from "namespace-guard";
import { createTypeORMAdapter } from "namespace-guard/adapters/typeorm";
import { DataSource } from "typeorm";
import { User, Organization } from "./entities";

const dataSource = new DataSource({ /* ... */ });
const adapter = createTypeORMAdapter(dataSource, { user: User, organization: Organization });

const guard = createNamespaceGuard({
  reserved: ["admin", "api", "settings"],
  sources: [
    { name: "user", column: "handle" },
    { name: "organization", column: "slug" },
  ],
  suggest: { strategy: ["sequential", "random-digits"], max: 3 },
}, adapter);

const result = await guard.check("acme-corp");
import { createNamespaceGuard } from "namespace-guard";
import { createMikroORMAdapter } from "namespace-guard/adapters/mikro-orm";
import { MikroORM } from "@mikro-orm/core";
import { User, Organization } from "./entities";

const orm = await MikroORM.init(config);
const adapter = createMikroORMAdapter(orm.em, { user: User, organization: Organization });

const guard = createNamespaceGuard({
  reserved: ["admin", "api", "settings"],
  sources: [
    { name: "user", column: "handle" },
    { name: "organization", column: "slug" },
  ],
  suggest: { strategy: ["sequential", "random-digits"], max: 3 },
}, adapter);

const result = await guard.check("acme-corp");
import { createNamespaceGuard } from "namespace-guard";
import { createSequelizeAdapter } from "namespace-guard/adapters/sequelize";
import { User, Organization } from "./models";

const adapter = createSequelizeAdapter({ user: User, organization: Organization });

const guard = createNamespaceGuard({
  reserved: ["admin", "api", "settings"],
  sources: [
    { name: "user", column: "handle" },
    { name: "organization", column: "slug" },
  ],
  suggest: { strategy: ["sequential", "random-digits"], max: 3 },
}, adapter);

const result = await guard.check("acme-corp");
import { createNamespaceGuard } from "namespace-guard";
import { createMongooseAdapter } from "namespace-guard/adapters/mongoose";
import { User, Organization } from "./models";

const adapter = createMongooseAdapter({ user: User, organization: Organization });

const guard = createNamespaceGuard({
  reserved: ["admin", "api", "settings"],
  sources: [
    { name: "user", column: "handle", idColumn: "_id" },
    { name: "organization", column: "slug", idColumn: "_id" },
  ],
  suggest: { strategy: ["sequential", "random-digits"], max: 3 },
}, adapter);

const result = await guard.check("acme-corp");
import { createNamespaceGuard } from "namespace-guard";
import { createRawAdapter } from "namespace-guard/adapters/raw";
import { Pool } from "pg";

const pool = new Pool();
const adapter = createRawAdapter((sql, params) => pool.query(sql, params));

const guard = createNamespaceGuard({
  reserved: ["admin", "api", "settings"],
  sources: [
    { name: "users", column: "handle" },
    { name: "organizations", column: "slug" },
  ],
  suggest: { strategy: ["sequential", "random-digits"], max: 3 },
}, adapter);

const result = await guard.check("acme-corp");

if (result.available) {
  // Safe to create
} else {
  console.log(result.message);      // "That name is already in use."
  console.log(result.suggestions); // ["acme-corp-1", "acme-corp-4821", "acme-corp1"]
}

CLI workflow

From red-team attacks to production CI gates in a few commands.

1. Red-team with attack-gen

Generate realistic variants to test your policy. Default mode is evasion (Unicode confusables + substitutions). Use --mode impersonation when you only want Unicode spoofing analysis.

$ npx namespace-guard attack-gen paypal --json
$ npx namespace-guard attack-gen shit --mode evasion --json

2. Recommend in one command

Get recommended warn/block thresholds and a suggested CI command from your dataset.

$ npx namespace-guard recommend ./risk-dataset.json
# emits recommended risk config + CI gate command

3. Tune thresholds for your domain

Set the relative cost of blocking a legitimate user vs letting a bad actor through. The calibrator picks optimal warn/block thresholds from your data.

$ npx namespace-guard calibrate ./risk-dataset.json \
  --cost-block-benign 8 --cost-allow-malicious 12 --malicious-prior 0.05

4. Audit canonical data before migration

Detect canonical collisions in exported records before adding database unique constraints on canonical columns.

$ npx namespace-guard audit-canonical ./users-export.json --json
// checks identifier fields + optional stored canonical fields

5. Compare map behaviour

See how results differ between the NFKC-filtered and full confusable maps, with a built-in 31-vector regression baseline.

$ npx namespace-guard drift --json
// dataset: builtin:composability-vectors (31 vectors)
// includes actionFlips, averageScoreDelta, maxAbsScoreDelta

6. Enforce in CI

Fail pull requests when changes exceed the limits you set.

$ npm run ci:drift-gate -- \
  --max-action-flips 29 --max-average-score-delta 95 --max-abs-score-delta 100

Advanced API primitives

Low-level helpers for custom scoring, pairwise checks, and cross-script risk analysis.

Explainable pairwise checks

skeleton() for fast binary checks. confusableDistance() when you need graded similarity with explainable steps.

import { skeleton, areConfusable, confusableDistance } from "namespace-guard";
skeleton("pa\u0443pal"); // "paypal"
areConfusable("paypal", "pa\u0443pal"); // true
confusableDistance("paypal", "pa\u0443pal"); // similarity + chainDepth + steps
confusableDistance("paypal", "pa\u0443pal", { weights: CONFUSABLE_WEIGHTS }); // measured visual costs

Measured visual weights + cross-script detection

4,174 confusable pairs measured across 245 system fonts using vector-outline raycasting (RaySpace), including 3,525 cross-script pairs between non-Latin scripts (Hangul/Han, Thai/Devanagari, Cyrillic/Greek, and more). The world's first cross-script confusable dataset.

import { areConfusable, detectCrossScriptRisk } from "namespace-guard";
import { CONFUSABLE_WEIGHTS } from "namespace-guard/confusable-weights";
areConfusable("ᅵ", "丨", { weights: CONFUSABLE_WEIGHTS }); // true
detectCrossScriptRisk("ᅵ丨", { weights: CONFUSABLE_WEIGHTS }); // high risk

LLM Pipeline Preprocessing

Confusable characters fragment into multi-byte BPE tokens, inflating API costs up to 5.2x per document. The model reads them correctly. The invoice does not care. We tested this across 4 models and 130+ API calls, and we call the result Denial of Spend.

Denial of Spend: the attack vector

Confusable characters are pixel-identical to Latin letters but encode as multi-byte BPE tokens. A 95-line contract that costs 881 tokens in clean ASCII costs 4,567 tokens when flooded with confusables: 5.2x the API bill. The model reads it correctly. The invoice does not care.

Unlike volumetric DDoS, there is no traffic spike. The requests are normal-sized HTTP payloads, one document each, from legitimate accounts. The inflation is invisible until the invoice arrives. A contract review SaaS processing thousands of documents per day could see its costs quintupled.

Tested against frontier models

GPT-5.2, Claude Sonnet 4.6, GPT-5.2-instant, and Claude Haiku 4.5. Eight attack types: single-character substitution, novel confusables absent from any standard, meaning-flip attacks that reverse legal clauses, a 57% character flood, and medical text with 28 safety-critical negations.

Zero meaning flips across all variants. Every substituted clause was correctly interpreted in every run. But token costs inflated 1.1x to 5.2x depending on density, and the billing attack scales with document length.

Variant GPT-5.2 Sonnet Cost
Clean 881 975 1.0x
Flip (1.5% of doc) 961 1,070 ~1.1x
Flood (57% of chars) 4,567 5,209 ~5.2x

The Denial of Spend defence

canonicalise() recovered every substituted term across all 12 attack variants. Zero confusable findings remained after canonicalisation. The 5.2x billing multiplier drops to 1.0x because multi-byte confusable tokens map back to single-byte ASCII. Processing a 10,000-character document takes under 1ms.

isClean() gates in microseconds: fast boolean, short-circuits on the first confusable found. scan() returns per-character detail with codepoints, scripts, and risk level for audit. Use strategy: "all" for known-Latin documents (English contracts, medical text).

import { canonicalise, scan, isClean } from "namespace-guard";

// Gate: fast boolean check (microseconds)
if (!isClean(document)) {
  const report = scan(document); // per-character detail + risk level
}

// Fix: rewrite confusables to Latin (5.2x tokens -> 1.0x)
const clean = canonicalise(document, { strategy: "all" });

Profanity & Evasion

Zero-dependency by default. Plug in any external moderation library when you need it.

Built-in profanity validator

createProfanityValidator catches common obfuscation while staying conservative by default: variantProfile: "balanced" and minSubstringLength: 4.

const validator = createProfanityValidator(["word"], { mode: "evasion" });
// catches common evasion like leet and confusables
// avoids broad matches on very short tokens

Use a default list or your own

For quick setup, use namespace-guard/profanity-en. Or keep bring-your-own moderation with createPredicateValidator.

// one-liner default list
validators: [createEnglishProfanityValidator()]
// external library path also supported
validators: [createPredicateValidator((id) => profanity.exists(id))]

Increase strictness carefully

Only switch to variantProfile: "aggressive" or shorter substring lengths after checking false positives.

// safer default
{ variantProfile: "balanced", minSubstringLength: 4 }
// stricter mode
{ variantProfile: "aggressive", minSubstringLength: 3 }

Anti-spoofing pipeline

Three clear steps for Unicode spoof detection.

Stage 1 NFKC normalize Collapses compatibility forms
Stage 2 Confusable map 4,174 scored character pairs
Stage 3 Mixed-script reject 22+ script families

NFKC-aware confusable map

Unicode's confusables.txt and NFKC normalisation disagree on 31 characters. namespace-guard ships two maps: CONFUSABLE_MAP for NFKC-first pipelines (the common case) and CONFUSABLE_MAP_FULL (~1,400 entries) for raw-input pipelines that skip normalisation.

// TR39 says Long S (‹ſ›) → “f”
// NFKC says Long S → “s” ← correct
// TR39 says Math Bold I (‹𝐈›) → “l”
// NFKC says Math Bold I → “i” ← correct

22+ script families

Latin, Cyrillic, Greek, Armenian, Hebrew, Arabic, Devanagari, Thai, Georgian, Ethiopic, Hangul, Han, and more. Mixed-script detection blocks identifiers that combine characters from different scripts.

аdmin Cyrillic “a” + Latin
հello Armenian “h” + Latin
ɑdmin IPA “a” + Latin
Ꭺdmin Cherokee “A” + Latin

Reproducible from source

Generated from Unicode's official confusables.txt, not hand-curated. Regenerate for new Unicode versions with a single command. NFKC conflicts are excluded automatically.

Defence in depth

The default slug pattern already blocks non-ASCII. The confusable map and mixed-script detection add a second layer for apps that allow Unicode identifiers, or as a safety net if a format regex is misconfigured.