How it works
Format Validation
Validates identifiers against a configurable regex pattern. Default: 2–30 lowercase alphanumeric characters and hyphens.
Reserved Names
Blocks system routes, brand names, and other reserved identifiers. Supports categorised lists with custom error messages.
Collision Detection
Checks multiple database tables in parallel. Users, organisations, teams - one call covers them all, with alternative suggestions.
Anti-Spoofing
Detects visually deceptive Unicode across 22+ scripts, including cross-script confusable pairs between non-Latin scripts that no existing standard covers.
"ᅵ丨" → Hangul/Han cross-script (ray distance 0.004)
Profanity & Evasion
Blocks common profanity obfuscation (Unicode confusables + substitutions) with conservative defaults, and supports external moderation libraries via validators.
Invisible Character Guard
Catches zero-width joiners, direction overrides, and other invisible Unicode characters that make identifiers look clean while hiding extra bytes.
Domain Spoof Detection
Detects realistic IDN homograph attacks. Only flags single-script spoofs that could actually be registered as domains — mixed-script labels are excluded because ICANN registrars reject them.
Research Behind The Pipeline
We measured 4,174 character pairs across 245 system fonts using vector-outline raycasting (RaySpace) and discovered 3,525 cross-script confusables that no standard covers.
Visual measurement pipeline
- 4,174 confusable pairs scored across 245 system fonts via vector-outline raycasting (RaySpace)
- Each pair carries a
dangerscore (0–1) representing geometric similarity across fonts. The shipped dataset uses a 0.5 floor; for higher precision, filter at 0.7 (574 pairs) - 3,525 novel cross-script pairs between non-Latin scripts: Hangul/Han, Cyrillic/Greek, Greek/Han, Cyrillic/Arabic, and more
- Glyph-reuse detection to distinguish font fallback from true visual similarity
- Full dataset published as confusable-vision (CC-BY-4.0)
Normalization composability
- 31 documented NFKC vs TR39 divergence vectors, shipped as
nfkc-tr39-divergence-v1 - Two production map modes:
CONFUSABLE_MAP(NFKC-first) andCONFUSABLE_MAP_FULL(TR39/raw-input) - Published artifacts: composability vectors + benchmark corpus (
confusable-bench.v1) - Accepted into Unicode public review (PRI #540)
RaySpace methodology · Launch write-up · Technical details · Benchmark corpus · confusable-vision · Unicode PRI #540
Built-in Profiles
Pick a profile, override what you need. Profiles are defaults, not lock-in.
consumer-handle
For public usernames and handles. Strict anti-impersonation defaults.
- 2-30 chars
- lowercase letters, numbers, hyphens
- purely numeric names blocked
org-slug
For teams, workspaces, and company slugs. Slightly longer and conservative by default.
- 2-40 chars
- lowercase letters, numbers, hyphens
- purely numeric names blocked
developer-id
For technical IDs and package-style names. Allows more length and numeric-only IDs.
- 2-50 chars
- lowercase letters, numbers, hyphens
- purely numeric names allowed
Override example
Most setups only need one or two overrides.
allowPurelyNumeric: true,
}, adapter);
Scope
What you get and where the boundary is.
What it does
- Normalizes and validates slug format
- Checks collisions across multiple data sources
- Blocks reserved names
- Scores and enforces Unicode confusable risk
- Prevents confusable-driven token inflation in LLM pipelines (
isClean/canonicalise/scan) - Flags invisible and direction-control characters
- Lets you plug in moderation or policy validators
Out of scope
- No bundled third-party moderation datasets (bring your own or use the built-in list)
- Not a replacement for account security (2FA, verification, abuse ops)
- No guarantee of perfect abuse detection
Try it out
Type a slug to run claimability checks against simulated data, including reserved names, collisions, spoofing risk, invisible-character checks, and optional profanity evasion checks.
Simulated database
Users
Organisations
Reserved (system)
Reserved (brand)
Moderation list (demo)
Integration (9 adapters)
import { createNamespaceGuardWithProfile } from "namespace-guard";
import { createPrismaAdapter } from "namespace-guard/adapters/prisma";
import { PrismaClient } from "@prisma/client";
const prisma = new PrismaClient();
const guard = createNamespaceGuardWithProfile("consumer-handle", {
reserved: ["admin", "api", "settings"],
sources: [
{ name: "user", column: "handle" },
{ name: "organization", column: "slug" },
],
}, createPrismaAdapter(prisma));
await guard.assertClaimable("acme-corp");
// throws if invalid, reserved, taken, or too confusable
import { createNamespaceGuard } from "namespace-guard";
import { createDrizzleAdapter } from "namespace-guard/adapters/drizzle";
import { eq } from "drizzle-orm";
import { db } from "./db";
import { users, organizations } from "./schema";
const adapter = createDrizzleAdapter(db, { users, organizations }, eq);
const guard = createNamespaceGuard({
reserved: ["admin", "api", "settings"],
sources: [
{ name: "users", column: "handle" },
{ name: "organizations", column: "slug" },
],
suggest: { strategy: ["sequential", "random-digits"], max: 3 },
}, adapter);
const result = await guard.check("acme-corp");
if (result.available) {
// Safe to create
} else {
console.log(result.message); // "That name is already in use."
console.log(result.suggestions); // ["acme-corp-1", "acme-corp-4821", "acme-corp1"]
}
import { createNamespaceGuard } from "namespace-guard";
import { createKyselyAdapter } from "namespace-guard/adapters/kysely";
import { Kysely, PostgresDialect } from "kysely";
const db = new Kysely({ dialect: new PostgresDialect({ pool }) });
const guard = createNamespaceGuard({
reserved: ["admin", "api", "settings"],
sources: [
{ name: "users", column: "handle" },
{ name: "organizations", column: "slug" },
],
suggest: { strategy: ["sequential", "random-digits"], max: 3 },
}, createKyselyAdapter(db));
const result = await guard.check("acme-corp");
if (result.available) {
// Safe to create
} else {
console.log(result.message); // "That name is already in use."
console.log(result.suggestions); // ["acme-corp-1", "acme-corp-4821", "acme-corp1"]
}
import { createNamespaceGuard } from "namespace-guard";
import { createKnexAdapter } from "namespace-guard/adapters/knex";
import Knex from "knex";
const knex = Knex({ client: "pg", connection: process.env.DATABASE_URL });
const guard = createNamespaceGuard({
reserved: ["admin", "api", "settings"],
sources: [
{ name: "users", column: "handle" },
{ name: "organizations", column: "slug" },
],
suggest: { strategy: ["sequential", "random-digits"], max: 3 },
}, createKnexAdapter(knex));
const result = await guard.check("acme-corp");
if (result.available) {
// Safe to create
} else {
console.log(result.message); // "That name is already in use."
console.log(result.suggestions); // ["acme-corp-1", "acme-corp-4821", "acme-corp1"]
}
import { createNamespaceGuard } from "namespace-guard";
import { createTypeORMAdapter } from "namespace-guard/adapters/typeorm";
import { DataSource } from "typeorm";
import { User, Organization } from "./entities";
const dataSource = new DataSource({ /* ... */ });
const adapter = createTypeORMAdapter(dataSource, { user: User, organization: Organization });
const guard = createNamespaceGuard({
reserved: ["admin", "api", "settings"],
sources: [
{ name: "user", column: "handle" },
{ name: "organization", column: "slug" },
],
suggest: { strategy: ["sequential", "random-digits"], max: 3 },
}, adapter);
const result = await guard.check("acme-corp");
import { createNamespaceGuard } from "namespace-guard";
import { createMikroORMAdapter } from "namespace-guard/adapters/mikro-orm";
import { MikroORM } from "@mikro-orm/core";
import { User, Organization } from "./entities";
const orm = await MikroORM.init(config);
const adapter = createMikroORMAdapter(orm.em, { user: User, organization: Organization });
const guard = createNamespaceGuard({
reserved: ["admin", "api", "settings"],
sources: [
{ name: "user", column: "handle" },
{ name: "organization", column: "slug" },
],
suggest: { strategy: ["sequential", "random-digits"], max: 3 },
}, adapter);
const result = await guard.check("acme-corp");
import { createNamespaceGuard } from "namespace-guard";
import { createSequelizeAdapter } from "namespace-guard/adapters/sequelize";
import { User, Organization } from "./models";
const adapter = createSequelizeAdapter({ user: User, organization: Organization });
const guard = createNamespaceGuard({
reserved: ["admin", "api", "settings"],
sources: [
{ name: "user", column: "handle" },
{ name: "organization", column: "slug" },
],
suggest: { strategy: ["sequential", "random-digits"], max: 3 },
}, adapter);
const result = await guard.check("acme-corp");
import { createNamespaceGuard } from "namespace-guard";
import { createMongooseAdapter } from "namespace-guard/adapters/mongoose";
import { User, Organization } from "./models";
const adapter = createMongooseAdapter({ user: User, organization: Organization });
const guard = createNamespaceGuard({
reserved: ["admin", "api", "settings"],
sources: [
{ name: "user", column: "handle", idColumn: "_id" },
{ name: "organization", column: "slug", idColumn: "_id" },
],
suggest: { strategy: ["sequential", "random-digits"], max: 3 },
}, adapter);
const result = await guard.check("acme-corp");
import { createNamespaceGuard } from "namespace-guard";
import { createRawAdapter } from "namespace-guard/adapters/raw";
import { Pool } from "pg";
const pool = new Pool();
const adapter = createRawAdapter((sql, params) => pool.query(sql, params));
const guard = createNamespaceGuard({
reserved: ["admin", "api", "settings"],
sources: [
{ name: "users", column: "handle" },
{ name: "organizations", column: "slug" },
],
suggest: { strategy: ["sequential", "random-digits"], max: 3 },
}, adapter);
const result = await guard.check("acme-corp");
if (result.available) {
// Safe to create
} else {
console.log(result.message); // "That name is already in use."
console.log(result.suggestions); // ["acme-corp-1", "acme-corp-4821", "acme-corp1"]
}
CLI workflow
From red-team attacks to production CI gates in a few commands.
1. Red-team with attack-gen
Generate realistic variants to test your policy. Default mode is evasion (Unicode confusables + substitutions). Use --mode impersonation when you only want Unicode spoofing analysis.
$ npx namespace-guard attack-gen shit --mode evasion --json
2. Recommend in one command
Get recommended warn/block thresholds and a suggested CI command from your dataset.
# emits recommended risk config + CI gate command
3. Tune thresholds for your domain
Set the relative cost of blocking a legitimate user vs letting a bad actor through. The calibrator picks optimal warn/block thresholds from your data.
--cost-block-benign 8 --cost-allow-malicious 12 --malicious-prior 0.05
4. Audit canonical data before migration
Detect canonical collisions in exported records before adding database unique constraints on canonical columns.
// checks identifier fields + optional stored canonical fields
5. Compare map behaviour
See how results differ between the NFKC-filtered and full confusable maps, with a built-in 31-vector regression baseline.
// dataset: builtin:composability-vectors (31 vectors)
// includes actionFlips, averageScoreDelta, maxAbsScoreDelta
6. Enforce in CI
Fail pull requests when changes exceed the limits you set.
--max-action-flips 29 --max-average-score-delta 95 --max-abs-score-delta 100
Advanced API primitives
Low-level helpers for custom scoring, pairwise checks, and cross-script risk analysis.
Explainable pairwise checks
skeleton() for fast binary checks. confusableDistance() when you need graded similarity with explainable steps.
skeleton("pa\u0443pal"); // "paypal"
areConfusable("paypal", "pa\u0443pal"); // true
confusableDistance("paypal", "pa\u0443pal"); // similarity + chainDepth + steps
confusableDistance("paypal", "pa\u0443pal", { weights: CONFUSABLE_WEIGHTS }); // measured visual costs
Measured visual weights + cross-script detection
4,174 confusable pairs measured across 245 system fonts using vector-outline raycasting (RaySpace), including 3,525 cross-script pairs between non-Latin scripts (Hangul/Han, Thai/Devanagari, Cyrillic/Greek, and more). The world's first cross-script confusable dataset.
import { CONFUSABLE_WEIGHTS } from "namespace-guard/confusable-weights";
areConfusable("ᅵ", "丨", { weights: CONFUSABLE_WEIGHTS }); // true
detectCrossScriptRisk("ᅵ丨", { weights: CONFUSABLE_WEIGHTS }); // high risk
confusable-vision (CC-BY-4.0 data)
LLM Pipeline Preprocessing
Confusable characters fragment into multi-byte BPE tokens, inflating API costs up to 5.2x per document. The model reads them correctly. The invoice does not care. We tested this across 4 models and 130+ API calls, and we call the result Denial of Spend.
Denial of Spend: the attack vector
Confusable characters are pixel-identical to Latin letters but encode as multi-byte BPE tokens. A 95-line contract that costs 881 tokens in clean ASCII costs 4,567 tokens when flooded with confusables: 5.2x the API bill. The model reads it correctly. The invoice does not care.
Unlike volumetric DDoS, there is no traffic spike. The requests are normal-sized HTTP payloads, one document each, from legitimate accounts. The inflation is invisible until the invoice arrives. A contract review SaaS processing thousands of documents per day could see its costs quintupled.
Tested against frontier models
GPT-5.2, Claude Sonnet 4.6, GPT-5.2-instant, and Claude Haiku 4.5. Eight attack types: single-character substitution, novel confusables absent from any standard, meaning-flip attacks that reverse legal clauses, a 57% character flood, and medical text with 28 safety-critical negations.
Zero meaning flips across all variants. Every substituted clause was correctly interpreted in every run. But token costs inflated 1.1x to 5.2x depending on density, and the billing attack scales with document length.
| Variant | GPT-5.2 | Sonnet | Cost |
|---|---|---|---|
| Clean | 881 | 975 | 1.0x |
| Flip (1.5% of doc) | 961 | 1,070 | ~1.1x |
| Flood (57% of chars) | 4,567 | 5,209 | ~5.2x |
The Denial of Spend defence
canonicalise() recovered every substituted term across all 12 attack variants. Zero confusable findings remained after canonicalisation. The 5.2x billing multiplier drops to 1.0x because multi-byte confusable tokens map back to single-byte ASCII. Processing a 10,000-character document takes under 1ms.
isClean() gates in microseconds: fast boolean, short-circuits on the first confusable found. scan() returns per-character detail with codepoints, scripts, and risk level for audit. Use strategy: "all" for known-Latin documents (English contracts, medical text).
// Gate: fast boolean check (microseconds)
if (!isClean(document)) {
const report = scan(document); // per-character detail + risk level
}
// Fix: rewrite confusables to Latin (5.2x tokens -> 1.0x)
const clean = canonicalise(document, { strategy: "all" });
Profanity & Evasion
Zero-dependency by default. Plug in any external moderation library when you need it.
Built-in profanity validator
createProfanityValidator catches common obfuscation while staying conservative by default: variantProfile: "balanced" and minSubstringLength: 4.
// catches common evasion like leet and confusables
// avoids broad matches on very short tokens
Use a default list or your own
For quick setup, use namespace-guard/profanity-en. Or keep bring-your-own moderation with createPredicateValidator.
validators: [createEnglishProfanityValidator()]
// external library path also supported
validators: [createPredicateValidator((id) => profanity.exists(id))]
Increase strictness carefully
Only switch to variantProfile: "aggressive" or shorter substring lengths after checking false positives.
{ variantProfile: "balanced", minSubstringLength: 4 }
// stricter mode
{ variantProfile: "aggressive", minSubstringLength: 3 }
Anti-spoofing pipeline
Three clear steps for Unicode spoof detection.
NFKC-aware confusable map
Unicode's confusables.txt and NFKC normalisation disagree on 31 characters. namespace-guard ships two maps: CONFUSABLE_MAP for NFKC-first pipelines (the common case) and CONFUSABLE_MAP_FULL (~1,400 entries) for raw-input pipelines that skip normalisation.
// NFKC says Long S → “s” ← correct
// TR39 says Math Bold I (‹𝐈›) → “l”
// NFKC says Math Bold I → “i” ← correct
22+ script families
Latin, Cyrillic, Greek, Armenian, Hebrew, Arabic, Devanagari, Thai, Georgian, Ethiopic, Hangul, Han, and more. Mixed-script detection blocks identifiers that combine characters from different scripts.
հello Armenian “h” + Latin
ɑdmin IPA “a” + Latin
Ꭺdmin Cherokee “A” + Latin
Reproducible from source
Generated from Unicode's official confusables.txt, not hand-curated. Regenerate for new Unicode versions with a single command. NFKC conflicts are excluded automatically.
Defence in depth
The default slug pattern already blocks non-ASCII. The confusable map and mixed-script detection add a second layer for apps that allow Unicode identifiers, or as a safety net if a format regex is misconfigured.