User Dictionary
Add custom words to improve analysis for your domain.
Runtime Loading
Load dictionary entries at runtime using loadUserDictionary():
const suzume = await Suzume.create()
// Add a single word
suzume.loadUserDictionary('ChatGPT,NOUN')
// Add multiple words
suzume.loadUserDictionary(`
スカイツリー,NOUN
ポケモン,NOUN
DeepL,NOUN
`)Format
Basic Format
surface,pos| Field | Required | Description |
|---|---|---|
surface | Yes | The word as it appears in text |
pos | Yes | Part of speech |
Full Format
surface,pos,cost,lemma| Field | Required | Description |
|---|---|---|
surface | Yes | The word as it appears in text |
pos | Yes | Part of speech |
cost | No | Word cost (lower = more likely to be selected) |
lemma | No | Base/dictionary form |
Part of Speech Values
| Value | Description | Japanese |
|---|---|---|
NOUN | Nouns, proper nouns | 名詞 |
VERB | Verbs | 動詞 |
ADJ | Adjectives | 形容詞 |
ADV | Adverbs | 副詞 |
PARTICLE | Particles | 助詞 |
AUX | Auxiliary verbs | 助動詞 |
PRON | Pronouns | 代名詞 |
DET | Adnominal adjectives | 連体詞 |
CONJ | Conjunctions | 接続詞 |
INTJ | Interjections | 感動詞 |
PREFIX | Prefixes | 接頭辞 |
SUFFIX | Suffixes | 接尾辞 |
SYMBOL | Symbols | 記号 |
Japanese POS names
You can also use Japanese POS names (e.g., 名詞, 動詞, 形容詞) instead of English values.
Examples
Tech Terms
ChatGPT,NOUN
GitHub,NOUN
TypeScript,NOUN
WebAssembly,NOUN
Kubernetes,NOUNBrand Names
スカイツリー,NOUN
ポケモン,NOUN
任天堂,NOUN
ソニー,NOUNCompound Words
形態素解析,NOUN
機械学習,NOUN
自然言語処理,NOUNVerbs with Conjugation
ググる,VERB,5000,ググる
バズる,VERB,5000,バズるCost Tuning
The cost parameter controls word selection priority:
- Lower cost = More likely to be selected
- Default cost = ~8000
- Common words = 5000-7000
- Rare words = 9000+
# Prefer "東京都" over "東京" + "都"
東京都,NOUN,5000
# Less common compound
超電磁砲,NOUN,9000Use Cases
Search Indexing
// Add domain-specific terms for better tokenization
suzume.loadUserDictionary(`
React,NOUN
Next.js,NOUN
Tailwind,NOUN
`)
const tags = suzume.generateTags('Next.jsでReactアプリを作成')
// ['Next.js', 'React', 'アプリ', '作成']Chat Applications
// Add slang and neologisms
suzume.loadUserDictionary(`
草,INTJ
ワロタ,INTJ
エモい,ADJ
`)E-commerce
// Add product names and brands
suzume.loadUserDictionary(`
iPhone,NOUN
MacBook,NOUN
AirPods,NOUN
`)Best Practices
- Keep entries minimal - Only add words that are mis-tokenized
- Use uppercase POS -
NOUNnotnoun - Test incrementally - Add a few words and verify results
- Consider compounds - Add
東京都if you want it as one token
Binary Dictionary
For faster loading, dictionaries can be pre-compiled to binary format (.dic) using the suzume-cli tool:
# Compile TSV to binary
suzume-cli dict compile user.tsv # → user.dicThen load the binary dictionary at runtime:
// Node.js
import { readFile } from 'fs/promises'
const dictData = new Uint8Array(await readFile('user.dic'))
suzume.loadBinaryDictionary(dictData)
// Browser
const response = await fetch('/dictionaries/user.dic')
const dictData = new Uint8Array(await response.arrayBuffer())
suzume.loadBinaryDictionary(dictData)Performance
Binary dictionaries load significantly faster than CSV format, making them ideal for production deployments with large custom vocabularies.
.dic Format Overview
The binary dictionary is a compact format with the following layout:
[Header (40 bytes, magic: "SZMD")]
[Double-Array Trie]
[Entry Array (12 bytes each)]
[String Pool (UTF-8)]- Double-array trie — Enables fast common-prefix lookup of surface forms (O(m) per query)
- Entry array — Each entry stores string pool offsets for surface/lemma, POS, and flags
- String pool — Concatenated, deduplicated UTF-8 strings
During compilation, verbs and adjectives are expanded into their conjugated forms, and all entries are sorted before being packed into the trie.
Persistence
Dictionary entries are stored in memory and lost when the instance is destroyed. To persist:
// Load from your storage on init
const savedDict = localStorage.getItem('myDictionary')
if (savedDict) {
suzume.loadUserDictionary(savedDict)
}
// Save when adding new words
function addWord(word: string, pos: string) {
const entry = `${word},${pos}`
suzume.loadUserDictionary(entry)
// Append to storage
const current = localStorage.getItem('myDictionary') || ''
localStorage.setItem('myDictionary', current + '\n' + entry)
}