Skip to content

API Reference

Suzume Class

The main class for Japanese tokenization.

Suzume.create(options?)

Creates a new Suzume instance.

typescript
static async create(options?: SuzumeOptions & { wasmPath?: string }): Promise<Suzume>

SuzumeOptions:

OptionTypeDefaultDescription
wasmPathstringundefinedCustom path to WASM file
preserveVubooleantruePreserve ヴ (don't normalize to ビ etc.)
preserveCasebooleantruePreserve case (don't lowercase ASCII)
preserveSymbolsbooleanfalsePreserve symbols/emoji in output

Returns: Promise<Suzume>

Example:

typescript
// Default usage
const suzume = await Suzume.create()

// Custom WASM path
const suzume = await Suzume.create({ wasmPath: '/path/to/suzume.wasm' })

// With options
const suzume = await Suzume.create({
  preserveSymbols: true,
  preserveVu: false,
})

analyze(text)

Analyzes Japanese text and returns an array of tokens.

typescript
analyze(text: string): Morpheme[]
ParameterTypeDescription
textstringJapanese text to analyze

Returns: Morpheme[]

Example:

typescript
const result = suzume.analyze('東京に行きました')

// Result:
// [
//   { surface: '東京', pos: 'NOUN', posJa: '名詞', ... },
//   { surface: 'に', pos: 'PARTICLE', posJa: '助詞', ... },
//   { surface: '行き', pos: 'VERB', posJa: '動詞', ... },
//   { surface: 'まし', pos: 'AUX', posJa: '助動詞', ... },
//   { surface: 'た', pos: 'AUX', posJa: '助動詞', ... }
// ]

generateTags(text, options?)

Extracts meaningful tags from text. Useful for search indexing, classification, and content analysis. By default extracts content words (nouns, verbs, adjectives, adverbs) while filtering out particles, auxiliaries, formal nouns, and low-information words.

typescript
generateTags(text: string, options?: TagOptions): Tag[]

Tag:

PropertyTypeDescription
tagstringTag text (surface or lemma depending on useLemma)
posstringPart of speech (Noun, Verb, Adjective, Adverb, etc.)
ParameterTypeDescription
textstringJapanese text to extract tags from
optionsTagOptionsOptional tag generation settings

TagOptions:

OptionTypeDefaultDescription
pos('noun' | 'verb' | 'adjective' | 'adverb')[]allPOS categories to include
excludeBasicbooleanfalseExclude basic verbs/words with hiragana-only lemma
useLemmabooleantrueUse lemma (dictionary form) instead of surface form
minLengthnumber2Minimum tag length in characters
maxTagsnumber0Maximum number of tags (0 = unlimited)

Returns: Tag[]

Examples:

typescript
// Basic usage
const tags = suzume.generateTags('東京スカイツリーに行きました')
// [{ tag: '東京', pos: 'Noun' },
//  { tag: 'スカイツリー', pos: 'Noun' },
//  { tag: '行く', pos: 'Verb' }]

// Nouns only
const nouns = suzume.generateTags('美しい花が静かに咲いている', {
  pos: ['noun']
})
// [{ tag: '花', pos: 'Noun' }]

// Exclude basic verbs (hiragana-only lemma like する, いる, ある, なる...)
const tags2 = suzume.generateTags('新しいプロジェクトを開始して管理する', {
  excludeBasic: false
})
// [{ tag: '新しい', pos: 'Adjective' },
//  { tag: 'プロジェクト', pos: 'Noun' },
//  { tag: '開始', pos: 'Noun' },
//  { tag: '管理', pos: 'Noun' },
//  { tag: 'する', pos: 'Verb' }]

const tags3 = suzume.generateTags('新しいプロジェクトを開始して管理する', {
  excludeBasic: true
})
// [{ tag: '新しい', pos: 'Adjective' },
//  { tag: 'プロジェクト', pos: 'Noun' },
//  { tag: '開始', pos: 'Noun' },
//  { tag: '管理', pos: 'Noun' }]
// 'する' is excluded (lemma is hiragana-only)

// Limit results
const top3 = suzume.generateTags('東京タワーと東京スカイツリーを見学しました', {
  maxTags: 3
})
// [{ tag: '東京タワー', pos: 'Noun' },
//  { tag: '東京スカイツリー', pos: 'Noun' },
//  { tag: '見学', pos: 'Noun' }]

excludeBasic

excludeBasic: true filters out words whose lemma (dictionary form) is written entirely in hiragana. This removes common basic verbs like する, いる, ある, なる, いく, くる etc., while keeping kanji-containing verbs like 開始, 管理, 確認. Useful when you want only content-bearing tags.

Filter pipeline

The tag generator applies these filters in order:

  1. Particles — excluded (は, が, を, に, etc.)
  2. Auxiliaries — excluded (です, ます, た, etc.)
  3. Formal nouns — excluded (こと, もの, ため, etc.)
  4. Low-info words — excluded (words marked as low information)
  5. Conjunctions — always excluded
  6. Symbols — always excluded
  7. POS filter — if pos is set, only matching categories pass
  8. Basic words — if excludeBasic: true, words with hiragana-only lemma are excluded
  9. Min length — tags shorter than minLength characters are excluded
  10. Deduplication — duplicate tags are removed

loadUserDictionary(data)

Loads custom words into the analyzer at runtime.

typescript
loadUserDictionary(data: string): boolean
ParameterTypeDescription
datastringDictionary entries in CSV format

Returns: boolean - true on success

Format: surface,pos[,cost][,lemma]

Example:

typescript
// Single entry
suzume.loadUserDictionary('ChatGPT,NOUN')

// Multiple entries
suzume.loadUserDictionary(`
ChatGPT,NOUN
スカイツリー,NOUN
DeepL,NOUN
`)

// With optional fields
suzume.loadUserDictionary('走る,VERB,5000,走る')

version

Gets the Suzume version string.

typescript
get version(): string

Example:

typescript
console.log(suzume.version) // "0.1.0"

loadBinaryDictionary(data)

Loads a pre-compiled binary dictionary (.dic format) at runtime.

typescript
loadBinaryDictionary(data: Uint8Array): boolean
ParameterTypeDescription
dataUint8ArrayBinary dictionary data (.dic format)

Returns: boolean - true on success

Example:

typescript
// Load from file (Node.js)
import { readFile } from 'fs/promises'
const dictData = new Uint8Array(await readFile('custom.dic'))
suzume.loadBinaryDictionary(dictData)

// Load from URL (Browser)
const response = await fetch('/dictionaries/custom.dic')
const dictData = new Uint8Array(await response.arrayBuffer())
suzume.loadBinaryDictionary(dictData)

Binary vs CSV dictionaries

Binary dictionaries (.dic) load faster than CSV format. Use the suzume-cli dict compile command to compile a TSV dictionary into binary format.


destroy()

Frees WASM memory and resources. Call this when done using the instance.

typescript
destroy(): void

Automatic cleanup via FinalizationRegistry

Suzume registers a FinalizationRegistry callback, so resources will be freed automatically when the instance is garbage collected. However, calling destroy() explicitly is recommended for immediate cleanup — especially in Node.js where GC timing is unpredictable and WASM memory is not visible to the GC's heap pressure heuristics.

Example:

typescript
const suzume = await Suzume.create()
// ... use suzume ...
suzume.destroy() // Free resources immediately

Morpheme Interface

Represents a single linguistic token.

typescript
interface Morpheme {
  surface: string      // Surface form (as appears in text)
  pos: string          // Part of speech (English)
  baseForm: string     // Base/dictionary form
  posJa: string        // Part of speech (Japanese)
  conjType: string | null  // Conjugation type
  conjForm: string | null  // Conjugation form
  extendedPos: string  // Extended POS subcategory (English)
}

Properties

PropertyTypeDescriptionExample
surfacestringSurface form as it appears in text"食べ"
posstringPart of speech in English"VERB"
baseFormstringDictionary/base form"食べる"
posJastringPart of speech in Japanese"動詞"
conjTypestring | nullConjugation type (for verbs/adjectives)"一段"
conjFormstring | nullConjugation form"連用形"
extendedPosstringExtended POS subcategory"VerbRenyokei"

Part of Speech Values

posposJaDescription
NOUN名詞Nouns
VERB動詞Verbs
ADJ形容詞Adjectives
ADV副詞Adverbs
PARTICLE助詞Particles
AUX助動詞Auxiliary verbs
PRON代名詞Pronouns
DET連体詞Adnominal adjectives
CONJ接続詞Conjunctions
INTJ感動詞Interjections
PREFIX接頭辞Prefixes
SUFFIX接尾辞Suffixes
SYMBOL記号Symbols
OTHERその他Other/Unknown

Extended POS Values

The extendedPos property provides fine-grained subcategories beyond the basic pos tag. This is useful when you need to distinguish conjugation forms, particle roles, auxiliary functions, or noun subtypes.

Verb forms:

ValueDescriptionExample
VerbShuushikei終止形: dictionary form食べる, 書く
VerbRenyokei連用形: continuative form食べ, 書き
VerbMizenkei未然形: irrealis form食べ-, 書か-
VerbOnbinkei音便形: euphonic change書い-, 泳い-
VerbTeFormて形食べて, 書いて
VerbKateikei仮定形: conditional食べれば, 書けば
VerbMeireikei命令形: imperative食べろ, 書け
VerbRentaikei連体形: attributive(same as shuushi in modern Japanese)
VerbTaFormた形: past食べた, 書いた
VerbTaraFormたら形: conditional past食べたら, 書いたら

Adjective forms:

ValueDescriptionExample
AdjBasic終止形: basic form美しい, 高い
AdjRenyokei連用形(く): adverbial美しく, 高く
AdjStem語幹: stem (ガル接続)美し-, 高-
AdjKattかっ形: past stem美しかっ-, 高かっ-
AdjKeFormけ形: conditional stem美しけれ-
AdjNaAdjナ形容詞: na-adjective stem静か, 綺麗

Auxiliaries:

ValueDescriptionExample
AuxTenseTa過去: past tenseた, だ
AuxTenseMasu丁寧: politeます, まし, ませ
AuxNegativeNai否定ない, なかっ
AuxNegativeNu否定(古語)ぬ, ん
AuxDesireTai願望たい, たかっ
AuxVolitional意志/推量う, よう
AuxPassive受身れる, られる
AuxCausative使役せる, させる
AuxPotential可能れる, られる
AuxAspectIru継続いる, い, おる
AuxAspectShimau完了しまう, ちゃう
AuxAspectOku準備おく, とく
AuxAspectMiru試行みる
AuxAspectIku進行方向いく
AuxAspectKuru接近くる
AuxAppearanceSou様態そう
AuxConjectureRashii推定らしい
AuxConjectureMitai推定みたい
AuxCopulaDa断定だ, で, な, なら
AuxCopulaDesu丁寧断定です, でし
AuxHonorific尊敬れる, られる
AuxGozaru丁重ござる
AuxExcessive過度すぎる
AuxGaruガル接続がる

Particles:

ValueDescriptionExample
ParticleCase格助詞が, を, に, で, へ, と, から, まで, より
ParticleTopic係助詞は, も
ParticleFinal終助詞ね, よ, わ, な, か
ParticleConj接続助詞て, で, ば, ながら, たり, けど
ParticleQuote引用助詞と(引用)
ParticleAdverbial副助詞ばかり, だけ, ほど, しか, など
ParticleNo準体助詞
ParticleBinding係結びこそ, さえ, すら

Nouns:

ValueDescriptionExample
Noun普通名詞東京, 天気
NounFormal形式名詞こと, もの, ところ, わけ
NounVerbal連用形転成名詞読み, 書き
NounProper固有名詞
NounProperFamily固有名詞(姓)田中, 鈴木
NounProperGiven固有名詞(名)太郎
NounNumber数詞一, 100

Other:

ValueDescription
Pronoun代名詞
PronounInterrogative疑問詞 (何, 誰, どこ)
Adverb副詞
AdverbQuotative引用副詞 (そう, こう)
Conjunction接続詞
Determiner連体詞
Prefix接頭辞
Suffix接尾辞
Symbol記号
Interjection感動詞
Otherその他
Unknown不明

Error Handling

typescript
try {
  const suzume = await Suzume.create()
  const result = suzume.analyze('テスト')
} catch (error) {
  if (error.message === 'Failed to create Suzume instance') {
    console.error('WASM initialization failed')
  }
}

Memory Management

Suzume uses WebAssembly which allocates memory outside the JavaScript heap. A FinalizationRegistry ensures cleanup on GC, but explicit destroy() is strongly recommended — especially in Node.js where GC timing is unpredictable and WASM memory is invisible to the GC's heap pressure heuristics.

typescript
// Good: Clean up when done
const suzume = await Suzume.create()
try {
  const result = suzume.analyze(text)
  // process result...
} finally {
  suzume.destroy()
}

// For long-running apps: reuse the instance
class MyApp {
  private suzume: Suzume | null = null

  async init() {
    this.suzume = await Suzume.create()
  }

  analyze(text: string) {
    return this.suzume?.analyze(text) ?? []
  }

  dispose() {
    this.suzume?.destroy()
    this.suzume = null
  }
}

Node.js

In Node.js, WASM memory is not tracked by V8's heap size. If you create many instances without calling destroy(), memory usage will grow even though the GC sees no pressure. Always call destroy() explicitly in server-side code.