Name: Suzume
Author: libraz

Question 1

What is Suzume?

Accepted Answer

Suzume is a lightweight, feature-driven Japanese tokenizer that runs on WebAssembly. Unlike dictionary-based analyzers like MeCab, it works without large dictionary files and is robust to unknown words. It runs in browsers, Node.js, Deno, and Bun.

Question 2

How does Suzume handle unknown words?

Accepted Answer

Suzume generates candidates from character patterns (kanji sequences, katakana sequences, alphanumeric compounds) and evaluates them alongside dictionary entries using Viterbi algorithm. This makes it robust to neologisms and domain-specific terms.

Question 3

Can I use Suzume in the browser?

Accepted Answer

Yes, Suzume runs entirely in the browser via WebAssembly. No server required. You can load it from npm or directly from a CDN like esm.sh. The entire package is under 450KB gzipped.

Question 4

How do I add custom words to Suzume?

Accepted Answer

Use loadUserDictionary() to add custom words at runtime. Format: "word,pos" (e.g., "ChatGPT,noun"). You can add brand names, technical terms, or domain-specific vocabulary without rebuilding the dictionary.

Question 5

What is the difference between Suzume and MeCab?

Accepted Answer

MeCab requires large dictionary files (50MB+) and server-side installation. Suzume uses feature-based analysis with a minimal dictionary, runs on WebAssembly in the browser, and handles unknown words gracefully. Choose Suzume for client-side processing without server infrastructure.

Question 6

How is Suzume different from kuromoji.js?

Accepted Answer

kuromoji.js requires downloading a 20MB+ dictionary on first load, causing slow initial page loads. Suzume is under 450KB gzipped and loads instantly. Suzume also handles unknown words better and has a simpler API.

Question 7

Can I use Suzume for SEO keyword extraction?

Accepted Answer

Yes, Suzume can extract nouns and compound words from Japanese text, making it ideal for auto-tagging blog posts, generating hashtags, or building keyword analysis tools - all without server infrastructure.

Question 8

Is Suzume suitable for production use?

Accepted Answer

Yes, Suzume is production-ready. It is compiled from C++ to WebAssembly for near-native performance, includes full TypeScript support, and works in all modern browsers, Node.js, Deno, and Bun.

Question 9

Does Suzume work offline?

Accepted Answer

Yes, once loaded, Suzume works completely offline. All processing happens locally in the browser or runtime. No API calls or internet connection required after initial load.

Question 10

How do I install Suzume?

Accepted Answer

Install via npm: npm install @libraz/suzume. Then import and use: const { Suzume } = await import("@libraz/suzume"); const suzume = await Suzume.create(); const result = suzume.analyze("日本語テキスト");

Layer	Framework	Files	Description
C++ Unit/Integration	Google Test 1.12.1	31 files	Core library, dictionary, grammar, normalization
Data-Driven	JSON + Google Test	77 JSON files	Tokenization correctness (auto-discovered)
WASM	Vitest	3 files	JS/C API, memory layout, struct compatibility
CLI	Built-in	`test` command	Single/batch test, benchmarks

Category	Files	Description
`basic.json`	1	Basic tokenization, single words
`adjective*.json`	5	i-adjectives, na-adjectives, compounds
`verb*.json`	10	Ichidan, godan, suru, passive, causative
`particle*.json`	7	Case, topic, binding particles
`usecase_*.json`	9	Real-world texts: news, business, casual
`pattern_*.json`	4	Linguistic patterns

Target	Description
`make test`	Build dictionaries + run all C++ tests
`make build`	Build the project
`make dict`	Build dictionaries only
`make wasm-test`	Build WASM + run WASM tests
`make format`	Format C++ code with clang-format
`make format-check`	Check C++ code formatting

Testing Guide

Test Architecture

Running Tests

C++ Tests

WASM Tests

CLI Test Command

Adding Tests

Data-Driven Tokenization Tests (Recommended)

Optional Fields

Existing Test Files

C++ Unit Tests

WASM Tests

CLI Test Files (TSV)

Benchmarks

Debug Builds

With Sanitizers

With Coverage

CI

Makefile Targets

Testing Guide ​

Test Architecture ​

Running Tests ​

C++ Tests ​

WASM Tests ​

CLI Test Command ​

Adding Tests ​

Data-Driven Tokenization Tests (Recommended) ​

Optional Fields ​

Existing Test Files ​

C++ Unit Tests ​

WASM Tests ​

CLI Test Files (TSV) ​

Benchmarks ​

Debug Builds ​

With Sanitizers ​

With Coverage ​

CI ​

Makefile Targets ​

Testing Guide

Test Architecture

Running Tests

C++ Tests

WASM Tests

CLI Test Command

Adding Tests

Data-Driven Tokenization Tests (Recommended)

Optional Fields

Existing Test Files

C++ Unit Tests

WASM Tests

CLI Test Files (TSV)

Benchmarks

Debug Builds

With Sanitizers

With Coverage

CI

Makefile Targets