Suzume is a lightweight, feature-driven Japanese tokenizer that runs on WebAssembly. Unlike dictionary-based analyzers like MeCab, it works without large dictionary files and is robust to unknown words. It runs in browsers, Node.js, Deno, and Bun.

How does Suzume handle unknown words?

Suzume generates candidates from character patterns (kanji sequences, katakana sequences, alphanumeric compounds) and evaluates them alongside dictionary entries using Viterbi algorithm. This makes it robust to neologisms and domain-specific terms.

Can I use Suzume in the browser?

Yes, Suzume runs entirely in the browser via WebAssembly. No server required. You can load it from npm or directly from a CDN like esm.sh. The entire package is under 450KB gzipped.

How do I add custom words to Suzume?

Use loadUserDictionary() to add custom words at runtime. Format: "word,pos" (e.g., "ChatGPT,noun"). You can add brand names, technical terms, or domain-specific vocabulary without rebuilding the dictionary.

What is the difference between Suzume and MeCab?

MeCab requires large dictionary files (50MB+) and server-side installation. Suzume uses feature-based analysis with a minimal dictionary, runs on WebAssembly in the browser, and handles unknown words gracefully. Choose Suzume for client-side processing without server infrastructure.

How is Suzume different from kuromoji.js?

kuromoji.js requires downloading a 20MB+ dictionary on first load, causing slow initial page loads. Suzume is under 450KB gzipped and loads instantly. Suzume also handles unknown words better and has a simpler API.

Can I use Suzume for SEO keyword extraction?

Yes, Suzume can extract nouns and compound words from Japanese text, making it ideal for auto-tagging blog posts, generating hashtags, or building keyword analysis tools - all without server infrastructure.

Is Suzume suitable for production use?

Yes, Suzume is production-ready. It is compiled from C++ to WebAssembly for near-native performance, includes full TypeScript support, and works in all modern browsers, Node.js, Deno, and Bun.

Does Suzume work offline?

Yes, once loaded, Suzume works completely offline. All processing happens locally in the browser or runtime. No API calls or internet connection required after initial load.

How do I install Suzume?

Install via npm: npm install @libraz/suzume. Then import and use: const { Suzume } = await import("@libraz/suzume"); const suzume = await Suzume.create(); const result = suzume.analyze("日本語テキスト");

Differences from MeCab

Name: Suzume
Author: libraz

Suzume takes a fundamentally different approach to Japanese tokenization than MeCab. This page documents the intentional design differences and known constraints.

Design Philosophy

Suzume is a Tokenizer, Not a Morphological Analyzer

MeCab is a morphological analyzer — its goal is to decompose text into morphemes with detailed grammatical information. Suzume is a tokenizer — its goal is to split text into meaningful units for practical applications like search, display, and text processing. Morphological analysis is not a goal of Suzume. This fundamental difference in purpose explains many of the behavioral differences described below.

	MeCab	Suzume
Approach	Dictionary-driven	Feature-driven
Dictionary	50MB+ (required)	Minimal (~400KB total with WASM)
Unknown words	Falls back to character types	Pattern-based candidate generation
Compound handling	Splits per dictionary	Merges by character type heuristics
Target	Server-side, academic	Browser, real-time, client-side

Core Trade-off

MeCab knows every word in its dictionary and can split them precisely. Suzume uses character-type patterns instead of an exhaustive dictionary, so it merges sequences it cannot reliably split.

Intentional Differences

These behaviors are by design — Suzume deliberately differs from MeCab in these cases.

1. Kanji Compound Merging

Consecutive kanji sequences are merged into a single noun token.

Input: 経済成長
MeCab:  経済 / 成長     (2 tokens)
Suzume: 経済成長         (1 token)

Input: 開始予定
MeCab:  開始 / 予定     (2 tokens)
Suzume: 開始予定         (1 token)

Why: Without a full dictionary, Suzume cannot determine where to split kanji compounds. Merging is the safer choice — an over-merged token still captures the correct text span. User dictionaries can be used to define specific split points.

2. Katakana Compound Merging

Consecutive katakana sequences are merged into a single noun token.

Input: データベース
MeCab:  データ / ベース  (2 tokens)
Suzume: データベース      (1 token)

Input: セットリスト
MeCab:  セット / リスト  (2 tokens)
Suzume: セットリスト      (1 token)

Why: Same reason as kanji — without dictionary entries for each loanword, splitting would be unreliable. Katakana words are almost always loanword nouns, so merging as a single noun is correct.

3. Number + Unit Merging

Numbers followed by counters/units are merged into a single token. Includes large number units (万, 億, 兆), decimal numbers, percentages, and alphabetic units.

Input: 3人
MeCab:  3 / 人          (2 tokens)
Suzume: 3人              (1 token)

Input: 100円
MeCab:  100 / 円        (2 tokens)
Suzume: 100円            (1 token)

Input: 3.14
MeCab:  3 / . / 14      (3 tokens)
Suzume: 3.14             (1 token)

Why: In most application contexts (search, tagging, display), "3人" as a unit is more useful than splitting into number and counter.

4. Date Merging

Full date expressions are merged into a single token.

Input: 2024年12月23日
MeCab:  2024 / 年 / 12 / 月 / 23 / 日  (6 tokens)
Suzume: 2024年12月23日                    (1 token)

Why: Dates are atomic units of meaning. Splitting them provides no practical benefit for tokenization.

5. Nai-Adjective Handling

Certain adjectives ending in ない are treated as single lexical units rather than being split.

Input: だらしない
MeCab:  だらし / ない    (noun + adjective)
Suzume: だらしない        (single adjective)

Input: もったいない
MeCab:  もったい / ない  (noun + adjective)
Suzume: もったいない      (single adjective)

Applies to: だらしない, つまらない, しょうがない, もったいない, くだらない, せわしない, やるせない, いたたまれない, あどけない, おぼつかない, はしたない, みっともない, ろくでもない, どうしようもない, ものたりない, こころもとない

Why: These words function as indivisible adjectives in modern Japanese. The "stem + ない" split is etymologically correct but not useful for NLP applications.

6. Slang/Modern Word Support

Modern colloquial adjectives and verbs are recognized natively.

Input: エモい
MeCab:  (unknown or split)
Suzume: エモい  (ADJ)

Supported adjectives: エモい, キモい, ウザい, ダサい, イタい, エロい

Supported verbs: バズる, ググる, パクる

Why: MeCab's dictionary does not include modern slang. Suzume recognizes common slang adjective and verb patterns, including their conjugated forms (エモかった, バズった, etc.).

7. Compound Verb Merging

Verb stems in 連用形 followed by subsidiary verbs are merged into compound verbs.

Input: 走り込む
MeCab:  走り / 込む      (2 tokens)
Suzume: 走り込む          (1 token)

Input: 食べ続ける
MeCab:  食べ / 続ける    (2 tokens)
Suzume: 食べ続ける        (1 token)

Supported V2 elements include: 込む, 出す, 続く, 返す, 合う, 直す, 切る, 上がる, 抜く, 続ける, つける, 替える, 合わせる, 上げる, 下げる, 掛ける, 入れる, etc. (40+ patterns)

Why: Compound verbs function as single lexical units in Japanese. Splitting them loses the compound meaning.

8. タリ活用副詞 Merging

Tari-conjugation adverb stems followed by と are merged into a single adverb.

Input: 堂々と
MeCab:  堂々 / と        (2 tokens)
Suzume: 堂々と            (1 token, ADV)

Applies to: 泰然, 堂々, 悠々, 淡々, 粛々, 颯爽, 毅然, 漫然, 茫然, 呆然, 唖然, 愕然, 断然, 俄然, 歴然, 整然, 雑然, 騒然, 憮然, 黙然, 昂然, 凛然, 厳然

Why: These stem+と combinations are conventionally used as adverbs and are more useful as single tokens.

9. お/ご Prefix Handling

Suzume splits お/ご honorific prefixes from nouns but keeps them merged when they form inseparable lexemes.

Input: お茶
Suzume: お(PREFIX) / 茶(NOUN)    (split — separable prefix)

Input: お金
Suzume: お金(NOUN)                (kept — inseparable lexeme)

Input: お母さん
Suzume: お母さん(NOUN)            (kept — family term)

Inseparable exceptions include: お金, お前, おかず, おでん, おもちゃ, おすすめ, おいら, おっぱい, おしっこ, おもらし, おっさん, お疲れ様, お出で/おいで, and family terms (お母さん, お父さん, お兄ちゃん, お姉さん, おじさん, おばさん, おじいさん, おばあさん, etc.)

Why: In most contexts, お/ご are grammatical prefixes that should be separated. But some words have lexicalized with the prefix and splitting them would be incorrect.

10. Honorific Suffix Splitting

Honorific suffixes are split from names.

Input: 田中さん
MeCab:  田中さん          (1 token)
Suzume: 田中 / さん       (2 tokens)

Applies to suffixes: さん, ちゃん, 様, 君, 殿, さま

Exceptions: Family terms like お兄ちゃん, お母さん are kept as single tokens.

Why: For search and display, separating names from honorifics is more useful.

11. URL / Mention / Hashtag Handling

URLs, @mentions, and #hashtags are merged into single tokens.

Input: https://example.com にアクセス
Suzume: https://example.com / に / アクセス

Input: @user_name に送信
Suzume: @user_name / に / 送信

Input: #推し活 について
Suzume: #推し活 / について

Why: These are atomic identifiers in modern text. Splitting them provides no benefit.

12. Prolonged Sound Mark Handling

Prolonged sound marks (ー) are merged with the preceding token, and consecutive marks are normalized to one.

Input: あのー
MeCab:  あの / ー          (2 tokens)
Suzume: あのー              (1 token)

Input: すごーーい
Suzume: すごーい            (normalized)

Why: Prolonged sounds are part of the word they modify.

13. Colloquial Pronoun Merging

Colloquial pronouns that MeCab splits are merged.

Input: こいつは
MeCab:  こ / いつ / は    (3 tokens)
Suzume: こいつ / は        (2 tokens)

Applies to: どいつ, こいつ, そいつ, あいつ

14. Split Rules

Suzume splits certain MeCab tokens that should be separate units.

ったら topic particle:

Input: あなたったら
MeCab:  あなたったら      (1 token)
Suzume: あなた / ったら    (pronoun + particle)

ってば emphatic particle:

Input: もうってば
MeCab:  もうってば        (1 token)
Suzume: もう / ってば      (adverb + particle)

Plural suffix ら:

Input: 彼ら
MeCab:  彼ら              (1 token)
Suzume: 彼 / ら            (pronoun + suffix)

Kanji adverb + に:

Input: 次に
MeCab:  次に(副詞)        (1 token)
Suzume: 次(NOUN) / に(PARTICLE)  (2 tokens)

Kanji + katakana compound nouns:

Input: 量子コンピュータ (without user dictionary)
Suzume: 量子 / コンピュータ  (2 tokens)

15. Causative-Passive Split

MeCab sometimes merges godan verb 未然形 + causative さ into one token. Suzume normalizes this inconsistency.

Input: 飲まされた
MeCab:  飲まさ / れ / た    (merged causative)
Suzume: 飲ま / さ / れ / た  (split: verb + causative + passive + past)

Why: MeCab is inconsistent — it splits some causative-passive forms (読ま + さ + れた) but merges others (飲まさ + れた). Suzume normalizes all cases.

16. Kango + として Adverb Split

MeCab treats kango + として as a single adverb. Suzume splits it into the adverb form + する conjugation.

Input: 依然として
MeCab:  依然として          (1 token, adverb)
Suzume: 依然と / し / て     (adverb + verb + auxiliary)

Why: These are taru-adjective adverb forms (漢語 + と) followed by する conjugation. Splitting provides more accurate grammatical structure.

17. Prefecture + City Split

Prefecture-city compound nouns are split at administrative boundaries.

Input: 神奈川県横浜市 (when MeCab produces single token)
Suzume: 神奈川県 / 横浜市     (split at 県/市 boundary)

Note: This split rule applies only to the 県+市 pattern. Other combinations like 都+区 (東京都新宿区) or 府+市 (大阪府大阪市) are merged into single tokens by the proper noun merging rule in §22.

Why: Prefecture and city are distinct administrative levels, and splitting at their boundary is useful for search and geocoding.

18. Copula Negation Split

MeCab sometimes treats じゃない as a single auxiliary. Suzume splits it.

Input: じゃない
MeCab:  じゃない            (1 token, auxiliary)
Suzume: じゃ / ない          (auxiliary + auxiliary)

Why: Splitting copula and negation allows for more granular grammatical analysis.

19. Technical Text Merging

Technical identifiers are merged into single tokens.

Snake_case identifiers:

Input: user_name
Suzume: user_name            (1 token)

Version numbers:

Input: v1.2.3
Suzume: v1.2.3               (1 token)

Brand + number:

Input: iPhone15
Suzume: iPhone15              (1 token)

ASCII dot notation:

Input: console.log
Suzume: console.log           (1 token)

Why: These are atomic identifiers in technical text. Splitting them provides no benefit.

20. Noun + Suffix Merging

Nouns followed by single-character suffixes are merged.

Input: 報告書
Suzume: 報告書               (1 token)

Input: 成功率
Suzume: 成功率               (1 token)

Applies to suffixes: 書, 誌, 時, 率, 性

Why: These noun + suffix combinations function as single lexical units.

21. Particle Merging

Certain multi-particle sequences are merged.

か + も → かも:

Input: 行くかもしれない
Suzume: 行く / かも / しれ / ない

の + に → のに (after past tense):

Input: 行ったのに
Suzume: 行っ / た / のに

ず + に → ずに:

Input: 食べずに
Suzume: 食べ / ずに

Why: These particle combinations function as compound particles and are more useful as single units.

22. Proper Noun + Region Merging

Consecutive proper nouns with region suffixes are merged.

Input: 東京都新宿区
Suzume: 東京都新宿区         (1 token — place name)

Why: Place names consisting of multiple geographic components should be treated as single entities for search and display.

POS Classification Differences

MeCab and Suzume use different POS classification strategies, resulting in different labels for the same words. Suzume applies 150+ rules for its own POS classification system.

Adjective Continuative Form (連用形)

MeCab classifies adjective 連用形 (〜く form) as adverbs. Suzume treats these as adjectives:

Input: よくある質問
MeCab:  よく(副詞) / ある / 質問
Suzume: よく(ADJ) / ある / 質問

Input: 美しく咲く花
MeCab:  美しく(副詞) / 咲く / 花
Suzume: 美しく(ADJ) / 咲く / 花

Any word ending in く whose lemma ends in い, or a kanji-containing surface, is recognized as an adjective continuative form, not an adverb.

Pronoun Recognition

MeCab classifies many pronouns as plain nouns. Suzume treats these as Pronoun:

Input: みんなで行こう
MeCab:  みんな(名詞) / で / 行こう
Suzume: みんな(PRONOUN) / で / 行こう

Applies to: あなた, あんた, みんな, みな, 皆, 某, 拙者, 我輩, 彼女, 彼氏, 奴, 我, わし, いくら (interrogative)

Na-Adjective Recognition

MeCab classifies na-adjective stems as nouns (形容動詞語幹). Suzume recognizes them as adjectives:

Input: きれいな花
MeCab:  きれい(名詞) / な / 花
Suzume: きれい(ADJ) / な / 花

Applies to: きれい, しずか, おだやか, げんき, しんちょう, ありきたり, 無限, 滅多

Note: Some 形容動詞語幹 are intentionally kept as Noun: マジ, 不安, 不要, 乙, 不便, 公式, 可能, 容易, 積極, 健康, 傍若無人

じゃ Conjugation

MeCab classifies じゃ (colloquial copula) inconsistently across contexts. Suzume consistently treats it as auxiliary (助動詞) and classifies the following ない/なかっ/な accordingly:

Input: そうじゃない
MeCab:  そう / じゃ(助詞) / ない(形容詞)
Suzume: そう / じゃ(AUX) / ない(AUX)

て-Form Auxiliary Classification

After て/で, subsidiary verbs like いる are classified as Auxiliary:

Input: 食べている
MeCab:  食べ / て / いる(動詞)
Suzume: 食べ / て(AUX) / いる(AUX)

Additionally, て/で after verbs and adjectives are classified as Auxiliary rather than Particle.

Context-Dependent POS

Suzume applies context-aware POS classification for several ambiguous words:

そう: Classified as Adjective before copula (そうだ = hearsay), Auxiliary after Auxiliary (しまいそう = appearance), Adverb otherwise.

でも: Classified as Particle after interrogatives (何でも), Conjunction at sentence/clause boundaries (でも、...).

いかが: Classified as Adverb when not before copula, Pronoun before copula (いかがですか).

大変: Classified as Adjective before な (大変な), Adverb otherwise (大変良い).

Slang and Modern Words

MeCab's dictionary does not include modern slang, often producing incorrect tokenization:

Input: エモい曲
MeCab:  エモ(noun) / い(unknown) / 曲
Suzume: エモい(ADJ) / 曲(NOUN)

Suzume recognizes modern adjective patterns (エモい, キモい, ウザい, ダサい, イタい, エロい) and handles their conjugated forms correctly (エモかった, エモくない, etc.). Modern verb patterns (バズる, ググる, パクる) are also supported.

Particle Classification

MeCab classifies certain particles as nouns in some contexts. Suzume applies context-aware classification for 30+ particles:

Input: 行くのは大変
MeCab:  行く / の(名詞,非自立) / は / 大変
Suzume: 行く / の(PARTICLE) / は / 大変

The nominalizer の functions as a particle here, not a noun. Suzume classifies such cases as particles.

Katakana Onomatopoeia

MeCab classifies katakana onomatopoeia (reduplication patterns) as nouns. Suzume recognizes them as adverbs:

Input: ワクワクする
MeCab:  ワクワク(名詞) / する
Suzume: ワクワク(ADV) / する

Onomatopoeia + っと patterns are also merged:

Input: どきっとする
MeCab:  どき / っと / する
Suzume: どきっと(ADV) / する

いい Classification

MeCab sometimes classifies いい as Verb (lemma: いう). Suzume treats it as Adjective when not followed by another verb:

Input: いい天気
MeCab:  いい(動詞,いう) / 天気
Suzume: いい(ADJ) / 天気

で+ある Copula Handling

Suzume applies context-aware classification for the copula である pattern:

Input: 重要である
Suzume: 重要 / で(AUX,だ) / ある(VERB)

Input: 問題であった
Suzume: 問題 / で(PARTICLE) / あっ(VERB) / た

ない Context-Dependent Classification

Suzume classifies ない/なく/なかっ as Adjective (rather than Auxiliary) when functioning as an existence adjective:

Input: 時間がない
Suzume: 時間 / が / ない(ADJ)    ← existence negation

Input: 食べない
Suzume: 食べ / ない(AUX)         ← negation auxiliary

なら + ない Classification

When なら is followed by ない/なく/なかっ, Suzume classifies it as Verb (なる):

Input: ならない
Suzume: なら(VERB,なる) / ない(AUX)

Per-Word POS Differences

The following words are classified differently between MeCab and Suzume:

Word	MeCab	Suzume	Reason
なら	助動詞	PARTICLE	Conditional particle
違い	名詞	VERB	Noun form of 違う
推し	動詞	NOUN	Modern noun usage
嫌い	動詞	ADJ	Na-adjective
大変	名詞	ADV	Adverb usage
超	接頭詞	NOUN	Modern usage
びっくり	名詞	ADV	Adverb
なるほど	—	ADV	Adverb
たくさん	—	ADV	Adverb
いずれ	—	ADV	Adverb
お疲れ様	—	INTJ	Interjection
よろしく	—	ADV	Adverb
おめでとう	感動詞	ADV	Adverb
じゃん	助動詞	PARTICLE	Sentence-final particle
や	助動詞	PARTICLE	Kansai copula → particle
よう	助動詞	AUX	Auxiliary (様)
時々	副詞	NOUN	Noun usage
遥か	副詞	ADJ	Na-adjective
どう	副詞	ADJ	Na-adjective
まじ	助動詞	ADJ	Adjective (katakana マジ stays NOUN)
なんて	副詞	PARTICLE	Particle
っていう	助詞	DET	Determiner
という	助詞	DET	Determiner
まして	副詞	CONJ	Conjunction
いわば	副詞	CONJ	Conjunction (言わば)
寒し	形容詞	NOUN	Archaic noun form
付け	接尾	NOUN	Noun
得 (before する)	名詞	VERB	Ichidan verb 得る
むしろ	副詞	OTHER	Other
その後	—	ADV	Adverb
しどろもどろ	—	ADV	Adverb
無い/無く	助動詞	ADJ	Kanji ない adjective

Constraints

These are known limitations arising from Suzume's feature-based architecture.

Cannot Split Merged Compounds

Since Suzume uses character-type features rather than a dictionary for compound word boundaries, it cannot split kanji or katakana sequences that should be separate words.

Input: 東京都庁前
Suzume: 東京都庁前  (1 token — cannot determine internal boundaries)
MeCab:  東京 / 都庁 / 前  (3 tokens — dictionary-driven)

Workaround: Use the user dictionary to register specific words that need to be recognized as separate tokens.

typescript

suzume.loadUserDictionary('東京都庁,NOUN')

Context-Dependent POS Classification

Suzume's feature-based model sometimes cannot distinguish POS that requires dictionary knowledge or deep context. The major patterns:

Auxiliary vs Main Verb

When subsidiary verbs follow て-form, Suzume may classify them as main verbs:

Input: 確認してあります
MeCab:  確認 / し / て / あり(AUX) / ます
Suzume: 確認 / し / て / あり(VERB) / ます

Affects: ある, おく, みる, いく, くる after て-form. Suzume cannot always determine whether these function as subsidiary (auxiliary) or main verbs.

で: Copula vs Particle

The particle で has multiple grammatical roles that Suzume cannot always distinguish:

Input: マジで驚いた
MeCab:  マジ / で(AUX=copula) / 驚い / た
Suzume: マジ / で(PARTICLE) / 驚い / た

After na-adjective stems like マジ, で is the copula (断定の助動詞). Suzume may misclassify it as a particle since this requires dictionary-level knowledge of na-adjective stems.

ない: Adjective vs Auxiliary

ない can be a standalone adjective or a negation auxiliary:

Input: 仕方ない
MeCab:  仕方 / ない(ADJ)     ← lexical adjective
Suzume: 仕方 / ない(AUX)     ← misclassified as negation

Verb Renyokei vs Noun

Verb stems used as nouns (nominalization) can be ambiguous:

Input: 東京行きは何番線ですか
MeCab:  東京 / 行き(NOUN) / は / ...
Suzume: 東京 / 行き(VERB) / は / ...

なければ Conditional

The conditional form なければ is classified differently:

Input: 行かなければ
MeCab:  行か / なけれ(AUX) / ば
Suzume: 行か / なけれ(VERB) / ば

POS Granularity

Suzume's basic POS (pos) uses a simpler tag set than MeCab's detailed subcategories.

MeCab	Suzume `pos`
名詞,一般	NOUN
名詞,固有名詞,地域	NOUN
名詞,サ変接続	NOUN
名詞,副詞可能	NOUN
動詞,自立	VERB
動詞,非自立	VERB

However, the extendedPos field provides finer-grained subcategories:

MeCab subcategory	Suzume `extendedPos`
名詞,固有名詞	`NounProper`
名詞,固有名詞,人名	`NounProperGiven` / `NounProperFamily`
名詞,数	`NounNumber`
名詞,サ変接続	`NounVerbal`
名詞,形式名詞	`NounFormal`
動詞,連用形	`VerbRenyokei`
動詞,未然形	`VerbMizenkei`
形容詞,連用形	`AdjRenyokei`
形容動詞語幹	`NaAdjectiveStem`
助詞,格助詞	`ParticleCase`
助詞,係助詞	`ParticleBinding`

When MeCab-level subcategories are needed, extendedPos covers many of these cases. See the API Reference ExtendedPOS section for the full list.

When to Use Which

Use Case	Recommendation
Browser / client-side apps	Suzume — no server required
Search indexing / tag extraction	Suzume — compound merging is often desirable
Academic research / corpus analysis	MeCab — maximum accuracy and POS detail
Real-time UI (input-as-you-type)	Suzume — fast, no network latency
Precise compound word splitting	MeCab — dictionary-driven boundaries
Handling unknown / modern words	Suzume — robust to unseen vocabulary

Differences from MeCab ​

Design Philosophy ​

Intentional Differences ​

1. Kanji Compound Merging ​

2. Katakana Compound Merging ​

3. Number + Unit Merging ​

4. Date Merging ​

5. Nai-Adjective Handling ​

6. Slang/Modern Word Support ​

7. Compound Verb Merging ​

8. タリ活用副詞 Merging ​

9. お/ご Prefix Handling ​

10. Honorific Suffix Splitting ​

11. URL / Mention / Hashtag Handling ​

12. Prolonged Sound Mark Handling ​

13. Colloquial Pronoun Merging ​

14. Split Rules ​

15. Causative-Passive Split ​

16. Kango + として Adverb Split ​

17. Prefecture + City Split ​

18. Copula Negation Split ​

19. Technical Text Merging ​

20. Noun + Suffix Merging ​

21. Particle Merging ​

22. Proper Noun + Region Merging ​

POS Classification Differences ​

Adjective Continuative Form (連用形) ​

Pronoun Recognition ​

Na-Adjective Recognition ​

じゃ Conjugation ​

て-Form Auxiliary Classification ​

Context-Dependent POS ​

Slang and Modern Words ​

Particle Classification ​

Katakana Onomatopoeia ​

いい Classification ​

で+ある Copula Handling ​

ない Context-Dependent Classification ​

なら + ない Classification ​

Per-Word POS Differences ​

Constraints ​

Cannot Split Merged Compounds ​

Context-Dependent POS Classification ​

POS Granularity ​

When to Use Which ​