Skip to main content
Regexflux
Lesson 14 of 150 completed

Unicode and Property Escapes

Match characters by their Unicode properties — letters, numbers, scripts, and categories — instead of hard-coded ranges.· 9 min

Concept

Traditional character classes like [a-zA-Z] only cover ASCII letters. For internationalized text, you need Unicode-aware matching.

The u flag (Unicode mode) enables full Unicode support in JavaScript. Without it, patterns may mishandle characters outside the Basic Multilingual Plane (such as emoji or rare CJK characters) because they are encoded as surrogate pairs.

Unicode property escapes, written as \p{Property}, match characters by their Unicode category: - \p{Letter} or \p{L} — any Unicode letter (Latin, Cyrillic, CJK, Arabic, etc.) - \p{Number} or \p{N} — any Unicode digit or numeric character - \p{Punctuation} or \p{P} — any punctuation mark - \p{Emoji} — emoji characters (note: also matches digits 0-9, #, and * due to Unicode properties — use \p{Emoji_Presentation} for pictographic emoji only) - \p{Script=Greek} — characters from the Greek script - \p{Script=Han} — CJK characters (Chinese, Japanese kanji, Korean hanja)

Negated forms use \P{...} (uppercase P) to match characters NOT in the category.

**The u flag is required** for \p{...} to work in JavaScript. Without it, \p is treated as a literal "p".

**Language support:** Unicode property escapes (\p{...}) are supported in JavaScript (ES2018+ with u flag), Java, .NET, PCRE, Ruby, and Perl. Python's re module does not support \p{...} — use the third-party regex module instead. Go's regexp supports Unicode categories with the same \p{L} syntax natively, no flag needed.

**Note:** \w in JavaScript matches only [A-Za-z0-9_] even with the u flag. To match word characters from any script, use [\p{L}\p{N}_] with the u flag.

/\p{L}+/gu

Matches one or more Unicode letters from any script — works with Latin, Cyrillic, CJK, Arabic, and more

Hello World
Привет мир
你好世界
123 !@#
/\p{Emoji}/gu

Matches emoji characters — requires the u flag. Note: \p{Emoji} also matches digits (0-9) and a few symbols (#, *) because they have the Unicode Emoji property. Use \p{Emoji_Presentation} for pictographic emoji only.

Hello 👋 World 🌍
No emoji here
🎉🎊🎈
text only
/\p{Script=Greek}+/gu

Matches one or more Greek script characters

alpha is α, beta is β
Ωmega and Δelta
no greek here
π is approximately 3.14

Exercise

Write a pattern using \p{L} with the u flag to match words that contain Unicode letters from any script.

Your pattern:

Must match

café has accents
日本語 text
München is a city
naïve résumé

Must not match

123 456
!@# $%^
--- === ---

Try These Patterns

See these concepts in action with real-world patterns from the library: