Unicode and Property Escapes
Match characters by their Unicode properties — letters, numbers, scripts, and categories — instead of hard-coded ranges.· 9 min
Concept
Traditional character classes like [a-zA-Z] only cover ASCII letters. For internationalized text, you need Unicode-aware matching.
The u flag (Unicode mode) enables full Unicode support in JavaScript. Without it, patterns may mishandle characters outside the Basic Multilingual Plane (such as emoji or rare CJK characters) because they are encoded as surrogate pairs.
Unicode property escapes, written as \p{Property}, match characters by their Unicode category:
- \p{Letter} or \p{L} — any Unicode letter (Latin, Cyrillic, CJK, Arabic, etc.)
- \p{Number} or \p{N} — any Unicode digit or numeric character
- \p{Punctuation} or \p{P} — any punctuation mark
- \p{Emoji} — emoji characters (note: also matches digits 0-9, #, and * due to Unicode properties — use \p{Emoji_Presentation} for pictographic emoji only)
- \p{Script=Greek} — characters from the Greek script
- \p{Script=Han} — CJK characters (Chinese, Japanese kanji, Korean hanja)
Negated forms use \P{...} (uppercase P) to match characters NOT in the category.
**The u flag is required** for \p{...} to work in JavaScript. Without it, \p is treated as a literal "p".
**Language support:** Unicode property escapes (\p{...}) are supported in JavaScript (ES2018+ with u flag), Java, .NET, PCRE, Ruby, and Perl. Python's re module does not support \p{...} — use the third-party regex module instead. Go's regexp supports Unicode categories with the same \p{L} syntax natively, no flag needed.
**Note:** \w in JavaScript matches only [A-Za-z0-9_] even with the u flag. To match word characters from any script, use [\p{L}\p{N}_] with the u flag.
/\p{L}+/guMatches one or more Unicode letters from any script — works with Latin, Cyrillic, CJK, Arabic, and more
/\p{Emoji}/guMatches emoji characters — requires the u flag. Note: \p{Emoji} also matches digits (0-9) and a few symbols (#, *) because they have the Unicode Emoji property. Use \p{Emoji_Presentation} for pictographic emoji only.
/\p{Script=Greek}+/guMatches one or more Greek script characters
Exercise
Write a pattern using \p{L} with the u flag to match words that contain Unicode letters from any script.
Your pattern:
Must match
Must not match
Try These Patterns
See these concepts in action with real-world patterns from the library: