What is RegEx?
RegEx (regular expression) is a system for doing search patterns in text, used widely in programming languages. In Vulgar, RegEx can be used in the custom spelling option to mimic spelling idiosyncrasies of natural languages. It can also be used in the illegal combinations field to prohibit combinations of phonemes, and in custom affix rules to create more complex sound changes.
This page covers the basics of RegEx. Also try our RegEx Builder tool.
Matching at the beginning or end of a word
Input | Rule | Output |
---|---|---|
This also works at the end of the word:
Input | Rule | Output |
---|---|---|
Match A or B
The bar symbol
Alternatively, square brackets
Input | Rule | Output |
---|---|---|
| ||
Take home point: If any of your patterns are more than one symbol, don't use square brackets! Some people erroneously think that
A carat symbol
Input | Rule | Output |
---|---|---|
Lookahead
Lookahead allows you to match a pattern but only replace it if it comes before another pattern. Example: you want to change
Input | Rule | Output |
---|---|---|
Notice how when
Input | Rule | Output |
---|---|---|
Negative lookahead is the same principle, but the rule is applied if the there is no match ahead of it. It uses the
Input | Rule | Output |
---|---|---|
Lookbehind
Lookbehind is same is same principle as lookahead, but checking for a pattern behind the main pattern. It uses
Input | Rule | Output |
---|---|---|
Negative lookbehinds use
Input | Rule | Output |
---|---|---|
Shorthand symbols
Vulgar uses various shorthand abbreviations for classes of phonemes, such as
Input | Rule | Output |
---|---|---|
Here is a complete list:
Shorthand code | Category |
---|---|
A | Affricates |
B | Back vowels |
C | Consonants |
D | Any IPA letter (does not match diacritics) |
ᴰ (superscript D) | Any diacritic symbol |
E | Front vowels | F | Fricatives |
H | Laryngeals |
K | Velars |
L | Liquids |
ʟ (small capital L) | Any IPA letter (does not match diacritics) |
M | Diphthongs | N | Nasal consonants | O | Obstruent | P | Labials |
Q | Uvulars |
R | Sonorant/resonant |
S | Stops |
U or σ | Syllable |
V | Vowels, including diphthongs |
W | Semivowels |
X | Any phoneme |
Z | Continuant |
Backreferences
Numbers refer back to whatever was captured inside brackets. The number
Input | Rule | Output |
---|---|---|
The following pattern matches two consonants in a row and swaps them:
Input | Rule | Output |
---|---|---|
Zero refers to the entire match:
Input | Rule | Output |
---|---|---|
Replace with nothing
Creating a rule with nothing on the right side of the
Input | Rule | Output |
---|---|---|
Replace any character
The dot symbol
Input | Rule | Output |
---|---|---|
Dealing with stress symbols
If you want to make spelling rules that are sensitive to stress, you first need to check the Make spelling rules sensitive to stress symbol option. (The default setting is to apply the RegEx patterns with the stress symbol already removed, so that you don't have to worry about the stress symbol making your patterns more complicated.) Let's say you want stressed
Input | Rule | Output |
---|---|---|
Finally, you will need a second rule to replace stress symbols with nothing.
Non-Latin alphabets
Custom orthography also supports all Unicode alphabets and scripts, such as Japanese, Chinese, Cyrillic, Georgian and even Unicode Emojis.
Order of rules
The order of your custom spelling rules matter. Vulgar will find-and-replace the first spelling rule to a word, then apply the next rule over the top of what it just did. This can be a problem if an IPA symbol appears again in a consonant cluster in a later rule. For instance, the following rules are problematic:
The intent here is for
Another solution is to use the single Unicode character versions if it exists, such as