What is RegEx?
RegEx (regular expression) is a system for doing search patterns in text, used widely in programming languages. In Vulgar, RegEx can be used in the custom spelling option to mimic spelling idiosyncrasies of natural languages. It can also be used in the illegal combinations field to prohibit combinations of phonemes, and in custom affix rules to create more complex sound changes.
This page covers the basics of RegEx. Also try our RegEx Builder tool.
Matching at the beginning or end of a word
This also works at the end of the word:
Match A or B
The bar symbol
Alternatively, square brackets
Take home point: If any of your patterns are more than one symbol, don't use square brackets! Some people erroneously think that
A carat symbol
Lookahead allows you to match a pattern but only replace it if it comes before another pattern. Example: you want to change
Notice how when
Negative lookahead is the same principle, but the rule is applied if the there is no match ahead of it. It uses the
Lookbehind is same is same principle as lookahead, but checking for a pattern behind the main pattern. It uses
Negative lookbehinds use
Vulgar uses various shorthand abbreviations for classes of phonemes, such as
Here is a complete list:
|D||Any IPA letter (does not match diacritics)|
|ᴰ (superscript D)||Any diacritic symbol|
|ʟ (small capital L)||Any IPA letter (does not match diacritics)|
|U or σ||Syllable|
|V||Vowels, including diphthongs|
Numbers refer back to whatever was captured inside brackets. The number
The following pattern matches two consonants in a row and swaps them:
Zero refers to the entire match:
Replace with nothing
Creating a rule with nothing on the right side of the
Replace any character
The dot symbol
Dealing with stress symbols
If you want to make spelling rules that are sensitive to stress, you first need to check the Make spelling rules sensitive to stress symbol option. (The default setting is to apply the RegEx patterns with the stress symbol already removed, so that you don't have to worry about the stress symbol making your patterns more complicated.) Let's say you want stressed
Finally, you will need a second rule to replace stress symbols with nothing.
Custom orthography also supports all Unicode alphabets and scripts, such as Japanese, Chinese, Cyrillic, Georgian and even Unicode Emojis.
Order of rules
The order of your custom spelling rules matter. Vulgar will find-and-replace the first spelling rule to a word, then apply the next rule over the top of what it just did. This can be a problem if an IPA symbol appears again in a consonant cluster in a later rule. For instance, the following rules are problematic:
The intent here is for
Another solution is to use the single Unicode character versions if it exists, such as