What is RegEx?
RegEx (regular expression) is a system for doing search patterns in text, used widely in programming languages. In Vulgar, RegEx can be used in the custom spelling option to mimic spelling idiosyncrasies of natural languages. It can also be used in the illegal combinations field to prohibit combinations of phonemes, and in custom affix rules to create more complex sound changes.
This page covers the basics of RegEx. Also try our RegEx Builder tool.
Replace a symbol at the beginning or end of a word
Replace multiple symbols
The bar symbol
Alternatively, square brackets
Remember: The bar symbol
A carat symbol inside the square brackets
Replace a symbol with itself & something else
Let's say you wanted to put a
To mimic French spelling we can modify this rule to be applied to the end of a word only:
Lookahead allows you to match a pattern and then only replace it if a pattern ahead of it is also matched. Example: you want to change
Negative lookahead is the same principle, but the rule is applied if the there is no match ahead of it. It uses the
Lookbehind is same is same principle as lookahead, but checking before the main pattern. It uses
To do negative lookbehinds, replace the
Replace with nothing
Creating a rule with nothing on the right side of the
Replace any character
The dot symbol
Dealing with stress symbols
If you want to make spelling rules that are sensitive to stress, you first need to check the
Finally, you will need a second rule to replace stress symbols with nothing.
These shorthand symbols are borrowed from phonological rule notation, and can be used anywhere RegEx can be used. However, be aware they are NOT a part of standard RegEx, and some bugs may arise using them in more complex rules.
|A or C[+affricate]||Affricates|
|B or V[+back]||Back vowels|
|E or V[+front]||Front vowels||F or C[+fricative]||Fricatives|
|H or C[+laryngeal]||Laryngeals|
|K or C[+velar]||Velars|
|L or C[+liquid]||Liquids|
|M||Diphthongs||N or C[+nasal]||Nasal consonants||P or C[+labial]||Labials|
|Q or C[+uvular]||Uvulars|
|S or C[+stop]||Stops|
|V||Vowels, including diphthongs|
|V[+round]||Rounded vowels||V[-round]||Unrounded vowels|
Custom orthography also supports all Unicode alphabets and scripts, such as Japanese, Chinese, Cyrillic, Georgian and even Unicode Emojis.
Order of rules
The order of your custom spelling rules matter. Vulgar will find-and-replace the first spelling rule to a word, then apply the next rule over the top of what it just did. This can be a problem if an IPA symbol appears again in a consonant cluster in a later rule. For instance, the following rules are problematic:
The intent here is for
Another solution is to use the single Unicode character versions if it exists, such as