RegEx (regular expression) is a programming syntax for matching patterns in text. In Vulgar, RegEx can be implemented in the custom spelling option to mimic the spelling idiosyncrasies of natural languages. It can also be used in the illegal combinations field to prohibit combinations of phonemes, and in custom affix rules to create more complex sound changes.
Separate sequences of characters with spaces!
Replace a symbol at the beginning or end of a word
# signifies a word boundary. The pattern #dʒ will match dʒ at the beginning of a word, but not the middle or end.
Try it yourself:
The rule dʒ# > dge changes dʒudʒ to dʒudge. You could then combine this with the first rule to turn dʒudʒ into judge.
Replace multiple symbols
The bar symbol | will match anything either side of it. For example, if you want both əʊ and ɔ to turn into o, use ɔ|əʊ > o. Multiple bar symbols can also be used: ɔ|əʊ|ɒ > o. If you wanted to change ɔ or əʊ at the end of a word to o you will need to group the OR section in round brackets: (ɔ|əʊ)#. Without the round brackets it will match ɔ anywhere in the word OR əʊ at the end of a word.
Alternatively, square brackets  are used to match any character inside them. For instance, [ɔəʊ] > o changes any of those characters to o. The disadvantage to this method is it treats everything inside the brackets as individual characters, which means you will not be able to properly match a pair of phonemes. See the difference:
Remember: The bar symbol | should never be wrapped in square brackets . A rule such as [ɔ|əʊ] is saying: matching ɔ, |, ə or ʊ. Whereas (ɔ|əʊ) matches ɔ or əʊ.
A carat symbol inside the square brackets [^] matches anything not inside the brackets:
Replace a symbol with itself & something else
Let's say you wanted to put a x after every vowel. This means matching a pattern without replacing it. You can carry the matched pattern over to the replacing side using $&. For example, [aeiou] > $&x will find any vowel and replace it with whichever vowel was matched + x.
To mimic French spelling we can modify this rule to be applied to the end of a word only:
Lookahead allows you to match a pattern and then only replace it if a pattern ahead of it is also matched. Example: you want to change k to c but only if there is an a after it. The lookahead pattern is placed inside brackets with ?= at the beginning, like this (?=a):
Negative lookahead is the same principle, but the rule is applied if the there is no match ahead of it. It uses the ?! symbol inside brackets:
Lookbehind is same is same principle as lookahead, but checking before the main pattern. It uses ?<= inside the brackets. The following examples combines multiple rules we have learned so far. In plain English it says: find an l or an m[lm], look behind for a vowel (?<=[aeiou]), and ahead for a vowel (?=[aeiou]), and then change it to itself twice $&$&:
To do negative lookbehinds, replace the = with ! inside the brackets. Note: negative lookbehinds are only supported in Chrome.
Replace with nothing
Creating a rule with nothing on the right side of the > symbol will simply delete everything on the left side of the rule; [aeiou] > will replace all vowels inside the brackets with nothing. Arabic and Hebrew are examples of languages that do not have letters for their vowels.
Replace any character
The dot symbol . matches any character. The rule . > x would change every character in the word to an x. While this is probably not useful in isolation, it can be useful as part of larger patterns.
Dealing with stress symbols
If you want to make spelling rules that are sensitive to stress, you first need to check the Make spelling rules sensitive to stress symbol option. (The default setting is to apply the RegEx patterns with the stress symbol already removed, so that you don't have to worry about the stress symbol making your patterns more complicated.) Let's say you want stressed a to turn into á, like in Spanish spelling. The stress symbol could come right before an a/, as in ˈama, however it could also come before any number of consonants and then an a, as in ˈdrama. To capture any number of consonants you can put all consonants in square brackets and use the star symbol after it: [mdr]*. The star symbol means match any number of whatever is before it, includng zero instances. The consonants will need to wrapped inside a Lookbehind group (?<=) so that you don't replace them, and the a will go outside the Lookbehind so that you do replace it. And don't forget about the stress symbol too: (?<=ˈ[mdr]*).
Finally, you will need a second rule to replace stress symbols with nothing.
These shorthand symbols are borrowed from phonological rule notation, and can be used anywhere RegEx can be used. However, be aware they are NOT a part of standard RegEx, and some bugs may arise using them in more complex rules.
A or C[+affricate]
B or V[+back]
E or V[+front]
F or C[+fricative]
H or C[+laryngeal]
K or C[+velar]
L or C[+liquid]
N or C[+nasal]
P or C[+labial]
Q or C[+uvular]
S or C[+stop]
Vowels, including diphthongs
Custom orthography also supports all Unicode alphabets and scripts, such as Japanese, Chinese, Cyrillic, Georgian and even Unicode Emojis.
Order of rules
The order of your custom spelling rules matter. Vulgar will find-and-replace the first spelling rule to a word, then apply the next rule over the top of what it just did. This can be a problem if an IPA symbol appears again in a consonant cluster in a later rule. For instance, the following rules are problematic:
ʃ > sh tʃ > ch
The intent here is for /tʃ/ to change to ch. However, in a word such a /tʃar/, the first rule will find /ʃ/ and change the orthography to tshar. Then when it moves to the second rule it will fail to find /tʃ/. The easiest solution is to reverse the order of the rules:
tʃ > ch ʃ > sh
Another solution is to use the single Unicode character versions if it exists, such as ʧ. Lookahead patterns may be another option.