What is RegEx?

RegEx (regular expression) is a system for doing search patterns in text, used widely in programming languages. In Vulgar, RegEx can be used in the custom spelling option to mimic spelling idiosyncrasies of natural languages. It can also be used in the illegal combinations field to prohibit combinations of phonemes, and in custom affix rules to create more complex sound changes.

This page covers the basics of RegEx. Also try our RegEx Builder tool.

Matching at the beginning or end of a word

# signifies a word boundary. The pattern #dʒ will match at the beginning of a word, but not the middle or end.

Try it yourself:


This also works at the end of the word:


Match A or B

The bar symbol | will match anything either side of it. For example, if you want both əʊ and ɔ to turn into o, use əʊ|ɔ > o. Multiple bar symbols can be used: əʊ|ɔ|ɒ > o. If you wanted to match əʊ or ɔ at the end of a word, you will need to group the OR section in round brackets: (əʊ|ɔ)#. Without the brackets it will match ɔ# at the end of a word or əʊ anywhere in the word.

Alternatively, square brackets [ ] match any character inside them. For instance, [ɔəʊ] > o changes any of those characters to o. The disadvantage to this method is it treats everything inside the brackets as individual characters. See the difference:


Take home point: If any of your patterns are more than one symbol, don't use square brackets! Some people erroneously think that [əʊ ɔ] works, however this matches ə or ʊ or a space or ɔ.

A carat symbol ^ inside the square brackets matches anything not inside the brackets:



Lookahead allows you to match a pattern but only replace it if it comes before another pattern. Example: you want to change k to c but only if there is an a after it. The lookahead pattern is placed inside brackets with ?= at the beginning, like this (?=a):


Notice how when a is not in a lookahead it gets replaced. We don't want this!


Negative lookahead is the same principle, but the rule is applied if the there is no match ahead of it. It uses the ?! symbol inside brackets:



Lookbehind is same is same principle as lookahead, but checking for a pattern behind the main pattern. It uses ?<= inside brackets. The following example replaces vowels if they come after consonants:


Negative lookbehinds use ?<! inside the brackets. (Note: this may not work for some older browsers. Try latest version of Firefox/Chrome/Edge.)


Shorthand symbols

Vulgar uses various shorthand abbreviations for classes of phonemes, such as C for "any consonant" or V for "any vowel". This allow us to simplify some of the previous examples:


Here is a complete list:

Shorthand codeCategory
BBack vowels
DAny IPA letter (does not match diacritics)
ᴰ (superscript D)Any diacritic symbol
EFront vowels
ʟ (small capital L)Any IPA letter (does not match diacritics)
NNasal consonants
U or σSyllable
VVowels, including diphthongs
XAny phoneme


Numbers refer back to whatever was captured inside brackets. The number 1 refers to whatever was matched in the first brackets. The following pattern matches a vowel at the end of the word, and doubles it:


The following pattern matches two consonants in a row and swaps them:


Zero refers to the entire match:


Replace with nothing

Creating a rule with nothing on the right side of the > symbol will simply delete everything on the left side of the rule; [aeiou] > will replace all vowels inside the brackets with nothing. Arabic and Hebrew are examples of languages that do not have letters for their vowels.


Replace any character

The dot symbol . matches any character. The rule . > x would change every character in the word to an x. While this is probably not useful in isolation, it can be useful as part of larger patterns.


Dealing with stress symbols

If you want to make spelling rules that are sensitive to stress, you first need to check the Make spelling rules sensitive to stress symbol option. (The default setting is to apply the RegEx patterns with the stress symbol already removed, so that you don't have to worry about the stress symbol making your patterns more complicated.) Let's say you want stressed a to turn into á, like in Spanish spelling. The stress symbol could come right before an a, as in ˈama, however it could also come before any number of consonants and then an a, as in ˈdrama. To capture any number of consonants you can put all consonants in square brackets and use the star symbol after it: [mdr]*. The star symbol means match any number of whatever is before it, including zero instances. The consonants will need to wrapped inside a Lookbehind group (?<=) so that you don't replace them, and the a will go outside the Lookbehind so that you do replace it. And don't forget about the stress symbol too: (?<=ˈ[mdr]*).


Finally, you will need a second rule to replace stress symbols with nothing.

Non-Latin alphabets

Custom orthography also supports all Unicode alphabets and scripts, such as Japanese, Chinese, Cyrillic, Georgian and even Unicode Emojis.

Order of rules

The order of your custom spelling rules matter. Vulgar will find-and-replace the first spelling rule to a word, then apply the next rule over the top of what it just did. This can be a problem if an IPA symbol appears again in a consonant cluster in a later rule. For instance, the following rules are problematic:

ʃ > sh
tʃ > ch

The intent here is for /tʃ/ to change to ch. However, in a word such a /tʃar/, the first rule will find /ʃ/ and change the orthography to tshar. Then when it moves to the second rule it will fail to find /tʃ/. The easiest solution is to reverse the order of the rules:

tʃ > ch
ʃ > sh

Another solution is to use the single Unicode character versions if it exists, such as ʧ. Lookahead patterns may be another option.