VÜlgÅr. A language generator

What is RegEx?

RegEx (regular expression) is a programming syntax for matching patterns in text. In Vulgar, RegEx can be implemented in the custom spelling option to mimic the spelling idiosyncrasies of natural languages. It can also be used in the illegal combinations field to prohibit combinations of phonemes in the word generations process.

Replace a symbol at the beginning of a word only

RegEx uses the carat symbol ^ to signify the beginning of line of text. The pattern ^dʒ will match /dʒ/ at the beginning of a word, but not the middle or end. Thus, the rule ^dʒ > j would change /dʒudʒ/ to judʒ.

Try it yourself:

Input Rule Output

Replace a symbol at the end of a word only

The dollar sign $ is used to match characters at the end of a line. The rule dʒ$ > dge changes /dʒudʒ/ to dʒudge. You could then combine this with the first rule to turn /dʒudʒ/ into judge.

Input Rule Output

Replace multiple symbols

Square brackets [] are used to match any character inside them. For instance, [ɐɑ] > a means both /ɐ/ and /ɑ/ turn into a. You can put as many characters as you want inside the square brackets, but it will treat everything as an individual character. If you want multiple character patterns you can use the use the vertical line symbol | to form an 'or' expression. For example, you want both /ɔ/ and /əʊ/ to turn into o: ɔ|əʊ > o.

Input Rule Output

Negative matching

The carat symbol inside the square brackets [^] matches anything except whatever else is inside the brackets:

Input Rule Output

Replace a symbol with itself & something else

Say you wanted to put a x after every vowel. This means matching multiple symbols without replacing them, but by adding to them. You can retain the matched pattern by using $&. For example, [aeiou] > $&x will find any vowel and put x on the end.

Input Rule Output

To mimic French spelling we can modify this rule to be applied to the end of a word only:

Input Rule Output

Lookahead

Lookahead allows you to match a pattern and then look ahead of that match to see if it matches another condition, but only replace the first pattern. Example: you want to change /k/ to c but only if there is an /a/, /o/ or /u/ after it. So you want to find /k/, lookahead for /a/, /o/ or /u/, then go back and change the /k/ only. The lookahead pattern is placed inside brackets with ?= at the beginning, like this (?=[aou]):

Input Rule Output

Negative lookahead is the same principle, but the rule is applied if the there is no match. It uses the ?! symbol inside brackets:

Input Rule Output

Lookbehind

Lookbehind is same is same principle as lookahead, but looking behind. It uses ?<= inside the brackets. The following examples combines multiple rules we have learned so far. In plain English it says: find an /l/ or an /m/ [lm], look behind for a vowel (?<=[aeiou]), and ahead for a vowel (?=[aeiou]), and then change it to itself twice $&$&:

Input Rule Output

Note: negative lookbehind is not supported.

Replace with nothing

Creating a rule with nothing on the right side of the > symbol will simply delete everything on the left side of the rule; [aeiou] > will replace all vowels inside the brackets with nothing. Arabic and Hebrew are examples of languages that do not have letters for their vowels.

Input Rule Output

Replace everything

All orthography can be deleted using the dot symbol . which signifies any character. The rule . > x translates to take any character and replace it with x.

Input Rule Output

Dealing with stress symbols

If you want to make spelling rules that are sensitive to stress, you first need to check the Make spelling rules sensitive to stress symbol option. (The default setting is to apply the RegEx patterns with the stress symbol already removed, because if you don't care about the stress the stress symbols just get in the way.) Let's say you want stressed /a/ to turn into á, like in Spanish spelling. The stress symbol could simply come right before the /a/, as in /ˈama/, however it could also come before any number of consonants, as in /ˈdrama/. To capture any number of consonants you can put all consonants in square brackets and use the star symbol after it: [mdr]*. The star symbol means match any number of whatever is before it, includng zero instances. The consonants will need to wrapped inside a Lookbehind group (?<=) so that you don't replace them, and the /a/ will go outside the Lookbehind so that you do replace it. And don't forget about the stress symbol too: (?<=ˈ[mdr]*). Finally, you will need a second rule to replace stress symbols with nothing.

Input Rule Output

Non-Latin alphabets

Custom orthography also supports all Unicode alphabets and scripts, such as Japanese, Chinese, Cyrillic, Georgian and even Unicode Emojis.

Create katakana orthography:

ka > カ
ki > キ
ku > ク
ke > ケ
ko > コ
sa > サ
si > シ
su > ス
se > セ
so > ソ
ta > タ
ti > チ
tu > ツ
te > テ
to > ト
na > ナ
ni > ニ
nu > ヌ
ne > ネ
no > ノ
ha > ハ
hi > ヒ
hu > フ
he > ヘ
ho > ホ
ma > マ
mi > ミ
mu > ム
me > メ
mo > モ
ja > ヤ
ju > ユ
jo > ヨ
ra > ラ
ri > リ
ru > ル
re > レ
ro > ロ
wa > ワ
wi > ヰ
we > ヱ
wo > ヲ
a > ア
i > イ
u > ウ
e > エ
o > オ
n > ン

Order of rules

The order of your custom spelling rules matter. Vulgar will find-and-replace the first spelling rule to a word, then apply the next rule over the top of what it just did. This can be a problem if an IPA symbol appears again in a consonant cluster in a later rule. For instance, the following rules are problematic:

ʃ > sh
tʃ > ch

The intent is for /tʃ/ to change to ch. However in word such a /tʃar/, the first rule will find /ʃ/ and change the orthography to tshar. Then when it moves to the second rule it will fail to find /tʃ/. One solution to this is the flip the rules:

tʃ > ch
ʃ > sh

Created and designed in Sydney, Australia.
Vulgarlang.com © 2018.