What is RegEx?

RegEx (regular expression) is a programming syntax for matching patterns in text. In Vulgar, RegEx can be implemented in the custom orthography setting to make more complex spelling idiosyncrasies for your language.

Matching a character at the beginning of a word

RegEx uses the carat symbol ^ to signify the beginning of line of text. The pattern ^d will match a 'd' phoneme at the beginning of a word only. An orthography rule such as ^d > D would change 'dog' into 'Dog', while 'god' would remain unchanged.

Matching a character at the end of a word

The dollar sign symbol $ is used to match characters at the end of a line. A rule such as dʒ$ > ge would change 'dʒ' at the end of a word into 'ge', but not at the beginning or middle of a word.

Matching a set of a characters

Square brackets [] are used to match any character inside it. For instance, [ɐɑ] > a means both 'ɐ' and 'ɑ' turn into 'a'. You can put as many characters as you want inside the square brackets, but it will treat everything as an individual character. If you want both 'ɔ' and 'əʊ' to turn into 'o' you can use the use the vertical line symbol | to form an 'or' expression, for example ɔ|əʊ > o. Alternatively, you can simply list rules separately:

ɔ > o
əʊ > o

Order of rules

The order of your custom orthography rules matter. Vulgar will find-and-replace the first orthography rule to every word in your language, everywhere it can find a match, before moving on to the next rule. This can be a problem if an IPA symbol appears again in a later rule. For instance, the following rules are problematic:

ʃ > sh
tʃ > ch

In word such a /tʃar/, the first rule will find ʃ and change the orthography to tshar. Then when it moves to the second rule it will fail to find the that you intended to be replaced with 'ch'. A solution to this is the flip the rules:

tʃ > ch
ʃ > sh

Alternatively you could use negative matching syntax, which is signified by a carat inside square brackets. The rule [^t]ʃ > sh translates to: replace ʃ with sh, unless there's a t before it.

Other tricks

Replace with nothing

Creating a rule with nothing on the right side of the > symbol will simply delete everything on the left side of the rule. Thus, the rule [aeiou] > will replace all vowels inside the brackets with nothing. All orthography can be deleted using the dot symbol . which, in RegEx, signifies any character. The rule . > translates to: take any character and replace it with nothing.

Non-Latin alphabets

Custom orthography also supports non-Latin alphabets and scripts, such as Japanese, Chinese, Cyrillic and Georgian symbols, just to name a few.

Create katakana orthography:

ka > カ
ki > キ
ku > ク
ke > ケ
ko > コ
sa > サ
si > シ
su > ス
se > セ
so > ソ
ta > タ
ti > チ
tu > ツ
te > テ
to > ト
na > ナ
ni > ニ
nu > ヌ
ne > ネ
no > ノ
ha > ハ
hi > ヒ
hu > フ
he > ヘ
ho > ホ
ma > マ
mi > ミ
mu > ム
me > メ
mo > モ
ja > ヤ
ju > ユ
jo > ヨ
ra > ラ
ri > リ
ru > ル
re > レ
ro > ロ
wa > ワ
wi > ヰ
we > ヱ
wo > ヲ
a > ア
i > イ
u > ウ
e > エ
o > オ
n > ン

Created and designed in Sydney, Australia.
Vulgarlang.com © 2017.