Vulgar: Language generator

Vulgar is capable of generating 100 quadrillion unique and usable conlangs. How does it achieve this? Using pseudo-randomness!

Vulgar begins with a random seed number of 17 digits*. This number is then run through a formula that generates many many thousands of other random numbers. It's Pseudo-random because, although there is nothing truly random about them (i.e. they are determined by an exact mathematical formula), every decimal number between 0 and 1 is equally likely to be produced, and there is no obvious pattern to the human eye.

These numbers are used to make tens of thousands of decisions about which phonemes to select to build words, and what grammar rules to generate based on pre-defined thresholds.

*10 to the power of 17 = 100 quadrillion

What does it generate?

The Pro version of Vulgar generates about 4000 unique words and matches them to a list of English's 4000 most common words. Shorter, more common English words are more likely to matched to a shorter conlang word. However the program is more than just a one-to-one mapping of conlang word to English word. Vulgar also assigns words polysemy (more than one related meaning) inspired by examples from real world languages. For example:

- There is a 50% chance in every generated language the word for ‘tongue’ also means ‘language’
- There is a 60% chance in every generated language the word for ‘white’ is also the word for ‘blank’
- There is a 10% chance in every generated language the word for ‘air’ is also the word for ‘wind’
- There is a 30% chance in every generated language the word for ‘girl’ is also the word for ‘girlfriend’

There are about 150 polyseme possibilities in the current version of Vulgar.

Vulgar also creates related derived words with affix rules. For instance:

utu /ˈʏtyː/ adj. violent
utua /ˈʏtyːa/ n. violence (suffix -a changes adjectives into nouns)

pson /pʂon/ n. paint; v. paint
psonru /ˈpʂonru/ n. painter (suffix -ru changes nouns into the doer of the noun)

ootsui /ˈɔotsui/ v. divide
ootsuilb /ˈɔotsuilb/ n. division (suffix -lb changes verbs into nouns)

There are also about 150 derived words in the current version of Vulgar.

What is the default 4000 word list?

Vulgar's default word list the comes from an English word frequency list by linguist Mark Davies at Wordfrequency.info. Davies’ research groups inflected English words into their non-inflected dictionary forms, example: ‘is’, ‘was’, and ‘were’ are counted as ‘be’; ‘dogs’ is counted as ‘dog’.

Because this research comes from a corpus of contemporary American English, a certain level of artistic licence has been taken to tailor the vocabulary towards a more ‘fantasy fiction’ genre. Certain highly culturally specific words have been removed (e.g. ‘Catholic’, ‘Republican’), as well as some highly modern terminology (e.g. ‘internet’).

The both the 2000 and 4000 word versions cover the Swadesh list, which is often used in the conlang community as a starting point for basic vocabulary.

Fun fact: The 2000th word is ‘cure’ and the 4000th word is ‘cellar’!

Grammar

The grammar output of Vulgar draws on statistics from real world languages. Much of this data comes the excellent research at World Atlas of Language Structures.