Projects
After you have acquired basic competencies and skills in Python programming, it may be good exercise to tackle slightly more complicated language-related problems. Here are some possible small projects for (possibly collaborative) programming in Python.
- Spelling Bee Links to an external site. solver. Read a large word list from the web. Find all words of at least four letters containing one obligatory letter and six optional letters.
- Wordle Links to an external site. helper. Read a large word list from the web. Search the list for words that fit a pattern and that contain certain letters but do not contain some other letters.
- Anagrams. Given a word, find all its anagrams in a large word list.
- Alphabetized words: In a large word list, find all words of at least five letters that occur in alphabetized order, such as accent and glossy. Variation: find all words that are reverse alphabetized, such as zonked and toffee.
- Morse code Links to an external site.. Read this table of Morse code equivalents Links to an external site. into a dataframe. Convert to a dict for encoding and another for decoding. Make a program that transliterates strings to and from Morse code.
- Poetry generator. Compute n-grams from some existing poetry Links to an external site.. Then generate poems from random new combinations of the n-grams. Trigrams should work well.
- Phonological rules. Implement some phonological rules for a language. Some examples are forward and backward assimilation, final devoicing, intervocalic lenition, epenthesis, etc.
- Naive translation. Make a dict with translations for a number of words. Provide word-by-word translations of sentences with those words. A slightly better but more challenging approach uses translation of n-grams.
- Weather forecast. Make a weather forecast for a place, in plain English or another language, based on weather data obtained for that place through a weather service such as the Met API Links to an external site. (may be complicated).
- Braille
Links to an external site.. Make a simple Braille encoder and decoder based on the mapping between the strings a and b below. A straightforward transliteration can be done really easily with str.translate
Links to an external site. (see String operations). Standard Braille is however a bit more complicated
Links to an external site. because there are also characters for frequent words and there are contractions, so you can try to handle some of those features, if you like a challenge.
a = " A1B'K2L@CIF/MSP\"E3H9O6R^DJG>NTQ,*5<-U8V.%[$+X!&;:4\\0Z7(_?W]#Y)=" b = "⠀⠁⠂⠃⠄⠅⠆⠇⠈⠉⠊⠋⠌⠍⠎⠏⠐⠑⠒⠓⠔⠕⠖⠗⠘⠙⠚⠛⠜⠝⠞⠟⠠⠡⠢⠣⠤⠥⠦⠧⠨⠩⠪⠫⠬⠭⠮⠯⠰⠱⠲⠳⠴⠵⠶⠷⠸⠹⠺⠻⠼⠽⠾⠿"
- Using the following list of numerals for ten languages, make a function that computes the distance between two languages as the average Levenshtein distance between corresponding numerals. Then make a 10 x 10 distance matrix with the distance for every pair of languages. Finally present the matrix as a heatmap. NB. The use of orthography has limited value. A more advanced project would take into account phonetic similarity
Links to an external site. between words.
# adapted (with small corrections) from Folgert Karsdorp numerals = [ ["one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten"], ["een", "twee", "drie", "vier", "vijf", "zes", "zeven", "acht", "negen", "tien"], ["ien", "twa", "trije", "fjouwer", "fiif", "seis", "sân", "acht", "njoggen", "tsien"], ["eins", "zwei", "drei", "vier", "funf", "sechs", "sieben", "acht", "neun", "zehn"], ["én", "to", "tre", "fire", "fem", "seks", "sju", "åtte", "ni", "ti"], ["én", "to", "tre", "fire", "fem", "seks", "syv", "otte", "ni", "ti"], ["ett", "två", "tre", "fyra", "fem", "sex", "sju", "åtta", "nio", "tio"], ["uno", "dos", "tres", "cuatro", "cinco", "seis", "siete", "ocho", "nueve", "diez"], ["un", "deux", "trois", "quatre", "cinq", "six", "sept", "huit", "neuf", "dix"], ["uno", "due", "tre", "quattro", "cinque", "sei", "sette", "otto", "nove", "dieci"]] languages = ['English', 'Dutch', 'Frisian', 'German', 'Norwegian', 'Danish', 'Swedish', 'Spanish', 'French', 'Italian']