Python
Python Links to an external site. is an interactive programming language. Python3.7 or later is recommended for this course. Python is relatively easy to learn but powerful. It is probably the most popular programming language for data science. The materials offered in this course should provide a sufficient introduction for getting started with elementary text processing and basic analyses and visualizations of datasets. Among the good reasons for choosing Python are the following.
- Python has suitable basic operations on strings that represent text.
- Current versions of Python are compatible with the Unicode Links to an external site. standard and encodes text strings in UTF-8.
- Many available packages provide Python modules for different text processing tasks, such as tokenization, word frequencies, lemmatization, document classification, word vectors and more.
- Python has good support for working with tabular data (dataframes), including sorting, counting, selecting, visualization, import and export, etc.
These aspects are exploited in this course. Several advanced topics, such as objects and classes, are not covered. If you want to know in detail how something in Python works, you can look it up in the online documentation of the Python language.
Jupyter notebooks and platforms
We will mainly use Python in the form of Jupyter notebooks, which combine programs and explanatory text.
The notebooks which are linked from this course are provided on the online platform Google Colaboratory (Lenker til en ekstern side.) (Colab), which offers free editing and running of notebooks. The use of Colab requires a Google account. This free service has some limitations on runtime and memory use, but the notebooks here should run fine.
There are other external services for editing and running notebooks, such as Kaggle Links to an external site., Binder or Deepnote Links to an external site.. So you can download a Colab notebook, import it into another platform, and run it there. These free services also has some usage limitations. If you need more power, you can apply to NIRD Links to an external site..
Notebooks and Python code can also be downloaded, edited and run locally on an integrated development environment (IDE) such as the following:
- Visual Studio Code Links to an external site. is an IDE for many languages including Python; it supports Jupyter notebooks.
- Jupyter.org Links to an external site. provides several IDE versions to run notebooks in a web browser, including JupyterLab, VoilĂ and others.
- Spyder is an IDE for Python code, but not notebooks, currently.
- Anaconda (Lenker til en ekstern side.) is an environment and package manager which provides Spyder and other applications.
Currently (spring 2025) the easiest is probably to use Google Colab, but be aware that this is a free service with resource limitations. If you require technical assistance with the installation of software on your own computer, please contact your IT support team. If you are a student or staff member at UiB, use hjelp.uib.no.
Packages with modules
A good reason for programming in Python is that there are many modules available for doing NLP, which makes the task easier, because modules provide extra functionality. Many modules are available in Colab. If you install Python on your own machine, the following are often included in Python distributions:
- re Links to an external site., for regular expressions
- numpy Links to an external site., for arrays and numerical processing
We will also use the following modules, which are not always included in Python distributions, but are available in Google Colab: Links to an external site.
- pandas Links to an external site., for dataframes
- nltk Links to an external site., the natural language toolkit
There are several more specialized libraries, such as sklearn Links to an external site. (SciKit Learn), for machine learning and lingpy Links to an external site. for historical linguistics, but these will not be used in this course.
If you run Python locally, packages with Python modules can be locally installed
Links to an external site., for instance with pip
, or with an integrated package manager such as Anaconda (Lenker til en ekstern side.). Again, if you want to install things on your own machine and you require technical assistance, please contact your IT support team, not the teacher.