Course Data

Course name: High-Performance Computing with Python
Course length: 5 days
Remote: Yes
Open course: Yes
In-house: Yes
Course ID: HPC
German course here

Course Dates

Location Date Registration
Leipzig October 10 - 14, 2022
Remote October 10 - 14, 2022

Combining Topics

Company courses allow to assemble trainings combining topics from different courses.

Course Topics Overview as PDF

You can download our flyer. It has an overview of all our course topics.

High-Performance Computing with Python

Intended Audience

Programmers, scientists and engineers with basic to intermediate knowledge of Python. This course can be combined with introductory courses (see Recommended Module Combinations) to achieve appropriate Python skills.


Python is a great programming language. It has the reputation of being slow for computational tasks. While this can be true for pure Python programs, there are many tools and libraries that can help you to get very close to the speed of programs written in C or other compiled languages. Because you still get most of the advantages of Python, you can spent more time on experimenting with different algorithms and may have a faster program as if you would have spent the same time in a lower-level language.

This course gives an overview over some tools and libraries for fast computations in Python. It covers the most common tools and helps to get you started on HPC with Python

Course Content

Day 1: Profiling, Algorithms and Parallel Computation


One of the most important steps toward a fast program is profiling to find out where your program spends its time. There are several tools for Python that help you to quantify the run times of your program. The course gives introduction to this topic.


Often the best speed improvements can be achieved by finding a better algorithm. Python offers several data structures that come with efficient algorithms. The course gives an overview of common Python data structures and the run time complexity.


Python comes with the multiprocessing module that allows to distribute calculation over several processes and this way parallelize applications. Its API is closed model after that of the threading module. You will learn how to use multiprocessing.


There are many libraries for Python to do distributed programming for clusters or networked computers. Pyro is a very mature solution. The course introduces version 4 with some examples.

Day 2: Beyond Pure Python

Python is very good glue language to connect existing systems. There is a long tradition to write modules in other languages. There are also some newer developments that increase the usefulness of Python HPC computing. The course presents some of them.


Numba is a new module that still undergoes considerable changes. It allows to compile pure Python code in to machine code via the LLVM. This means, many pure Python algorithms can run as fast as if they would have been written in C. The course shows how numba works and presents some implementations of algorithms.


PyPy is a different implementation that has a Just-in-Time-Complier (JIT). It is full Python 2.7 compliant implementation that has several very innovative features. The course introduces to the work with PyPy.


Wrapping existing Fortran programs becomes much simpler with f2py. You will learn how to use f2py to wrap Fortran programs. Furthermore, the course covers accessing common memory in Fortran modules and calling Python functions from wrapped Fortran.

Day 3 NumPy for Fast Computations

The library NumPy is the defacto standard for the work with arrays. You will get a solid introduction to NumPy and learn some of its more advanced features.

Introduction to NumPy

  • Array construction and array properties

  • Data types

  • Slicing and broadcasting

  • Universal functions

Advanced NumPy

  • Masked arrays

  • Customizing error handling

  • Testing NumPy programs

NumPy and C

  • A look into the implementation of ndarrays

  • Working with ndarrays from C


Numexpr can evaluate numerical expressions such as 5 * a + 3 * b - 2 * c. The evaluations, especially of complex expressions, are faster and use less memory than using NumPy calculations of these exprssiosn.

Numexpr can run evaluation in parallel using multiple cores. It also supports the Math Kernel Library (MKL) for even more speed improvements.

Algorithms and SciPy

Examples of algorithms in NumPy and solutions in SciPy showcase solutions for common numerical problems.

Day 4: Cython for Speed

My first Cython extension

  • using pyximport to quickly (re-)build extension modules

  • using cython.inline() to compile code at runtime

  • building extension modules with distutils

Speeding up Python code with Cython

  • fast access to Python’s builtin types

  • fast looping over Python iterables and C types

  • string processing

  • fast arithmetic

  • incrementally optimizing Cython code

  • multi-threading outside of the GIL (Global Interpreter Lock)

Interfacing with external C code

  • calling into external C libraries

  • building against C libraries

  • writing Python wrapper APIs

  • calling C functions across extension module boundaries

Day 5 Cython and NumPy

Use of Python’s buffer interface from Cython code

  • directly accessing data buffers of other Python extensions

  • retrieving meta data about the buffer layout

  • setting up efficient memory views on external buffers

Implementing fast Cython loops over NumPy arrays

  • looping over NumPy exported buffers

  • implementing a simple image processing algorithm

  • using “fused types” (simple templating) to implement an algorithm once and run it efficiently on different C data types

Use of parallel loops to make use of multiple processing cores

  • building modules with OpenMP

  • processing data in parallel

  • speeding up an existing loop using OpenMP threads

Note: the part on parallel processing requires a C compiler that supports OpenMP, e.g. gcc starting with 4.2, preferably 4.4 or later. It should be readily available in recent installations of both Linux and MacOS-X. Note that recent versions of XCode use the “clang” compiler, which does not support OpenMP. On these systems, please install gcc separately and make sure it can be used from your CPython installation. Users of Microsoft Windows must install the C compiler that was used to build their Python installation, e.g. the VS2008 Express or MinGW for Python 2.7.

Case studies

The participants are encouraged to send in short code examples from their own experience that they would like to see running faster by using Cython. Based on general interest and practicality, one or two of these examples will be examined as a case study. These examples must be available to the teacher at least one week before the course, and must be short but complete executable examples, including sufficient input data for benchmarking. Please be aware that example code that requires a substantial amount of explanation or background knowledge about a specific application domain will not be accepted.


The participants can follow all steps directly on their computers. There are exercises at the end of each unit providing ample opportunity to apply the freshly learned knowledge.


We use our online programing system that contains all needed software. There is no need to install any additional software. A modern internet browser and a decent internet connection will be enough.

Hardware for Open In-Person Trainings

For open trainings at our teaching center you can use your own laptop. Alternatively, we provide teaching computers. Please let us know if you need one in your registration form.

Course Material

Every participant receives comprehensive materials in PDF format that cover the whole course content as well as all source code.

How to contact us:
Python Academy & Co. KG
Zur Schule 20
04158 Leipzig / Germany
Tel:+49 341 260 3370
Fax:+49 341 520 4495
How to contact us:
Python Academy & Co. KG
Zur Schule 20
04158 Leipzig / Germany
Tel:+49 341 260 3370
Fax:+49 341 520 4495