- 1. Process vs Thread
- 2. Process
- 3. Thread
- 4. Python’s GIL problem
- 5. When to use threads vs processes?
- 6. Reference
Intro to Threads and Processes in Python
A process is an instance of program (e.g. Jupyter notebook, Python interpreter). Processes spawn threads (sub-processes) to handle subtasks like reading keystrokes, loading HTML pages, saving files. Threads live inside processes and share the same memory space.
Example: Microsoft Word
When you open Word, you create a process. When you start typing, the process spawns threads: one to read keystrokes, another to display text, one to autosave your file, and yet another to highlight spelling mistakes. By spawning multiple threads, Microsoft takes advantage of idle CPU time (waiting for keystrokes or files to load) and makes you more productive.
- Created by the operating system to run programs
- Processes can have multiple threads
- Two processes can execute code simultaneously in the same python program
- Processes have more overhead than threads as opening and closing processes takes more time
- Sharing information between processes is slower than sharing between threads as processes do not share memory space. In python they share information by pickling data structures like arrays which requires IO time.
- Threads are like mini-processes that live inside a process
- They share memory space and efficiently read and write to the same variables
- Two threads cannot execute code simultaneously in the same python program (although there are workarounds*)
The CPU, or processor, manages the fundamental computational work of the computer. CPUs have one or more cores, allowing the CPU to execute code simultaneously.
With a single core, there is no speedup for CPU-intensive tasks (e.g. loops, arithmetic). The OS switches back and forth between tasks executing each one a little bit at a time. This is why for small operations (e.g. downloading a few images), multitasking can sometimes hurt your performance. There is overhead associated with launching and maintaining multiple tasks.
CPython (the standard python implementation) has something called the GIL (Global Interpreter Lock), which prevent two threads from executing simultaneously in the same program. Some people are upset by this, while others fiercely defend it. There are workarounds, however, and libraries like Numpy bypass this limitation by running external code in C.
Processes speed up Python operations that are CPU intensive because they benefit from multiple cores and avoid the GIL.
Threads are best for IO tasks or tasks involving external systems because threads can combine their work more efficiently. Processes need to pickle their results to combine them which takes time.
Threads provide no benefit in python for CPU intensive tasks because of the GIL.
For certain operations like Dot Product, Numpy works around Python’s GIL and executes code in parallel.