Questions on blocking vs non-blocking, and how to handle?
GET puts queries on the URL
POST puts queries on the URL and sends them in the body of the HTTP request.
We'll talk about how to deal with that next week.
How do you do more than one thing at a time on a computer!?
The bottom line is that as soon as you start using multitasking, you lose predictability & gain stochasticity (randomness) in your execution.
Atomicity:
'atomic' operations are indivisible operations, i.e. they (look like they) take precisely one CPU tick.
Do basic operations get interrupted? Imagine a simple algorithm for inserting into a list:
get slot number (slot = length) add one element slot to list (array.add_one_slot() -> length++) fill slot with data (array[slot] = data)
This is not guaranteed to perform correctly as written, in presence of threading! Need to use locks:
grab the lock: get slot number (slot = length) add one element slot to list (array.add_one_slot() -> length++) fill slot with data (array[slot] = data) release lock
Locking of resources:
Locks provide a guaranteed atomic way to execute a piece of code nonconcurrently.
Deadlock - two tasks are waiting for each other to release a lock.
(always release the lock; try/finally!)
This can get arbitrarily complicated. There be dragons. Avoid at all costs until you know what you're doing (-> never use).
Processes:
- heavyweight concurrency
- less shared state between processes (disk, etc., but not memory)
os.fork() on UNIX; also see 'multiprocessing' in Python 2.6.
How do you communicate between processes? Through the OS. But it's slow:
Threads:
- lighterweight concurrency
- shared memory space DANGER WILL ROBINSON
- atomicity suddenly becomes important
'threading' module on Python 2.2+.
create, start, do stuff, 'join'
Concurrency and stochasticity will only bite you when you need to communicate either explicitly (IPC) or implicitly.
Any writeable, shared state must be protected: "threadsafe".
Module globals, class globals, shared namespaces... anything that can be accessed outside of the function that creates it is bad. Local variables are isolated and hence safe, until you say otherwise.
(This is why reentrancy came up in iterators: only the calling process could get a handle on the iterator object!)
Python->C calls are automatically atomic, so most "low-level" Python data type operations are atomic.
All other Python calls are not, so e.g. functions in threads may execute in parallel (that's the point!)
...well, not really. It's complicated.
Digression: I/O vs CPU-intensive multitasking
- blocking or slow I/O calls are one reason to use multitasking
- executing CPU-intensive tasks in parallel are another
Python multithreading is good for the first, NOT good for the second, because only specially written C code actually runs in parallel.
---
Bottom line: concurrency is not for the faint-hearted and is best done as simply as possible.