Lecture 4 Outline / September 16th, 2008 ======================================== Homework stuff ============== Questions on blocking vs non-blocking, and how to handle? GET vs POST =========== GET puts queries on the URL POST puts queries on the URL **and** sends them in the body of the HTTP request. We'll talk about how to deal with that *next* week. Multitasking, concurrency, and pre-emptiveness ============================================== How do you do more than one thing at a time on a computer!? The friendly approach: - say, are you done yet? excellent! may I have some CPU? (Windows 3.1) - "non-preemptive" - doesn't work well with "bad actors" (WANT CPU!) or bad programmers - actually *more* complicated to do right, in some ways - e.g. think about your nonblocking code due Thursday... The less friendly approach: the traffic cop (scheduler) - pre-emptive - task scheduler coordinates things - less up-front work for you, but **no guarantees** on order of execution - how almost everything works nowadays The bottom line is that as soon as you start using multitasking, you lose predictability & gain stochasticity (randomness) in your execution. Atomicity: 'atomic' operations are indivisible operations, i.e. they (look like they) take precisely one CPU tick. Do basic operations get interrupted? Imagine a simple algorithm for inserting into a list: :: get slot number (slot = length) add one element slot to list (array.add_one_slot() -> length++) fill slot with data (array[slot] = data) This is not guaranteed to perform correctly as written, in presence of threading! Need to use locks: :: grab the lock: get slot number (slot = length) add one element slot to list (array.add_one_slot() -> length++) fill slot with data (array[slot] = data) release lock Locking of resources: Locks provide a guaranteed atomic way to execute a piece of code nonconcurrently. Fine grained vs coarse-grained locks: - more locks are better, because they let you subdivide tasks and do more concurrently - fewer locks are better because they are easier to manage and understand Deadlock - two tasks are waiting for each other to release a lock. (*always release the lock*; try/finally!) This can get arbitrarily complicated. There be dragons. Avoid at all costs until you know what you're doing (-> never use). Processes vs Threads ==================== Processes: - heavyweight concurrency - less shared state between processes (disk, etc., but not memory) os.fork() on UNIX; also see 'multiprocessing' in Python 2.6. How do you communicate between processes? Through the OS. But it's slow: Interprocess Communication (IPC) techniques: - shared memory - pipes - domain or network sockets - files Threads: - lighterweight concurrency - shared memory space **DANGER WILL ROBINSON** - *atomicity* suddenly becomes important 'threading' module on Python 2.2+. create, start, do stuff, 'join' Avoiding shared state ===================== Concurrency and stochasticity will only bite you when you need to *communicate* either explicitly (IPC) or implicitly. Any **writeable**, **shared state** must be protected: "threadsafe". Module globals, class globals, shared namespaces... anything that can be accessed outside of the function that creates it is *bad*. Local variables are isolated and hence safe, until you say otherwise. (This is why reentrancy came up in iterators: only the calling process could get a handle on the iterator object!) Python, concurrency, atomicity, and locking =========================================== Python->C calls are automatically atomic, so most "low-level" Python data type operations are atomic. All other Python calls are not, so e.g. functions in threads may execute in parallel (that's the point!) ...well, not really. It's complicated. Digression: I/O vs CPU-intensive multitasking - blocking or slow I/O calls are one reason to use multitasking - executing CPU-intensive tasks in parallel are another Python multithreading is good for the first, NOT good for the second, because only specially written C code actually runs in parallel. --- Bottom line: concurrency is not for the faint-hearted and is best done as simply as possible.