Lec 27: Synchronization II - Cornell University

Lec 27: Synchronization II - Cornell University

Synchronization II Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University P&H Chapter 2.11 Administrivia Pizza party: PA3 Games Night Friday, April 27th, 5:00-7:00pm Location: Upson B17 Prelim3 Review Today, Tuesday, April 24th, 5:30-7:30pm Location: Hollister 110 Prelim 3

Thursday, April 26th, 7:30pm Location: Olin 155 PA4: Final project out next week Demos: May 14-16 Will not be able to use slip days 2 Goals for Today Synchronization Threads and processes Critical sections, race conditions, and mutexes Atomic Instructions

HW support for synchronization Using sync primitives to build concurrency-safe data structures Cache coherency causes problems Locks + barriers Language level synchronization 3 Synchronization Two processors sharing an area of memory P1 writes, then P2 reads Data race if P1 and P2 dont synchronize Result depends of order of accesses

Hardware support required Atomic read/write memory operation No other access to the location allowed between the read and write Could be a single instruction E.g., atomic swap of register memory (e.g. ATS, BTS; x86) Or an atomic pair of instructions (e.g. LL and SC; MIPS) 4 Synchronization in MIPS Load linked: LL rt, offset(rs) Store conditional: SC rt, offset(rs) Succeeds if location not changed since the LL Returns 1 in rt

Fails if location is changed Returns 0 in rt Example: atomic swap (to test/set lock variable) try: MOVE $t0,$s4 ;copy exchange value LL $t1,0($s1) ;load linked SC $t0,0($s1) ;store conditional BEQZ $t0,try ;branch store fails MOVE $s4,$t1 ;put load value in $s4 5 Programming with Threads Need it to exploit multiple processing units to provide interactive applications to parallelize for multicore

to write servers that handle many clients Problem: hard even for experienced programmers Behavior can depend on subtle timing differences Bugs may be impossible to reproduce Needed: synchronization of threads 6 Programming with Threads Concurrency poses challenges for: Correctness Threads accessing shared memory should not interfere with each other Liveness Threads should not get stuck, should make forward progress

Efficiency Program should make good use of available computing resources (e.g., processors). Fairness Resources apportioned fairly between threads 7 Two threads, one counter Example: Web servers use concurrency Multiple threads handle client requests in parallel. Some shared state, e.g. hit counts: each thread increments a shared counter to track number of hits

LW R0, hitsloc hits = hits + 1; ADDI R0, r0, 1 SW R0, hitsloc What happens when two threads execute concurrently? 8 Shared counters Possible result: lost update! hits = 0 T2

T1 time lw (0) addu/sw: hits = 0 + 1 lw (0) addu/sw: hits hits = 1 Timing-dependent failure race condition =0+1

hard to reproduce Difficult to debug 9 Race conditions Def: timing-dependent error involving access to shared state Whether it happens depends on how threads scheduled: who wins races to instruction that updates state vs. instruction that accesses state Races are intermittent, may occur rarely Timing dependent = small changes can hide bug A program is correct only if all possible schedules are safe Number of possible schedule permutations is huge Need to imagine an adversary who switches contexts at the worst possible time

10 Critical sections To eliminate races: use critical sections that only one thread can be in Contending threads must wait to enter time T1 CSEnter(); Critical section CSExit(); T1 T2

CSEnter(); Critical section CSExit(); T2 11 Mutexes Critical sections typically associated with mutual exclusion locks (mutexes) Only one thread can hold a given mutex at a time Acquire (lock) mutex on entry to critical section Or block if another thread already holds it Release (unlock) mutex on exit Allow one waiting thread (if any) to acquire & proceed pthread_mutex_init(&m);

pthread_mutex_lock(&m); hits = hits+1; pthread_mutex_unlock(&m); T1 pthread_mutex_lock(&m); hits = hits+1; pthread_mutex_unlock(&m); T2 12 Mutexes Q: How to implement critical section in code? A: Lots of approaches. Mutual Exclusion Lock (mutex)

lock(m): wait till it becomes free, then lock it unlock(m): unlock it safe_increment() { pthread_mutex_lock(&m); hits = hits + 1; pthread_mutex_unlock(&m) } 13 Hardware Support for Synchronization 14 Synchronization in MIPS Load linked: LL rt, offset(rs)

Store conditional: SC rt, offset(rs) Succeeds if location not changed since the LL Returns 1 in rt Fails if location is changed Returns 0 in rt Example: atomic swap (to test/set lock variable) try: MOVE $t0,$s4 ;copy exchange value LL $t1,0($s1) ;load linked SC $t0,0($s1) ;store conditional BEQZ $t0,try ;branch store fails MOVE $s4,$t1 ;put load value in $s4 15 Mutex from LL and SC

Linked load / Store Conditional mutex_lock(int *m) { while(test_and_test(m)){} } int test_and_set(int *m) { old = *m; *m = 1; return old; } 16 Mutex from LL and SC Linked load / Store Conditional mutex_lock(int *m) { while(test_and_test(m)){} }

int test_and_set(int *m) { LI $t0, 1 LL $t1, 0($a0) SC $t0, 0($a0) MOVE $v0, $t1 } 17 Mutex from LL and SC Linked load / Store Conditional mutex_lock(int *m) { test_and_set: LI $t0, 1 LL $t1, 0($a0) BNEZ $t1, test_and_set SC $t0, 0($a0) BEQZ $t0, test_and_set

} mutex_unlock(int *m) { *m = 0; } 18 Mutex from LL and SC Linked load / Store Conditional mutex_lock(int *m) { test_and_set: LI $t0, 1 LL $t1, 0($a0) BNEZ $t1, test_and_set SC $t0, 0($a0) BEQZ $t0, test_and_set }

mutex_unlock(int *m) { SW $zero, 0($a0) } 19 Alternative Atomic Instructions Other atomic hardware primitives - test and set (x86) - atomic increment (x86) - bus lock prefix (x86) 20 Alternative Atomic Instructions Other atomic hardware primitives - test and set (x86)

- atomic increment (x86) - bus lock prefix (x86) - compare and exchange (x86, ARM deprecated) - linked load / store conditional (MIPS, ARM, PowerPC, DEC Alpha, ) 21 Synchronization Synchronization techniques clever code must work despite adversarial scheduler/interrupts used by: hackers also: noobs disable interrupts used by: exception handler, scheduler, device drivers,

disable preemption dangerous for user code, but okay for some kernel code mutual exclusion locks (mutex) general purpose, except for some interrupt-related cases 22 Using synchronization primitives to build concurrency-safe datastructures 23 Broken invariants Access to shared data must be synchronized goal: enforce datastructure invariants

// invariant: // data is in A[h t-1] char A[100]; int h = 0, t = 0; head 1 tail 2 3 // producer: add to list tail // consumer: take from list head void put(char c) {

char get() { A[t] = c; while (h == t) { }; t++; char c = A[h]; } h++; return c; } 24 Protecting an invariant // invariant: (protected by m) // data is in A[h t-1] pthread_mutex_t *m = pthread_mutex_create(); char A[100]; int h = 0, t = 0;

// consumer: take from list // producer: add to list tail head void put(char c) { char get() { pthread_mutex_lock(m); pthread_mutex_lock(m); A[t] = c; while(h == t) {} t++; char c = A[h]; pthread_mutex_unlock(m); h++; } pthread_mutex_unlock(m); return c; Rule of thumb: all updates}that can affect

invariant become critical sections 25 Guidelines for successful mutexing Insufficient locking can cause races Skimping on mutexes? Just say no! Poorly designed locking can cause deadlock P1: lock(m1); P2: lock(m2); lock(m2); lock(m1); know why you are using mutexes! acquire locks in a consistent order to avoid cycles use lock/unlock like braces (match them lexically) lock(&m); ; unlock(&m)

watch out for return, goto, and function calls! watch out for exception/error conditions! 26 Cache Coherency causes yet more trouble 27 Remember: Cache Coherence Recall: Cache coherence defined... Informal: Reads return most recently written value Formal: For concurrent processes P1 and P2 P writes X before P reads X (with no intervening writes) read returns written value P1 writes X before P2 reads X read returns written value

P1 writes X and P2 writes X all processors see writes in the same order all see the same final value for X 28 Relaxed consistency implications Ideal case: sequential consistency Globally: writes appear in interleaved order Locally: other cores writes show up in program order In practice: not so much write-back caches sequential consistency is tricky writes appear in semi-random order locks alone dont help * MIPS has sequential consistency; Intel does not 29

Acquire/release Memory Barriers and Release Consistency Less strict than sequential consistency; easier to build One protocol: Acquire: lock, and force subsequent accesses after Release: unlock, and force previous accesses before P1: ... acquire(m); A[t] = c; t++; release(m); P2: ... acquire(m);

A[t] = c; t++; unlock(m); Moral: cant rely on sequential consistency (so use synchronization libraries) 30 Are Locks + Barriers enough? 31 Beyond mutexes Writers must check for full buffer & Readers must check if for empty buffer ideal: dont busy wait go to sleep instead char get() { while(empty) {}

acquire(L); char c = A[h]; head h++; last==head release(L); return c; } empty 32 Beyond mutexes Writers must check for full buffer & Readers must check if for empty buffer ideal: dont busy wait go to sleep instead char get() {

char get() { t) { }; acquire(L); while (h == acquire(L); while (h == t) { }; acquire(L); char char c c = = A[h]; A[h]; h++; h++; release(L);

release(L); return return c; c; } } head last==head empty Dilemma: Have to check while holding lock, 33 Beyond mutexes Writers must check for full buffer

& Readers must check if for empty buffer ideal: dont busy wait go to sleep instead char get() { char get() { acquire(L); acquire(L); while (h == t) { }; char char c c = = A[h]; A[h]; h++; h++; release(L); release(L);

return return c; c; } } Dilemma: Have to check while holding lock, but cannot wait while hold lock 34 Beyond mutexes Writers must check for full buffer & Readers must check if for empty buffer ideal: dont busy wait go to sleep instead char get() { do {

acquire(L); empty = (h == t); if (!empty) { c = A[h]; h++; } release(L); } while (empty); return c; } 35 Language-level Synchronization 36

Condition variables Use [Hoare] a condition variable to wait for a condition to become true (without holding lock!) wait(m, c) : atomically release m and sleep, waiting for condition c wake up holding m sometime after c was signaled signal(c) : wake up one thread waiting on c broadcast(c) : wake up all threads waiting on c POSIX (e.g., Linux): pthread_cond_wait, pthread_cond_signal, pthread_cond_broadcast 37 Using a condition variable wait(m, c) : release m, sleep until c, wake up holding m signal(c) : wake up one thread waiting on c

cond_t *not_full = ...; cond_t *not_empty = ...; mutex_t *m = ...; void put(char c) { lock(m); while ((t-h) % n == 1) wait(m, not_full); A[t] = c; t = (t+1) % n; unlock(m); signal(not_empty); } char get() { lock(m); while (t == h)

wait(m, not_empty); char c = A[h]; h = (h+1) % n; unlock(m); signal(not_full); return c; } 38 Monitors A Monitor is a concurrency-safe datastructure, with one mutex some condition variables some operations All operations on monitor acquire/release mutex

one thread in the monitor at a time Ring buffer was a monitor Java, C#, etc., have built-in support for monitors 39 Java concurrency Java objects can be monitors synchronized keyword locks/releases the mutex Has one (!) builtin condition variable o.wait() = wait(o, o) o.notify() = signal(o) o.notifyAll() = broadcast(o) Java wait() can be called even when mutex is not held. Mutex not held when awoken by signal(). Useful?

40 More synchronization mechanisms Lots of synchronization variations (can implement with mutex and condition vars.) Reader/writer locks Any number of threads can hold a read lock Only one thread can hold the writer lock Semaphores N threads can hold lock at the same time Message-passing, sockets, queues, ring buffers, transfer data and synchronize 41

Summary Hardware Primitives: test-and-set, LL/SC, barrier, ... used to build Synchronization primitives: mutex, semaphore, ... used to build Language Constructs: monitors, signals, ... 42

Recently Viewed Presentations

  • Model United Nations - Miss Mallo

    Model United Nations - Miss Mallo

    Model United Nations. Rules and procedure. MUN is basically a model version of the actual United Nations. The aim is to find solutions to global issues through organised discussion. ... The delegate will then be recognised by the Chair, and...
  • Essential Question: How do major organ systems work

    Essential Question: How do major organ systems work

    Explain the purpose of the major organ systems in the human body (i.e., digestion, respiration, reproduction, circulation, excretion, movement, control and coordination, and for protection from disease). Look at the following slides with animated pictures.
  • Four Blocks - University of Nevada, Reno

    Four Blocks - University of Nevada, Reno

    Simple displays using A/D converters to go from computer to calligraphic CRT. Cost of refresh for CRT too high . Computers slow, expensive, unreliable. Computer Graphics: 1960-1970. Wireframe. graphics. Draw only lines. Sketchpad. Display Processors. Storage tube. ... client-server model....
  • The Italian Grid Infrastructure (IGI) HPC 2008 Cetraro

    The Italian Grid Infrastructure (IGI) HPC 2008 Cetraro

    From the EGEE project to the new EGI organization and IGI PON Final Workshop 10/2/2009 Catania Mirco Mazzucato EGI-DS MB Italian Grid Infrastructure Coordinator
  • Computational Photography

    Computational Photography

    "Real Pixels," in Graphics Gems IV, edited by James Arvo, Academic Press, 1994 Radiance Format (.pic, .hdr) ILM's OpenEXR (.exr) 6 bytes per pixel, 2 for each channel, compressed sign exponent mantissa Several lossless compression options, 2:1 typical Compatible with...
  • Student Satisfaction @ Shoreline Community College

    Student Satisfaction @ Shoreline Community College

    Student Satisfaction @ Shoreline Community College Results from the Noel-Levitz Student Satisfaction Inventory (SSI) Description Availability of companion employee survey (Institutional Priorities Survey--IPS) to compare student/employee perceptions Included 10 locally developed questions in the same importance/satisfaction measurement format Measures 12...
  • The Tar Heel Certificate Program in Research Administration

    The Tar Heel Certificate Program in Research Administration

    "Dreams Are Just Thoughts. They Become Tangible Goals When We Write Them Down." Write Your Goals Down. Helps To Crystallize Thinking Be Precise Helps Spot Contradictory Goals Increases Commitment Creating Deadlines Is A Powerful Management Technique Make Your Goals Time...
  • Vietnam War (1954-1975) - Missouri State University

    Vietnam War (1954-1975) - Missouri State University

    Arial Americana Default Design Vietnam War (1954-1975) Slide 2 Slide 3 Political Players Slide 5 Ho Chi Minh US Strategies of 1960s US Tactics of '60s Late '60s-Early 70s Shift McDonough arrives at this time with 173rd Airborne Brigade Platoon...