Updating Data Center Networks with Zero Traffic Loss

Updating Data Center Networks with Zero Traffic Loss

zUpdate: Updating Data Center Networks with Zero Loss Hongqiang Harry Liu (Yale University) Xin Wu (Duke University) Ming Zhang, Lihua Yuan, Roger Wattenhofer, Dave Maltz (Microsoft) 1 DCN is constantly in flux Upgrade Reboot New Switch Switches Traffic Flows 2 DCN is constantly in flux Switches Traffic Flows Virtual Machines 3 Network updates are painful for operators Switch Upgrade Holy C**p Two weeks before update, Bob has to: Coordinate with application owners Complex Prepare a detailed updatePlanning plan

Review and revise the plan with colleagues At the night of update, Bob executes plan by hands, but Application alerts arePerformance triggered unexpectedly Unexpected Faults Switch failures force him to backpedal several times. Eight hours later, Bob is still stuck with update: No sleep over night Laborious Process Numerous application complaints No quick fix in sight Bob: An operator 4 Congestion-free DCN update is the key Applications want network updates to be seamless Reachability Low network latency (propagation, queuing) No packet drops Congestion Congestion-free updates are hard Many switches are involved Multi-step plan Different scenarios have distinct requirements Interactions between network and traffic demand changes 5 A clos network with ECMP All switches: Equal-Cost Multi-Path (ECMP) Link capacity: 1000

CORE 1 2 3 150 AGG 1 2 300 ToR 4 6 5 300 300 1 600 150= 920150 620 + 150 + 150 150 3 4

2 3 4 300 5 600 6 Switch upgrade: a nave solution triggers congestion Link capacity: 1000 CORE 1 2 3 4 620 + 300 150 + 150 = 920 1070 AGG 1 2 Drain AGG1 ToR 3 4

6 5 600 1 2 3 4 5 7 Switch upgrade: a smarter solution seems to be working Link capacity: 1000 CORE 1 2 3 4 50 = 1070 620 + 300 + 150 970 AGG 1 2 3

4 Drain AGG1 ToR 6 5 500 1 2 3 4 100 Weighted ECMP 5 8 Traffic distribution transition Initial Traffic Distribution Congestion-free CORE 1 AGG 1 2 2 300

ToR 3 3 4 1 4 6 5 300 300 2 3 4 Final Traffic Distribution Congestion-free 300 5 Transition ? CORE 1 AGG 1

2 2 0 ToR 3 3 4 6 5 600 1 4 500 2 3 4 100 5 Simple? NO! Asynchronous Switch Updates 9 Asynchronous changes can cause transient congestion When ToR1 is changed but ToR5 is not yet: Link capacity: 1000 CORE

1 2 3 4 620 + 300 + 150 = 1070 AGG 1 2 3 4 6 5 Drain AGG1 300 300 600 ToR 1 2 3 4

5 Not Yet 10 Solution: introducing an intermediate step Final Initial CORE 1 2 3 4 CORE 1 AGG 1 2 3 4 Transition AGG 1 2 300 ToR

3 4 300 1 6 5 300 2 3 Congestion-free regardless the asynchronizations 0 300 4 2 ToR 5 1 AGG 1 2 1

? 2 200 ToR 3 400 1 3 2 3 4 4 450 500 2 3 4 100 5 Congestion-free regardless the asynchronizations 150 5 6

5 6 5 4 4 600 Intermediate CORE 3 11 How zUpdate performs congestionfree update Update Scenario Operator Update requirements zUpdate Current Traffic Distribution Intermediate Traffic Distribution Intermediate Traffic Distribution Target Traffic Distribution Data Center Network

12 Key technical issues Describing traffic distribution Representing update requirements Defining conditions for congestion-free transition Computing an update plan Implementing an update plan 13 Describing traffic distribution : flow fs load on the link from switch v to u CORE s4 s5 =150 150 AGG s2 s3 =300 ToR 300 s1 f 600 Traffic Distribution: 14 Representing update requirements

CORE s4 s5 When s2 recovers AGG s2 Drain s2 s3 Constraint: = Constraint: = 0 ToR s1 f To To restore upgrade ECMP: switch : 15 Switch asynchronization exponentially inflates the possible load values Transition from old traffic distribution to new traffic distribution f ingress 1 2 4

6 egress f 8 3 5 7 7 , 8 5 Asynchronous updates can result in 2 possible load values on link 7,8 during transition. In large networks, it is impossible to check if the load value exceeds link capacity. 16 Two-phase commit reduces the possible load values to two Transition from old traffic distribution to new traffic distribution f ingress 2 1 version flip 4 6 egress 8

7 5 3 f With two-phase commit, fs load on link , only has two possible values throughout a transition: ( old ) , or ( new ) , 17 Flow asynchronization exponentially inflates the possible load values f1 1 2 4 6 f1 + f2 8 f2 0

3 5 7 = ( old ) 7 , 8( old ) + 7,8 ( new ) 7 , 8( old ) + 7,8 ( new ) ( old ) 7, 8 + 7,8 ( new ) 7, 8(new ) + 7,8 1 2 1 2 1 2 1 2 Asynchronous updates to N independent flows can result in2possible load values on link 7,8 18 Handling flow asynchronization f1 2 1

6 4 8 f2 0 3 5 7 = Basic idea: ( old ) 7 , 8( old ) + 7,8 ( new ) 7 , 8( old ) + 7,8 ( new ) ( old ) 7, 8 + 7,8 ( new ) 7, 8(new ) + 7,8 1 2 1 2 1 2 1 2

[Congestion-free transition constraint] There is no congestion throughout a transition if and only if: , : max { ( old ) , the capacity of link , ( new ) , , } , 19 Computing congestion-free transition plan Linear Programming Constant: Current Traffic Distribution Constraint: Congestion-free Variable: Intermediate Traffic Distribution Variable: Intermediate Traffic Distribution Constraint: Update Requirements

Variable: Target Traffic Distribution Constraint: Deliver all traffic Flow conservation 20 Implementing an update plan Computation time Switch table size limit Weighted-ECMP ECMP Critical Flows Other Flows Update overhead Failure during transition Flows traversing bottleneck links Traffic demand variation 21 Evaluations Testbed experiments Large-scale trace-driven simulations 22 Testbed setup Switch: OpenFlow 1.0 Link: 10Gbps ToR6,7: 6.2Gbps

ToR6,7: 6.2Gbps CORE 1 AGG 1 3 2 2 3 4 5 ToR6,7: 6.2Gbps ToR6,7: 6.2Gbps 4 4 5 8 9 6 Drain AGG1 ToR 1

2 3 6 7 ToR5: 6Gbps 10 11 12 ToR8: 6Gbps Traffi c Generator 23 zUpdate achieves congestion-free switch upgrade Initial CORE 1 AGG 1 2 2 3Gbps ToR 3

3 4 3Gbps 1 2 Intermediate 5 3Gbps 3 4 4 CORE 1 6 AGG 1 2 2 2Gbps 3Gbps 3 1 4

4 4Gbps ToR 5 3 6 5 4.5Gbps 2 3 1.5Gbps 4 5 Real-time link utilization Link Utilization 1.05 Final 1 CORE 1 AGG 1

2 3 4 0.95 0.9 2 3 4 0.8 0 5 Time (sec) 10 15 Link: CORE1-AGG3 20 Link: CORE3-AGG4 25 0 ToR 6Gbps 1 2 6

5 0.85 5Gbps 3 4 1Gbps 5 24 One-step update causes transient congestion Initial CORE 1 AGG 1 2 2 3Gbps ToR 3 3 4 3Gbps 1 4

6 5 3Gbps 2 3 4 3Gbps 5 Real-time link utilization Final Link Utilization 1.1 1 CORE 1 AGG 1 2 3 4 0.9 2 3

4 6 5 0.8 0 0.7 -1 1 3 5 Time (sec) 7 9 Link: CORE1-AGG3 11 13 15 ToR 6Gbps 1 2 5Gbps 3 4

1Gbps 5 Link: CORE3-AGG4 25 Large-scale trace-driven simulations A production DCN topology CORE New Switch AGG ToR Flows Test flows (1%) 26 zUpdate beats alternative solutions Post-transition Loss Rate Transition Loss Rate Loss Rate (%) 15 10 5 0 zUpdate #step 2 zUpdate-OneStep 1 ECMP-OneStep 1

ECMP-Planned 300+ 27 Conclusion Switch and flow asynchronization can cause severe congestion during DCN updates We present zUpdate for congestion-free DCN updates Novel algorithms to compute update plan Practical implementation on commodity switches Evaluations in real DCN topology and update scenarios The End 28 Thanks & Questions? 29 Updating DCN is a painful process Interactive Applications Switch Upgrade Any performance disruption? How bad will the latency be? Operator Uh? This is Bob How long will the disruption last?

What servers will be affected? 30 Network update: a tussle between applications and operators Applications want network update to be fast and seamless Update can happen on demand No performance disruption during update Network update is time consuming Nowadays, an update is planned and executed by hands Rolling back in unplanned cases Network update is risky Human errors Accidents 31 Challenges in congestion-free DCN update Many switches are involved Multi-step plan Different scenarios have distinctive requirements Switch upgrade/failure recovery New switch on-boarding Load balancer reconfiguration VM migration Help! Coordination between changes in routing (network) and traffic demand (application) 32 Related work SWAN [SIGCOMM13]

maximizing the network utilization Tunnel-based traffic engineering Reitblatt et al. [SIGCOMM12] Control plane consistency during network updates Per-packet and per-flow cannot guarantee no congestions Raza et al. [ToN2011], Ghorbani et al. [HotSDN12] One a specific scenario (IGP update, VM migration) One link weight change or one VM migration at a time 33

Recently Viewed Presentations

  • Pattern Recognition

    Pattern Recognition

    Although the technology curve flattens out, new technologies can emerge to keep growth exponential . For example, the growth of computer technology - from mechanical switches in 1890s to relay-based switches in 1940s to vacuum tubes in 1950s to transistors...
  • Incarceration Nation: The Non-Confinement Consequences of a ...

    Incarceration Nation: The Non-Confinement Consequences of a ...

    What can a CRA report? Convictions. For life. Unless the job pays less than $20,000 a year. Then, only for 7 years. Unless applying for rental housing in WA, then only for 7 years. Juvenile Adjudications. For life. Unless job...
  • DC Circuits: Review - Educypedia

    DC Circuits: Review - Educypedia

    DC Circuits: Review ... Ideal Voltage and Current Sources An ideal voltage source is a source of voltage with zero internal resistance (a perfect battery) Supply the same voltage regardless of the amount of current drawn from it An ideal...
  • Layers of the Earth Watch the movie trailer

    Layers of the Earth Watch the movie trailer

    Inner Core. Solid sphere composed mostly of iron. It is believed to be as hot as 6,650°C (12,000°F) Heat in the core is probably generated by the radioactive decay of uranium and other elements. It is solid because of the...
  • Alternating Bit Protocol

    Alternating Bit Protocol

    Consider various modes of communication like shared memory or message passing. Also assume that one process (i.e. a member) may crash at any time. Asynchronous Consensus Theorem. In a purely asynchronous distributed system, the consensus problem is impossible to solve...
  • Time and Effort: From the Beginningto New Flexibility

    Time and Effort: From the Beginningto New Flexibility

    Single Cost Objective: OREA is sufficient documentation . Multiple Cost Objective: On an annual basis, employee or supervisor with knowledge certifies that the employee worked in accordance with the OREA ... On-line or electronic form submission where employee logs in...
  • Tucker Turtle Takes Time to Tuck and Think

    Tucker Turtle Takes Time to Tuck and Think

    Tucker is happy when he plays with his friends and keeps his hands and body to himself. Friends also like it when Tucker stops and "thinks like a turtle" when he gets mad. The End! Slide 11 Slide 12 Slide...
  • Effect of User Behaviors on Wireless Random Access

    Effect of User Behaviors on Wireless Random Access

    DNA of Silicon Brains? Peer Production. Large number of individuals co-create knowledge. Examples- Citizen science projects, Citizen journalism, Open source software (Linux Kernel Development), Wikipedia