zUpdate: Updating Data Center Networks with Zero Loss Hongqiang Harry Liu (Yale University) Xin Wu (Duke University) Ming Zhang, Lihua Yuan, Roger Wattenhofer, Dave Maltz (Microsoft) 1 DCN is constantly in flux Upgrade Reboot New Switch Switches Traffic Flows 2 DCN is constantly in flux Switches Traffic Flows Virtual Machines 3 Network updates are painful for operators Switch Upgrade Holy C**p Two weeks before update, Bob has to: Coordinate with application owners Complex Prepare a detailed updatePlanning plan
Review and revise the plan with colleagues At the night of update, Bob executes plan by hands, but Application alerts arePerformance triggered unexpectedly Unexpected Faults Switch failures force him to backpedal several times. Eight hours later, Bob is still stuck with update: No sleep over night Laborious Process Numerous application complaints No quick fix in sight Bob: An operator 4 Congestion-free DCN update is the key Applications want network updates to be seamless Reachability Low network latency (propagation, queuing) No packet drops Congestion Congestion-free updates are hard Many switches are involved Multi-step plan Different scenarios have distinct requirements Interactions between network and traffic demand changes 5 A clos network with ECMP All switches: Equal-Cost Multi-Path (ECMP) Link capacity: 1000
CORE 1 2 3 150 AGG 1 2 300 ToR 4 6 5 300 300 1 600 150= 920150 620 + 150 + 150 150 3 4
2 3 4 300 5 600 6 Switch upgrade: a nave solution triggers congestion Link capacity: 1000 CORE 1 2 3 4 620 + 300 150 + 150 = 920 1070 AGG 1 2 Drain AGG1 ToR 3 4
6 5 600 1 2 3 4 5 7 Switch upgrade: a smarter solution seems to be working Link capacity: 1000 CORE 1 2 3 4 50 = 1070 620 + 300 + 150 970 AGG 1 2 3
4 Drain AGG1 ToR 6 5 500 1 2 3 4 100 Weighted ECMP 5 8 Traffic distribution transition Initial Traffic Distribution Congestion-free CORE 1 AGG 1 2 2 300
ToR 3 3 4 1 4 6 5 300 300 2 3 4 Final Traffic Distribution Congestion-free 300 5 Transition ? CORE 1 AGG 1
2 2 0 ToR 3 3 4 6 5 600 1 4 500 2 3 4 100 5 Simple? NO! Asynchronous Switch Updates 9 Asynchronous changes can cause transient congestion When ToR1 is changed but ToR5 is not yet: Link capacity: 1000 CORE
1 2 3 4 620 + 300 + 150 = 1070 AGG 1 2 3 4 6 5 Drain AGG1 300 300 600 ToR 1 2 3 4
5 Not Yet 10 Solution: introducing an intermediate step Final Initial CORE 1 2 3 4 CORE 1 AGG 1 2 3 4 Transition AGG 1 2 300 ToR
3 4 300 1 6 5 300 2 3 Congestion-free regardless the asynchronizations 0 300 4 2 ToR 5 1 AGG 1 2 1
? 2 200 ToR 3 400 1 3 2 3 4 4 450 500 2 3 4 100 5 Congestion-free regardless the asynchronizations 150 5 6
5 6 5 4 4 600 Intermediate CORE 3 11 How zUpdate performs congestionfree update Update Scenario Operator Update requirements zUpdate Current Traffic Distribution Intermediate Traffic Distribution Intermediate Traffic Distribution Target Traffic Distribution Data Center Network
12 Key technical issues Describing traffic distribution Representing update requirements Defining conditions for congestion-free transition Computing an update plan Implementing an update plan 13 Describing traffic distribution : flow fs load on the link from switch v to u CORE s4 s5 =150 150 AGG s2 s3 =300 ToR 300 s1 f 600 Traffic Distribution: 14 Representing update requirements
CORE s4 s5 When s2 recovers AGG s2 Drain s2 s3 Constraint: = Constraint: = 0 ToR s1 f To To restore upgrade ECMP: switch : 15 Switch asynchronization exponentially inflates the possible load values Transition from old traffic distribution to new traffic distribution f ingress 1 2 4
6 egress f 8 3 5 7 7 , 8 5 Asynchronous updates can result in 2 possible load values on link 7,8 during transition. In large networks, it is impossible to check if the load value exceeds link capacity. 16 Two-phase commit reduces the possible load values to two Transition from old traffic distribution to new traffic distribution f ingress 2 1 version flip 4 6 egress 8
7 5 3 f With two-phase commit, fs load on link , only has two possible values throughout a transition: ( old ) , or ( new ) , 17 Flow asynchronization exponentially inflates the possible load values f1 1 2 4 6 f1 + f2 8 f2 0
3 5 7 = ( old ) 7 , 8( old ) + 7,8 ( new ) 7 , 8( old ) + 7,8 ( new ) ( old ) 7, 8 + 7,8 ( new ) 7, 8(new ) + 7,8 1 2 1 2 1 2 1 2 Asynchronous updates to N independent flows can result in2possible load values on link 7,8 18 Handling flow asynchronization f1 2 1
6 4 8 f2 0 3 5 7 = Basic idea: ( old ) 7 , 8( old ) + 7,8 ( new ) 7 , 8( old ) + 7,8 ( new ) ( old ) 7, 8 + 7,8 ( new ) 7, 8(new ) + 7,8 1 2 1 2 1 2 1 2
[Congestion-free transition constraint] There is no congestion throughout a transition if and only if: , : max { ( old ) , the capacity of link , ( new ) , , } , 19 Computing congestion-free transition plan Linear Programming Constant: Current Traffic Distribution Constraint: Congestion-free Variable: Intermediate Traffic Distribution Variable: Intermediate Traffic Distribution Constraint: Update Requirements
Variable: Target Traffic Distribution Constraint: Deliver all traffic Flow conservation 20 Implementing an update plan Computation time Switch table size limit Weighted-ECMP ECMP Critical Flows Other Flows Update overhead Failure during transition Flows traversing bottleneck links Traffic demand variation 21 Evaluations Testbed experiments Large-scale trace-driven simulations 22 Testbed setup Switch: OpenFlow 1.0 Link: 10Gbps ToR6,7: 6.2Gbps
ToR6,7: 6.2Gbps CORE 1 AGG 1 3 2 2 3 4 5 ToR6,7: 6.2Gbps ToR6,7: 6.2Gbps 4 4 5 8 9 6 Drain AGG1 ToR 1
2 3 6 7 ToR5: 6Gbps 10 11 12 ToR8: 6Gbps Traffi c Generator 23 zUpdate achieves congestion-free switch upgrade Initial CORE 1 AGG 1 2 2 3Gbps ToR 3
3 4 3Gbps 1 2 Intermediate 5 3Gbps 3 4 4 CORE 1 6 AGG 1 2 2 2Gbps 3Gbps 3 1 4
4 4Gbps ToR 5 3 6 5 4.5Gbps 2 3 1.5Gbps 4 5 Real-time link utilization Link Utilization 1.05 Final 1 CORE 1 AGG 1
2 3 4 0.95 0.9 2 3 4 0.8 0 5 Time (sec) 10 15 Link: CORE1-AGG3 20 Link: CORE3-AGG4 25 0 ToR 6Gbps 1 2 6
5 0.85 5Gbps 3 4 1Gbps 5 24 One-step update causes transient congestion Initial CORE 1 AGG 1 2 2 3Gbps ToR 3 3 4 3Gbps 1 4
6 5 3Gbps 2 3 4 3Gbps 5 Real-time link utilization Final Link Utilization 1.1 1 CORE 1 AGG 1 2 3 4 0.9 2 3
4 6 5 0.8 0 0.7 -1 1 3 5 Time (sec) 7 9 Link: CORE1-AGG3 11 13 15 ToR 6Gbps 1 2 5Gbps 3 4
1Gbps 5 Link: CORE3-AGG4 25 Large-scale trace-driven simulations A production DCN topology CORE New Switch AGG ToR Flows Test flows (1%) 26 zUpdate beats alternative solutions Post-transition Loss Rate Transition Loss Rate Loss Rate (%) 15 10 5 0 zUpdate #step 2 zUpdate-OneStep 1 ECMP-OneStep 1
ECMP-Planned 300+ 27 Conclusion Switch and flow asynchronization can cause severe congestion during DCN updates We present zUpdate for congestion-free DCN updates Novel algorithms to compute update plan Practical implementation on commodity switches Evaluations in real DCN topology and update scenarios The End 28 Thanks & Questions? 29 Updating DCN is a painful process Interactive Applications Switch Upgrade Any performance disruption? How bad will the latency be? Operator Uh? This is Bob How long will the disruption last?
What servers will be affected? 30 Network update: a tussle between applications and operators Applications want network update to be fast and seamless Update can happen on demand No performance disruption during update Network update is time consuming Nowadays, an update is planned and executed by hands Rolling back in unplanned cases Network update is risky Human errors Accidents 31 Challenges in congestion-free DCN update Many switches are involved Multi-step plan Different scenarios have distinctive requirements Switch upgrade/failure recovery New switch on-boarding Load balancer reconfiguration VM migration Help! Coordination between changes in routing (network) and traffic demand (application) 32 Related work SWAN [SIGCOMM13]
maximizing the network utilization Tunnel-based traffic engineering Reitblatt et al. [SIGCOMM12] Control plane consistency during network updates Per-packet and per-flow cannot guarantee no congestions Raza et al. [ToN2011], Ghorbani et al. [HotSDN12] One a specific scenario (IGP update, VM migration) One link weight change or one VM migration at a time 33