A Model for Application Slowdown Estimation in On-Chip

A Model for Application Slowdown Estimation in On-Chip

A Model for Application Slowdown Estimation in On-Chip Networks and Its Use for Improving System Fairness and Performance Xiyue Xiang*, Saugata Ghose, Onur Mutlu, Nian-Feng Tzeng* * University of Louisiana at Lafayette Carnegie Mellon University ETH Zrich Executive Summary Problem: inter-application interference in on-chip networks (NoCs) In a multicore processor, interference can occur due to NoC contention Interference causes applications to slow down unfairly Goal: estimate NoC-level slowdown at runtime, and use slowdown information to improve system fairness and performance Our Approach NoC Application Slowdown Model (NAS): first online model to quantify inter-application interference in NoCs Fairness-Aware Source Throttling (FAST): throttle network injection rate of processor cores based on slowdown estimate from NAS Results

NAS is very accurate and scalable: 4.2% error rate on average (88 mesh) FAST improves system fairness by 9.5%, and performance by 5.2% (compared to a baseline without source throttling on a 88 mesh) 2 S lo w d o w n Motivation: Interference in NoCs 2.7 3 1.6 2 1 0 lbm leslie3d mcf Gems... 16 copies of each application run concurrently on a 64-core processor = Root cause: NoC bandwidth is shared Interference slows down applications and increases system

unfairness 3 NAS: NoC Application Slowdown Model tshared: measured directly h = talone: unknown at runtime h h Online estimation of application stall time due to interference Challenges: Flit-level delay slowdown Node S reque st Node D respon se Each request involves multiple packets 4 NAS: NoC Application Slowdown Model tshared: measured directly h =

talone: unknown at runtime h h Online estimation of application stall time due to interference Challenges: Node S Node D Flit-level delay slowdown Random and distributive Overlapped delay A packet is formed by multiple flits Basic idea: track delay and calculate tstall 4 Flit-Level Interference 12 13 14 15 8

9 10 11 5 4 1 0 Core L1 6 2 MSHRs Shared LLC Slice Three interference events Injection Virtual channel arbitration Switch arbitration 7

3 Each flit carries an additional field tflit Router Node If arbitration loses, tflit = tflit + 1 Sum up arbitration delays due to interference 5 Packet-Level Interference 1 2 3 4 5 f1 f2 f3 f4 f5 treassembly = M cycles Alone (M=5) run: Packets flits arrive consecutively when there is no interference 1

2 3 Shared run: f1 =2 4 5 6 7 f3 f2 M-cycle reassembly Tfirst_arrival =3 8 9 10 11 f4 f5 Tlast_arrival=11 treassembly Track increase in packet reassembly time

6 Request-Level Interference Node S 0 1 Request packet delayed by 5 cycles due to inter-application interference Node D Leverage closed-loop packet behavior to accumulate tpacket Inheritance Table: lump sum of tpacket for associated packets 7 Request-Level Interference Node S 0 1 Request packet delayed by 5 cycles due to inter-application interference Node D 2 5 Register request packet info in

inheritance table (tpacket = 5) NI Inheritance Table reqID mshrID tpacket .. . .. . 5 .. . 4 LLC Slice 3 Cache acces s Generate response packet, inheriting ttpacket from table Leverage closed-loop packet behavior to accumulate tpacket Inheritance Table: lump sum of tpacket for associated packets 7 Request-Level Interference

Node S 1 Request packet delayed by 5 cycles due to inter-application interference Node D 2 5 Register request packet info in inheritance table (tpacket = 5) NI Inheritance Table reqID mshrID tpacket 5 Response packet delayed by 3 cycles due to inter-application interference .. . .. . 5 .. . 4 LLC Slice 3 Cache acces

s Generate response packet, inheriting ttpacket from table Leverage closed-loop packet behavior to accumulate tpacket Inheritance Table: lump sum of tpacket for associated packets 7 Request-Level Interference Node S 1 Request packet delayed by 5 cycles due to inter-application interference Node D 2 5 Register request packet info in inheritance table (tpacket = 5) NI Final value of tpacket is 8 cycles 8 Inheritance Table reqID mshrID tpacket

5 Response packet delayed by 3 cycles due to inter-application interference .. . .. . .. . 4 LLC Slice 3 Cache acces s Generate response packet, inheriting ttpacket from table Leverage closed-loop packet behavior to accumulate tpacket Inheritance Table: lump sum of tpacket for associated Sum up delays of all associated = + packets

packets 7 Application Stall Time ILP, MLP Latency of critical request Latency is hidden ignored App. stalls Tcritical Tservice A memory request becomes critical if 1) It is the oldest instruction at ROB and ROB is full, and/or 2) It is the oldest instruction at LSQ and LSQ is full when the next is a memory instruction For all critical requests = ( , ) Count only request delays on critical path of execution = time , 8

Using NAS to Improve Fairness NAS provides online estimation of slowdown Sum up flit-level arbitration delays due to interference Track increase in packet reassembly time Sum up delays of all associated packets Determine which request delays causes application stall Goal Use NAS to improve system fairness and performance FAST: Fairness-Aware Source Throttling 9 Slowdown 3.0 Network Intensity 120 2.5

100 2.0 80 1.5 60 1.0 1.0 40 0.5 20 0.0 lbm leslie3d Lower STCnoc mcf GemsFDTD 0 N e t w o r k I n t e n s it y ( M P K I ) S lo w d o w n A New Metric: NoC Stall-Time Criticality Interference in NoCs has uneven impact NoC Stall-Time Criticality

= <==> Less sensitive to NoC-level interference Good candidate to be throttled down FAST utilizes STCnoc to proactively estimate the expected impact of each L1 miss 10 Key Knobs of FAST Rank based on slowdown Classification based on network intensity Latency-sensitive: spends more time in the core Throughput-sensitive: network intensive Throttle Up Latency-sensitive applications: improve system performance Slower applications: optimize system fairness

Throttle Down Throughput sensitive application with lower STCnoc: reduce interference with lower negative impact on performance Avoid throttling down the slowest application 11 Methodology Processor Out-of-order, ROB / instruction window = 128 Caches L1: 64KB, 16 MSHRs L2: perfect shared NoCs Topology: 44 and 88 mesh Router: conventional VC router with 8 VCs, 4 flits/VC

Workloads: multiprogrammed SPEC CPU2006 90 randomly-chosen workloads Categorized by network intensity (i.e., MPKI) 12 Slowdown Estimation Error NAS is Accurate 15% 10% Network saturation 31.7% 2.6% 4.2% 4x4 8x8 5% 0% Slowdown estimation error: 4.2% (2.6%) for 88 (44) Low estimated slowdown error consistently

NAS is highly Good scalability accurate and scalable 13 +5.0% 1.10 1.05 N o rm a liz e d W e ig h te d S p e e d u p N o rm a liz e d W e ig h te d S p e e d u p FAST Improves Performance +5.2% 1.10 1.05 1.00 1.00 0.95 0.95 0.90 0.90 (a) Mixed workloads (b) Heavy workloads FAST has better performance than both HAT and NoST

Inter-application interference is reduced Only throttles applications with low negative impact (i.e., lower STCnoc) 14 1.10 - 4.7% 1.05 1.00 0.95 N o r m a liz e d U n f a ir n e s s N o rm a liz e d U n f a irn e s s FAST Reduces Unfairness 1.10 1.05 -9.5% 1.00 0.95 0.90 0.90 0.85 0.85 (a) Mixed workloads

(b) Heavy workloads FAST can improve fairness Source throttling allows slower applications to catch up Uses runtime slowdown to identify and avoid throttling the slowest application 15 Conclusion Problem: inter-application interference in on-chip networks (NoCs) In a multicore processor, interference can occur due to NoC contention Interference causes applications to slow down unfairly Goal: estimate NoC-level slowdown at runtime, and use slowdown information to improve system fairness and performance Our Approach NoC Application Slowdown Model (NAS): first online model to quantify inter-application interference in NoCs Fairness-Aware Source Throttling (FAST): throttle network injection rate of processor cores based on slowdown estimate from NAS

Results NAS is very accurate and scalable: 4.2% error rate on average (88 mesh) FAST improves system fairness by 9.5%, and performance by 5.2% (compared to a baseline without source throttling on a 88 mesh) 16 A Model for Application Slowdown Estimation in On-Chip Networks and Its Use for Improving System Fairness and Performance Xiyue Xiang*, Saugata Ghose, Onur Mutlu, Nian-Feng Tzeng* * University of Louisiana at Lafayette Carnegie Mellon University ETH Zrich Backup Slides Xiyue Xiang*, Saugata Ghose, Onur Mutlu, Nian-Feng Tzeng* * University of Louisiana at Lafayette Carnegie Mellon University ETH Zrich Related Works Slowdown modeling Fine grained: [Mutlu+ MICRO 07], [Ebrahimi+ ASPLOS 10], [Bois+ TACO 13] Coarse grained: [Subramanian+ HPCA 13], [Subramanian MICRO 15] Source throttling [Chang+ SBAC-PAD 12], [Nychis+ SIGCOMM 12], [Nychis+ HotNet 10] Application mapping

[Chou+ ICCD 08], [Das+ HPCA 13] Prioritization [Das+ MICRO 09], [Das ISCA 10] Scheduling [Kim+ MICRO10] QoS [Grot+ MICRO 09], [Grot+ ISCA 11], [Lee+ ISCA 08] 19 Hardware Cost of NAS Location Components Costs Router Interference delay of each flit 5.3% wider data path NI Timestamp of the first and last arrival flit of a packet (16+16)16 bits Inheritance table (6+4+8)20 bits Interference delay of the request 8 bits

Timestamp when processor stalls 16 bits Core Estimated application stall time 16 bits Total cost of NAS per node 114 Bytes + 5.3% router area 20 F r a c t io n o fA p p lic a t io n I n s t a n c e s NAS Error Distribution Plot 7,200 application instances 50% 66.0% of application instances with < 10% error 40% 84.3% of application instances with < 20% error 30% 5.6% of application instances with 40% error 20% 10% 0% 10% 20% 30%

40% 50% 60% 70% 80% 90% 100% Slowdown Estimation Error (Binned) Plot 7,200 application instance NAS exhibits high accuracy most of the time 21

Recently Viewed Presentations

  • Atmosphere Web quest - lcboe.net

    Atmosphere Web quest - lcboe.net

    Why is the atmosphere important? The atmosphere is an important part of what makes Earth livable. It blocks some of the Sun's dangerous rays from reaching Earth. It traps heat, making Earth a comfortable temperature. ... More frequent heat waves....
  • Rape Investigation Handbook: Second Edition

    Rape Investigation Handbook: Second Edition

    Noble Cause Corruption Noble cause corruption refers to corrupt or illegal acts committed by law enforcement in order to secure or maintain an arrest or conviction, or some other worthy end. Law enforcement must select between competing ethics.
  • The Will to Act Drives Corporate Disconnect With

    The Will to Act Drives Corporate Disconnect With

    Instead, they spend their days trying to score debating points for the next election" (Alan Murray, asst mng editor) Source: Wall Street Journal, 1/18/06 Types of regulation and regulatory agencies THE WORLD TRADE ORGANIZATION WTO imposed penalties 11/26/04 on U.S....
  • Larry H Bernstein, MD, FCAP Emeritus,NAACLS Board of Directors

    Larry H Bernstein, MD, FCAP Emeritus,NAACLS Board of Directors

    [1] Hybridization. The self-association (self=assembly) of complementary nucleic acid molecules or parts of molecules, is implicit in all aspects of structural DNA nanotechnology [2] Stably branched DNA. the combination of in vitro hybridization and synthetic branched DNA that leads to...
  • Sustainability Lesson

    Sustainability Lesson

    Economic Systems & Broad Social Goals. SSEF4 Compare and contrast different economic systems and explain how they answer the three basic economic questions of what to produce, how to produce, and for whom to produce. a. Compare traditional, command, market,...
  • Elements of a Story - Warren County Public Schools

    Elements of a Story - Warren County Public Schools

    PLOT. There are five essential parts of plot: Introduction - The beginning of the story where the characters and the setting is revealed. 2) Rising Action - This is where the events in the story become complicated and the conflict...
  • Modernism An Overview General Definitions Modernism a term

    Modernism An Overview General Definitions Modernism a term

    A Working Definition Modernism is a cultural movement which rebelled against Victorian mores Victorianism emphasized nationalism & cultural absolutism. placed humans over and outside of nature. showed a single way of looking at the world, and in absolute and clear-cut...
  • Miscellaneous Dental Materials

    Miscellaneous Dental Materials

    Should be radiopaque. Should be easily removed in case of failure. ENDODONTIC MATERIALS(ROOT CANAL SEALER MATERIALS) ROOT CANAL SEALER MATERIALS: ZINC OXIDE EUGENOL. GLASS IONOMER CEMENT. CALCIUM HYDROXIDE CEMENT. ... MISCELLANEOUS DENTAL MATERIALS