Sparse Computations: Better, Faster, Cheaper!
Padma Raghavan
Department of Computer Science and Engineering, The Pennsylvania State University
[email protected]
Challenges in Processing
High Dimensional Sparse Data
User
Will my application speed-up, compute high quality solutions?
Can I make sparse codes better, faster, cheaper?
One sparse dataset: 3 forms
Matrix
Graph
Mesh
Algebraic Combinatorial Geometry
WHERE DOES SPARSITY COME FROM?
WHAT IS SPARSITY?
Consider N interacting variables
NxN pair-wise interactions
Dense:
elements, NxN
matrix/array
Sparse: c N elements, c is a small
number
Developer
SPARSITY PROPERTIES
Compact representation
Memory and computational
cost scaling
O(N) per sweep through data
Graph
Discretization
Data mining
Approximations
Matrix
Issue: Can we transform data for BETTER
classification?
Focus: Quality of classification
Method: Feature subspace transformation (FST)
by force-directed graph embedding. Apply KMeans on transformed data (FST K-Means)
Issue: Can we make sparse matrix vector
multiplication (SMV) FASTER on multicores?
Focus: Performance
Method: Identify and use smaller dense submatrices hidden in the sparse matrix -- Sparsityexploiting effectively dense (SpEED-SMV )
Obtaining a supernodal ordered matrix
Mesh
Issue: Can we schedule for energy efficiency
(CHEAPER)?
Focus: Reduce energy delay product (EDP)
Method: Dynamic scheduling with helper threads
that learn application profile to select number of
cores, threads and voltage/frequency levels
EnergyDelayProduct [EDP]
(Energy X Time) is lower
for faster system A
Run away
leakage on idle
cores
Thermal
emergencies
Transient errors
Ordering supernodal matrix using RCM ordering scheme
Decreasing memory loads by reducing index overheads
Classification Accuracy (P)
Classification Accuracy (P)
100
90
80
70
60
Total number of documents
FST-K-Means improves the
classification accuracy of 5 out of
8 datasets.
50
Percent
P
N
Number of correctly classified
documents
K-Means
GraClus
FST-K-Means
40
30
20
J ( M1 , M 2 M k )
M
ij
w1
M ij M1
Helper threads recipe
Collect performance/energy
statistics as the execution
progresses
Calculate EDP values
Use data fitting with
available data and interpolate
to predict configuration
10
0
a
n
er
a2
lia
nc
a
t_
a
r
l
u
st
t-c
ad
au
as
e
br
a
dn
e
lic
sp
0
18
t
tx
0
30
t
tx
Performance comparison: SpEED-SMV vs. RxC-SMV
Data Sets
Cluster Cohesiveness (J)
2
B
M
ij
wk
2
M ij M k
jth document of cluster Mi
Centroid of cluster
Mk
Best EDP adaptivity?
Cluster Cohesiveness (J)
90,000
Monitor- ModelAdapt
80,000
70,000
60,000
50,000
Cohesiveness
FST-K-Means improves the
cluster cohesiveness for K-Means
and is competitive with GraClus on
the benchmark datasets.
K-Means
GraClus
FST-K-Means
40,000
30,000
20,000
10,000
0
r
a
a
n
ce
a2
lia
dn
n
a
a
a
r
a2
st
t-c
lt_
au
as
Data Sets
u
e
ad
br
sp
e
lic
18
xt
0t
30
xt
0t
Anirban Chatterjee, Sanjukta Bhowmick, Padma Raghavan: Feature subspace transformations
for enhancing k-means clustering. CIKM 2010: 1801-1804
SpEED-SMV improves the performance of SMV by as much as
59.35% and 50.32%, and on average, 37.35% and 6.07%, compared
to traditional compressed sparse row and blocked compressed
sparse row schemes
Manu Shantharam, Anirban Chatterjee, Padma Raghavan: Exploiting dense substructures for fast
sparse matrix vector multiplication. IJHPCA 25(3): 328-341 (2011)
Achieve up to 66:3% and 83:3% savings in EDP when adjusting
all the parameters properly in applications FFT and MG, respectively
Yang Ding, Mahmut T. Kandemir, Padma Raghavan, Mary Jane Irwin: A helper thread based EDP
reduction scheme for adapting application execution in CMPs. IPDPS 2008: 1-14
Software for sparse computations can deliver high quality solutions that are scalable and energy efficient by exploiting latent structure in the data and adapting to
match hardware attributes
This research was funded in part by the NSF Grants CSR-SMA 0720769, CCF 0830679 and OCI 0821527