10th DIMACS Implementation Challenge
Updated July 2012
http://www.cc.gatech.edu/dimacs10/index.shtml
http://www.cise.ufl.edu/research/sparse/dimacs10
As stated on their main website (
http://dimacs.rutgers.edu/Challenges/ ), the "DIMACS Implementation
Challenges address questions of determining realistic algorithm
performance where worst case analysis is overly pessimistic and
probabilistic models are too unrealistic: experimentation can provide
guides to realistic algorithm performance where analysis fails."
For the 10th DIMACS Implementation Challenge, the two related
problems of graph partitioning and graph clustering were chosen.
Graph partitioning and graph clustering are among the aforementioned
questions or problem areas where theoretical and practical results
deviate significantly from each other, so that experimental outcomes
are of particular interest.
Problem Motivation
Graph partitioning and graph clustering are ubiquitous subtasks in
many application areas. Generally speaking, both techniques aim at
the identification of vertex subsets with many internal and few
external edges. To name only a few, problems addressed by graph
partitioning and graph clustering algorithms are:
* What are the communities within an (online) social network?
* How do I speed up a numerical simulation by mapping it
efficiently onto a parallel computer?
* How must components be organized on a computer chip such that
they can communicate efficiently with each other?
* What are the segments of a digital image?
* Which functions are certain genes (most likely) responsible
for?
Challenge Goals
* One goal of this Challenge is to create a reproducible picture
of the state-of-the-art in the area of graph partitioning
(GP) and graph clustering (GC) algorithms. To this end we
are identifying a standard set of benchmark instances and
generators.
* Moreover, after initiating a discussion with the community, we
would like to establish the most appropriate problem
formulations and objective functions for a variety of
applications.
* Another goal is to enable current researchers to compare their
codes with each other, in hopes of identifying the most
effective algorithmic innovations that have been proposed.
* The final goal is to publish proceedings containing results
presented at the Challenge workshop, and a book containing
the best of the proceedings papers.
Problems Addressed
The precise problem formulations need to be established in the course
of the Challenge. The descriptions below serve as a starting point.
* Graph partitioning:
The most common formulation of the graph partitioning problem
for an undirected graph G = (V,E) asks for a division of V into
k pairwise disjoint subsets (partitions) such that all
partitions are of approximately equal size and the edge-cut,
i.e., the total number of edges having their incident nodes in
different subdomains, is minimized. The problem is known to be
NP-hard.
* Graph clustering:
Clustering is an important tool for investigating the
structural properties of data. Generally speaking, clustering
refers to the grouping of objects such that objects in the same
cluster are more similar to each other than to objects of
different clusters. The similarity measure depends on the
underlying application. Clustering graphs usually refers to the
identification of vertex subsets (clusters) that have
significantly more internal edges (to vertices of the same
cluster) than external ones (to vertices of another cluster).
There are 12 data sets in the DIMACS10 collection:
clustering: real-world graphs commonly used as benchmarks
coauthor: citation and co-author networks
Delaunay: Delaunay triangulations of random points in the plane
dyn-frames: frames from a 2D dynamic simulation
Kronecker: synthetic graphs from the Graph500 benchmark
numerical: graphs from numerical simulation
random: random geometric graphs (random points in the unit square)
streets: real-world street networks
Walshaw: Chris Walshaw's graph partitioning archive
matrix: graphs from the UF collection (not added here)
redistrict: census networks
star-mixtures : artificially generated from sets of real graphs
Some of the graphs already exist in the UF Collection. In some cases,
the original graph is unsymmetric, with values, whereas the DIMACS
graph is the symmetrized pattern of A+A'. Rather than add duplicate
patterns to the UF Collection, a MATLAB script is provided at
http://www.cise.ufl.edu/research/sparse/dimacs10 which downloads
each matrix from the UF Collection via UFget, and then performs whatever
operation is required to convert the matrix to the DIMACS graph problem.
Also posted at that page is a MATLAB code (metis_graph) for reading the
DIMACS *.graph files into MATLAB.
--------------------------------------------------------------------------------
clustering: Clustering Benchmarks
--------------------------------------------------------------------------------
These real-world graphs are often used as benchmarks in the graph
clustering and community detection communities. All but 4 of the 27
graphs already appear in the UF collection in other groups. The
DIMACS10 version is always symmetric, binary, and with zero-free
diagonal. The version in the UF collection may not have those
properties, but in those cases, if the pattern of the UF matrix
is symmetrized and the diagonal removed, the result is the DIMACS10
graph.
DIMACS10 graph: new? UF matrix:
--------------- ---- -------------
clustering/adjnoun Newman/adjoun
clustering/as-22july06 Newman/as-22july06
clustering/astro-ph Newman/astro-ph
clustering/caidaRouterLevel * DIMACS10/caidaRouterLevel
clustering/celegans_metabolic Arenas/celegans_metabolic
clustering/celegansneural Newman/celegansneural
clustering/chesapeake * DIMACS10/chesapeake
clustering/cnr-2000 LAW/cnr-2000
clustering/cond-mat-2003 Newman/cond-mat-2003
clustering/cond-mat-2005 Newman/cond-mat-2005
clustering/cond-mat Newman/cond-mat
clustering/dolphins Newman/dolphins
clustering/email Arenas/email
clustering/eu-2005 LAW/eu-2005
clustering/football Newman/football
clustering/hep-th Newman/hep-th
clustering/in-2004 LAW/in-2004
clustering/jazz Arenas/jazz
clustering/karate Arenas/karate
clustering/lesmis Newman/lesmis
clustering/netscience Newman/netscience
clustering/PGPgiantcompo Arenas/PGPgiantcompo
clustering/polblogs Newman/polblogs
clustering/polbooks Newman/polbooks
clustering/power Newman/power
clustering/road_central * DIMACS10/road_central
clustering/road_usa * DIMACS10/road_usa
the following graphs were added on July 2012:
G_n_pin_pout
preferentialAttachment
smallworld
uk-2002 was 'added' on July 2012 to the dimacs10 MATLAB interface,
but it already appears as the LAW/uk-2002 matrix.
uk-2007-05 is in the DIMACS10 collection but is not yet added here,
because it's too large for the file format of the UF collection.
--------------------------------------------------------------------------------
coauthor: Citation Networks
--------------------------------------------------------------------------------
These graphs are examples of social networks used in R. Geisberger, P.
Sanders, and D. Schultes. Better approximation of betweenness
centrality. In 10th Workshop on Algorithm Engineering and
Experimentation, pages 90-108, San Francisco, 2008. SIAM.
--------------------------------------------------------------------------------
Delaunay: Delaunay Graphs
--------------------------------------------------------------------------------
These files have been generated as Delaunay triangulations of random
points in the unit square.
Engineering a scalable high quality graph partitioner,
M. Holtgrewe, P. Sanders, C. Schulz, IPDPS 2010
http://dx.doi.org/10.1109/IPDPS.2010.5470485
--------------------------------------------------------------------------------
dyn-frames: Frames from 2D Dynamic Simulations
--------------------------------------------------------------------------------
These files have been created with the generator described in Oliver
Marquardt, Stefan Schamberger: Open Benchmarks for Load Balancing
Heuristics in Parallel Adaptive Finite Element Computations. In Proc.
International Conference on Parallel and Distributed Processing
Techniques and Applications (PDPTA 2005), Volume 2, pp. 685-691. CSREA
Press 2005, ISBN 1-932415-59-9685-691.
The graphs are meshes taken from indivudual frames of a dynamic
sequence that resembles two-dimensional adaptive numerical simulations.
Smaller versions of these files (and their dynamic sequences as videos)
can be found on Stefan Schamberger's website (
http://www.upb.de/cs/schaum/benchmark.html ) dedicated to these
benchmarks. The files presented here are the frames 0, 10, and 20 of
the sequences, respectively.
--------------------------------------------------------------------------------
Kronecker: Kronecker Generator Graphs
--------------------------------------------------------------------------------
The original Kronecker files contain self-loops and multiple edges.
These properties are also present in real-world data sets. However,
some tools cannot handle these "artifacts" at the moment. That is why
we present "cleansed" versions of the data sets as well. For the
Challenge you should expect to be confronted with the original data
with self-loops and multiple edges. However, the final decision on
this issue will be made based on participant feedback.
All files have been generated with the R-MAT parameters A=0.57, B=0.19,
C=0.19, and D=1-(A+B+C)=0.05 and edgefactor=48, i.e., the number of
edges equals 48*n, where n is the number of vertices. Details about the
generator and the parameter meanings can be found on the Graph500
website. ( http://www.graph500.org/Specifications.html )
There are 12 graphs in the DIMACS10 test set at
http://www.cc.gatech.edu/dimacs10/index.shtml . Them come in 6 pairs.
One graph in each pair is a multigraph, with self-edges. The other
graph is the nonzero pattern of the first (binary), with self-edges
removed. MATLAB cannot directly represent multigraph, so in the UF
Collection the unweighted multigraph G is represented as a matrix A
where A(i,j) is an integer equal to the number edges (i,j) in G.
The binary graphs include the word 'simple' in their
name In the UF Collection, only the multigraph is included,
since the simple graph can be constructed from the multigraph.
If A is the multigraph, the simple graph S can be computed as:
S = spones (tril (A,-1)) + spones (triu (A,1)) ;
DIMACS10 graph: UF matrix:
--------------- -------------
kronecker/kron_g500-logn16 DIMACS10/kron_g500-logn16
kronecker/kron_g500-simple-logn16
kronecker/kron_g500-logn17 DIMACS10/kron_g500-logn17
kronecker/kron_g500-simple-logn17
kronecker/kron_g500-logn18 DIMACS10/kron_g500-logn18
kronecker/kron_g500-simple-logn18
kronecker/kron_g500-logn19 DIMACS10/kron_g500-logn19
kronecker/kron_g500-simple-logn19
kronecker/kron_g500-logn20 DIMACS10/kron_g500-logn20
kronecker/kron_g500-simple-logn20
kronecker/kron_g500-logn21 DIMACS10/kron_g500-logn21
kronecker/kron_g500-simple-logn21
References: "Introducing the Graph 500," Richard C. Murphy, Kyle B.
Wheeler, Brian W. Barrett, James A. Ang, Cray User's Group (CUG), May
5, 2010.
D.A. Bader, J. Feo, J. Gilbert, J. Kepner, D. Koester, E. Loh, K.
Madduri, W. Mann, Theresa Meuse, HPCS Scalable Synthetic Compact
Applications #2 Graph Analysis (SSCA#2 v2.2 Specification), 5
September 2007.
D. Chakrabarti, Y. Zhan, and C. Faloutsos, R-MAT: A recursive model
for graph mining, SIAM Data Mining 2004.
Section 17.6, Algorithms in C (third edition). Part 5 Graph
Algorithms, Robert Sedgewick (Programs 17.7 and 17.8)
P. Sanders, Random Permutations on Distributed, External and
Hierarchical Memory, Information Processing Letters 67 (1988) pp
305-309.
"DFS: A Simple to Write Yet Difficult to Execute Benchmark," Richard C.
Murphy, Jonathan Berry, William McLendon, Bruce Hendrickson, Douglas
Gregor, Andrew Lumsdaine, IEEE International Symposium on Workload
Characterizations 2006 (IISWC06), San Jose, CA, 25-27 October 2006.
---- sample code for generating these matrices:
function ij = kronecker_generator (SCALE, edgefactor)
%% Generate an edgelist according to the Graph500
%% parameters. In this sample, the edge list is
%% returned in an array with two rows, where StartVertex
%% is first row and EndVertex is the second. The vertex
%% labels start at zero.
%%
%% Example, creating a sparse matrix for viewing:
%% ij = kronecker_generator (10, 16);
%% G = sparse (ij(1,:)+1, ij(2,:)+1, ones (1, size (ij, 2)));
%% spy (G);
%% The spy plot should appear fairly dense. Any locality
%% is removed by the final permutations.
%% Set number of vertices.
N = 2^SCALE;
%% Set number of edges.
M = edgefactor * N;
%% Set initiator probabilities.
[A, B, C] = deal (0.57, 0.19, 0.19);
%% Create index arrays.
ij = ones (2, M);
%% Loop over each order of bit.
ab = A + B;
c_norm = C/(1 - (A + B));
a_norm = A/(A + B);
for ib = 1:SCALE,
%% Compare with probabilities and set bits of indices.
ii_bit = rand (1, M) > ab;
jj_bit = rand (1, M) > ( c_norm * ii_bit + a_norm * not (ii_bit) );
ij = ij + 2^(ib-1) * [ii_bit; jj_bit];
end
%% Permute vertex labels
p = randperm (N);
ij = p(ij);
%% Permute the edge list
p = randperm (M);
ij = ij(:, p);
%% Adjust to zero-based labels.
ij = ij - 1;
function G = kernel_1 (ij)
%% Compute a sparse adjacency matrix representation
%% of the graph with edges from ij.
%% Remove self-edges.
ij(:, ij(1,:) == ij(2,:)) = [];
%% Adjust away from zero labels.
ij = ij + 1;
%% Find the maximum label for sizing.
N = max (max (ij));
%% Create the matrix, ensuring it is square.
G = sparse (ij(1,:), ij(2,:), ones (1, size (ij, 2)), N, N);
%% Symmetrize to model an undirected graph.
G = spones (G + G.');
--------------------------------------------------------------------------------
numerical: graphs from numerical simulations
--------------------------------------------------------------------------------
For the graphs adaptive and venturiLevel3, please refer to the preprint:
Hartwig Anzt, Werner Augustin, Martin Baumann, Hendryk Bockelmann,
Thomas Gengenbach, Tobias Hahn, Vincent Heuveline, Eva Ketelaer,
Dimitar Lukarski, Andrea Otzen, Sebastian Ritterbusch, Bjo"rn Rocker,
Staffan RonnĂ¥s, Michael Schick, Chandramowli Subramanian, Jan-Philipp
Weiss, and Florian Wilhelm. Hiflow3 - a flexible and hardware-aware
parallel Finite element package. In Parallel/High-Performance Object-
Oriented Scientific Computing (POOSC'10).
For the graphs channel-500x100x100-b050 and packing-500x100x100-b050,
please refer to:
Markus Wittmann, Thomas Zeiser. Technical Note: Data Structures of
ILBDC Lattice Boltzmann Solver.
http://www.cc.gatech.edu/dimacs10/archive/numerical-overview-Erlangen.pdf
The instances NACA0015, M6, 333SP, AS365, and NLR are 2-dimensional FE
triangular meshes with coordinate information. 333SP and AS365 are actually
converted from existing 3-dimensional models to 2D places, while the rest
are created from geometry. The corresponding coordinate files have been
assembled in one archive. They have been created and contributed by Chan
Siew Yin with the help of Jian Tao Zhang, Department of Mechanical
Engineering, University of New Brunswick, Fredericton, Canada.
--------------------------------------------------------------------------------
random: Random Geometric Graphs
--------------------------------------------------------------------------------
rggX is a random geometric graph with 2^X vertices. Each vertex is a
random point in the unit square and edges connect vertices whose
Euclidean distance is below 0.55 (ln n)/n. This threshold was choosen
in order to ensure that the graph is almost connected.
Note: the UF Collection is a collection of matrices primarily from
real applications. The only random matrices I add to the collection
are those used in established benchmarks (such as DIMACS10).
Engineering a scalable high quality graph partitioner,
M. Holtgrewe, P. Sanders, C. Schulz, IPDPS 2010.
http://dx.doi.org/10.1109/IPDPS.2010.5470485
--------------------------------------------------------------------------------
steets: Street Networks
--------------------------------------------------------------------------------
The graphs Asia, Belgium, Europe, Germany, Great-Britain, Italy,
Luxemburg and Netherlands are (roughly speaking) undirected and
unweighted versions of the largest strongly connected component
of the corresponding Open Street Map road networks. The Open
Street Map road networks have been taken from
http://download.geofabrik.de and have been converted for DIMACS10
by Moritz Kobitzsch (kobitzsch at kit.edu) as follows:
First, we took the corresponding graph and extracted all routeable
streets. Routable streets are objects in this file that are tagged
using one of the following tags: motorway, motorway_link, trunk
trunk_link, primary, primary_link, secondary, secondary_link,
tertiary, tertiary_link, residential, unclassified, road,
living_street, and service. Next, we now compute the largest
strongly connected component of this extracted open street map graph.
Self-edges and parallel edges are removed afterwards.
The DIMACS 10 graph is now the undirected and unweighted version
of the constructed graph, i.e. if there is an edge (u,v) but the
reverse edge (v,u) is missing, we insert the reverse edge into the
graph. The XY coordinates of each node in the graph are preserved.
--------------------------------------------------------------------------------
Walshaw: Chris Walshaw's graph partitioning archive
--------------------------------------------------------------------------------
Chris Walshaw's graph partitioning archive contains 34 graphs that
have been very popular as benchmarks for graph partitioning algorithms
( http://staffweb.cms.gre.ac.uk/~wc06/partition/ ).
17 of them are already in the UF Collection. Only the 17 new graphs
not yet in the collection are added here in the DIMACS10 set.
DIMACS10 graph: new? UF matrix:
--------------- ---- -------------
walshaw/144 * DIMACS10/144
walshaw/3elt AG-Monien/3elt
walshaw/4elt Pothen/barth5
walshaw/598a * DIMACS10/598a
walshaw/add20 Hamm/add20
walshaw/add32 Hamm/add32
walshaw/auto * DIMACS10/auto
walshaw/bcsstk29 HB/bcsstk29
walshaw/bcsstk30 HB/bcsstk30
walshaw/bcsstk31 HB/bcsstk31
walshaw/bcsstk32 HB/bcsstk32
walshaw/bcsstk33 HB/bcsstk33
walshaw/brack2 AG-Monien/brack2
walshaw/crack AG-Monient/crack
walshaw/cs4 * DIMACS10/cs4
walshaw/cti * DIMACS10/cti
walshaw/data * DIMACS10/data
walshaw/fe_4elt2 * DIMACS10/fe_4elt2
walshaw/fe_body * DIMACS10/fe_body
walshaw/fe_ocean * DIMACS10/fe_ocean
walshaw/fe_pwt Pothen/pwt
walshaw/fe_rotor * DIMACS10/fe_rotor
walshaw/fe_sphere * DIMACS10/fe_sphere
walshaw/fe_tooth * DIMACS10/fe_tooth
walshaw/finan512 Mulvey/finan512
walshaw/m14b * DIMACS10/m14b
walshaw/memplus Hamm/memplus
walshaw/t60k * DIMACS10/t60k
walshaw/uk * DIMACS10/uk
walshaw/vibrobox Cote/vibrobox
walshaw/wave AG-Monien/wave
walshaw/whitaker3 AG-Monien/whitaker3
walshaw/wing * DIMACS10/wing
walshaw/wing_nodal * DIMACS10/wing_nodal
--------------------------------------------------------------------------------
redistrict: census networks
--------------------------------------------------------------------------------
These graphs represent US states. They are used for solving the
redistricting problem. All data have been provided by Will Zhao. As stated
on the project website, The nodes are Census2010 blocks. Two nodes have an
edge linking them if they share a line segment on their borders, i.e.
rook-style neighboring. The nodes weights are the POP100 variable of
Census2010-PL94, and the area of eache district.
--------------------------------------------------------------------------------
star-mixtures : artificially generated from sets of real graphs
--------------------------------------------------------------------------------
Each graph in this benchmark represents a star-like structure of different
graphs S0 , . . . , St. Graphs S1 , . . . , St are weakly connected to the
center S0 by random edges. The total number of edges between each Si and S0
was less than 3% out of the total number of edges in Si . The graphs are
mixtures of the following structures: social networks, finite-element
graphs, VLSI chips, peer-to-peer networks, and matrices from optimization
solvers.
More info can be found in the paper I. Safro, P. Sanders, C. Schulz:
Advanced Coarsening Schemes for Graph Partitioning, SEA 2012.
Communicated by Christian Schulz, uploaded on March 30, 2012.