GAP/GAP-web

GAP benchmark: web

Name	GAP-web
Group	GAP
Matrix ID	2853
Num Rows	50,636,151
Num Cols	50,636,151
Nonzeros	1,930,292,948
Pattern Entries	1,930,292,948
Kind	Directed Weighted Graph
Symmetric	No
Date	2017
Author	Laboratory for Web Algorithmics (LAW), Universita degli Studi di Milano, http://law.di.unimi.it/index.php
Editor	S. Beamer, K. Asanovic, D. Patterson

Structural Rank
Structural Rank Full
Num Dmperm Blocks
Strongly Connect Components	8,815,054
Num Explicit Zeros	0
Pattern Symmetry	12.5%
Numeric Symmetry	0%
Cholesky Candidate	no
Positive Definite	no
Type	integer

Download	MATLAB Rutherford Boeing Matrix Market
Notes	GAP/GAP-web matrix: this is the 'web' graph from the GAP benchmark suite, as described in "The GAP Benchmark Suite", by Scott Beamer, Krste Asanovic', and David Patterson, https://arxiv.org/abs/1508.03619 . In that paper, the Web graph is described as follows: Web (\|V\|=50.6M, \|E\|=1,949.4M, directed) is a web-crawl of the .sk domain (sk-2005) [9]. Despite its large size, it exhibits substantial locality due to its topology and high average degree. [9] Timothy Davis and Yifan Hu. The University of Florida sparse matrix collection. ACM Transactions on Mathematical Software, 38:1:1 – 1:25, 2011. The graph comes from the following source: Laboratory for Web Algorithmics (LAW), Universita degli Studi di Milano, http://law.di.unimi.it/index.php. When using matrices in the LAW/ group in the collection, please follow the citation instructions at http://law.di.unimi.it/datasets.php. If you publish results based on these graphs, please acknowledge the usage of WebGraph and LLP by quoting the following papers: [1] "The WebGraph Framework I: Compression Techniques," Paolo Boldi and Sebastiano Vigna, Proc. of the Thirteenth International World Wide Web Conference (WWW 2004), 2004, Manhattan, USA, pp. 595--601, ACM Press. [2] "Layered Label Propagation: A MultiResolution Coordinate-Free Ordering for Compressing Social Networks," Paolo Boldi, Marco Rosa, Massimo Santini, and Sebastiano Vigna, Proceedings of the 20th international conference on World Wide Web, 2011, ACM Press. The graph was gathered by UbiCrawler; please acknowledge the usage of UbiCrawler by quoting the following paper: [3] "UbiCrawler: A Scalable Fully Distributed Web Crawler", Paolo Boldi, Bruno Codenotti, Massimo Santini, and Sebastiano Vigna, Software: Practice & Experience, 2004, vol 34, no. 8, pp. 711--726 "This graph has been obtained from a 2005 crawl of the .sk domain performed by UbiCrawler for a group of Slovakian researchers. An interesting feature of this crawl is that we were provided a very large seed." For additional graph properties and statistics, including node labels, see http://law.di.unimi.it/webdata/sk-2005 The graph already appears in the SuiteSparse Matrix Collection as LAW/sk-2005, but this GAP/GAP-web version is slightly different. The number of nodes differs by three. In the original LAW/sk-2005 graph, with 50,636,154 nodes, the last three nodes of the graph have no edges at all. These have been dropped from the GAP/GAP-web graph. The LAW graph includes 19,119,653 self-edges, but the GAP/GAP-web graph discards these. Finally, the LAW/sk-2005 graph is binary, but the GAP/GAP-web graph adds random edge weights. That is: Prob1 = ssget ('LAW/sk-2005') ; Prob2 = ssget ('GAP/GAP-web') ; A1 = Prob1.A ; A2 = Prob2.A ; % the LAW/sk-2005 graph is binary assert (isequal (A1, spones (A1))) ; % discard diagonal entries from LAW/sk-2005, and last 3 nodes S1 = tril (A1, -1) + triu (A1,1) ; S1 = S1 (1:end-3, 1:end-3) ; % discard the edge weights from GAP/GAP-web S2 = spones (A2) ; % now the graphs are identical: assert (isequal (S1, S2)) The original graph is from the Laboratory for Web Algorithmics (LAW), Universita degli Studi di Milano, http://law.di.unimi.it/index.php . It was added to the SuiteSparse Matrix Collection by Tim Davis, from the original LAW data, as the LAW/sk-2005 matrix. It was then incorporated from there into the GAP benchmark suite by Scott Beamer, Krste Asanovic', and David Patternson, with random edge weights added, based on the version of the graph at https://sparse.tamu.edu/LAW/sk-2005. Finally, the new GAP graph has been re-imported back into the SuiteSparse Matrix Collection, as the GAP/GAP-web graph. This is to preserve the random edge weights and source nodes added by S. Beamer et al., so that the GAP benchmark can be fully replicated in the SuiteSparse Matrix Collection. The GAP breadth-first-search (BFS) benchmark generates 64 random source nodes and evaluates the time to compute the BFS from each of the 64 sources. The betweenness-centrality (BC) runs 16 trials with 4 source nodes each. These source nodes are the same, in the same order. That is, the first 4 BFS source nodes are the same 4 source nodes for the first trial of BC. In one-based notation (where nodes are numbered 1 to n), the 64 source nodes are: 10219453 44758212 890672 13843757 14168063 20906931 12189585 26352336 43500687 8987025 5699763 41436456 5030728 40735219 16533564 28700167 64712 39634751 16037780 27152740 16404062 20491964 5322424 21420954 26622110 5882876 18091041 10665897 18634423 18138716 2355536 32885206 40657441 35196168 45544427 6175520 40058319 50626231 36571020 49397053 23434266 2299445 32873824 25978283 2461716 22787315 30759948 7428895 39173871 43194210 26361510 39747212 30670030 41483034 9358667 9945009 3355245 33831270 45124745 16137878 11235449 37509145 27402415 39546084 The first row are the 4 source nodes for the first BC trial, the 2nd row is for the second BC trial, and so on. These node ids also appear as Problem.aux.sources, in one-based notation. One-based notation is used because the Matrix Market format is one-based, as is MATLAB. If you use a zero-based method (such as the GAP benchmark itself, or the C API to GraphBLAS), be sure to subtract one from the node ids above (and in Problem.aux.sources) to obtain the right source nodes.

Download

MATLAB Rutherford Boeing Matrix Market

Notes

GAP/GAP-web matrix:  this is the 'web' graph from the GAP            
benchmark suite, as described in "The GAP Benchmark Suite", by Scott 
Beamer, Krste Asanovic', and David Patterson,                        
https://arxiv.org/abs/1508.03619 .                                   
                                                                     
In that paper, the Web graph is described as follows:                
                                                                     
    Web (|V|=50.6M, |E|=1,949.4M, directed) is a web-crawl of the    
    .sk domain (sk-2005) [9].  Despite its large size, it exhibits   
    substantial locality due to its topology and high average        
    degree.                                                          
                                                                     
    [9] Timothy Davis and Yifan Hu. The University of Florida sparse 
    matrix collection. ACM Transactions on Mathematical Software,    
    38:1:1 – 1:25, 2011.                                             
                                                                     
The graph comes from the following source:                           
                                                                     
    Laboratory for Web Algorithmics (LAW), Universita degli Studi di 
    Milano, http://law.di.unimi.it/index.php.  When using matrices   
    in the LAW/ group in the collection, please follow the citation  
    instructions at http://law.di.unimi.it/datasets.php.  If you     
    publish results based on these graphs, please acknowledge the    
    usage of WebGraph and LLP by quoting the following papers:       
                                                                     
    [1] "The WebGraph Framework I: Compression Techniques," Paolo    
    Boldi and Sebastiano Vigna, Proc. of the Thirteenth              
    International World Wide Web Conference (WWW 2004), 2004,        
    Manhattan, USA, pp.  595--601, ACM Press.                        
                                                                     
    [2] "Layered Label Propagation: A MultiResolution                
    Coordinate-Free Ordering for Compressing Social Networks," Paolo 
    Boldi, Marco Rosa, Massimo Santini, and Sebastiano Vigna,        
    Proceedings of the 20th international conference on World Wide   
    Web, 2011, ACM Press.                                            
                                                                     
    The graph was gathered by UbiCrawler; please acknowledge the     
    usage of UbiCrawler by quoting the following paper:              
                                                                     
    [3] "UbiCrawler: A Scalable Fully Distributed Web Crawler",      
    Paolo Boldi, Bruno Codenotti, Massimo Santini, and Sebastiano    
    Vigna, Software: Practice & Experience, 2004, vol 34, no. 8, pp. 
    711--726                                                         
                                                                     
    "This graph has been obtained from a 2005 crawl of the .sk       
    domain performed by UbiCrawler for a group of Slovakian          
    researchers. An interesting feature of this crawl is that we     
    were provided a very large seed."                                
                                                                     
    For additional graph properties and statistics, including node   
    labels, see http://law.di.unimi.it/webdata/sk-2005               
                                                                     
The graph already appears in the SuiteSparse Matrix Collection as    
LAW/sk-2005, but this GAP/GAP-web version is slightly different.     
                                                                     
The number of nodes differs by three.  In the original LAW/sk-2005   
graph, with 50,636,154 nodes, the last three nodes of the graph have 
no edges at all.  These have been dropped from the GAP/GAP-web       
graph.  The LAW graph includes 19,119,653 self-edges, but the        
GAP/GAP-web graph discards these.  Finally, the LAW/sk-2005 graph is 
binary, but the GAP/GAP-web graph adds random edge weights.  That is:
                                                                     
    Prob1 = ssget ('LAW/sk-2005') ;                                  
    Prob2 = ssget ('GAP/GAP-web') ;                                  
    A1 = Prob1.A ;                                                   
    A2 = Prob2.A ;                                                   
                                                                     
    % the LAW/sk-2005 graph is binary                                
    assert (isequal (A1, spones (A1))) ;                             
                                                                     
    % discard diagonal entries from LAW/sk-2005, and last 3 nodes    
    S1 = tril (A1, -1) + triu (A1,1) ;                               
    S1 = S1 (1:end-3, 1:end-3) ;                                     
                                                                     
    % discard the edge weights from GAP/GAP-web                      
    S2 = spones (A2) ;                                               
                                                                     
    % now the graphs are identical:                                  
    assert (isequal (S1, S2))                                        
                                                                     
The original graph is from the Laboratory for Web Algorithmics       
(LAW), Universita degli Studi di Milano,                             
http://law.di.unimi.it/index.php .                                   
                                                                     
It was added to the SuiteSparse Matrix Collection by Tim Davis,      
from the original LAW data, as the LAW/sk-2005 matrix.  It was       
then incorporated from there into the GAP benchmark suite by         
Scott Beamer, Krste Asanovic', and David Patternson, with random     
edge weights added, based on the version of the graph at             
https://sparse.tamu.edu/LAW/sk-2005.  Finally, the new GAP graph     
has been re-imported back into the SuiteSparse Matrix Collection,    
as the GAP/GAP-web graph.  This is to preserve the random edge       
weights and source nodes added by S. Beamer et al., so that the GAP  
benchmark can be fully replicated in the SuiteSparse Matrix          
Collection.                                                          
                                                                     
The GAP breadth-first-search (BFS) benchmark generates 64 random     
source nodes and evaluates the time to compute the BFS from each of  
the 64 sources.  The betweenness-centrality (BC) runs 16 trials with 
4 source nodes each.  These source nodes are the same, in the same   
order.  That is, the first 4 BFS source nodes are the same 4 source  
nodes for the first trial of BC.  In one-based notation (where nodes 
are numbered 1 to n), the 64 source nodes are:                       
                                                                     
    10219453    44758212      890672    13843757                     
    14168063    20906931    12189585    26352336                     
    43500687     8987025     5699763    41436456                     
     5030728    40735219    16533564    28700167                     
       64712    39634751    16037780    27152740                     
    16404062    20491964     5322424    21420954                     
    26622110     5882876    18091041    10665897                     
    18634423    18138716     2355536    32885206                     
    40657441    35196168    45544427     6175520                     
    40058319    50626231    36571020    49397053                     
    23434266     2299445    32873824    25978283                     
     2461716    22787315    30759948     7428895                     
    39173871    43194210    26361510    39747212                     
    30670030    41483034     9358667     9945009                     
     3355245    33831270    45124745    16137878                     
    11235449    37509145    27402415    39546084                     
                                                                     
The first row are the 4 source nodes for the first BC trial,         
the 2nd row is for the second BC trial, and so on.                   
                                                                     
These node ids also appear as Problem.aux.sources, in one-based      
notation.  One-based notation is used because the Matrix Market      
format is one-based, as is MATLAB.  If you use a zero-based method   
(such as the GAP benchmark itself, or the C API to GraphBLAS), be    
sure to subtract one from the node ids above (and in                 
Problem.aux.sources) to obtain the right source nodes.