GAP/GAP-web

GAP benchmark: web
Name GAP-web
Group GAP
Matrix ID 2853
Num Rows 50,636,151
Num Cols 50,636,151
Nonzeros 1,930,292,948
Pattern Entries 1,930,292,948
Kind Directed Weighted Graph
Symmetric No
Date 2017
Author Laboratory for Web Algorithmics (LAW), Universita degli Studi di Milano, http://law.di.unimi.it/index.php
Editor S. Beamer, K. Asanovic, D. Patterson
Structural Rank
Structural Rank Full
Num Dmperm Blocks
Strongly Connect Components 8,815,054
Num Explicit Zeros 0
Pattern Symmetry 12.5%
Numeric Symmetry 0%
Cholesky Candidate no
Positive Definite no
Type integer
Download MATLAB Rutherford Boeing Matrix Market
Notes
GAP/GAP-web matrix:  this is the 'web' graph from the GAP            
benchmark suite, as described in "The GAP Benchmark Suite", by Scott 
Beamer, Krste Asanovic', and David Patterson,                        
https://arxiv.org/abs/1508.03619 .                                   
                                                                     
In that paper, the Web graph is described as follows:                
                                                                     
    Web (|V|=50.6M, |E|=1,949.4M, directed) is a web-crawl of the    
    .sk domain (sk-2005) [9].  Despite its large size, it exhibits   
    substantial locality due to its topology and high average        
    degree.                                                          
                                                                     
    [9] Timothy Davis and Yifan Hu. The University of Florida sparse 
    matrix collection. ACM Transactions on Mathematical Software,    
    38:1:1 – 1:25, 2011.                                             
                                                                     
The graph comes from the following source:                           
                                                                     
    Laboratory for Web Algorithmics (LAW), Universita degli Studi di 
    Milano, http://law.di.unimi.it/index.php.  When using matrices   
    in the LAW/ group in the collection, please follow the citation  
    instructions at http://law.di.unimi.it/datasets.php.  If you     
    publish results based on these graphs, please acknowledge the    
    usage of WebGraph and LLP by quoting the following papers:       
                                                                     
    [1] "The WebGraph Framework I: Compression Techniques," Paolo    
    Boldi and Sebastiano Vigna, Proc. of the Thirteenth              
    International World Wide Web Conference (WWW 2004), 2004,        
    Manhattan, USA, pp.  595--601, ACM Press.                        
                                                                     
    [2] "Layered Label Propagation: A MultiResolution                
    Coordinate-Free Ordering for Compressing Social Networks," Paolo 
    Boldi, Marco Rosa, Massimo Santini, and Sebastiano Vigna,        
    Proceedings of the 20th international conference on World Wide   
    Web, 2011, ACM Press.                                            
                                                                     
    The graph was gathered by UbiCrawler; please acknowledge the     
    usage of UbiCrawler by quoting the following paper:              
                                                                     
    [3] "UbiCrawler: A Scalable Fully Distributed Web Crawler",      
    Paolo Boldi, Bruno Codenotti, Massimo Santini, and Sebastiano    
    Vigna, Software: Practice & Experience, 2004, vol 34, no. 8, pp. 
    711--726                                                         
                                                                     
    "This graph has been obtained from a 2005 crawl of the .sk       
    domain performed by UbiCrawler for a group of Slovakian          
    researchers. An interesting feature of this crawl is that we     
    were provided a very large seed."                                
                                                                     
    For additional graph properties and statistics, including node   
    labels, see http://law.di.unimi.it/webdata/sk-2005               
                                                                     
The graph already appears in the SuiteSparse Matrix Collection as    
LAW/sk-2005, but this GAP/GAP-web version is slightly different.     
                                                                     
The number of nodes differs by three.  In the original LAW/sk-2005   
graph, with 50,636,154 nodes, the last three nodes of the graph have 
no edges at all.  These have been dropped from the GAP/GAP-web       
graph.  The LAW graph includes 19,119,653 self-edges, but the        
GAP/GAP-web graph discards these.  Finally, the LAW/sk-2005 graph is 
binary, but the GAP/GAP-web graph adds random edge weights.  That is:
                                                                     
    Prob1 = ssget ('LAW/sk-2005') ;                                  
    Prob2 = ssget ('GAP/GAP-web') ;                                  
    A1 = Prob1.A ;                                                   
    A2 = Prob2.A ;                                                   
                                                                     
    % the LAW/sk-2005 graph is binary                                
    assert (isequal (A1, spones (A1))) ;                             
                                                                     
    % discard diagonal entries from LAW/sk-2005, and last 3 nodes    
    S1 = tril (A1, -1) + triu (A1,1) ;                               
    S1 = S1 (1:end-3, 1:end-3) ;                                     
                                                                     
    % discard the edge weights from GAP/GAP-web                      
    S2 = spones (A2) ;                                               
                                                                     
    % now the graphs are identical:                                  
    assert (isequal (S1, S2))                                        
                                                                     
The original graph is from the Laboratory for Web Algorithmics       
(LAW), Universita degli Studi di Milano,                             
http://law.di.unimi.it/index.php .                                   
                                                                     
It was added to the SuiteSparse Matrix Collection by Tim Davis,      
from the original LAW data, as the LAW/sk-2005 matrix.  It was       
then incorporated from there into the GAP benchmark suite by         
Scott Beamer, Krste Asanovic', and David Patternson, with random     
edge weights added, based on the version of the graph at             
https://sparse.tamu.edu/LAW/sk-2005.  Finally, the new GAP graph     
has been re-imported back into the SuiteSparse Matrix Collection,    
as the GAP/GAP-web graph.  This is to preserve the random edge       
weights and source nodes added by S. Beamer et al., so that the GAP  
benchmark can be fully replicated in the SuiteSparse Matrix          
Collection.                                                          
                                                                     
The GAP breadth-first-search (BFS) benchmark generates 64 random     
source nodes and evaluates the time to compute the BFS from each of  
the 64 sources.  The betweenness-centrality (BC) runs 16 trials with 
4 source nodes each.  These source nodes are the same, in the same   
order.  That is, the first 4 BFS source nodes are the same 4 source  
nodes for the first trial of BC.  In one-based notation (where nodes 
are numbered 1 to n), the 64 source nodes are:                       
                                                                     
    10219453    44758212      890672    13843757                     
    14168063    20906931    12189585    26352336                     
    43500687     8987025     5699763    41436456                     
     5030728    40735219    16533564    28700167                     
       64712    39634751    16037780    27152740                     
    16404062    20491964     5322424    21420954                     
    26622110     5882876    18091041    10665897                     
    18634423    18138716     2355536    32885206                     
    40657441    35196168    45544427     6175520                     
    40058319    50626231    36571020    49397053                     
    23434266     2299445    32873824    25978283                     
     2461716    22787315    30759948     7428895                     
    39173871    43194210    26361510    39747212                     
    30670030    41483034     9358667     9945009                     
     3355245    33831270    45124745    16137878                     
    11235449    37509145    27402415    39546084                     
                                                                     
The first row are the 4 source nodes for the first BC trial,         
the 2nd row is for the second BC trial, and so on.                   
                                                                     
These node ids also appear as Problem.aux.sources, in one-based      
notation.  One-based notation is used because the Matrix Market      
format is one-based, as is MATLAB.  If you use a zero-based method   
(such as the GAP benchmark itself, or the C API to GraphBLAS), be    
sure to subtract one from the node ids above (and in                 
Problem.aux.sources) to obtain the right source nodes.