SNAP/com-Amazon

SNAP network: Amazon product co-purchasing network and ground-truth communities
Name com-Amazon
Group SNAP
Matrix ID 2778
Num Rows 334,863
Num Cols 334,863
Nonzeros 1,851,744
Pattern Entries 1,851,744
Kind Undirected Graph With Communities
Symmetric Yes
Date 2012
Author J. Yang, J. Leskovec
Editor J. Leskovec
Structural Rank
Structural Rank Full
Num Dmperm Blocks
Strongly Connect Components 1
Num Explicit Zeros 0
Pattern Symmetry 100%
Numeric Symmetry 100%
Cholesky Candidate no
Positive Definite no
Type binary
Download MATLAB Rutherford Boeing Matrix Market
Notes
SNAP (Stanford Network Analysis Platform) Large Network Dataset Collection,
Jure Leskovec and Anrej Krevl, http://snap.stanford.edu/data, June 2014.   
email: jure at cs.stanford.edu                                             
                                                                           
Amazon product co-purchasing network and ground-truth communities          
                                                                           
https://snap.stanford.edu/data/com-Amazon.html                             
                                                                           
Dataset information                                                        
                                                                           
Network was collected by crawling the Amazon.com website. It is based on   
Customers Who Bought This Item Also Bought feature of the Amazon website.  
If a product i is frequently co-purchased with product j, the graph        
contains an undirected edge from i to j. Each product category provided by 
Amazon defines each ground-truth community.                                
                                                                           
We regard each connected component in a product category as a separate     
ground-truth community. We remove the ground-truth communities which have  
less than 3 nodes. We also provide the top 5,000 communities with highest  
quality which are described in our paper (http://arxiv.org/abs/1205.6233). 
As for the network, we provide the largest connected component.            
                                                                           
Dataset statistics                                                         
Nodes   334863                                                             
Edges   925872                                                             
Nodes in largest WCC    334863 (1.000)                                     
Edges in largest WCC    925872 (1.000)                                     
Nodes in largest SCC    334863 (1.000)                                     
Edges in largest SCC    925872 (1.000)                                     
Average clustering coefficient  0.3967                                     
Number of triangles 667129                                                 
Fraction of closed triangles    0.07925                                    
Diameter (longest shortest path)    44                                     
90-percentile effective diameter    15                                     
                                                                           
Source (citation) J. Yang and J. Leskovec. Defining and Evaluating Network 
Communities based on Ground-truth. ICDM, 2012.                             
http://arxiv.org/abs/1205.6233                                             
                                                                           
Files                                                                      
File    Description                                                        
com-amazon.ungraph.txt.gz   Undirected Amazon product co-purchasing network
com-amazon.all.dedup.cmty.txt.gz    Amazon communities                     
com-amazon.top5000.cmty.txt.gz  Amazon communities (Top 5,000)             
                                                                           
---------------------------------------------------------------------------
Notes on inclusion into the SuiteSparse Matrix Collection, July 2018:      
---------------------------------------------------------------------------
                                                                           
The graph in the SNAP data set is 1-based, with nodes numbered 1 to        
548,551.                                                                   
                                                                           
In the SuiteSparse Matrix Collection, Problem.A is the undirected Amazon   
product co-purchasing network, a matrix of size n-by-n with n=334,863,     
which is the number of unique product id's appearing in any edge.          
Problem.aux.nodeid is a list of the node id's that appear in the SNAP data 
set.  A(i,j)=1 if the product nodeid(i) is co-purchased with product       
nodeid(j).  The node id's are the same as the SNAP data set (1-based).     
                                                                           
C = Problem.aux.Communities_all is a sparse matrix of size n by 75,149,    
which holds the 75,149 categories in the com-amazon.all.dedup.cmty.txt     
file.  The kth line in that file defines the kth community, and is the     
column C(:,k), where C(i,k)=1 if product nodeid(i) is in the kth           
community.  Row C(i,:) and row/column i of the A matrix thus refer to the  
same product, nodeid(i).                                                   
                                                                           
Ctop = Problem.aux.Communities_top5000 is n-by-5000, with the same         
structure as the C array above, with the content of the                    
com-amazon.top5000.cmty.txt.