SNAP/higgs-twitter

SNAP network: Higgs Twitter dataset
Name higgs-twitter
Group SNAP
Matrix ID 2786
Num Rows 456,626
Num Cols 456,626
Nonzeros 14,855,842
Pattern Entries 14,855,842
Kind Directed Temporal Multigraph
Symmetric No
Date 2015
Author M. De Domenico, A. Lima, P. Mougel and M. Musolesi
Editor J. Leskovec
Structural Rank
Structural Rank Full
Num Dmperm Blocks
Strongly Connect Components 91,664
Num Explicit Zeros 0
Pattern Symmetry 31.6%
Numeric Symmetry 31.6%
Cholesky Candidate no
Positive Definite no
Type binary
Download MATLAB Rutherford Boeing Matrix Market
Notes
SNAP (Stanford Network Analysis Platform) Large Network Dataset Collection,
Jure Leskovec and Anrej Krevl, http://snap.stanford.edu/data, June 2014.   
email: jure at cs.stanford.edu                                             
                                                                           
Higgs Twitter Dataset                                                      
                                                                           
https://snap.stanford.edu/data/higgs-twitter.html                          
                                                                           
Dataset information                                                        
                                                                           
The Higgs dataset has been built after monitoring the spreading processes  
on Twitter before, during and after the announcement of the discovery of a 
new particle with the features of the elusive Higgs boson on 4th July 2012.
The messages posted in Twitter about this discovery between 1st and 7th    
July 2012 are considered.                                                  
                                                                           
The four directional networks made available here have been extracted from 
user activities in Twitter as:                                             
                                                                           
    1. re-tweeting (retweet network)                                       
    2. replying (reply network) to existing tweets                         
    3. mentioning (mention network) other users                            
    4. friends/followers social relationships among user involved          
       in the above activities                                             
    5. information about activity on Twitter during the discovery of       
       Higgs boson                                                         
                                                                           
It is worth remarking that the user IDs have been anonimized, and the same 
user ID is used for all networks. This choice allows to use the Higgs      
dataset in studies about large-scale interdependent/interconnected         
multiplex/multilayer networks, where one layer accounts for the social     
structure and three layers encode different types of user dynamics.        
                                                                           
For more information about data collection, please refer to our paper.     
                                                                           
Dataset statistics are calculated for the graph with the highest number of 
nodes and edges:                                                           
                                                                           
Social Network statistics                                                  
Nodes   456,626                                                            
Edges   14,855,842                                                         
Nodes in largest WCC    456290 (0.999)                                     
Edges in largest WCC    14855466 (1.000)                                   
Nodes in largest SCC    360210 (0.789)                                     
Edges in largest SCC    14102605 (0.949)                                   
Average clustering coefficient  0.1887                                     
Number of triangles 83023401                                               
Fraction of closed triangles    0.002901                                   
Diameter (longest shortest path)    9                                      
90-percentile effective diameter    3.7                                    
                                                                           
Retweet Network statistics                                                 
Nodes   256,491                                                            
Edges   328,132                                                            
Nodes in largest WCC    223833 (0.873)                                     
Edges in largest WCC    308596 (0.940)                                     
Nodes in largest SCC    984 (0.004)                                        
Edges in largest SCC    3850 (0.012)                                       
Average clustering coefficient  0.0156                                     
Number of triangles 21172                                                  
Fraction of closed triangles    0.0001085                                  
Diameter (longest shortest path)    19                                     
90-percentile effective diameter    6.8                                    
                                                                           
Reply Network statistics                                                   
Nodes   38,918                                                             
Edges   32,523                                                             
Nodes in largest WCC    12839 (0.330)                                      
Edges in largest WCC    14944 (0.459)                                      
Nodes in largest SCC    322 (0.008)                                        
Edges in largest SCC    708 (0.022)                                        
Average clustering coefficient  0.0058                                     
Number of triangles 244                                                    
Fraction of closed triangles    0.0001561                                  
Diameter (longest shortest path)    29                                     
90-percentile effective diameter    10                                     
                                                                           
Mention Network statistics                                                 
Nodes   116,408                                                            
Edges   150,818                                                            
Nodes in largest WCC    91606 (0.787)                                      
Edges in largest WCC    132068 (0.876)                                     
Nodes in largest SCC    1801 (0.015)                                       
Edges in largest SCC    7069 (0.047)                                       
Average clustering coefficient  0.0825                                     
Number of triangles 23068                                                  
Fraction of closed triangles    0.0002417                                  
Diameter (longest shortest path)    18                                     
90-percentile effective diameter    6.5                                    
                                                                           
Data format - higgs-activity_time.txt                                      
                                                                           
    userA userB timestamp interaction                                      
                                                                           
    Interaction can be RT (retweet), MT (mention) or RE (reply). Each link 
    is directed. The user IDs in this dataset corresponds to the ones      
    adopted to anonymize the social structure, thus the datasets (1) - (5) 
    can be used together for complex analysis involving structure and      
    dynamics.                                                              
                                                                           
    Note 1: the direction of links depends on the application, in general. 
    For instance, if one is interested in building a network of how        
    information flows, then the direction of RT should be reversed when    
    used in the analysis. Nevertheless, the choice is left to the          
    researcher and his/her own interpretation of the data, whereas we just 
    provide the observed actions, i.e., who                                
    retweets/mentions/replies/follows whom.                                
                                                                           
    Note 2: users mentioned in retweeted tweets are considered as mentions.
    For instance, if @A retweets the tweet “hello @C @D" sent by @B, then  
    the following links are created: @A @B timeX RT, @A @C timeX MT, @A @D 
    timeX MT, because @C and @D can be notified that they have been        
    mentioned in a retweet. Similarly in the case of a reply. If for some  
    reason the researcher does not agree with this choice, he/she can      
    easily identify this type of links and remove the mentions, for        
    instance.                                                              
                                                                           
Source (citation)                                                          
M. De Domenico, A. Lima, P. Mougel and M. Musolesi. The Anatomy of a       
Scientific Rumor. (Nature Open Access) Scientific Reports 3, 2980 (2013).  
http://www.nature.com/srep/2013/131018/srep02980/full/srep02980.html       
                                                                           
Files                                                                      
File    Description                                                        
social_network.edgelist.gz  Friends/follower graph (directed)              
retweet_network.edgelist.gz                                                
    Graph of who retweets whom (directed and weighted)                     
reply_network.edgelist.gz                                                  
    Graph of who replies to who (directed and weighted)                    
mention_network.edgelist.gz                                                
    Graph of who mentions whom (directed and weighted)                     
higgs-activity_time.txt.gz                                                 
    The dataset provides information about activity on                     
    Twitter during the discovery of Higgs boson                            
                                                                           
---------------------------------------------------------------------------
Notes on inclusion into the SuiteSparse Matrix Collection, July 2018:      
---------------------------------------------------------------------------
                                                                           
The SNAP data set is 1-based, with all nodes in all graphs numbered 1      
to n=456,626.                                                              
                                                                           
In the SuiteSparse Matrix Collection, each matrix is the same size, n-by-n 
where n=456,626, so that row/column i in each matrix refers to the same    
person i across all matrices.  This means that some rows and columns of    
the Retweet, Mention, and Reply matrices are empty, but these are left in  
so all four matrices can be compared with each other.                      
                                                                           
Problem.A is the primary social network, and is a directed graph           
with no edge weights (an unsymmetric binary matrix).  A(i,j)=1 if          
person i follows person j.  It is not a multigraph.                        
                                                                           
Retweet = Problem.aux.retweet is the Retweet network, where Retweet(i,j)   
is the number of times that person i retweets a tweet of person j.         
                                                                           
Mention = Problem.aux.mention is the Mention network, where Mention(i,j)   
is the number of times that person i mentions person j.                    
                                                                           
Reply = Problem.aux.reply is the Reply network, where Reply(i,j)           
is the number of times that person i replies to person j.                  
                                                                           
The Retweet, Mention, and Reply matrices represent multigraphs since each  
(i,j,t) with the same i and j but different timestamp t is considered a    
separate edge.  The timestamps do not appear in these matrices, however.   
                                                                           
The higgs-activity_time.txt is a set of labeled temporal edges.  Each edge 
in the SNAP data set has the form (i,j,time,interaction) where interaction 
is string (RT, MT, or RE).  In the SuiteSparse Matrix collection, these    
edges are stored as a dense matrix, Problem.aux.temporal_edges, where the  
kth row of the matrix holds the kth line of the higgs-activity_time.txt    
file as the temporal edge [i j interaction time].  The interaction is      
converted to an integer, where 1=RT (retweet), 2=MT (mention), and 3=RE    
(reply).