SNAP/wiki-RfA

SNAP network: Wikipedia requests for adminship
Name wiki-RfA
Group SNAP
Matrix ID 2797
Num Rows 11,380
Num Cols 11,380
Nonzeros 188,077
Pattern Entries 188,077
Kind Directed Weighted Temporal Multigraph
Symmetric No
Date 2014
Author R. West, H. S. Paskov, J. Leskovec, C. Potts
Editor J. Leskovec
Structural Rank
Structural Rank Full
Num Dmperm Blocks
Strongly Connect Components 8,931
Num Explicit Zeros 0
Pattern Symmetry 7.5%
Numeric Symmetry 6.5%
Cholesky Candidate no
Positive Definite no
Type integer
Download MATLAB Rutherford Boeing Matrix Market
Notes
SNAP (Stanford Network Analysis Platform) Large Network Dataset Collection,
Jure Leskovec and Anrej Krevl, http://snap.stanford.edu/data, June 2014.   
email: jure at cs.stanford.edu                                             
                                                                           
Wikipedia Requests for Adminship (with text)                               
                                                                           
https://snap.stanford.edu/data/wiki-RfA.html                               
                                                                           
Dataset information                                                        
                                                                           
For a Wikipedia editor to become an administrator, a request for adminship 
(RfA) (http://en.wikipedia.org/wiki/Wikipedia:RfA) must be submitted,      
either by the candidate or by another community member. Subsequently, any  
Wikipedia member may cast a supporting, neutral, or opposing vote.         
                                                                           
We crawled and parsed all votes since the adoption of the RfA process in   
2003 through May 2013. The dataset contains 11,381 users (voters and       
votees) forming 189,004 distinct voter/votee pairs, for a total of 198,275 
votes (this is larger than the number of distinct voter/votee pairs        
because, if the same user ran for election several times, the same         
voter/votee pair may contribute several votes).                            
                                                                           
This induces a directed, signed network in which nodes represent Wikipedia 
members and edges represent votes. In this sense, the present dataset is a 
more recent version of the Wikipedia adminship election data               
(https://snap.stanford.edu/data/wiki-Elec.html). However, there is also a  
rich textual component in RfAs, which was not included in the older        
version: each vote is typically accompanied by a short comment             
(median/mean: 19/34 tokens). A typical positive comment reads, "I've no    
concerns, will make an excellent addition to the admin corps", while an    
example of a negative comment is, "Little evidence of collaboration with   
other editors and limited content creation."                               
                                                                           
Network statistics                                                         
Nodes    10,835                                                            
Edges   159,388                                                            
Triangles 956,428                                                          
                                                                           
[The above statistics were computed after transforming the data into a     
directed network. The number of edges (159,388) is smaller than the number 
of voter/votee pairs (189,004) because neutral votes were discarded in the 
network we used for computing the statistics (but they are included in the 
dataset).]                                                                 
                                                                           
Sources (citations)                                                        
Robert West, Hristo S. Paskov, Jure Leskovec, and Christopher Potts:       
Exploiting Social Network Structure for Person-to-Person Sentiment         
Analysis. Transactions of the Association for Computational Linguistics,   
2(Oct):297–310, 2014.  http://infolab.stanford.edu/~west1/pubs/            
West-Paskov-Leskovec-Potts_TACL-14.pdf                                     
                                                                           
Files                                                                      
File Description                                                           
wiki-RfA.txt.gz     Wikipedia Requests for Adminship (RfA) 2003–2013       
                                                                           
Data format                                                                
                                                                           
    SRC:Guettarda                                                          
    TGT:Lord Roem                                                          
    VOT:1                                                                  
    RES:1                                                                  
    YEA:2013                                                               
    DAT:19:53, 25 January 2013                                             
    TXT:'''Support''' per [[WP:DEAL]]: clueful,                            
        and unlikely to break Wikipedia.                                   
                                                                           
where entries are separated by a single empty line and                     
                                                                           
    SRC: user name of source, i.e., voter                                  
    TGT: user name of target, i.e., the user running for election          
    VOT: the source's vote on the target                                   
         (-1 = oppose; 0 = neutral; 1 = support)                           
    RES: the outcome of the election                                       
         (-1 = target was rejected as admin; 1 = target was accepted)      
    YEA: the year in which the election was started                        
    DAT: the date and time of this vote                                    
    TXT: the comment written by the source, in wiki markup                 
                                                                           
---------------------------------------------------------------------------
Notes on inclusion into the SuiteSparse Matrix Collection, July 2018:      
---------------------------------------------------------------------------
                                                                           
In the SuiteSparse Matrix Collection, the 6 data items are split into      
6 different arrays in the MATLAB struct, and 6 different files in the      
MatrixMarket and Rutherford-Boeing formats.  Each have 198,275 rows        
(one per vote).  The votes appear in the same order as they do in the      
wiki-Rfa.txt file in the SNAP data set.                                    
                                                                           
Problem.aux.Source      source user name (a string, length <= 48)          
Problem.aux.Target      target user name                                   
Problem.aux.Vote        -1 (oppose), 0 (neutral), or 1 (support)           
Problem.aux.Result      result of the election (-1:reject, 1:accept)       
Problem.aux.Year        year of the vote (a number, 2003 to 2013)          
Problem.aux.Date        time and date (a string: HH:MM, day Month Year)    
Problem.aux.Text        text of the comments (a string, len <= 7609)       
                                                                           
In addition, the user names from the SRC: and TGT: of the votes have been  
collected in a single list, of size 11,380, and appear in                  
Problem.aux.UserId (each of length <= 48).  This list of user names has    
been sorted alplabetically.                                                
                                                                           
In 1,661 votes, the SRC: field is blank. These votes appear in the aux     
fields listed above, but are excluded from the graphs described below.     
For these votes, the DAT: and TXT: are also empty.  This "user id" is      
excluded in the list of user id's, giving 11,380 total users instead       
of 11,381.                                                                 
                                                                           
The user names in Problem.aux.Source and Problem.aux.Target have been      
translated into node numbers for the graph, into the two arrays            
Problem.aux.SourceNode and Problem.aux.TargetNode.  For the kth vote, the  
string Source(k,:) is equal to UserId (SourceNode (k),:), and likewise for 
the target nodes.  For votes with empty SRC: fields, however,              
SourceNode(k)=0.                                                           
                                                                           
Four graphs are constructed from the votes, after pruning votes with       
empty SRC: fields.  The primary graph, Problem.A, counts the number of     
votes, where A(i,j) is the number of times the ith user voted on the       
jth user (either positive, neutral, or negative, regardless of the vote).  
                                                                           
Each kind of vote is captured in three graphs.  For the source i and       
target j:                                                                  
                                                                           
Problem.aux.Positive (i,j) is the number of VOT:+1 votes that i voted on j 
Problem.aux.Neutral  (i,j) is the number of VOT:0 (neutral) votes that     
    i voted on j.  Note that a neutral vote is not the same thing as       
    no vote at all.                                                        
Problem.aux.Negative (i,j) is the number of VOT:-1 votes that i voted on j 
                                                                           
Thus, each graph is a multigraph, where the value of the (i,j) edge        
captures the number of edges, or votes, of the form (i,j) where            
i=source and j=target.  The overall graph A = Problem.A, is equal          
to the sum, A = Positive + Neutral + Negative.                             
                                                                           
    graph A nnz 188077 sum 196614                                          
    vote max multiplicity: 6                                               
    # votes: 1  : count 180284                                             
    # votes: 2  : count 7153                                               
    # votes: 3  : count 550                                                
    # votes: 4  : count 77                                                 
    # votes: 5  : count 12                                                 
    # votes: 6  : count 1                                                  
                                                                           
    graph Positive nnz 139326 sum 143879                                   
    vote max multiplicity: 6                                               
    # votes: 1  : count 135128                                             
    # votes: 2  : count 3884                                               
    # votes: 3  : count 278                                                
    # votes: 4  : count 32                                                 
    # votes: 5  : count 3                                                  
    # votes: 6  : count 1                                                  
                                                                           
    graph Neutral  nnz 11846 sum 12068                                     
    vote max multiplicity: 3                                               
    # votes: 1  : count 11627                                              
    # votes: 2  : count 216                                                
    # votes: 3  : count 3                                                  
                                                                           
    graph Negative nnz 39206 sum 40667                                     
    vote max multiplicity: 5                                               
    # votes: 1  : count 37840                                              
    # votes: 2  : count 1289                                               
    # votes: 3  : count 63                                                 
    # votes: 4  : count 10                                                 
    # votes: 5  : count 4                                                  
                                                                           
That is, the full graph A includes all 196,614 votes with non-empty SRC:   
fields (198275-1661).  The graphs do not capture the temporal nature of    
the edges; this data is preserved in the votes themselves, in the          
Problem.aux.[Source Target Vote Result Year Date Text] components.