## GAP/GAP-twitter

GAP benchmark: twitter

Name | GAP-twitter |
---|---|

Group | GAP |

Matrix ID | 2852 |

Num Rows | 61,578,415 |

Num Cols | 61,578,415 |

Nonzeros | 1,468,364,884 |

Pattern Entries | 1,468,364,884 |

Kind | Directed Weighted Graph |

Symmetric | No |

Date | 2017 |

Author | H. Kwak, C. Lee, H. Park, S. Moon |

Editor | S. Beamer, K. Asanovic, D. Patterson |

Download | MATLAB Rutherford Boeing Matrix Market |
---|---|

Notes |
GAP/GAP-twitter matrix: this is the 'twitter' graph from the GAP benchmark suite, as described in "The GAP Benchmark Suite", by Scott Beamer, Krste Asanovic', and David Patterson, https://arxiv.org/abs/1508.03619 . In that paper, the Twitter graph is described as follows: Twitter (|V|=61.6M, |E|=1,468.4M, directed) is an example of a social network topology [18]. This particular crawl of Twitter has been commonly used by researchers and thus eases comparisons with prior work. By virtue of it coming from real-world data, it has interesting irregularities and the skew in its degree distribution can be a challenge for some implementations. [18] Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. What is Twitter, a social network or a news media? International World Wide Web Conference (WWW), 2010. The graph already appears in the SuiteSparse Matrix Collection as SNAP/twitter7, but with different numbers of nodes and edges. The GAP/GAP-twitter graph has 61,578,415 and 1,468,364,884 edges. The SNAP/twitter7 graph has 41,652,230 nodes and 1,468,365,182 edges. The edge counts differ by 298, which are 298 self-edges in SNAP/twitter7 that do not appear in the GAP-twitter graph. The node counts differ because of how nodes with no incident edges are treated. Every node in SNAP/twitter is incident on at least one edge (incoming or outgoing). The GAP/GAP-twitter7 matrix has 19,926,185 nodes that have no incident edges at all (no incoming nor outgoing edges, nor self edges). The original data at http://an.kaist.ac.kr/traces/WWW2010.html lists edges with node ids that correspond to active user ids. User ids that did not exist at the time the data was collected, do not appear in the edge lists. These are the nodes that appear in GAP-twitter but not in SNAP/twitter7. The other difference between GAP-twitter and SNAP/twitter7 is that the former has been given random integer edge weights (in range 1 to 255), whereas SNAP/twitter7 is a binary matrix. The following shows the relationship between the two versions of this graph in this collection: Prob1 = ssget ('SNAP/twitter7') Prob2 = ssget ('GAP/GAP-twitter') A1 = GrB.offdiag (Prob1.A) ; % remove 298 diagonal entries id = Prob1.aux.nodeid + 1 ; A2 = Prob2.A ; S2 = spones (A2 (id, id)) ; % pattern of submatrix of A2 assert (nnz (S2) == nnz (A2)) % S2 has all entries from A2 assert (isequal (A1, S2)) ; % now they are equal The GAP breadth-first-search (BFS) benchmark generates 64 random source nodes and evaluates the time to compute the BFS from each of the 64 sources. The betweenness-centrality (BC) runs 16 trials with 4 source nodes each. These source nodes are the same, in the same order. That is, the first 4 BFS source nodes are the same 4 source nodes for the first trial of BC. In one-based notation (where nodes are numbered 1 to n), the 64 source nodes are: 12441073 54488258 25451916 57714474 14839495 32081105 52957358 50444381 49590702 20127817 34939334 48251002 19524254 43676727 33055509 15244688 24946739 6479473 26077683 22023876 22081916 40034163 49496015 42847508 52409558 55445389 22028098 48766649 44521242 60135543 28528672 9678013 40020307 31625736 37446893 51788953 52584256 20346697 48387910 37337428 50501085 30130062 41185894 56495704 45663306 33359461 48143059 33291514 53461446 29340611 34148499 49171807 35550697 14521508 51633219 46823383 19396274 19871751 36862678 49539127 34016453 36567396 55487794 14391371 The first row are the 4 source nodes for the first BC trial, the 2nd row is for the second BC trial, and so on. These node ids also appear as Problem.aux.sources, in one-based notation. One-based notation is used because the Matrix Market format is one-based, as is MATLAB. If you use a zero-based method (such as the GAP benchmark itself, or the C API to GraphBLAS), be sure to subtract one from the node ids above (and in Problem.aux.sources) to obtain the right source nodes. The original twitter problem was currated by Haewoon Kwak (http://an.kaist.ac.kr/~haewoon), Changhyun Lee (http://an.kaist.ac.kr/~chlee), Hosung Park (http://an.kaist.ac.kr/~hosung), and Sue Moon (http://an.kaist.ac.kr/~sbmoon). It was added to the GAP benchmark by Scott Beamer, Krste Asanovic', and David Patternson based on the original data at: http://an.kaist.ac.kr/~haewoon/release/twitter_social_graph/ twitter_rv.tar.gz. Random integer edge weights (1 to 255) were then added to the graph in the GAP benchmark. It was then included into the SuiteSparse Matrix Collection, from the GAP benchmark graph, by Tim Davis. Note that the SNAP/twitter7 problem in the SuiteSparse Matrix Collection preserves all the meta-data, including information about 6,499 celebrities in the graph. This metadata does not appear in this GAP/GAP-twitter version of the graph. |