Group Kamvar
Group Description |
Web matrices from Sep Kamvar, Stanford University. * Stanford Web Matrix: 281903 pages, ~2.3 million links. From a September 2002 crawl. * Stanford-Berkeley Web Matrix: 683446 pages, ~7.6 million links. From a December 2002 crawl. The data comes fro the Stanford WebBase project, http://www-diglib.stanford.edu/~testbed/doc2/WebBase . The original data is posted at http://www.stanford.edu/~sdkamvar/research.html . In the data posted there, one of the matrices is normalized, and the other is not (but includes MATLAB code to produce the normalized matrix). Both have some non-binary values in their connectivity, which represents pages with more than one link to another specific page. The matrices posted here are purely binary, and correspond to the matrix G discussed in Chapter 2 of "Numerical Computing with MATLAB" by Cleve Moler and Kathryn Moler, http://www.mathworks.com/moler . In these matrices, if G(i,j)=1 then there is a link from page j to page i. Thus, column j of G reflects the links that you can click on, if you are visiting page j. A basic statement of the pagerank algorithm is given at http://www.mathworks.com/moler/ncm/pagerank.m , but that algorithm is not suited for these large problems. The methods posted at http://www.mathworks.com/moler/ncm/pagerankpow.m or http://www.stanford.edu/~sdkamvar/research.html are more suitable for these problems. The related eigenvalue problem is to find the vector x such that x=A*x, where A = (p*G*D + delta*e*e'), and: p = a scalar damping factor less than one (typically 0.85) G = the binary connectivity matrix (Problem.A in the MATLAB .mat files, or the .pua files). Don't confuse Problem.A with the matrix A = (p*G*D + delta*e*e'). Problem.A is the matrix G, not A. D = a diagonal matrix with D (i,i) equal to the sum of column i of G delta = (1-p)/n e = the column vector of all one's The urls of the Stanford_Berkeley matrix are given in a cell array, Problem.colname. The url associate with column j is Problem.colname {j}. Only a few of these are given, corresponding to root urls. No urls are given for the Stanford matrix. |
---|
Displaying all 2 collection matrices
Id | Name | Group | Rows | Cols | Nonzeros | Kind | Date | Download File |
---|---|---|---|---|---|---|---|---|
979 | Stanford | Kamvar | 281,903 | 281,903 | 2,312,497 | Directed Graph | 2003 | MATLAB Rutherford Boeing Matrix Market |
980 | Stanford_Berkeley | Kamvar | 683,446 | 683,446 | 7,583,376 | Directed Graph | 2003 | MATLAB Rutherford Boeing Matrix Market |