Group Marini

Group Description
Economic Time Series, T. Di Fonzo (Univ Padua) and M. Marini, (ISTAT, Italy)

Description of the Sparse Linear System arising from the 
Reconciliation Problem of a System of Economic Time Series 

Notes for Professor Tim Davis 

by Tommaso Di Fonzo and Marco Marini 
July 3, 2008 

Dear Prof. Davis, 
we have read your message Sparse QR mexFunction on the Matlab Central 
website, in which you invite the community to provide sparse problems 
arising from real applications for testing a new sparse QR function. In 
particular, you seem particularly interested in rank deficient problems. We 
have a problem with such characteristics, so we have decided to upload in the 
CISE website the elements of this linear system (coefficient matrix and right-
hand side). This problem is generally known as the reconciliation problem 
in the statistics and economics literature, which we intend to describe you 
very briefly in this note for your information. 

Economic statistics are often published in the form of time series, as a collec
tion of observations sampled at equally-spaced time periods (months, quar
ters). Economic concepts behind such statistics are often linked by a system 
of linear relationships, deriving from the economic theory. However, these 
restrictions are rarely met by the original time series for various reasons. 
Then, data sets of real-world variables generally show discrepancies with 
respect to prior restrictions on their values. Such discrepancies are rarely 
accepted because their existence generally causes confusion among users and 
embarrassment to data producers. The adjustment of a set of data in order 
to satisfy a number of accounting restrictions -and thus to remove any dis
crepancy -is generally known as the reconciliation problem. 

The reconciliation problem can be seen as the solution of a linear system. 
The dimension of the system increases rapidly as the number of time pe
riods, the number of variables and the number of constraints grow. Since 
it is convenient to solve the system simultaneously (that is, considering all 
periods, all variables, all constraints), the use of sparse matrices and algo
rithms is strongly suggested for solving practical applications. However, as 
far as we know, such problems are generally solved by data producers by 
using dense matrices and/or iterative methods (i.e. the conjugate gradi
ent method). Our intention is to encourage the use of direct methods with 
sparse matrices for solving this kind of problems. 

A formalization of the problem is helpful to understand the features of this 
linear system. Let p be an n-dimensional vector containing all the obser-
vations of the system to be reconciled (where n is the number of variables 
multiplied by the number of periods in which such variables are observed). 
Let H be the matrix defining the relationships between such observations. 
Supposing there are k constraints, the matrix H has dimension k × n: for 
each constraint we have a known total that must be satisfied by the obser
vations, collected together in the k-dimensional vector w. Notice that (i) 
k<n, and (ii) rank(H) ≤ k. In other words, some constraints could be 
redundant. The reconciliation problem can be summarized by the inequality 

    H*p = w. (1) 

The vector p is called preliminary, because it contains data that do not 
satisfy the given constraints and must be modified accordingly. The solution 
to the problem is given by y, an n-dimensional vector which satisfies the 
given constraints, that is 

    H*y = w. (2) 

and is close enough to the preliminary vector p. 
An optimal solution of the problem (2) can be derived following a least-
squares adjustment criterion. First of all, a metric must be defined to mea
sure the distance between p and y. Denote by Q this metric, with Q a 
positive semi-definite (we stress this fact, Q can be singular) matrix of 
dimension n. Then, the vector y can be obtained as the minimand of the 
quadratic loss function 

    (y − p)'*Q*(y − p) (3) 

subject to 

    H*y = w. (4) 

Writing down the Lagrangian function and calculating derivates of it with 
respect to y and lambda, we finally obtain the block linear system 

    [ Q H' ; H 0 ] * [ y ; lambda ] = [ Q*p ; w ]           (5)

which can be expressed, according to a standard and compact notation, as 

    A*x = b (6) 

System 5 is an "augmented system", where the coefficient matrix A is sparse, 
symmetric and 2 × 2 block partitioned. The matrix A is also indefinite, 
due to the zeroes in the (2, 2) block. Moreover, the matrix might present 
singularities for two reasons: 


    * The block (1, 1), that is matrix Q, might be singular (for example 
    when Q is positive semi-definite) 

    * The block (1, 2) might have reduced row rank, due to redundancy in 
    the constraints of the system. 

The latter condition is crucial, because if H has reduced rank, than matrix A
is also singular. In Matlab R2008a, a symmetric indefinite sparse system is
efficiently solved by the algorithm MA57. According to Duff (2004)1, this
algorithm is able to flush any singularities to the end of the factorization
and perform an LDLT factorization on the nonsingular block. We would like to
understand better how MA57 is able to solve the system in so little time; on
the other hand, if the singularities might be discovered and eliminated with an
efficient rank-revealing sparse QR method (which is the objective of your
request), we wonder if we would be able to solve the system without recurring
to MA57. 

Now we provide some more details about the elements of the system up
loaded on the CISE website. It refers to a real application composed of 183 
quarterly time series observed over 28 quarters, which form the system of 
European national accounts by institutional sectors (EURQSA). Then, the 
number of observations to be reconciled is n = 28 × 183 = 5124. The vari
ables are connected by a system of 30 linear relationships. Moreover, each 
quarterly time series must be in line with the same variables observed yearly 
(due to different compilation practices quarterly and annual estimates might 
differ). The total number of constraints of the system is k = 2121. On the 
whole, matrix A has dimension 7245, with block (1,1) of dimension 5124. 
Figure 1 shows the partitioned form of the matrix, including the LDLT fac
torization provided by MA57. 


Figure 1: Matrix A and LDLT tranformation for the EURQSA problem 

Duff (2004), MA57|a code for the solution of sparse symmetric definite and
indefinite systems, ACM Transactions on Mathematical Software, Volume 30, Issue
2, p. 122. 


A single Matlab *.mat file has been uploaded, named eurqsa-davis.mat. 
The archive contains the coefficient matrix A, the right-hand side vector b, 
and the solution achieved with MA57 x. Moreover, for your convenience we 
also saved the blocks (1, 1) and (1, 2) of matrix A, named A11 and A12 
respectively. 

We hope we have been sufficiently able to explain our problem and to pro
vide you with an interesting real application for your activity. We are still 
novel to sparse algorithms, we are sorry if we missed some important infor
mation about the system. If so, please do not hesitate to contact us. Just 
for your knowledge, both of us have installed the SuiteSparse package in 
Matlab and are currently studying on your book Direct methods for Sparse 
Linear System, which is very clear and helpful for our research. 

We really look forward to hearing from you soon. Thank you very much for 
your attention. 

Yours sincerely 

Prof. Tommaso Di Fonzo (University of Padua, Italy) e-mail:
difonzo@stat.unipd.it 

Dr. Marco Marini (ISTAT, Rome, Italy) e-mail: marco.marini@istat.it
Displaying 1 collection matrix
Id Name Group Rows Cols Nonzeros Kind Date Download File
1891 eurqsa Marini 7,245 7,245 46,142 Economic Problem 2008 MATLAB Rutherford Boeing Matrix Market