My blogs reporting quantitative financial analysis, artificial intelligence for stock investment & trading, and latest progress in signal processing and machine learning

## Wednesday, February 23, 2011

### Sparse Signal Recovery with Temporally Correlated Source Vectors Using Sparse Bayesian Learning

Finally, I finished the revise of this paper, which was submitted to IEEE Journal of Selected Topics in Signal Processing for the second-round review. The paper can be downloaded from my homepage or from http://arxiv.org/abs/1102.3949

This paper addressed a problem in the multiple measurement vector (MMV) model in compressed sensing. For the MMV model, i.e.
Y=AX + V,
most algorithms apply Lq norm on the rows of the solution matrix X, while apply Lp norm on the evaluated values of the row norms. For example, the MMV extension of the classic Basis Pursuit uses the L2 norm to evaluate each row of X, while uses the L1 norm to constrain the row sparsity of X. In fact, most algorithms uses the L2 norm or the L_infinity norm to evaluate the rows of X. Clearly, these norm operations are blind to data structures, i.e. the correlations of elements in each nonzero row of X. I call this temporal correlation (this term is to emphasize the difference from those considering the correlations among different rows --- which are called spatial correlation in many literature). In this paper, by deriving two sparse Bayesian learning algorithms, I showed that algorithms can get  improved recovery performance if we replace the Lq norm imposed on each row with the Mahalanobis distance measure. Actually, the improvement is very surprising, as the experiments of the paper showed.

However, the development of new algorithms is just a part of the story. Computer simulations have found some interesting phenomena of the algorithms.

First, let me ask you a question:
When the temporal correlations exist, say with correlation coefficient being 0.9, does the recovery performance of algorithms become worse?
It seems that the answer is YES, because one may intuitively think that when the temporal correlations exist, each individual solution vector (i.e. Xi, with [X1,..., X_L]=X ) provides less information about the locations of nonzero rows. For example, consider an extreme case: suppose the temporal correlation coefficients are 0.999999. In this case, each individual solution vector Xi (i=1,...,L) is almost the same. Consequently, algorithms' performance is very close to their performance when only one measurement vector is available, namely, L=1 (let's consider a noiseless environment for simplicity).

However, my paper showed that the answer is NO. Here is one experiment result:

This is Fig.7 in my paper. M/N is the ratio of column number over row number of the dictionary matrix. T-MSBL is one of my derived algorithms, which exploits temporal correlations. MSBL is a benchmark algorithm proposed by David Wipf in 2007. $\beta$ indicates the temporal correlations. The relation of MSBL's performance to the temporal correlations is expected by our intuition: the higher the temporal correlations are, the worse the performance is. However, one can see the T-MSBL's performance becomes better with higher temporal correlations.

Next, let's see another extreme experiment. In this experiment (noiseless), the temporal correlations  approximate to 1 or -1 infinitely (temporal correlation is given by $\beta = sign(C)(1-10^{-|C|})$ ). Just imaging what's the value when C=10 and what this means how similarity among individual solution vectors! However, from the figure below, we again see the different behaviors of T-MSBL and MSBL. For MSBL, when |C| becomes large, its performance is close to its performance in the case of L=1. In contrast, T-MSBL still keeps the same performance, no matter how large |C| is.

As a reminder, the high temporal correlations indicate the ill-condition of the solution matrix X.

The two experiments clearly showed the importance of exploiting temporal correlations among different solution vectors, which is largely ignored in the field of compressed sensing. Up to now I've not found any existing algorithms have the similar performance to my T-MSBL and T-SBL. Also, I didn't find any theories can explain the behaviors of T-MSBL/T-SBL in the above two figures. If you find any algorithms with the similar performance, or find any theories can explain these phenomena, please let me know.

The third point that I want to emphasize is: I strongly suggest the use of Mahalanobis distance measure instead of the Lq norms imposed on the nonzero rows of X. We have seen lots of works (both theoretical and algorithmic) to analyze the benefits from the Lq norms, such as the L2 norm or the L_infinity norm. However, these data-structure blind norms cannot lead to satisfying performance in some practical applications, since practical signals or image sequences generally have strong temporal correlations. At least, using Mahalanobis distance measure can lead to better performance. But I have to admit, this work is just a start-point. Further study is needed.