My blogs reporting quantitative financial analysis, artificial intelligence for stock investment & trading, and latest progress in signal processing and machine learning

Tuesday, June 28, 2011

Neuroscience software survey: What is popular, what has problems?

There is an excellent survey on Neuroscience software, carried out by Yaroslav Halchenko and his colleague. Here is the result link:
The primary analyses have been published in: Hanke, M. & Halchenko, Y. O. (2011). Neuroscience runs on GNU/Linux. Frontiers in Neuroinformatics, 5:8.

I picked several figures from their result according to my research interests:

Thursday, June 23, 2011

ICAtoolbox 3.8 is available now for download

Previously, in my homepage I only provided the content file of the toolbox, and I promised that once I translated the code descriptions into English, I would release it. However, since 2009 when I switched my interest to sparse signal recovery/compressed sensing, I had no time to do the translation job. So I decide to release it now and I apologize for some codes with descriptions/comments written in Chinese. However, if some body has questions, feel free to contact me.

The toolbox can be found in my software page:

PS: Please note that there may be several algorithm codes written by other people. The authors' names are written in the code descriptions.

You can call me "Zorro" instead of "Zhilin" if you like

During my study in US, I find many people don't know how to pronounce my first name "Zhilin", or have difficulty to remember my name. So I decide to give myself a nickname such that people can easily remember or pronounce it. And I choose "Zorro" as my nickname, since I very like Zorro during my childhood and my wife said that I have several obvious characteristics in common with Zorro (except that I don't know how to fight :) ).

Open problems in Bayesian statistics

Mike Jordan wrote a report on his interesting survey on 50 statisticians by asking them what are the open problems in Bayesian statistics. Here is his report:

The top open problems in his report are as follows:

No.1. Model selection and hypothesis testing.

No.2. Computation and statistics.

No.3. Bayesian/frequentist relationships

No.4. Priors

No.5. Nonparametrics and semiparametrics

Andew Gelman made excellent comments on these problems. Here is the link:

Wednesday, June 22, 2011

Speeding up Latent Dirichlet Allocation (LDA)

Alex has wrote a blog entry on the speed issue of LDA and released his fast LDA algorithm that performs on many computers. Here is the blog entry:

In Purdue's MLSS 2011, I was lucky to attend his lectures on graphical models. I like his lectures. The only regret is that I really hope he could give more details on LDA and its variants; for example, giving a detailed development on a variant of LDA models. But I understand the time was limited. After all, the lectures serve for a seminar, not for a college class.

 (Image credit to Alex's Lecture Slides)

Friday, June 17, 2011

Scientific American: How Simple Photos Could Be Used as a Test for a Conscious Machine

The latest issue of Scientific American has an article: How Simple Photos Could Be Used as a Test for a Conscious Machine by  Christof Koch and Giulio Tononi (you need to access the issue to see the full-text. The website just provides you an introduction to the article). It is very interesting. If I have time this summer, I will attend the competition. In fact, I have several ideas to cheat their smart computer. But I don't want to say much right now. However, I'd to say, the picture that the authors provide (see below) is easily recognized by computer algorithms as a unreasonable picture. Their algorithm just needs to do object recognization and then do some semantic reasoning, then it can know this picture is not reasonable. So, don't be misled by the authors' picture.

(The picture's credit to Scientific American)

Thursday, June 16, 2011

The T-SBL/T-MSBL paper has been revised

I just uploaded the revised version of the paper "Sparse Signal Recovery with Temporally Correlated Source Vectors Using Sparse Bayesian Learning" accepted by IEEE Journal of Selected Topics in Signal Processing. Several errors in the local analysis have been corrected. This is the final version of the paper.

Saturday, June 11, 2011

Misunderstandings on Sparse Bayesian Learning (SBL) for Compressed Sensing (2)

I realized that I've almost forgot to continue this topic. Well, let's continue now. Particularly, I want to put more words on the regularization parameter lambda.

In the first blog entry of this topic (see Misunderstandings on Sparse Bayesian Learning (SBL) for Compressed Sensing (1)), I emphasized that

(1) for a given SBL algorithm, the optimal lambda is different under different experiment settings.
(2) Different SBL algorithms have different optimal lambda under the same experiment settings.
(3) For most SBL algorithms, noise variance is not the optimal lambda value.

Now I'll talk somethings on the lambda learning rules. Generally, most SBL algorithms have their own learning rules for lambda. However, as I emphasized previously, these learning rules basically cannot lead to the best performance for their associated SBL algorithms. To give a clear understanding on this, I'll show you a simulation result below.

The simulation is a comparison of 4 SBL algorithms for the SMV model (single measurement vector case) in a noisy environment (noise variance = 0.01). The 4 SBL algorithms are as follows:
    (1) Wipf & Rao's EM-SBL (published in IEEE TSP, 2004, title: Sparse Bayesian Learning for Basis Selection) using the basic EM updating rule, denoted by EM-SBL
    (2) Qiu & Dogandzic's SBL (published in IEEE TSP, 2010, title: Variance-component based sparse signal reconstruction and model selection), denoted by ExCov
    (3) Ji, Xue & Carin's Bayesian Compressive Sensing (published in IEEE TSP, 2008, title: Bayesian Compressive Sensing), denoted by BCS
    (4) My T-MSBL for the SINGLE measurement vector model (accepted by IEEE Journal of Selected Topics in Signal Processing, 2011, title: Sparse Signal Recovery with Temporally Correlated Source Vectors Using Sparse Bayesian Learning), denoted by T-MSBL for SMV. Although T-MSBL is derived for the MMV cases, it can be also used for SMV cases. In these cases, T-MSBL is essentially the same as EM-SBL, but T-MSBL uses an efficient learning rule for the lambda.
First, let's see how the lambda value affects their recovery performance. I plotted their performance as a function of the lambda (Note in SMV cases, EM-SBL and T-MSBL have the same performance curve as a function of lambda), i.e. all the algorithms estimated the solution when fed with a fixed lambda value, and the lambda value varied from 0.001 to 0.33. The performance curve of EM-SBL (T-MSBL for SMV), ExCov, and BCS are plotted as a red solid line, a blue solid line, and a green solid line, respectively, in the following figure.
As we can see, if we can obtain the optimal lambda for each algorithm, then EM-SBL(T-MSBL for SMV) and ExCov have the similar performance, while BCS has the poorest performance. However, if we choose a wrong lambda, say lambda = 0.0381 (this value was calculated according to the suggestion in the Basis Pursuit Denoising paper), then we may conclude that the performance rank is: EM-SBL better than BCS better than ExCov. But if we choose lambda = 0.0042 (this value was calculated according to the suggestion in the BCS code), then we may conclude that the performance rank is: ExCov better than EM-SBL better than BCS. So, unthoughtful choices of lambda may lead to wrong conclusions. Again, we've seen the noise variance (0.01) is not the optimal lambda values of all the SBL algorithms. However, it can be seen as a good estimate for the lambda for ExCov  in this case.

Second, let's see the how lambda learning rules affect the recovery performance. In fact, if we use lambda learning rules to estimate the lambda, we can find the conclusion on performance comparison will change again. I plotted the performance of EM-SBL, ExCov, and my T-MSBL when they used their own lambda leaning rules, shown in red dashed line with square marks, blue dashed line with circle marks, and red dashed line with star marks (for clear display, I omitted the performance of BCS in this case).

You can see, the lambda learning rule of EM-SBL leads to very very poor performance, so poor that even a random guess on lambda may leads to better performance. I've said that in SMV cases the only difference between EM-SBL and T-MSBL is basically the lambda learning rules. Now you can see, the lambda learning rule of my T-MSBL leads to the best performance among the algorithms.

When I read papers in the past one year, I did find that some people heavily relied on the lambda learning rules of SBL. An obvious example is, in terms of recovery performance, SBL algorithms are much better than Lasso, Basis Pursuit, Matching Pursuit etc (this is basically a "truth" well known in our lab). However, in some published simulation results, we found SBL had poorer performance than those algorithms. This is probably due to wrong choices of lambda values, or the use of some lambda learning rules that behaved very poorly.

Here I used the same experiment settings as above to compare the SBL algorithm with Lasso (fed with the optimal regularization value) and Subspace Pursuit (fed with the true number of nonzero elements in the solution vector). Lasso and Subspace Pursuit are plotted as black line with different marks in the following figure:
Clearly, if we allow EM-SBL to use its lambda learning rule, its performance is poorer than Lasso and Subspace Pursuit. However, EM-SBL actually has much better performance than Lasso and Subspace Pursuit, if it uses its optimal lambda value, or uses the noise variance as its lambda value, or uses other lambda values obtained from cross-validation etc.

So, the lambda learning rule is also a crucial factor when evaluating a SBL algorithm's performance. All the lambda learning rules cannot lead to the best performance for their associated SBL algorithms. But some are more effective than others.

Another conclusion is, the work to derive a more effective lambda learning rule is the same valuable as the work to derive a new SBL algorithm using other computation frameworks or models. For example, ExCov is actually an extension of EM-SBL, which treats the nonzero elements of large variance and small variance with different ways. Due to this, the performance curve of ExCov as a function of lambda values is different to that of EM-SBL. When both algorithms choose their optimal lambda value, ExCov has slightly better performance than EM-SBL. However, if both algorithms use their lambda learning rules, ExCov has much better performance than EM-SBL. But if EM-SBL uses the lambda learning rule of T-MSBL (again, note that EM-SBL and T-MSBL is basically the same in SMV cases, except the lambda learning rules), EM-SBL can have better performance than ExCov (see the performance curve denoted by T-MSBL for SMV case). In this sense, it may be better to derive an effective lambda learning rule based on old models than to derive algorithms based on new models.

Actually, lambda learning rules are more important for SBL algorithms in practical applications. This is because in practice, you cannot obtain the optimal lambda value. Although you can use cross-validation, modified L-curve methods or other methods to choose a value for the lambda, for a large-scale dataset, the computational load is heavy.

In simulations you may find some algorithms exhibit excellent performance when use the noise variance as the lambda values, while others behave poorly when use the noise variance as their lambda values. If you conclude that the former algorithms should be better than the latter, you are wrong. This is because it is difficult to accurately estimate the noise variance in practical applications in most cases;  errors in estimating the noise variance cannot be avoided. And you should note that the optimal lambda values of SBL algorithms are not far from each other, and not far from the true noise variance. So, the conclusion you get in simulations when use the true noise variance as the lambda values can be totally different to the conclusion you get in practice when you use the estimated noise variance as the lambda values.

In the next blog entry, I will discuss the issue on the threshold to prune out small gamma_i.

For reproducibility, the experiment settings are as follows:

Gaussian random dictionary matrix of the size 30 x 80
nonzero element number: D = 5
The nonzero elements are generated as: nonzeroW = sign(randn(D,1)).* ( rand(D,1)*0.5 + 0.5 );
And their locations are chosen randomly.
noise variance: 0.01


Zhilin Zhang, Bhaskar D. Rao, Clarify Some Issues on the Sparse Bayesian Learning for Sparse Signal Recovery, Technical Report, University of California, San Diego, September, 2011