Two days ago, Kaiming He make their ResNet public, and includes the various pretrained model. I was excited and download the model immediately and try to use loadcaffe module to use the ResNet. Sadly, the loadcaffe seems not to support the Batch Normalizations.

Okay, however, this morning, I found facebook has released their ResNet (fb.resnet.torch) model which has maximum 101 layers but has a better result on validation error. I think facebook just released it yesterday, so it’s pretty fresh now. The facebook version is also now on the page of Kaiming’s Resnet page, and also with some other versions.

So, a immediate idea to me is to use the ResNet on neuraltalk2, because it’s really easy to plug in the original framework. Facebook has a clear code how to extract feature from an image.

The code is here. In principle, it’s just remove the VGG and put ResNet in (less than 100-line modification). The link is here neuraltalk2-resnet. I am still training it, so there may be some bugs.

Enjoy and please point out the bugs.

In addition

I forgot to give the link about the post of the facebook ResNet. the blog post.

I found it interesting that they use shared memory for the repeated modules. This seems interesting, may reduce half of the intermediate variables. But this seems to require the update to be done during the back propagation(This won't hurt optimization right?). The shared memory is on gradInput, which will not be used in the update.

Here is what in the original post:

We used a few tricks to fit the larger ResNet-101 and ResNet-152 models on 4 GPUs, each with 12 GB of memory, while still using batch size 256. In a backwards pass, the gradInput buffers can be reused once the module’s gradWeight has been computed. In Torch, an easy way to achieve this is to modify modules of the same type to share their underlying storages. We also used the in-place variants of the ReLU and CAddTable modules.

Adding these memory optimizations only amount to an extra 10 lines of code.