In the previous week I have attended the thirtieth annual conference on Neural Information Processing Systems (NIPS) which is a single-track machine learning and computational neuroscience conference. The conference includes invited talks, demonstrations, oral and poster presentations of refereed papers. The place was crowded (double in size), however, I can’t compare it with NIPS 2015, because it was my first time being at the conference. It was incredible to meet the authors of this year notable publications in person and have a chance to talk to them, ask questions concerning opportunities in the field and future perspectives. However, the most discussable papers during breaks were ones submitted to the ICLR 2017. Also, you could observe NIPS trends (here I agree with Tomasz Malisiewicz):

  • Learning-to-learn
  • GANification of X
  • Reinforcement learning
  • RNNs
  • Creating/Selling AI companies.

Below you can find all accepted papers and implementations.
Accepted papers
All Code Implementations for NIPS 2016 papers

Here I will try to highlight the most interesting papers, talks and news from conference. There are a few usefull notes about NIPS conference, I added links where it is relevant.


I have attended the following tutorials::

Arpit Mohan wrote a nice and detailed post about the first day at the conference (tutorials and invited talk by Yann LeCun).

Papers and Highlights

GANs and dialogue systems:

  • Generating Text via Adversarial Training
    Introduce generic framework employing Long short-term Memory (LSTM) and convolutional neural network (CNN) for adversarial training to generate realistic text. Instead of using standard objective of GAN, feature distribution was match when training the generator.

  • GANS for Sequences of Discrete Elements with the Gumbel-softmax Distribution
    “Generative Adversarial Networks (GAN) have limitations when the goal is to generate sequences of discrete elements. The reason for this is that samples from a distribution on discrete objects such as the multinomial are not differentiable with respect to the distribution parameters. This problem can be avoided by using the Gumbel-softmax distribution, which is a continuous approximation to a multinomial distribution parameterized in terms of the softmax function. In this work, we evaluate the performance of GANs based on recurrent neural networks with Gumbel-softmax output distributions in the task of generating sequences of discrete elements.”

  • Adversarial Evaluation of Dialogue Models “The recent application of RNN encoder-decoder models has resulted in substantial progress in fully data-driven dialogue systems, but evaluation remains a challenge. An adversarial loss could be a way to directly evaluate the extent to which generated dialogue responses sound like they came from a human. This could reduce the need for human evaluation, while more directly evaluating on a generative task. In this work, we investigate this idea by training an RNN to discriminate a dialogue model’s samples from human-generated samples. Although we find some evidence this setup could be viable, we also note that many issues remain in its practical application. We discuss both aspects and conclude that future work is warranted.”

RNN modifications:

  • Using Fast Weights to Attend to the Recent Past
    Introduce “fast weights” that “can be used to store temporary memories of the recent past and they provide a neurally plausible way of implementing the type of attention to the past that has recently proved very helpful in sequence-to-sequence models. By using fast weights we can avoid the need to store copies of neural activity patterns.”
  • Sequential Neural Models with Stochastic Layers
    This paper introduces stochastic recurrent neural networks which glue together a deterministic recurrent neural network and a state space model to form a stochastic and sequential neural generative model. Basically, combines RNNs and HMMs.
  • Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences
    Extend the LSTM unit by adding a new time gate. This gate is controlled by a parametrized oscillation with a frequency range that produces updates of the memory cell only during a small percentage of the cycle. Even with the sparse updates imposed by the oscillation, the Phased LSTM network achieves faster convergence than regular LSTMs on tasks, which require learning of long sequences.
  • Quasi-Recurrent Neural Networks (QRNNs)
    The paper is under review as the conference paper at ICLR 2017. An approach to neural sequence modeling that alternates convolutional layers, which apply in parallel across timesteps, and a minimalist recurrent pooling function that applies in parallel across channels. Faster and sometimes better than standard LSTMs, also more interpretable.


  • Can Active Memory Replace Attention?
    “Active memory has not improved over attention for most natural language processing tasks, in particular for machine translation. We analyze this shortcoming in this paper and propose an extended model of active memory that matches existing attention models on neural machine translation and generalizes better to longer sentences. We investigate this model and explain why previous active memory models did not succeed. Finally, we discuss when active memory brings most benefits and where attention can be a better choice.”

  • Differentiable Neural Computer
    It is well-known paper. Alex Graves promised that they are going to publish the code (part of the deal with Nature journal).



  • Hierarchical Object Detection with Deep Reinforcement Learning
    It’s a method for performing hierarchical object detection in images guided by a deep reinforcement learning agent. The key idea is to focus on those parts of the image that contain richer information and zoom on them. We train an intelligent agent that, given an image window, is capable of deciding where to focus the attention among five different predefined region candidates (smaller windows). This procedure is iterated providing a hierarchical image analysis. We compare two different candidate proposal strategies to guide the object search: with and without overlap.


  • Workshop on Reliable ML in the Wild.
    “Adversarial Examples and Adversarial Training” by Ian Goodfellow, OpenAI Research Scientist.

Insightful take-aways

If you have any question, remarks or found mistake please contact me