In the previous week I have attended the thirtieth annual conference on Neural Information Processing Systems (NIPS) which is a singletrack machine learning and computational neuroscience conference. The conference includes invited talks, demonstrations, oral and poster presentations of refereed papers. The place was crowded (double in size), however, I can’t compare it with NIPS 2015, because it was my first time being at the conference. It was incredible to meet the authors of this year notable publications in person and have a chance to talk to them, ask questions concerning opportunities in the field and future perspectives. However, the most discussable papers during breaks were ones submitted to the ICLR 2017. Also, you could observe NIPS trends (here I agree with Tomasz Malisiewicz):
 Learningtolearn
 GANification of X
 Reinforcement learning
 RNNs
 Creating/Selling AI companies.
Below you can find all accepted papers and implementations.
Accepted papers
All Code Implementations for NIPS 2016 papers
Here I will try to highlight the most interesting papers, talks and news from conference. There are a few usefull notes about NIPS conference, I added links where it is relevant.
Tutorials
I have attended the following tutorials::

Amazing authoritative tutorial called “Deep Reinforcement Learning through Policy Optimization” by Pieter Abbeel and John Schulman. This tutorial indepth covered the history and the latest techniques for the policy optimization.

Nuts and bolts of applying Deep Learning by Andrew Ng. A handout from the tutorial is here.

Generative Adversarial Networks (GANs) tutorial by Ian Goodfellow. The tutorial was describing the concepts and current advances around GANs, recent advances in training and usage. Ian presented results from Plug & Play Generative Networks paper, which is worth to check out.
Arpit Mohan wrote a nice and detailed post about the first day at the conference (tutorials and invited talk by Yann LeCun).
Papers and Highlights
GANs and dialogue systems:

Generating Text via Adversarial Training
Introduce generic framework employing Long shortterm Memory (LSTM) and convolutional neural network (CNN) for adversarial training to generate realistic text. Instead of using standard objective of GAN, feature distribution was match when training the generator. 
GANS for Sequences of Discrete Elements with the Gumbelsoftmax Distribution
“Generative Adversarial Networks (GAN) have limitations when the goal is to generate sequences of discrete elements. The reason for this is that samples from a distribution on discrete objects such as the multinomial are not differentiable with respect to the distribution parameters. This problem can be avoided by using the Gumbelsoftmax distribution, which is a continuous approximation to a multinomial distribution parameterized in terms of the softmax function. In this work, we evaluate the performance of GANs based on recurrent neural networks with Gumbelsoftmax output distributions in the task of generating sequences of discrete elements.” 
Adversarial Evaluation of Dialogue Models “The recent application of RNN encoderdecoder models has resulted in substantial progress in fully datadriven dialogue systems, but evaluation remains a challenge. An adversarial loss could be a way to directly evaluate the extent to which generated dialogue responses sound like they came from a human. This could reduce the need for human evaluation, while more directly evaluating on a generative task. In this work, we investigate this idea by training an RNN to discriminate a dialogue model’s samples from humangenerated samples. Although we find some evidence this setup could be viable, we also note that many issues remain in its practical application. We discuss both aspects and conclude that future work is warranted.”
RNN modifications:
 Using Fast Weights to Attend to the Recent Past
Introduce “fast weights” that “can be used to store temporary memories of the recent past and they provide a neurally plausible way of implementing the type of attention to the past that has recently proved very helpful in sequencetosequence models. By using fast weights we can avoid the need to store copies of neural activity patterns.”
 Sequential Neural Models with Stochastic Layers
This paper introduces stochastic recurrent neural networks which glue together a deterministic recurrent neural network and a state space model to form a stochastic and sequential neural generative model. Basically, combines RNNs and HMMs.  Phased LSTM: Accelerating Recurrent Network Training for Long or Eventbased Sequences
Extend the LSTM unit by adding a new time gate. This gate is controlled by a parametrized oscillation with a frequency range that produces updates of the memory cell only during a small percentage of the cycle. Even with the sparse updates imposed by the oscillation, the Phased LSTM network achieves faster convergence than regular LSTMs on tasks, which require learning of long sequences.  QuasiRecurrent Neural Networks (QRNNs)
The paper is under review as the conference paper at ICLR 2017. An approach to neural sequence modeling that alternates convolutional layers, which apply in parallel across timesteps, and a minimalist recurrent pooling function that applies in parallel across channels. Faster and sometimes better than standard LSTMs, also more interpretable.
Memory:

Can Active Memory Replace Attention?
“Active memory has not improved over attention for most natural language processing tasks, in particular for machine translation. We analyze this shortcoming in this paper and propose an extended model of active memory that matches existing attention models on neural machine translation and generalizes better to longer sentences. We investigate this model and explain why previous active memory models did not succeed. Finally, we discuss when active memory brings most benefits and where attention can be a better choice.” 
Differentiable Neural Computer
It is wellknown paper. Alex Graves promised that they are going to publish the code (part of the deal with Nature journal).
Others:

Generating Videos with Scene Dynamics
The model learns to generate tiny videos using adversarial networks.
Github 
Fast and Provably Good Seedings for kMeans
Introduce a fast seeding algorithm. Can get good centroid seeds orders of magnitude faster than the previous stateoftheart  kMeans++. 
Learning What and Where to Draw
Impressive presentation of a new image generation model. 
Very useful talks on How to Train a GAN with tips and trick.

Congratulations to Magenta for winning the Best Demo Award at NIPS 2016 with demo “Interactive musical improvisation with Magenta”. Also, they have really interesting paper submitted to ICLR 2017.
Tuning Recurrent Neural Networks with Reinforcement Learning
Blog post 
Marc Raibert of Boston Dynamics gave a very nice talk at the conference on robots and swearing robot spoof.
Surprisingly, Marc said that they almost didn’t use ML in their robot control algorithms. 
Apple is going to start publishing. Probably the most tweeted news :) You can read more via the following links: BussinesInsider and Bloomberg

Uber bought Geometric Intelligence and launched Uber AI Labs!

Open sourcing the Embedding Projector: a tool for visualizing high dimensional data. This project has been released by Google.
More in the announcement
See Live Demo: http://projector.tensorflow.org/ 
Release of DeepMind Lab, OpenAI Universe.
Control systems and simulators seem to be the next target and without question they are proliferating. The most discussed ones during the conference: DeepMind Lab
 OpenAI Universe
 Project Malmo
 CommAIenv
 There are probably even more of them which worth attention.
Workshops

Hierarchical Object Detection with Deep Reinforcement Learning
It’s a method for performing hierarchical object detection in images guided by a deep reinforcement learning agent. The key idea is to focus on those parts of the image that contain richer information and zoom on them. We train an intelligent agent that, given an image window, is capable of deciding where to focus the attention among five different predefined region candidates (smaller windows). This procedure is iterated providing a hierarchical image analysis. We compare two different candidate proposal strategies to guide the object search: with and without overlap.
Github 
Workshop on Reliable ML in the Wild.
“Adversarial Examples and Adversarial Training” by Ian Goodfellow, OpenAI Research Scientist.
Insightful takeaways
 Brad Neuberg’s NIPS 2016 Notes Recommend to read the poster section!!!
 Andrew L. Beam NIPS 2016
 NIPS 2016 — Day 1 Highlights
 NIPS 2016 — Day 2 Highlights: Platform wars, RL and RNNs
 NIPS 2016 — Final Highlights Days 4–6: Likelihoodfree inference, Dessert analogies, and much more.
 Post NIPS Reflections
 Exhibition of rejected papers in NIPS 2016 could be found here
If you have any question, remarks or found mistake please contact me info@taraslehinevych.me.