Constrained de-noising AutoEncoder

This post discusses an implementation of simple constrained de-noising autoencoder using TensorFlow. Here is implementation. An auto-encoder is an unsupervised learning model. It takes some input, runs it through “encoder” to get encodings of the input, and then it attempts to reconstruct original input based only on obtained encodings. The autoencoder is called constrained, if it’s decoder uses transposed matrices from encoder (instead of learning them from scratch). It is called de-noising, if during training, it randomly sets parts of it’s input to 0, but still attempts to re-construct it’s original input. This can help with over-fitting and, potentially, to capture only the most important parts of the data.

The idea is that encodings will encode the most important information in the data.

Example run:

python ~/repos/autoencoder/ --encoder_network=784,128,10 --noise_level=0.0 --batch_size=64 --num_epochs=60 --logdir=LD_784_128_10_N0

Then start Tensorboard:

tensorboard --logdir=LD_784_128_10_N0

Comments on implementation

I intentionally wanted the keep it as simple as possible. The class AutoEncoder (in is not application specific. Currently, the only supported cost function is mean squared error:

self._loss = tf.sqrt(tf.reduce_mean(tf.square(tf.sub(self._x, self._z))))

Notice, that it does not contain any regularization terms. You can regularize the model corrupting the input by setting noise level > 0. You might still find yourself in the situation when adding regularization terms to cost function might be usefull.

When building autoencoder model, it is natual to first create encoding part of the network and then add decoding part of the network on top of it. Since we are building a constrained autoencoder, matrices in the decoder part of the network are not learned, but instead just equal to the transpose of corresponding encoding matrices. However, we still need to learn biases for the decoding layer as well. Hence, all free parameters of the model are stored in these lists (order is important).

self._encoding_matrices = []
self._encoding_biases = []
self._decoding_biases = []

The network is fully defined by its activations, weight matrices and biases in the encoding and decoding layers. All we need to do to build a proper Tensorflow graph is to connect them using operations. We then add loss function and optimizer which we can use during training.

    def encode(self, x):
        inpt = x
        for i in range(0, len(self._encoding_matrices)):
            W = self._encoding_matrices[i]
            b = self._encoding_biases[i]
            logits = tf.nn.bias_add(tf.matmul(inpt, W), b)
    def decode(self, encoding):
        inpt = encoding
        for i in range(len(self._encoding_matrices)-1, -1, -1):
            Wd = tf.transpose(self._encoding_matrices[i])
            bd = self._decoding_biases[i]
            logits = tf.nn.bias_add(tf.matmul(inpt, Wd), bd)

Note, that in the snippet above, the decoder traverses encoding matrices, but in the reversed order and uses their transposes.

Example usage:

    dA = AutoEncoder(FLAGS.encoder_network, FLAGS.noise_level,
        True, FLAGS.acitivation_kind, FLAGS.optimizer_kind, FLAGS.learning_rate)

and then feed input x for training and eval:

feed_dict = {dA.x: batch_data}
tloss, _ =[dA.loss, dA.train_op], feed_dict=feed_dict)

See this post for examples Compressing MNIST using AutoEncoders