2019年6月4日 星期二

ftps.space released!

  I started doing web site development some time ago.

  After fiddled with Flask, nginx, javascript, html, css, websocket, etc, etc... I have finished a site https://ftps.space that can transfer files between 2 Chrome browsers no matter whether they are on an Android device or desktop PC. It's a small and useful tool.

  If you read this, give it a try. Any suggestions are welcome.

  OKay~ So, why I build this site? There are times when you want to send some 'BIG' files to someone, but, it's too big to send through email, or you're in area where dropbox or google drive are forbidden, or you are reluctant (or too lazy) to setup a local ftp server just to transfer some files... I think a service that provide some one-time only channel to transfer files would be useful, a pure web app would be easy to use, thus FTPS was developed. Currently only chrome support the functions ftps required to run. It's a pity. But at least your android device can use it too. So, only between PCs, you can use it to transfer between Android and PC as well.




2019年3月10日 星期日

py-scm released!

Good news! I've completed porting my scm lib to python. project home
You can use pip to install it.
   pip install py-scm
or
    pip3 install py-scm

Since python is dynamic, I think I should be able to make actions like register slots automatically or more easily.  Well, until next time I find some time to enhance it. bye.

2018年8月25日 星期六

How to run an external program and interact with it with Python?

Python and its community is great.

Recently, I developed an acceptance test framework using Python. I am not expert in Python, but with help of Google search I can almost build anything I want. Except one: How to run an external program and interact with it with Python?

At first, it seems easy. Python's subprocess module let you run an external command and get its output. But it turns out you only get the outputs after the external program ended. In my application, I wanted to capture external programs outputs as soon as possible and as soon as some key words appear or timeout I can make some control flow decision.

I finally found the answer at ... http://code.activestate.com/recipes/440554-module-to-allow-asynchronous-subprocess-use-on-win/
Anyway, the code was for Python 2. With some amendment  it is listed below. Enjoy!



#!/usr/bin/env python3

import os
import subprocess
import errno
import time
import sys


if sys.platform == 'win32':
    from win32file import ReadFile, WriteFile
    from win32pipe import PeekNamedPipe
    import msvcrt
    import pywintypes
else:
    import select
    import fcntl


class AsyncPopen(subprocess.Popen):
    def recv(self, maxsize=None):
        return self._recv('stdout', maxsize)
    
    def recv_err(self, maxsize=None):
        return self._recv('stderr', maxsize)

    def send_recv(self, input='', maxsize=None):
        return self.send(input), self.recv(maxsize), self.recv_err(maxsize)

    def get_conn_maxsize(self, which, maxsize):
        if maxsize is None:
            maxsize = 1024
        elif maxsize < 1:
            maxsize = 1
        return getattr(self, which), maxsize
    
    def _close(self, which):
        getattr(self, which).close()
        setattr(self, which, None)
    
    if sys.platform == 'win32':
        def send(self, input):
            if not self.stdin:
                return None

            try:
                x = msvcrt.get_osfhandle(self.stdin.fileno())
                (errCode, written) = WriteFile(x, input.encode())
            except ValueError:
                return self._close('stdin')
            except (pywintypes.error, Exception) as why:
                print(why)
                if why.args[0] in (109, errno.ESHUTDOWN):
                    return self._close('stdin')
                raise

            return written

        def _recv(self, which, maxsize):
            conn, maxsize = self.get_conn_maxsize(which, maxsize)
            if conn is None:
                return None
            
            try:
                x = msvcrt.get_osfhandle(conn.fileno())
                (read, nAvail, nMessage) = PeekNamedPipe(x, 0)
                if maxsize < nAvail:
                    nAvail = maxsize
                if nAvail > 0:
                    (errCode, read) = ReadFile(x, nAvail, None)
            except ValueError:
                return self._close(which)
            except (pywintypes.error, Exception) as why:
                print(why)
                if why.args[0] in (109, errno.ESHUTDOWN):
                    return self._close(which)
                raise
            
            if self.universal_newlines:
                read = self._translate_newlines(read)
            return read

    else:
        def send(self, input):
            if not self.stdin:
                return None

            if not select.select([], [self.stdin], [], 0)[1]:
                return 0

            try:
                written = os.write(self.stdin.fileno(), input.encode())
            except OSError as why:
                if why[0] == errno.EPIPE: #broken pipe
                    return self._close('stdin')
                raise

            return written

        def _recv(self, which, maxsize):
            conn, maxsize = self.get_conn_maxsize(which, maxsize)
            if conn is None:
                return None
            
            flags = fcntl.fcntl(conn, fcntl.F_GETFL)
            if not conn.closed:
                fcntl.fcntl(conn, fcntl.F_SETFL, flags| os.O_NONBLOCK)
            
            try:
                if not select.select([conn], [], [], 0)[0]:
                    return ''
                
                r = conn.read(maxsize)
                if not r:
                    return self._close(which)
    
                if self.universal_newlines:
                    r = self._translate_newlines(r)
                return r
            finally:
                if not conn.closed:
                    fcntl.fcntl(conn, fcntl.F_SETFL, flags)


message = "Other end disconnected!"


def async_recv(p, t=.1, e=1, tr=5, stderr=0):
    if tr < 1:
        tr = 1
    x = time.time()+t
    y = []
    r = ''
    pr = p.recv
    if stderr:
        pr = p.recv_err
    while time.time() < x or r:
        r = pr()
        if r is None:
            if e:
                raise Exception(message)
            else:
                break
        elif r:
            y.append(r)
        else:
            time.sleep(max((x-time.time())/tr, 0))
    return b''.join(y)

    
def async_send(p, data):
    while len(data):
        sent = p.send(data)
        if sent is None:
            raise Exception(message)
        data = data[sent:]


if __name__ == '__main__':
    if sys.platform == 'win32':
        shell, commands, tail = ('cmd', ('dir /w', 'echo HELLO WORLD'), '\r\n')
    else:
        shell, commands, tail = ('sh', ('ls', 'echo HELLO WORLD'), '\n')
    
    p = AsyncPopen(shell, stdin=subprocess.PIPE, stdout=subprocess.PIPE)
    print (async_recv(p), end="")
    for cmd in commands:
        async_send(p, cmd + tail)
        print (async_recv(p), end="")
    async_send(p, 'exit' + tail)
    print (async_recv(p, e=0))
    p.wait()

2018年4月4日 星期三

My convolution neural network library

Finally, we've come to this. Today I present to you my convolution neural network library https://github.com/zen747/conv-neural-network. What's special about CNN is every layer is not composed of 1-dimensional neuron but 3-dimensional. Also, you have to implement the share-weights mechanism. It's much more complicate than bland neural network implementation. Still, the code should be easy enough to follow, check it out if you are interested.

The results seems decent enough.
Use a configuration as follows:

Input 32 32 3 0
Conv 32 3 1 1 0.1
Pool 2 2
Conv 64 3 1 1 0.1
Pool 2 2
Full 72 0.2
Full 72 0.2
Output 10
LearningRate 0.0004
RegularizationFactor 0.003
MaxNorm   1e4
BatchSize 150

I can obtain correct ratio about 70% after days of training. If my library supported GPU acceleration, it would take hours instead of days of training. But I will stop enhance the library at this point.

2018年3月10日 星期六

AI III -- classification

Today we are solving problems which are a little bit more complex. Unlike last article which are finding a neural network which can do XOR operation, this time the objective is to "classify". The first example is to classify a spiral dataset, as stated at http://cs231n.github.io/neural-networks-case-study/, the second one is fizz buzz http://joelgrus.com/2016/05/23/fizz-buzz-in-tensorflow/. By the way, we implemented features like regularization, dropout, max-norm, etc. Not sure every features were implemented correctly though XD, please leave a comment if you find something suspicious.

As always, please do as I suggested to learn your machine learning stuff at http://zen747.blogspot.tw/2018/03/i.html, to prevent wasting time.


/* 
 * This sample use gradient descend to classify a spiral dataset
 * http://cs231n.github.io/neural-networks-case-study/
 */

#include <vector>
#include <deque>
#include <algorithm>
#include <iostream>
#include <cmath>
#include <cassert>
#include <ctime>
#include <cfloat>
#include "MesenneTwister.h"

#define USE_SDL 1

#if USE_SDL
#include <SDL2/SDL.h>
#include <SDL2/SDL_image.h>
#include <SDL2/SDL_ttf.h>
#endif

using namespace std;

// The regularization scale the weights toward 0
#define REGULARIZATION_FACTOR  1e-2
#define DROPOUT_RATE           0.25
#define INPUT_DROPOUT_RATE     0.001
#define MAX_NORM               10
#define LEARNING_RATE          0.02
#define BATCH_SIZE             20

#define LEAKY_RELU_CONSTANT    0.005
const double epsilon = 1e-8;

MTRand g_rng(time(NULL));

// used in adam algorithm
int g_generation = 0;
    
//--------------------------------

template <int ACTIVATE_FUNC_TYPE>
double activate (double x)
{
    return 0;
}

template <int ACTIVATE_FUNC_TYPE>
double activate_drv (double y)
{
    return 0;
}

enum {
    SIGMOID,
    TANH,
    RELU
};

template<>
double activate<SIGMOID>(double x)
{
    return 1.0 / (1.0 + exp(-x));
}

template<>
double activate_drv<SIGMOID> (double y)
{
    return y * (1.0 - y);
}

template<>
double activate<TANH>(double x)
{
    return tanh (x);
}

template<>
double activate_drv<TANH> (double y)
{
    return 1.0 - y * y;
}

template<>
double activate<RELU>(double x)
{
    return (x > 0) ? x : LEAKY_RELU_CONSTANT * x;
}

template<>
double activate_drv<RELU>(double y)
{
    return (y > 0) ? 1.0 : LEAKY_RELU_CONSTANT;
}

//---------
double activate_func (double x)
{
    return activate<RELU>(x);
}

double activate_drv_func (double y)
{
    return activate_drv<RELU>(y);
}

//-----------
class Layer;

struct NetworkConfig
{
    bool training_;
    
    NetworkConfig()
    : training_(false)
    {}
};

class Neuron
{
public:
    Neuron (int numInputs=0);

    void init_weights (double mean, double variance, double magnitude);
    double calculate_output_from (vector<double> const&inputs);
    void calculate_pdrv_of_error_wrt_input_weights(vector<double> const&inputs, double drvLoss);
    void calculate_pdrv_of_error_wrt_input_weights(vector<double> const&inputs, const Layer &output_layer);
    const vector<double> & weights() const;
    size_t set_input_weights(const double *weights);
    void clear_pdrvs ();
    void update_weights (double learning_rate, double momentum, bool reguon);
    
public:
    int            idx_;
    vector<double> inputWeights_;
    double         value_;
    bool           droped_out_;
    
    vector<double> deltaWeight_;
    vector<double> momentum_;
    double         pdrvErrorWrtInput_;
    vector<double> pdrvErrorWrtWeights_;
};

class Layer
{
public:
    Layer(NetworkConfig *netconfig, int numNeurons, int numInputs);
    ~Layer();

    void init_weights (double mean, double variance, double magnitude);
    void forward_pass(const Layer &last_layer);
    void calculate_output_layer_pdrv_of_error_wrt_input_weights (vector<double> const&inputs,
                                                                 vector<double> const&outputsExp,
                                                                 double totalExp,
                                                                 int right_ans
                                                                );
    void calculate_middle_layer_pdrv_of_error_wrt_input_weights (vector<double> const&inputs, const Layer &output_layer);
    void save_weights_to(vector<double> &weights) const;
    size_t set_input_weights(const double *weights);
    const vector<double> & get_outputs() const;
    void set_outputs_of_input_layer(const vector<double> &value);
    size_t num_of_neurons () const;
    void clear_pdrvs ();
    void update_weights (double learning_rate, double momentum, bool reguon);
    void mark_dropout (double dropout_rate);
    void mark_no_dropout ();
    
public:
    vector<Neuron> nodes_;
    vector<double> outputs_;
    NetworkConfig *netconfig_;
    bool           output_layer_;
};


Neuron::Neuron(int numInputs)
: droped_out_(false)
{
    inputWeights_.resize(numInputs+1); // plus weight for bias
}

void Neuron::init_weights(double mean, double variance, double magnitude)
{
    // This initialization technic helps a lot
    double sqrtn = sqrt(2.0 * inputWeights_.size());
    for (size_t i=0; i < inputWeights_.size()-1; ++i) {
        inputWeights_[i] = magnitude * g_rng.randNorm(mean, variance) / sqrtn;
    }
    inputWeights_.back() = 0;
}

double Neuron::calculate_output_from (const vector< double >& inputs)
{
    assert (inputs.size()+1 == inputWeights_.size() && "number of inputs wrong");
    if (droped_out_) {
        value_ = 0;
        return 0;
    }
    
    double v=0;
    for (size_t i=0; i < inputs.size(); ++i) {
        v += inputs[i] * inputWeights_[i];
    }
    v += inputWeights_.back(); // bias
    value_ =  activate_func(v);
    return value_;
}

void Neuron::calculate_pdrv_of_error_wrt_input_weights(vector<double> const&inputs, double drvLoss)
{
    pdrvErrorWrtInput_ = drvLoss * activate_drv_func(value_);
    size_t i=0;
    for (; i < pdrvErrorWrtWeights_.size() - 1; ++i) {
        pdrvErrorWrtWeights_[i] += pdrvErrorWrtInput_ * inputs[i];
    }
    pdrvErrorWrtWeights_[i] += pdrvErrorWrtInput_ * 1.0; // and bias which is 1.0
}

void Neuron::calculate_pdrv_of_error_wrt_input_weights(const vector< double >& inputs, const Layer& output_layer)
{
    if (droped_out_) {
        pdrvErrorWrtInput_ = 0;
        return;
    }
    
    double d = 0;
    for (size_t i=0; i < output_layer.num_of_neurons(); ++i) {
        d += output_layer.nodes_[i].inputWeights_[idx_] * output_layer.nodes_[i].pdrvErrorWrtInput_;
    }
    
    pdrvErrorWrtInput_ = d * activate_drv_func(value_);
    size_t i=0;
    for (; i< pdrvErrorWrtWeights_.size() - 1; ++i) {
        pdrvErrorWrtWeights_[i] += pdrvErrorWrtInput_ * inputs[i];
    }
    pdrvErrorWrtWeights_[i] += pdrvErrorWrtInput_ * 1.0; // and bias which is 1.0
}


const vector< double >& Neuron::weights() const
{
    return inputWeights_;
}

size_t Neuron::set_input_weights(const double* weights)
{
    inputWeights_.assign(weights, weights + inputWeights_.size());
    pdrvErrorWrtWeights_.resize(inputWeights_.size());
    deltaWeight_.resize(inputWeights_.size());
    momentum_.resize(inputWeights_.size());
    return inputWeights_.size();
}

void Neuron::clear_pdrvs()
{
    pdrvErrorWrtWeights_.clear();
    pdrvErrorWrtWeights_.resize(inputWeights_.size());
}

void Neuron::update_weights(double learning_rate, double momentum, bool reguon)
{
    double weight_norm = 0;
    for (size_t i=0; i < inputWeights_.size(); ++i) {
        double dx = pdrvErrorWrtWeights_[i];
        double regularization = 0;
        if (reguon) {
            regularization = learning_rate * REGULARIZATION_FACTOR * inputWeights_[i];
        }
        /* RMSprop
        double decay_rate = 0.99;
        deltaWeight_[i] = decay_rate * deltaWeight_[i] + (1.0 - decay_rate) * dx * dx;
        inputWeights_[i] += -learning_rate * dx / (sqrt(deltaWeight_[i]) + epsilon) - regularization;
        //*/
        
        //* Adam
        double beta1=0.9;
        double beta2=0.999;
        momentum_[i] = beta1*momentum_[i] + (1.0 - beta1) * dx;
        double mt = momentum_[i] / (1.0 - pow(beta1, g_generation));
        //double mt = momentum_[i];
        deltaWeight_[i] = beta2 * deltaWeight_[i] + (1.0 - beta2) * dx * dx;
        double vt = deltaWeight_[i] / (1.0 - pow(beta2, g_generation));
        //double vt = deltaWeight_[i];
        inputWeights_[i] += -learning_rate * mt / (sqrt(vt) + epsilon) - regularization;
        //*/
        
        weight_norm += inputWeights_[i] * inputWeights_[i];
    }
    
    weight_norm = sqrt(weight_norm);
    if (weight_norm > MAX_NORM) {
        double r = MAX_NORM / weight_norm;
        for (size_t i=0; i < inputWeights_.size(); ++i) {
            inputWeights_[i] *= r;
        }
    }
}



Layer::Layer(NetworkConfig *netconfig, int numNeurons=0, int numInputsPerNeuron=0)
: netconfig_(netconfig)
, output_layer_(false)
{
    nodes_.resize(numNeurons);
    for (size_t i=0; i < nodes_.size(); ++i) {
        nodes_[i] = Neuron(numInputsPerNeuron);
        nodes_[i].idx_ = i;
    }
    outputs_.resize(numNeurons);
}

Layer::~Layer()
{
    nodes_.clear();
}

void Layer::init_weights(double mean, double variance, double magnitude)
{
    for (size_t i=0; i < nodes_.size(); ++i) {
        nodes_[i].init_weights (mean, variance, magnitude);
    }
}

void Layer::forward_pass(const Layer &last_layer)
{
    double rate = 1.0 - DROPOUT_RATE;
    
    const vector<double> & inputs = last_layer.get_outputs();
    for (size_t i=0; i < nodes_.size(); ++i) {
        if (nodes_[i].droped_out_) {
            outputs_[i] = nodes_[i].value_ = 0;
        } else {
            nodes_[i].calculate_output_from (inputs);
            if (!output_layer_ && !netconfig_->training_) {
                nodes_[i].value_ *= rate;
            }
            outputs_[i] = nodes_[i].value_;
        }
    }
}

void Layer::calculate_output_layer_pdrv_of_error_wrt_input_weights(vector<double> const&inputs,
                                                                   vector<double> const&outputsExp,
                                                                   double totalExp,
                                                                   int right_ans
                                                                  )
{
    assert (outputsExp.size() == nodes_.size());
    
    for (size_t i=0; i < nodes_.size(); ++i) {
        double drvLoss;
        if (i == right_ans) {
            drvLoss = outputsExp[i] / totalExp -1.0;
        } else {
            drvLoss = outputsExp[i] / totalExp;
        }
        nodes_[i].calculate_pdrv_of_error_wrt_input_weights (inputs, drvLoss);
    }
}

void Layer::calculate_middle_layer_pdrv_of_error_wrt_input_weights(const vector< double >& inputs, const Layer& output_layer)
{
    for (size_t i=0; i < nodes_.size(); ++i) {
        nodes_[i].calculate_pdrv_of_error_wrt_input_weights (inputs, output_layer);
    }
}


void Layer::save_weights_to(vector< double >& weights) const
{
    for (size_t i=0; i < nodes_.size(); ++i) {
        const vector<double> &w = nodes_[i].weights();
        weights.insert(weights.end(), w.begin(), w.end());
    }

}

size_t Layer::set_input_weights(const double* weights)
{
    size_t pos = 0;
    for (size_t i=0; i < nodes_.size(); ++i) {
        pos += nodes_[i].set_input_weights(weights + pos);
    }
    return pos;
}

void Layer::set_outputs_of_input_layer(const vector< double >& value)
{
    outputs_ = value;
    for (size_t i=0; i < nodes_.size(); ++i) {
        if (netconfig_->training_ && nodes_[i].droped_out_) {
            nodes_[i].value_ = outputs_[i] = 0;
        } else {
            nodes_[i].value_ = outputs_[i];
        }
    }
}

const vector< double >& Layer::get_outputs() const
{
    return outputs_;
}


size_t Layer::num_of_neurons() const
{
    return nodes_.size();
}

void Layer::clear_pdrvs()
{
    for (size_t i=0; i < nodes_.size(); ++i) {
        nodes_[i].clear_pdrvs();
    }
}

void Layer::update_weights(double learning_rate, double momentum, bool reguon)
{
    for (size_t i=0; i < nodes_.size(); ++i) {
        nodes_[i].update_weights(learning_rate, momentum, reguon);
    }
}

void Layer::mark_dropout(double dropout_rate)
{
    for (size_t i=0; i < nodes_.size(); ++i) {
        nodes_[i].droped_out_ = (g_rng.rand() < dropout_rate);
    }
}

void Layer::mark_no_dropout()
{
    for (size_t i=0; i < nodes_.size(); ++i) {
        nodes_[i].droped_out_ = false;
    }
}



class NeuralNetwork
{
public:
    NeuralNetwork(int numInputs, vector<int> const& hiddenLayerTopology, int numOutpus);
    ~NeuralNetwork();
    void init_weights(double mean_value, double variance_value, double magnitude);
    size_t number_of_weights () const;
    void run (vector<double> const &inputs, vector<double> &outputs);
    void train (vector<double> const &inputs, vector<double> &outputs);
    void forward_pass (vector<double> const &inputs);
    void backward_pass (vector<double> &outputs, int right_ans);
    void save_weights_to(vector<double> &weights) const;
    void set_weights(vector<double> const &weights);
    void clear_pdrvs ();
    void update_weights (double learning_rate, double momentum);
    double error () const { return last_error_; }
    bool regularization_on () const { return regularization_on_; }
    void set_regularization_on () { regularization_on_ = true; }
    
protected:
    void feed_forward ();
    void mark_dropout ();
    void mark_no_dropout ();
    
private:
    vector<Layer *> layers_;
    double last_error_;
    NetworkConfig *netconfig_;
    bool regularization_on_;
    
    NeuralNetwork(const NeuralNetwork &rhs) {}
    NeuralNetwork &operator=(const NeuralNetwork &rhs) { return *this; }
};

NeuralNetwork::NeuralNetwork(int nInputs, const vector< int >& hiddenLayerTopology, int nOutputs)
: last_error_(0)
, netconfig_(new NetworkConfig)
, regularization_on_(false)
{
    layers_.resize(hiddenLayerTopology.size() + 2); // hidden + inputs + outputs
    int nNeuron = nInputs;
    layers_[0] = new Layer(netconfig_, nNeuron, 0); // layer zero is inputs
    for (size_t i=0; i < hiddenLayerTopology.size(); ++i) {
        layers_[i+1] = new Layer(netconfig_, hiddenLayerTopology[i], layers_[i]->num_of_neurons());
    }
    nNeuron = nOutputs;
    layers_.back() = new Layer(netconfig_, nNeuron, hiddenLayerTopology.back());
}

NeuralNetwork::~NeuralNetwork()
{
    delete netconfig_;
    for (size_t i=0; i < layers_.size(); ++i) {
        delete layers_[i];
    }
    layers_.clear();
}

void NeuralNetwork::init_weights(double mean, double variance, double magnitude)
{
    for (size_t i=1; i < layers_.size(); ++i) {
        layers_[i]->init_weights(mean, variance, magnitude);
    }    
}

size_t NeuralNetwork::number_of_weights() const
{
    vector<double> weights;
    this->save_weights_to(weights);
    return weights.size();
}

void NeuralNetwork::run(const vector< double >& inputs, vector< double >& outputs)
{
    if (netconfig_->training_) {
        netconfig_->training_ = false;
        mark_no_dropout ();
    }
    forward_pass(inputs);
    outputs = layers_.back()->get_outputs();    
}

void NeuralNetwork::train(const vector< double >& inputs, vector< double >& outputs)
{
    if (!netconfig_->training_) {
        netconfig_->training_ = true;
        mark_dropout ();
    }
    forward_pass(inputs);
    outputs = layers_.back()->get_outputs();    
}

void NeuralNetwork::forward_pass(const vector< double >& inputs)
{
    // feed forward
    // set inputs as outputs of first layer
    layers_[0]->set_outputs_of_input_layer(inputs);
    feed_forward ();
}

void NeuralNetwork::backward_pass(vector< double >& outputs, int right_ans)
{
    // calculate errors/loss
    double denom = 0;
    double numer = 0;
    last_error_ = 0;

    double max_out=-DBL_MAX;
    for (size_t i=0; i < outputs.size(); ++i) {
        if (outputs[i] > max_out) {
            max_out = outputs[i];
        }
    }
    
    vector<double> outputsExp; outputsExp.reserve(outputs.size());
    double totalExp = 0;
    for (size_t i=0; i < outputs.size(); ++i) {
        outputsExp.push_back(exp(outputs[i] - max_out));
        if (i == right_ans) {
            numer = outputsExp.back();
        }
        denom += outputsExp.back();
    }
    totalExp = denom;
    last_error_ = -log(numer / denom);
    
    // calculate partial derivatives
    // first the last layer(output layer)
    vector< double > layer_inputs = layers_[layers_.size()-2]->get_outputs();
    layers_.back()->calculate_output_layer_pdrv_of_error_wrt_input_weights(layer_inputs, outputsExp, totalExp, right_ans);
    
    // back propagate
    for (int i=layers_.size()-2; i > 0; --i) {
        vector< double > layer_inputs = layers_[i-1]->get_outputs();
        layers_[i]->calculate_middle_layer_pdrv_of_error_wrt_input_weights(layer_inputs, *layers_[i+1]);
    }
    
}


void NeuralNetwork::feed_forward()
{
    for (size_t i=1; i < layers_.size(); ++i) {
        layers_[i]->forward_pass(*layers_[i-1]);
    }    
}

void NeuralNetwork::mark_dropout()
{
    for (size_t i=0; i < layers_.size()-1; ++i) {
        double r = DROPOUT_RATE;
        if (i==0) r = INPUT_DROPOUT_RATE;
        layers_[i]->mark_dropout(r);
    }    

}

void NeuralNetwork::mark_no_dropout()
{
    for (size_t i=0; i < layers_.size()-1; ++i) {
        layers_[i]->mark_no_dropout();
    }    
}

void NeuralNetwork::clear_pdrvs()
{
    for (size_t i=1; i < layers_.size(); ++i) {
        layers_[i]->clear_pdrvs();
    }
}

void NeuralNetwork::update_weights(double learning_rate, double momentum)
{
    assert (netconfig_->training_ && "why update weights when not training?");
    
    // update weights
    for (size_t i=1; i < layers_.size(); ++i) {
        layers_[i]->update_weights(learning_rate, momentum, regularization_on_);
    }
    
    mark_dropout();
    clear_pdrvs();        
}

void NeuralNetwork::save_weights_to(vector< double >& weights) const
{
    weights.clear();
    for (size_t i=1; i < layers_.size(); ++i) {
        layers_[i]->save_weights_to (weights);
    }

}

void NeuralNetwork::set_weights(const vector< double >& weights)
{
    int pos = 0;
    for (size_t i=1; i < layers_.size(); ++i) {
        pos += layers_[i]->set_input_weights (&weights[pos]);
    }
}

struct Genome
{
    double fitness_;
    vector<double> genes_;
    
    Genome(size_t numGenes=0);
    void copy_genes_from (Genome const&source, size_t from_index, size_t to_index, size_t replace_start_index, double mutate_rate);
    void report () const;
    
    bool operator<(const Genome &rhs) const
    {
        return fitness_ > rhs.fitness_;
    }
};

Genome::Genome(size_t numGenes)
{
    assert (numGenes && "number of genes can't be zero");
    genes_.resize(numGenes);
}


void Genome::copy_genes_from(const Genome& source, size_t from_index, size_t to_index, size_t replace_start_index, double mutate_rate)
{
    for (size_t i=from_index,cgene=0; i < to_index; ++i,++cgene) {
        genes_[replace_start_index+cgene] = source.genes_[i];
    }
}

void Genome::report() const
{
    cout << "weights ";
    for (size_t i=0; i < genes_.size(); ++i) {
        cout << genes_[i] << " ";
    }
}

struct Point {
    double x_, y_;
    int class_;
};

const int nClass = 3;

struct DataSet
{
    vector<Point> samples_;
    void randomize ()
    {
        vector<Point> sam;
        while (!samples_.empty()) {
            int idx = g_rng.randInt() % samples_.size();
            sam.push_back(samples_[idx]);
            samples_[idx] = samples_.back();
            samples_.pop_back();
        }
        sam.swap(samples_);
    }
};

#if USE_SDL

#define RENDER_WINDOW_WIDTH 600
#define RENDER_WINDOW_HEIGHT 600

struct Color {
    Uint8 r_, g_, b_;
    Color(Uint8 r,Uint8 g, Uint8 b)
    : r_(r), g_(g), b_(b)
    {
    }
};

class Renderer {
    SDL_Window *win_;
    SDL_Renderer *ren_;
    SDL_Texture *tex_;
    
    int init_sdl ()
    {
        if (SDL_Init(SDL_INIT_VIDEO) != 0){
            std::cout << "SDL_Init Error: " << SDL_GetError() << std::endl;
            return 1;
        }
        
        win_ = SDL_CreateWindow("Hello World!", 100, 100, RENDER_WINDOW_WIDTH, RENDER_WINDOW_HEIGHT, SDL_WINDOW_SHOWN);
        if (win_ == 0){
            std::cout << "SDL_CreateWindow Error: " << SDL_GetError() << std::endl;
            SDL_Quit();
            return 1;
        }
            
        ren_ = SDL_CreateRenderer(win_, -1, SDL_RENDERER_ACCELERATED | SDL_RENDERER_PRESENTVSYNC);
        if (ren_ == 0){
            SDL_DestroyWindow(win_);
            std::cout << "SDL_CreateRenderer Error: " << SDL_GetError() << std::endl;
            SDL_Quit();
            return 1;
        }    

        std::string imagePath = "circle.png";
        tex_ = IMG_LoadTexture(ren_, imagePath.c_str());
        if (tex_ == 0) {
            SDL_DestroyRenderer(ren_);
            SDL_DestroyWindow(win_);
            std::cout << "SDL_CreateTextureFromSurface Error: " << SDL_GetError() << std::endl;
            SDL_Quit();
            return 1;
        }


        return 0;
    }
        

public:
    Renderer ()
    : win_(0), ren_(0), tex_(0)
    {
    }
    
    ~Renderer ()
    {
        if (tex_) SDL_DestroyTexture(tex_);
        if (ren_) SDL_DestroyRenderer(ren_);
        if (win_) SDL_DestroyWindow(win_);
        SDL_Quit();

    }
    
    int start ()
    {
        return init_sdl ();
    }
    
    int render (DataSet const&sData)
    {
        SDL_SetRenderDrawColor(ren_, 255, 255, 255, 255);
        SDL_RenderClear(ren_);
        SDL_Rect dst;
        dst.w = dst.h = 16;
        int offX = dst.w/2, offY=dst.h/2;
        vector<Color> cvClass;
        cvClass.push_back(Color(255,0,0));
        cvClass.push_back(Color(255,255,0));
        cvClass.push_back(Color(0,255,0));
        cvClass.push_back(Color(0,0,255));
        for (size_t i=0; i < sData.samples_.size(); ++i) {
            const Color &color = cvClass[sData.samples_[i].class_];
            SDL_SetTextureColorMod(tex_, color.r_, color.g_, color.b_);
            dst.x = RENDER_WINDOW_WIDTH * (0.5 + sData.samples_[i].x_) - offX;
            dst.y = RENDER_WINDOW_HEIGHT * (0.5 + sData.samples_[i].y_) - offY;
            SDL_RenderCopy(ren_, tex_, NULL, &dst);
        }
        //Update the screen
        SDL_RenderPresent(ren_);
            
        //SDL_Delay(3000);
        return 0;
    }
};
#endif

void generate_data_set (DataSet &sData, int nPointsPerClass)
{
    int nC = nClass + 1;
    for (int iC=0; iC < nClass; ++iC) {
        for (int iP=0; iP < nPointsPerClass; ++iP) {
            Point p;
            p.class_ = iC;
            double radius = 0.001 + double(iP) / nPointsPerClass;
            double angle = nC * iC + 1.0 * nC * radius + g_rng.randNorm() * 0.2;
            p.x_ = 0.5 * radius * sin(angle); p.y_ = 0.5 * radius * cos(angle);
            sData.samples_.push_back(p);
        }
    }
}

bool answer_correct (vector<double> const&outputs, int right_ans)
{
    double max_term=-DBL_MAX;
    int ans = -1;
    for (size_t i=0; i < outputs.size(); ++i) {
        if (outputs[i] > max_term) {
            max_term = outputs[i];
            ans = i;
        }
    }
    return ans == right_ans;
}

const int numInputPoints = 2;
const int numOutputPoints = 4;

int test_check (NeuralNetwork &network)
{
    vector<double> inputs(numInputPoints);
    vector<double> outputs(numOutputPoints);

    cout << "test... ";
    DataSet sDataTest;
    generate_data_set (sDataTest, 100);
    int correct_answer = 0;
    for (size_t i=0; i < sDataTest.samples_.size(); ++i) {
        inputs.clear();
        inputs.push_back(sDataTest.samples_[i].x_);
        inputs.push_back(sDataTest.samples_[i].y_);
        network.run(inputs, outputs);
        //expects.clear();
        //load_expects (expects, sDataTest.samples_[i].class_);
        //network.backward_pass(outputs, sDataTest.samples_[i].class_);
        if (answer_correct(outputs, sDataTest.samples_[i].class_)) {
            ++correct_answer;
        } else {
            //cout << i <<  " at (" << sDataTest.samples_[i].x_ << "," <<sDataTest.samples_[i].x_ << ") ";
            //cout << "value " << outputs[0] << "," << outputs[1] << "," << outputs[2] << " vs " <<  sDataTest.samples_[i].class_ << "\n";
        }
    }
    cout << "precision: " << (100.0 * double(correct_answer) / sDataTest.samples_.size()) << "%" << endl;
    return correct_answer;
}

int main()
{
    int iSeed = time(NULL)%1000000;
    //iSeed = 7;
    g_rng.seed(iSeed);
    cout << "use seed " << iSeed << endl;
       
    DataSet sData;
    generate_data_set (sData, 100);
    sData.randomize();
#if USE_SDL
    Renderer renderer;
    
    if (renderer.start() != 0) {
        cout << "SDL2 initialization failed." << endl;
        return 1;
    }

    renderer.render(sData);
    
#endif

    vector<int> hidden_layers_topology; // hidden layers, from input to output
    hidden_layers_topology.push_back(160);
    /*
    hidden_layers_topology.push_back(30);
    hidden_layers_topology.push_back(30);
    //*/
    
    NeuralNetwork network(numInputPoints, hidden_layers_topology, numOutputPoints);
    network.init_weights(0, 1.0, 1.0);
    
    const size_t numGenes = network.number_of_weights();

    Genome entity(numGenes);
    network.save_weights_to(entity.genes_);
    
    vector<double> inputs(numInputPoints);
    vector<double> outputs(numOutputPoints);
        
    network.set_weights(entity.genes_);
    
    const double learning_rate = LEARNING_RATE;
    const double momentum = 0.9;

    const int batch_size = BATCH_SIZE;
    
    while (g_generation < 500) {
//        cout << "generation " << g_generation << "    ";
        ++g_generation;
        int run_count = 0;
        double max_error = 0;
        int correct_answer = 0;
        
        for (size_t i=0; i < sData.samples_.size(); ++i) {
            inputs.clear();
            inputs.push_back(sData.samples_[i].x_);
            inputs.push_back(sData.samples_[i].y_);
            network.train(inputs, outputs);
            if (answer_correct(outputs, sData.samples_[i].class_)) {
                ++correct_answer;
            }
            network.backward_pass(outputs, sData.samples_[i].class_);
            max_error = network.error() > max_error ? network.error() : max_error;
            ++run_count;
            if (run_count >= batch_size) {
                network.update_weights(learning_rate, momentum);
                run_count = 0;
            }            
        }
        if (run_count != 0) {
            network.update_weights(learning_rate, momentum);
            run_count = 0;
        }            
                
        if (g_generation % 10 == 0) {
            int correct = test_check(network);
            if (!network.regularization_on () && double(correct_answer - correct) / correct_answer > 0.05 ) {
                network.set_regularization_on ();
                cout << "\n\t\t**** regularization on ****\n" << endl;
            }
        }
        
    }
    
    test_check(network);
    
    network.save_weights_to(entity.genes_);
    double weights = 0;
    for (size_t i=0; i < entity.genes_.size(); ++i) {
        weights += fabs(entity.genes_[i]);
    }
    cout << "(" << entity.genes_.size() << ") weights\n";
    cout << "|totalWeights| = " << weights << "\n";
    entity.report();
    cout << "\n";

    return 0;
}

This toy example needs only several epochs to achieve good precision. #define USE_SDL 0 to build if you don't have SDL installed. On Linux system this should be very simple to do.



/* 
 * a fizz buzz example
 * http://joelgrus.com/2016/05/23/fizz-buzz-in-tensorflow/
 */

#include <vector>
#include <deque>
#include <algorithm>
#include <iostream>
#include <cmath>
#include <cassert>
#include <ctime>
#include <cfloat>
#include "MesenneTwister.h"

using namespace std;

// The regularization scale the weights toward 0
#define REGULARIZATION_FACTOR  1e-2
#define DROPOUT_RATE           0.1
#define INPUT_DROPOUT_RATE     0.0
#define MAX_NORM               10
#define LEARNING_RATE          0.001
#define BATCH_SIZE             32

#define LEAKY_RELU_CONSTANT    0.005
const double epsilon = 1e-8;

MTRand g_rng(time(NULL));

// used in adam algorithm
int g_generation = 0;
    
//--------------------------------

template <int ACTIVATE_FUNC_TYPE>
double activate (double x)
{
    return 0;
}

template <int ACTIVATE_FUNC_TYPE>
double activate_drv (double y)
{
    return 0;
}

enum {
    SIGMOID,
    TANH,
    RELU
};

template<>
double activate<SIGMOID>(double x)
{
    return 1.0 / (1.0 + exp(-x));
}

template<>
double activate_drv<SIGMOID> (double y)
{
    return y * (1.0 - y);
}

template<>
double activate<TANH>(double x)
{
    return tanh (x);
}

template<>
double activate_drv<TANH> (double y)
{
    return 1.0 - y * y;
}

template<>
double activate<RELU>(double x)
{
    return (x > 0) ? x : LEAKY_RELU_CONSTANT * x;
}

template<>
double activate_drv<RELU>(double y)
{
    return (y > 0) ? 1.0 : LEAKY_RELU_CONSTANT;
}

//---------
double activate_func (double x)
{
    return activate<RELU>(x);
}

double activate_drv_func (double y)
{
    return activate_drv<RELU>(y);
}

//-----------
class Layer;

struct NetworkConfig
{
    bool training_;
    
    NetworkConfig()
    : training_(false)
    {}
};

class Neuron
{
public:
    Neuron (int numInputs=0);

    void init_weights (double mean, double variance, double magnitude);
    double calculate_output_from (vector<double> const&inputs);
    void calculate_pdrv_of_error_wrt_input_weights(vector<double> const&inputs, double drvLoss);
    void calculate_pdrv_of_error_wrt_input_weights(vector<double> const&inputs, const Layer &output_layer);
    const vector<double> & weights() const;
    size_t set_input_weights(const double *weights);
    void clear_pdrvs ();
    void update_weights (double learning_rate, double momentum, bool reguon);
    
public:
    int            idx_;
    vector<double> inputWeights_;
    double         value_;
    bool           droped_out_;
    
    vector<double> deltaWeight_;
    vector<double> momentum_;
    double         pdrvErrorWrtInput_;
    vector<double> pdrvErrorWrtWeights_;
};

class Layer
{
public:
    Layer(NetworkConfig *netconfig, int numNeurons, int numInputs);
    ~Layer();

    void init_weights (double mean, double variance, double magnitude);
    void forward_pass(const Layer &last_layer);
    void calculate_output_layer_pdrv_of_error_wrt_input_weights (vector<double> const&inputs,
                                                                 vector<double> const&outputsExp,
                                                                 double totalExp,
                                                                 int right_ans
                                                                );
    void calculate_middle_layer_pdrv_of_error_wrt_input_weights (vector<double> const&inputs, const Layer &output_layer);
    void save_weights_to(vector<double> &weights) const;
    size_t set_input_weights(const double *weights);
    const vector<double> & get_outputs() const;
    void set_outputs_of_input_layer(const vector<double> &value);
    size_t num_of_neurons () const;
    void clear_pdrvs ();
    void update_weights (double learning_rate, double momentum, bool reguon);
    void mark_dropout (double dropout_rate);
    void mark_no_dropout ();
    
public:
    vector<Neuron> nodes_;
    vector<double> outputs_;
    NetworkConfig *netconfig_;
    bool           output_layer_;
};


Neuron::Neuron(int numInputs)
: droped_out_(false)
{
    inputWeights_.resize(numInputs+1); // plus weight for bias
}

void Neuron::init_weights(double mean, double variance, double magnitude)
{
    // This initialization technic helps a lot
    double sqrtn = sqrt(2.0 * inputWeights_.size());
    for (size_t i=0; i < inputWeights_.size()-1; ++i) {
        inputWeights_[i] = magnitude * g_rng.randNorm(mean, variance) / sqrtn;
    }
    inputWeights_.back() = 0;
}

double Neuron::calculate_output_from (const vector< double >& inputs)
{
    assert (inputs.size()+1 == inputWeights_.size() && "number of inputs wrong");
    if (droped_out_) {
        value_ = 0;
        return 0;
    }
    
    double v=0;
    for (size_t i=0; i < inputs.size(); ++i) {
        v += inputs[i] * inputWeights_[i];
    }
    v += inputWeights_.back(); // bias
    value_ =  activate_func(v);
    return value_;
}

void Neuron::calculate_pdrv_of_error_wrt_input_weights(vector<double> const&inputs, double drvLoss)
{
    pdrvErrorWrtInput_ = drvLoss * activate_drv_func(value_);
    size_t i=0;
    for (; i < pdrvErrorWrtWeights_.size() - 1; ++i) {
        pdrvErrorWrtWeights_[i] += pdrvErrorWrtInput_ * inputs[i];
    }
    pdrvErrorWrtWeights_[i] += pdrvErrorWrtInput_ * 1.0; // and bias which is 1.0
}

void Neuron::calculate_pdrv_of_error_wrt_input_weights(const vector< double >& inputs, const Layer& output_layer)
{
    if (droped_out_) {
        pdrvErrorWrtInput_ = 0;
        return;
    }
    
    double d = 0;
    for (size_t i=0; i < output_layer.num_of_neurons(); ++i) {
        d += output_layer.nodes_[i].inputWeights_[idx_] * output_layer.nodes_[i].pdrvErrorWrtInput_;
    }
    
    pdrvErrorWrtInput_ = d * activate_drv_func(value_);
    size_t i=0;
    for (; i< pdrvErrorWrtWeights_.size() - 1; ++i) {
        pdrvErrorWrtWeights_[i] += pdrvErrorWrtInput_ * inputs[i];
    }
    pdrvErrorWrtWeights_[i] += pdrvErrorWrtInput_ * 1.0; // and bias which is 1.0
}


const vector< double >& Neuron::weights() const
{
    return inputWeights_;
}

size_t Neuron::set_input_weights(const double* weights)
{
    inputWeights_.assign(weights, weights + inputWeights_.size());
    pdrvErrorWrtWeights_.resize(inputWeights_.size());
    deltaWeight_.resize(inputWeights_.size());
    momentum_.resize(inputWeights_.size());
    return inputWeights_.size();
}

void Neuron::clear_pdrvs()
{
    pdrvErrorWrtWeights_.clear();
    pdrvErrorWrtWeights_.resize(inputWeights_.size());
}

void Neuron::update_weights(double learning_rate, double momentum, bool reguon)
{
    double weight_norm = 0;
    for (size_t i=0; i < inputWeights_.size(); ++i) {
        double dx = pdrvErrorWrtWeights_[i];
        double regularization = 0;
        if (reguon) {
            regularization = learning_rate * REGULARIZATION_FACTOR * inputWeights_[i];
        }
        /* RMSprop
        double decay_rate = 0.99;
        deltaWeight_[i] = decay_rate * deltaWeight_[i] + (1.0 - decay_rate) * dx * dx;
        inputWeights_[i] += -learning_rate * dx / (sqrt(deltaWeight_[i]) + epsilon) - regularization;
        //*/
        
        //* Adam
        double beta1=0.9;
        double beta2=0.999;
        momentum_[i] = beta1*momentum_[i] + (1.0 - beta1) * dx;
        double mt = momentum_[i] / (1.0 - pow(beta1, g_generation));
        //double mt = momentum_[i];
        deltaWeight_[i] = beta2 * deltaWeight_[i] + (1.0 - beta2) * dx * dx;
        double vt = deltaWeight_[i] / (1.0 - pow(beta2, g_generation));
        //double vt = deltaWeight_[i];
        inputWeights_[i] += -learning_rate * mt / (sqrt(vt) + epsilon) - regularization;
        //*/
        
        weight_norm += inputWeights_[i] * inputWeights_[i];
    }
    
    weight_norm = sqrt(weight_norm);
    if (weight_norm > MAX_NORM) {
        double r = MAX_NORM / weight_norm;
        for (size_t i=0; i < inputWeights_.size(); ++i) {
            inputWeights_[i] *= r;
        }
    }
}



Layer::Layer(NetworkConfig *netconfig, int numNeurons=0, int numInputsPerNeuron=0)
: netconfig_(netconfig)
, output_layer_(false)
{
    nodes_.resize(numNeurons);
    for (size_t i=0; i < nodes_.size(); ++i) {
        nodes_[i] = Neuron(numInputsPerNeuron);
        nodes_[i].idx_ = i;
    }
    outputs_.resize(numNeurons);
}

Layer::~Layer()
{
    nodes_.clear();
}

void Layer::init_weights(double mean, double variance, double magnitude)
{
    for (size_t i=0; i < nodes_.size(); ++i) {
        nodes_[i].init_weights (mean, variance, magnitude);
    }
}

void Layer::forward_pass(const Layer &last_layer)
{
    double rate = 1.0 - DROPOUT_RATE;
    
    const vector<double> & inputs = last_layer.get_outputs();
    for (size_t i=0; i < nodes_.size(); ++i) {
        if (nodes_[i].droped_out_) {
            outputs_[i] = nodes_[i].value_ = 0;
        } else {
            nodes_[i].calculate_output_from (inputs);
            if (!output_layer_ && !netconfig_->training_) {
                nodes_[i].value_ *= rate;
            }
            outputs_[i] = nodes_[i].value_;
        }
    }
}

void Layer::calculate_output_layer_pdrv_of_error_wrt_input_weights(vector<double> const&inputs,
                                                                   vector<double> const&outputsExp,
                                                                   double totalExp,
                                                                   int right_ans
                                                                  )
{
    assert (outputsExp.size() == nodes_.size());
    
    for (size_t i=0; i < nodes_.size(); ++i) {
        double drvLoss;
        if (i == right_ans) {
            drvLoss = outputsExp[i] / totalExp -1.0;
        } else {
            drvLoss = outputsExp[i] / totalExp;
        }
        nodes_[i].calculate_pdrv_of_error_wrt_input_weights (inputs, drvLoss);
    }
}

void Layer::calculate_middle_layer_pdrv_of_error_wrt_input_weights(const vector< double >& inputs, const Layer& output_layer)
{
    for (size_t i=0; i < nodes_.size(); ++i) {
        nodes_[i].calculate_pdrv_of_error_wrt_input_weights (inputs, output_layer);
    }
}


void Layer::save_weights_to(vector< double >& weights) const
{
    for (size_t i=0; i < nodes_.size(); ++i) {
        const vector<double> &w = nodes_[i].weights();
        weights.insert(weights.end(), w.begin(), w.end());
    }

}

size_t Layer::set_input_weights(const double* weights)
{
    size_t pos = 0;
    for (size_t i=0; i < nodes_.size(); ++i) {
        pos += nodes_[i].set_input_weights(weights + pos);
    }
    return pos;
}

void Layer::set_outputs_of_input_layer(const vector< double >& value)
{
    outputs_ = value;
    for (size_t i=0; i < nodes_.size(); ++i) {
        if (netconfig_->training_ && nodes_[i].droped_out_) {
            nodes_[i].value_ = outputs_[i] = 0;
        } else {
            nodes_[i].value_ = outputs_[i];
        }
    }
}

const vector< double >& Layer::get_outputs() const
{
    return outputs_;
}


size_t Layer::num_of_neurons() const
{
    return nodes_.size();
}

void Layer::clear_pdrvs()
{
    for (size_t i=0; i < nodes_.size(); ++i) {
        nodes_[i].clear_pdrvs();
    }
}

void Layer::update_weights(double learning_rate, double momentum, bool reguon)
{
    for (size_t i=0; i < nodes_.size(); ++i) {
        nodes_[i].update_weights(learning_rate, momentum, reguon);
    }
}

void Layer::mark_dropout(double dropout_rate)
{
    for (size_t i=0; i < nodes_.size(); ++i) {
        nodes_[i].droped_out_ = (g_rng.rand() < dropout_rate);
    }
}

void Layer::mark_no_dropout()
{
    for (size_t i=0; i < nodes_.size(); ++i) {
        nodes_[i].droped_out_ = false;
    }
}



class NeuralNetwork
{
public:
    NeuralNetwork(int numInputs, vector<int> const& hiddenLayerTopology, int numOutpus);
    ~NeuralNetwork();
    void init_weights(double mean_value, double variance_value, double magnitude);
    size_t number_of_weights () const;
    void run (vector<double> const &inputs, vector<double> &outputs);
    void train (vector<double> const &inputs, vector<double> &outputs);
    void forward_pass (vector<double> const &inputs);
    void backward_pass (vector<double> &outputs, int right_ans);
    void save_weights_to(vector<double> &weights) const;
    void set_weights(vector<double> const &weights);
    void clear_pdrvs ();
    void update_weights (double learning_rate, double momentum);
    double error () const { return last_error_; }
    bool regularization_on () const { return regularization_on_; }
    void set_regularization_on () { regularization_on_ = true; }
    
protected:
    void feed_forward ();
    void mark_dropout ();
    void mark_no_dropout ();
    
private:
    vector<Layer *> layers_;
    double last_error_;
    NetworkConfig *netconfig_;
    bool regularization_on_;
    
    NeuralNetwork(const NeuralNetwork &rhs) {}
    NeuralNetwork &operator=(const NeuralNetwork &rhs) { return *this; }
};

NeuralNetwork::NeuralNetwork(int nInputs, const vector< int >& hiddenLayerTopology, int nOutputs)
: last_error_(0)
, netconfig_(new NetworkConfig)
, regularization_on_(false)
{
    layers_.resize(hiddenLayerTopology.size() + 2); // hidden + inputs + outputs
    int nNeuron = nInputs;
    layers_[0] = new Layer(netconfig_, nNeuron, 0); // layer zero is inputs
    for (size_t i=0; i < hiddenLayerTopology.size(); ++i) {
        layers_[i+1] = new Layer(netconfig_, hiddenLayerTopology[i], layers_[i]->num_of_neurons());
    }
    nNeuron = nOutputs;
    layers_.back() = new Layer(netconfig_, nNeuron, hiddenLayerTopology.back());
}

NeuralNetwork::~NeuralNetwork()
{
    delete netconfig_;
    for (size_t i=0; i < layers_.size(); ++i) {
        delete layers_[i];
    }
    layers_.clear();
}

void NeuralNetwork::init_weights(double mean, double variance, double magnitude)
{
    for (size_t i=1; i < layers_.size(); ++i) {
        layers_[i]->init_weights(mean, variance, magnitude);
    }    
}

size_t NeuralNetwork::number_of_weights() const
{
    vector<double> weights;
    this->save_weights_to(weights);
    return weights.size();
}

void NeuralNetwork::run(const vector< double >& inputs, vector< double >& outputs)
{
    if (netconfig_->training_) {
        netconfig_->training_ = false;
        mark_no_dropout ();
    }
    forward_pass(inputs);
    outputs = layers_.back()->get_outputs();    
}

void NeuralNetwork::train(const vector< double >& inputs, vector< double >& outputs)
{
    if (!netconfig_->training_) {
        netconfig_->training_ = true;
        mark_dropout ();
    }
    forward_pass(inputs);
    outputs = layers_.back()->get_outputs();    
}

void NeuralNetwork::forward_pass(const vector< double >& inputs)
{
    // feed forward
    // set inputs as outputs of first layer
    layers_[0]->set_outputs_of_input_layer(inputs);
    feed_forward ();
}

void NeuralNetwork::backward_pass(vector< double >& outputs, int right_ans)
{
    // calculate errors/loss
    double denom = 0;
    double numer = 0;
    last_error_ = 0;

    double max_out=-DBL_MAX;
    for (size_t i=0; i < outputs.size(); ++i) {
        if (outputs[i] > max_out) {
            max_out = outputs[i];
        }
    }
    
    vector<double> outputsExp; outputsExp.reserve(outputs.size());
    double totalExp = 0;
    for (size_t i=0; i < outputs.size(); ++i) {
        outputsExp.push_back(exp(outputs[i] - max_out));
        if (i == right_ans) {
            numer = outputsExp.back();
        }
        denom += outputsExp.back();
    }
    totalExp = denom;
    last_error_ = -log(numer / denom);
    
    // calculate partial derivatives
    // first the last layer(output layer)
    vector< double > layer_inputs = layers_[layers_.size()-2]->get_outputs();
    layers_.back()->calculate_output_layer_pdrv_of_error_wrt_input_weights(layer_inputs, outputsExp, totalExp, right_ans);
    
    // back propagate
    for (int i=layers_.size()-2; i > 0; --i) {
        vector< double > layer_inputs = layers_[i-1]->get_outputs();
        layers_[i]->calculate_middle_layer_pdrv_of_error_wrt_input_weights(layer_inputs, *layers_[i+1]);
    }
    
}


void NeuralNetwork::feed_forward()
{
    for (size_t i=1; i < layers_.size(); ++i) {
        layers_[i]->forward_pass(*layers_[i-1]);
    }    
}

void NeuralNetwork::mark_dropout()
{
    for (size_t i=0; i < layers_.size()-1; ++i) {
        double r = DROPOUT_RATE;
        if (i==0) r = INPUT_DROPOUT_RATE;
        layers_[i]->mark_dropout(r);
    }    

}

void NeuralNetwork::mark_no_dropout()
{
    for (size_t i=0; i < layers_.size()-1; ++i) {
        layers_[i]->mark_no_dropout();
    }    
}

void NeuralNetwork::clear_pdrvs()
{
    for (size_t i=1; i < layers_.size(); ++i) {
        layers_[i]->clear_pdrvs();
    }
}

void NeuralNetwork::update_weights(double learning_rate, double momentum)
{
    assert (netconfig_->training_ && "why update weights when not training?");
    
    // update weights
    for (size_t i=1; i < layers_.size(); ++i) {
        layers_[i]->update_weights(learning_rate, momentum, regularization_on_);
    }
    
    mark_dropout();
    clear_pdrvs();        
}

void NeuralNetwork::save_weights_to(vector< double >& weights) const
{
    weights.clear();
    for (size_t i=1; i < layers_.size(); ++i) {
        layers_[i]->save_weights_to (weights);
    }

}

void NeuralNetwork::set_weights(const vector< double >& weights)
{
    int pos = 0;
    for (size_t i=1; i < layers_.size(); ++i) {
        pos += layers_[i]->set_input_weights (&weights[pos]);
    }
}

struct Genome
{
    double fitness_;
    vector<double> genes_;
    
    Genome(size_t numGenes=0);
    void copy_genes_from (Genome const&source, size_t from_index, size_t to_index, size_t replace_start_index, double mutate_rate);
    void report () const;
    
    bool operator<(const Genome &rhs) const
    {
        return fitness_ > rhs.fitness_;
    }
};

Genome::Genome(size_t numGenes)
{
    assert (numGenes && "number of genes can't be zero");
    genes_.resize(numGenes);
}


void Genome::copy_genes_from(const Genome& source, size_t from_index, size_t to_index, size_t replace_start_index, double mutate_rate)
{
    for (size_t i=from_index,cgene=0; i < to_index; ++i,++cgene) {
        genes_[replace_start_index+cgene] = source.genes_[i];
    }
}

void Genome::report() const
{
    cout << "weights ";
    for (size_t i=0; i < genes_.size(); ++i) {
        cout << genes_[i] << " ";
    }
}

const int numInputPoints = 11;
const int numOutputPoints = 4;


struct Data {
    int value_;
    vector<double> inputs_;        // binray representation
    
    int class_;// classify into 4 classes. 0 - no-change, 1 - fizz, 2 - buzz, 3 - fizzbuzz
};

struct DataSet
{
    vector<Data> samples_;
    void randomize ()
    {
        vector<Data> sam;
        while (!samples_.empty()) {
            int idx = g_rng.randInt() % samples_.size();
            sam.push_back(samples_[idx]);
            samples_[idx] = samples_.back();
            samples_.pop_back();
        }
        sam.swap(samples_);
    }
};

DataSet sData;
DataSet sDataTest;

Data encode_binary (int value)
{
    Data d;
    
    d.value_ = value;
    
    d.inputs_.reserve(numInputPoints);
    for (int i=0; i < numInputPoints; ++i) {
        int bin = (1 << i);
        if (value & bin) {
            d.inputs_.push_back(1.0);
        } else {
            d.inputs_.push_back(0);
        }
    }
    
    bool fizz = (value % 3 == 0);
    bool buzz = (value % 5 == 0);
    if (fizz && buzz) {
        d.class_ = 3;
    } else if (buzz) {
        d.class_ = 2;
    } else if (fizz) {
        d.class_ = 1;
    } else {
        d.class_ = 0;
    }
    
    return d;
    
}

void generate_data_set (DataSet &sData, int iFrom, int iTo)
{
    for (int iC=iFrom; iC <= iTo; ++iC) {
        sData.samples_.push_back(encode_binary(iC));
    }
}

int answer_from_outputs (vector<double> const&outputs)
{
    double max_term=-DBL_MAX;
    int ans = -1;
    for (size_t i=0; i < outputs.size(); ++i) {
        if (outputs[i] > max_term) {
            max_term = outputs[i];
            ans = i;
        }
    }
    return ans;
}

bool answer_correct (vector<double> const&outputs, int right_class)
{
    return answer_from_outputs(outputs) == right_class;
}

int test_check (NeuralNetwork &network, bool print=false)
{
    vector<double> outputs(numOutputPoints);

    cout << "test...  ";
    if (sDataTest.samples_.empty()) generate_data_set (sDataTest, 1, 100);
    int correct_answer = 0;
    for (size_t i=0; i < sDataTest.samples_.size(); ++i) {
        const Data &d = sDataTest.samples_[i];
        network.run(d.inputs_, outputs);
        int ans = answer_from_outputs(outputs);
        if (ans == d.class_) {
            ++correct_answer;
        }
        if (print) {
            if (ans == 0) {
                cout << d.value_ << " ";
            } else if (ans == 1) {
                cout << "fizz" << " ";
            } else if (ans == 2) {
                cout << "buzz" << " ";
            } else if (ans == 3) {
                cout << "fizzbuzz" << " ";
            } else {
                assert (0 && "what happened?");
            }
        }
    }
    cout << "precision: " << (100.0 * double(correct_answer) / sDataTest.samples_.size()) << "%" << endl;
    
    return correct_answer;
}

int main()
{
    int iSeed = time(NULL)%1000000;
    //iSeed = 587983;
    g_rng.seed(iSeed);
    cout << "use seed " << iSeed << endl;
       
    generate_data_set (sData, 101, 1000);
    sData.randomize();

    vector<int> hidden_layers_topology; // hidden layers, from input to output
    hidden_layers_topology.push_back(700);
    /*
    hidden_layers_topology.push_back(100);
    hidden_layers_topology.push_back(100);
    //*/
    
    NeuralNetwork network(numInputPoints, hidden_layers_topology, numOutputPoints);
    network.init_weights(0, 1.0, 1.0);
    
    const size_t numGenes = network.number_of_weights();

    Genome entity(numGenes);
    network.save_weights_to(entity.genes_);
    
    vector<double> outputs(numOutputPoints);
        
    network.set_weights(entity.genes_);
    
    const double learning_rate = LEARNING_RATE;
    const double momentum = 0.9;

    const int batch_size = BATCH_SIZE;
    
    while (g_generation < 2000) {
//        cout << "generation " << g_generation << "    ";
        ++g_generation;
        int run_count = 0;
        double max_error = 0;
        int correct_answer = 0;
        
        for (size_t i=0; i < sData.samples_.size(); ++i) {
            network.train(sData.samples_[i].inputs_, outputs);
            if (answer_correct(outputs, sData.samples_[i].class_)) {
                ++correct_answer;
            }
            network.backward_pass(outputs, sData.samples_[i].class_);
            max_error = network.error() > max_error ? network.error() : max_error;
            ++run_count;
            if (run_count >= batch_size) {
                network.update_weights(learning_rate, momentum);
                run_count = 0;
            }            
        }
        if (run_count != 0) {
            network.update_weights(learning_rate, momentum);
            run_count = 0;
        }            
                
        if (g_generation % 10 == 0) {
            cout << "generation " << g_generation << " ";
            int correct = test_check(network);
            if (!network.regularization_on () && double(correct_answer)/sData.samples_.size() - double(correct)/sDataTest.samples_.size() > 0.05 ) {
                network.set_regularization_on ();
                cout << "\t\t**** regularization on ****\n" << endl;
            }
            
            if (correct == sDataTest.samples_.size()) {
                cout << "done!\n";
                break;
            }
            
            sData.randomize();
        }
        
    }
    
    test_check(network, true);
    
    network.save_weights_to(entity.genes_);
    cout << "(" << entity.genes_.size() << ") weights\n";
    double weights = 0;
    for (size_t i=0; i < entity.genes_.size(); ++i) {
        weights += fabs(entity.genes_[i]);
    }
    cout << "|totalWeights| = " << weights << "\n";
    entity.report();
    cout << "\n";

    return 0;
}

One thing I found interesting in this example is network with only one hidden layer outperform one with multiple hidden layers (the deep one). We are tempted to think that a 'deep' neural network is better, the original blog author suggested trying a deep one at the end of the blog, too. But, is it true? The fact that a shallow one outperform a deep one we experienced here suggest that is not necessarily true, it really depends on the problem on hand. We now realized that what deep learning excel is modularization. The typical example is image recognition where we see a well-trained network exhibits a hierarchical structure where minor basic features learned in front layers and more complex features learned in rear layers.  On the other hand, a problem like fizz buzz which modularization does not help doesn't suit deep network. In fact, I bet genetic algorithm will handle this problem more easily, please try it yourself if you are interested.

Here is some observations:
  • These hyper parameters have huge influence on training a network, you can't train a network with a 'wrong' parameter. The values in sample code were obtained by try and error. I believe there's studies on how to obtain better initial parameters. I guess it will be an important task if I make a living in machine learning.
  • The paper where dropout was introduced suggesting use 0.5 as the dropout rate. Maybe there's error in my implementation, I can't train a network if the dropout rate was 0.5. Orz But, if the dropout rate was as low as you see in the sample code, it can train and it help. Thus, I guess bigger dropout rate only help if you have a huge network, with small one you should lower it.
  • You will decide the magnitude of learning rate by the size of network. The bigger the network the smaller the learning rate you will use. 
  • There are L2 regularization an L1 regularization. With L2 regularization you multiply the weight with a value smaller than 1.0 when updating. With L1 regularization you subtract it with a fixed small value. What you see here in my code is L2 regularization.  I find that a regularization factor between 0.01 and 0.001 is preferable, which means multiply 0.99 ~ 0.999 when updating. Too big your network will never train because any change introduced by gradient descend were cancelled. Too small it will only affect the network like a noise, the weights not necessarily become smaller.
  • Batch size have big influence. Set it too small or too large you will have difficulty train the network.
  • So... is there some method more scientifically to set these hyper parameters? Please enlighten me and leave a comment. Orz



ftps.space released!

  I started doing web site development some time ago.   After fiddled with Flask, nginx, javascript, html, css, websocket, etc, etc... I h...