This is the first post of a series about understanding Deep Neural Networks. We will start with the core component of artificial neural networks - the neuron. We use a single artificial neuron to learn a simple dataset.

The Machine Learning task is to predict the net income \(y\) based on two inputs: the gross salary \(x_1\) and the heritage \(x_2\) in that year. We assume a relationship
$$y = a \cdot x_1 + b \cdot x_2$$
between the inputs and the target \(y\), but we don't know the values of \(a\) and \(b\). That makes our problem perfectly fit for a single artifical neuron.

### Task

Let's say we want to predict how much money we will probably earn in 2016. We take a look at our financial data and see that there seems to be some kind of correlation between our gross salary, our heritage and how much money we actually earn in that year. But instead of googling tax rates, we decide to throw Machine Learning at the problem and hope that it does all the thinking for us.Year | Gross Salary [$] | Heritage [$] | Net Income [$] |
---|---|---|---|

2011 | 80,000 | 10,000 | 53,100 |

2012 | 85,000 | 5,000 | 52,650 |

2013 | 85,000 | 0 | 49,300 |

2014 | 120,000 | 30,000 | 89,700 |

2015 | 140,000 | 0 | 81,200 |

### Biological Context

The neuron is the core unit of an artificial neural network. It simulates the behavior of neurons in brains. A neuron in the human brain receives input from other neurons via synapses, combines that input and forwards it to subsequent neurons. Over time, it adapts the way of combining the input, which is called learning.### Structure

A neuron in an artificial neural network receives some numbers as input and outputs a weighted sum of these numbers. For our problem, we need one neuron with two weights for the two input variables. An artificial neuron can learn by adapting its weights to a given training dataset. In our case, that is the table of financial data. But first, let's implement the generation of predictions.

### Implementation

Let us start directly with implementing and feeding the neuron. We store our weights in a vector \(w\) and multiply it with the transposed input vector \(x\). Thus, we get the prediction \(\hat{y} = x^T w\). To test the performance of our neuron, we generate random input variables in \([0,1]\). We measure the error as \((y-\hat{y})^2\).```
import numpy as np
# Define the data in 100.000 $ per year
inputs = [(0.8, 0.1), (0.85, 0.05), (0.85, 0), (1.2, 0.3), (1.4, 0)]
targets = [0.531, 0.5265, 0.493, 0.897, 0.812]
# Start with a vector of random weights
weights = np.random.random_sample((2,))
error = 1
data_index = 0
while error > 1e-6:
# Feed the next input to the neuron
x = np.array(inputs[data_index])
y = targets[data_index]
prediction = np.dot(x, weights)
# Calculate the error
error = np.square(y - prediction)
print 'input: %s target: %s prediction: %s error: %s' % (
x, y, prediction, error)
# Move to the next year
data_index = (data_index + 1) % len(inputs)
```

### Learning

So far, the neuron predicts completely randomly. The program above only stops, when the prediction was accidentally correct, not because we found the correct weights. To achieve that, we need to modify the weights. For this, we use Stochastic Gradient Descent. This means that we calculate for the current input how the error \(e\) changes if we alter a weight \(w_i\). For example, $$\frac{\partial e}{\partial w_1} = 20$$ means that if we increase weight \(w_1\) by 1, the error for the current input would increase by 20. So, we should alter the weight in the opposite direction of the gradient. Or, as Wikipedia puts it:To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point.As an update rule for the weights we get $$w_{i(new)} = w_{i(old)} - \frac{\partial e}{\partial w_{i(old)}}.$$ The only thing left to calculate is the derivative of the error with respect to the weights. Remembering the chain rule from high school, we can quickly calculate it on paper: $$\hat{y} = w_1 \cdot x_1 + w_2 \cdot x_2$$ $$e = (y-\hat{y})^2$$ $$\frac{\partial e}{\partial w_1} = \frac{\partial e}{\partial \hat{y}} \frac{\partial \hat{y}}{\partial w_1} = 2 (y-\hat{y}) \cdot (-1) \cdot \frac{\partial \hat{y}}{\partial w_1}$$ $$ = -2 (y-\hat{y}) \cdot x_1$$ The calculation for the second weight is analogous. In conclusion, we have to increase the weights by $$2 (y-\hat{y}) \cdot x$$ In the code, we add the weight adjustment at the end of each iteration of the while loop:

`weights += 2 * (y - prediction) * x.T`

Now, if we print the weights at the end of the program, we will see values close to 58% and 67%, which are exactly the values that I generated the dataset above with. You can find the whole code for this post here.
### Conclusion

We enabled our neuron to learn. However, the task was quite simple. For more complicated tasks, the learning of our neuron becomes inefficient and ineffective. Read the next post to see, for which problems our current neuron fails and how we can tackle these.

Next Post: Flexible Neurons