In the paper *Multilayer feedforward networks are universal approximators* written by Kurt Hornik, Maxwell Stinchcombe and Halbert White in 1989, it was argued that neural networks can approximate “quite well nearly any function”.

…and it made the authors wonder about what neural networks can achieve, since pretty much anything can be translated into models and by consequence mathematical formulae.

When reading the paper, I felt like experimenting a little with this property of neural networks, and to try and find out whether with sufficient data functions such as \(x^2\), \(sin(x)\) and \(1/x\) can be approximated.

Let’s see if we can!

## The experiment

For the experiment, I used the following code for approximating \(x^2\):

```
# Imports
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
# Load training data
x = -50 + np.random.random((25000,1))*100
y = x**2
# Define model
model = Sequential()
model.add(Dense(40, input_dim=1, activation='relu'))
model.add(Dense(20, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(x, y, epochs=15, batch_size=50)
predictions = model.predict([10, 5, 200, 13])
print(predictions) # Approximately 100, 25, 40000, 169
```

Let’s take the code above apart first, before we move on to the results.

First, I’m importing the Python packages that I need for successfully running the experiment. First, I’m using `numpy`

, which is the numerical processing package that is the de facto standard in data science today.

Second, I’m using `keras`

, which is a deep learning framework for Python and runs on TensorFlow, Theano and CNTK. It simply abstracts much of the pain away and allows one to create a deep learning model in only a few lines of code.

And it runs on GPU, which is very nice.

Specifically, for Keras, I’m importing the `Sequential`

model type and the `Dense`

layer type. The Sequential model type requires the engineer to ‘stack’ the individual layers on top of each other (as you will see next), while the Dense or Densely-connected layer means that each individual neuron is connected to all neurons in the following layer.

Next, I load the training data. Rather simply, I’m generating 25.000 numbers in the range [-50, 50]. Subsequently, I’m also generating the targets for the individual numbers by applying `x**2`

or \(x^2\).

Then, I define the model – it’s a Sequential one with three hidden layers: all of them are Dense with 40, 20 and 10 neurons, respectively. The input layer has simply one neuron (every `x`

is just a number) and the output layer has only one as well (since we regress to `y`

, which is also just a number). Note that all layers use `ReLU`

as an activation function except for the last one, standard with regression.

Mean squared error is used as a loss function, as well as Adam for optimization, all pretty much standard options for deep neural networks today.

Next, we fit the data in 15 epochs and generate predictions for 4 values. Let’s see what it outputs under ‘The results’.

### The two other functions

I used the same code for \(sin(x)\) and \(1/x\), however I did change the assignment of \(y\) as follows, together with the expected values for the predictions:

**sin(x):**\(y = np.sin(x)\); expected values approximately -0.544, -0.959, -0.873 and 0.420.**1/x:**\(y = 1/x\); expected values approximately 0.10, 0.20, 0.005 and 0.077.

## The results

For \(x^2\), these were the expected results: `100, 25, 40000, 169`

.

Those are the actual results:

```
[[ 101.38112 ]
[ 25.741158]
[11169.604 ]
[ 167.91489 ]]
```

Pretty close for most ones. Only for `40000`

, the model generated a wholly wrong prediction. That’s not strange, though: the training data was generated in the interval [-50, 50]; apparently, 100, 25 and 169 are close enough to be properly regressed, while 40000 is not. That makes intuitive sense.

Let’s now generate predictions for all the `x`

s when the model finishes and plot the results:

```
import matplotlib.pyplot as plt
plt.subplot(2, 1, 1)
plt.scatter(x, y, s = 1)
plt.title('y = $x^2$')
plt.ylabel('Real y')
plt.subplot(2, 1, 2)
plt.scatter(x, predictions, s = 1)
plt.xlabel('x')
plt.ylabel('Approximated y')
plt.show()
```

When you plot the functions, you get pretty decent results for \(x^2\):

For \(sin(x)\), results are worse:

What you see is that it approximates the sine function quite appropriately for a *very small domain*, e.g. [-5, +3], but then loses track. We might improve the estimation by feeding it with *more* samples, so we increase the number of random samples to 100.000, still at the interval [-50, 50]:

That’s already much better, but still insufficient. Perhaps, the cause is different – e.g. we may achieve better results if we used something like sin(x) as an activation function. However, that’s something for a next blog.

And finally, this is what \(1/x\) looks like:

That one’s getting closer again, but you can stee that it is not yet *highly accurate.*

## My observations

The experiment was quite interesting, actually.

First, I noticed that you need more training data than I expected. For example, with only 1000 samples in my training set, the approximation gets substantially worse:

Second, not all the functions could be approximated properly. Particularly, the sine function was difficult to approximate.

Third, I did not account for overfitting whatsoever. I just let the models run, possibly introducing severe overfitting to the function at hand. But – to some extent – that was precisely what we wanted.

Fourth, perhaps as a result of (3), the models seem to perform quite well *around* the domain of the training data (i.e. the [-50, +50] interval), but generalization remains difficult. On the other hand, that could be expected; the `40000`

value for the first \(x^2\) was anything but \(

-50 < x < 50\).

Altogether, this was a nice experiment for during the evening, showing that you can use neural networks for approximating mathematical functions – if you take into account that it’s slightly more complex than you imagine at first, it can be done.

I have very poor results with 5 variables functionx1x2x3 +x4x5. Very bad. Almost impossible with NN to approximate or learn.

Hi there,

You’re trying to learn a polynomial function of the fifth degree, which has this form: https://i.imgur.com/tmHx4Sz.png. As you’ve seen with sin(x) in my post, these functions – which fluctuate a lot – are difficult to learn with regular neural networks.

My post was written to identify whether neural networks can be used to learn these functions. The answer was yes, but only sometimes – with simple functions. It’s likely better to use a polyfit, e.g. the numpy polyfit (https://docs.scipy.org/doc/numpy/reference/generated/numpy.polyfit.html) to learn functions based on data, and then create the plot yourself.

Regards,

Chris

To get a better result I think you can make this ‘experiment’ more often. You can add one more column with random data.

# Load training data

x = -50 + np.random.random((25000,2))*100

y = x**2

If I want to do this I have to change my code. I already changed the following line:

model.add(Dense(40, input_dim=2, activation=’relu’))

because I have one dimension more now. I somehow have the opinion that’s not enough and that this additional information is not taken into account when calculating.

Hi Kim,

I follow your line of thinking – but it slightly deviates from the x-y mapping in mathematical terms in the sense that the original f(x) = xยฒ function doesn’t have the extra dimension. Its domain is a line rather than a plane.

Why the models have difficulties with approximating the functions successfully is the fact that they’ll have to learn the y values for the functions which are not represented by the

trainingdomain, i.e. the values for x in the training data.It’s going to be hard to overcome this, I believe.

Regards,

Chris

Hello Chris,

I can’t see an easy solution as well. Because in my case these additional datas would be results from measurements the idea was to use all data from all experiments I did, to get a better approximated neural network result in the end. The only idea I can think of right now, is to preprocess the original data and make some kind of average out of it.

Kim

I also found this response to a StackOverflow question which nicely explains why neural networks are no extrapolation methods and hence why the results above don’t work for data outside the training data subspace: https://stackoverflow.com/a/18310746

Good luck!

Regards,

Chris