Can neural networks approximate mathematical functions?

In the paper Multilayer feedforward networks are universal approximators written by Kurt Hornik, Maxwell Stinchcombe and Halbert White in 1989, it was argued that neural networks can approximate "quite well nearly any function".

...and it made the authors wonder about what neural networks can achieve, since pretty much anything can be translated into models and by consequence mathematical formulae.

When reading the paper, I felt like experimenting a little with this property of neural networks, and to try and find out whether with sufficient data functions such as \(x^2\), \(sin(x)\) and \(1/x\) can be approximated.

Let's see if we can!

Update 02/Nov/2020: made code compatible with TensorFlow 2.x.

Update 02/Nov/2020: added Table of Contents.

The experiment

For the experiment, I used the following code for approximating \(x^2\):

# Imports
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Load training data
x = -50 + np.random.random((25000,1))*100
y = x**2

# Define model
model = Sequential()
model.add(Dense(40, input_dim=1, activation='relu'))
model.add(Dense(20, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(x, y, epochs=15, batch_size=50)

predictions = model.predict([10, 5, 200, 13])
print(predictions) # Approximately 100, 25, 40000, 169

Let's take the code above apart first, before we move on to the results.

First, I'm importing the Python packages that I need for successfully running the experiment. First, I'm using numpy, which is the numerical processing package that is the de facto standard in data science today.

Second, I'm using keras, which is a deep learning framework for Python and runs on TensorFlow, Theano and CNTK. It simply abstracts much of the pain away and allows one to create a deep learning model in only a few lines of code.

And it runs on GPU, which is very nice.

Specifically, for Keras, I'm importing the Sequential model type and the Dense layer type. The Sequential model type requires the engineer to 'stack' the individual layers on top of each other (as you will see next), while the Dense or Densely-connected layer means that each individual neuron is connected to all neurons in the following layer.

Next, I load the training data. Rather simply, I'm generating 25.000 numbers in the range [-50, 50]. Subsequently, I'm also generating the targets for the individual numbers by applying x**2 or \(x^2\).

Then, I define the model - it's a Sequential one with three hidden layers: all of them are Dense with 40, 20 and 10 neurons, respectively. The input layer has simply one neuron (every x is just a number) and the output layer has only one as well (since we regress to y, which is also just a number). Note that all layers use ReLU as an activation function except for the last one, standard with regression.

Mean squared error is used as a loss function, as well as Adam for optimization, all pretty much standard options for deep neural networks today.

Next, we fit the data in 15 epochs and generate predictions for 4 values. Let's see what it outputs under 'The results'.

The two other functions

I used the same code for \(sin(x)\) and \(1/x\), however I did change the assignment of \(y\) as follows, together with the expected values for the predictions:

sin(x): \(y = np.sin(x)\); expected values approximately -0.544, -0.959, -0.873 and 0.420.
1/x: \(y = 1/x\); expected values approximately 0.10, 0.20, 0.005 and 0.077.

The results

For \(x^2\), these were the expected results: 100, 25, 40000, 169.

Those are the actual results:

[[  101.38112 ]
 [   25.741158]
 [11169.604   ]
 [  167.91489 ]]

Pretty close for most ones. Only for 40000, the model generated a wholly wrong prediction. That's not strange, though: the training data was generated in the interval [-50, 50]; apparently, 100, 25 and 169 are close enough to be properly regressed, while 40000 is not. That makes intuitive sense.

Let's now generate predictions for all the xs when the model finishes and plot the results:

import matplotlib.pyplot as plt
plt.subplot(2, 1, 1)
plt.scatter(x, y, s = 1)
plt.title('y = $x^2$')
plt.ylabel('Real y')

plt.subplot(2, 1, 2)
plt.scatter(x, predictions, s = 1)
plt.xlabel('x')
plt.ylabel('Approximated y')

plt.show()

When you plot the functions, you get pretty decent results for \(x^2\):

For \(sin(x)\), results are worse:

What you see is that it approximates the sine function quite appropriately for a very small domain, e.g. [-5, +3], but then loses track. We might improve the estimation by feeding it with more samples, so we increase the number of random samples to 100.000, still at the interval [-50, 50]:

That's already much better, but still insufficient. Perhaps, the cause is different - e.g. we may achieve better results if we used something like sin(x) as an activation function. However, that's something for a next blog.

And finally, this is what \(1/x\) looks like:

That one's getting closer again, but you can stee that it is not yet highly accurate.

My observations

The experiment was quite interesting, actually.

First, I noticed that you need more training data than I expected. For example, with only 1000 samples in my training set, the approximation gets substantially worse:

Second, not all the functions could be approximated properly. Particularly, the sine function was difficult to approximate.

Third, I did not account for overfitting whatsoever. I just let the models run, possibly introducing severe overfitting to the function at hand. But - to some extent - that was precisely what we wanted.

Fourth, perhaps as a result of (3), the models seem to perform quite well around the domain of the training data (i.e. the [-50, +50] interval), but generalization remains difficult. On the other hand, that could be expected; the 40000 value for the first \(x^2\) was anything but \(
-50 < x < 50\).

Altogether, this was a nice experiment for during the evening, showing that you can use neural networks for approximating mathematical functions - if you take into account that it's slightly more complex than you imagine at first, it can be done.

Hi, I'm Chris!

I know a thing or two about AI and machine learning. Welcome to MachineCurve.com, where machine learning is explained in gentle terms.

Getting started

Foundation models

Learn how large language models and other foundation models are working and how you can train open source ones yourself.

Keras

Keras is a high-level API for TensorFlow. It is one of the most popular deep learning frameworks.

TensorFlow

TensorFlow is the most popular deep learning framework. It is is used by many companies.

PyTorch

PyTorch is a deep learning framework which is popular for its ease of use and flexibility.

Machine learning theory

Read about the fundamentals of machine learning, deep learning and artificial intelligence.

Transformer architectures

Emerging since 2017, Transformer architectures are part of the state of the art in deep learning.

Most recent articles

January 8, 2024

LLM in a Flash: improving memory requirements of large language models

January 2, 2024

What is Retrieval-Augmented Generation?

December 27, 2023

Building a zero-shot image classifier with CLIP and HuggingFace Transformers

December 27, 2023

In-Context Learning: what it is and how it works

December 22, 2023

CLIP: how it works, how it's trained and how to use it

Article tags

function

mathematics

neural network

Connect on social media

Connect with me on LinkedIn

To get in touch with me, please connect with me on LinkedIn. Make sure to write me a message saying hi!

See my work on GitHub

My work is available on GitHub. Feel free to check it out and see if it can be of use to you!

Side info

The content on this website is written for educational purposes. In writing the articles, I have attempted to be as correct and precise as possible. Should you find any errors, please let me know by creating an issue or pull request in this GitHub repository.

All text on this website written by me is copyrighted and may not be used without prior permission. Creating citations using content from this website is allowed if a reference is added, including an URL reference to the referenced article.

If you have any questions or remarks, feel free to get in touch.

TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.

PyTorch, the PyTorch logo and any related marks are trademarks of The Linux Foundation.

Montserrat and Source Sans are fonts licensed under the SIL Open Font License version 1.1.

Mathjax is licensed under the Apache License, Version 2.0.

Can neural networks approximate mathematical functions?

July 18, 2019 by Chris

The experiment

The two other functions

The results

My observations

Hi, I'm Chris!

I know a thing or two about AI and machine learning. Welcome to MachineCurve.com, where machine learning is explained in gentle terms.

Getting started

Foundation models

Keras

TensorFlow

PyTorch

Machine learning theory

Transformer architectures

Most recent articles

January 8, 2024

LLM in a Flash: improving memory requirements of large language models

January 2, 2024

What is Retrieval-Augmented Generation?

December 27, 2023

Building a zero-shot image classifier with CLIP and HuggingFace Transformers

December 27, 2023

In-Context Learning: what it is and how it works

December 22, 2023

CLIP: how it works, how it's trained and how to use it

Article tags

Most popular articles

February 18, 2020

How to use K-fold Cross Validation with TensorFlow 2 and Keras?

December 28, 2020

Introduction to Transformers in Machine Learning

December 27, 2021

StyleGAN, a step-by-step introduction

July 17, 2019

This Person Does Not Exist - how does it work?

October 26, 2020

Your First Machine Learning Project with TensorFlow 2.0 and Keras

Connect on social media

Connect with me on LinkedIn

See my work on GitHub

Side info

Getting started

Foundation models

Keras

TensorFlow

PyTorch

Machine learning theory

Transformer architectures

Most popular articles

February 18, 2020

How to use K-fold Cross Validation with TensorFlow 2 and Keras?

December 28, 2020

Introduction to Transformers in Machine Learning

December 27, 2021

StyleGAN, a step-by-step introduction

July 17, 2019

This Person Does Not Exist - how does it work?

October 26, 2020

Your First Machine Learning Project with TensorFlow 2.0 and Keras

Side info

Connect with me on LinkedIn

See my work on GitHub