1 Answers

Best Answer

The **Frechet Inception Distance** or FID is a method for comparing the statistics of two distributions by computing the distance between them. In GANs, the FID method is used for computing how much the distribution of the Generator looks like the distribution of the Discriminator. By consequence, it is a metric of GAN performance – the lower the FID, the better the GAN.

It is named *Inception* Distance because you’re using an Inception neural network (say, InceptionV3) for computing the distance. Here’s how you’ll do that, technically:

- You take the InceptionV3 network and strip off the last layer, which is Softmax. This means that you’ll end up with 2048 activations and hence features (see https://arxiv.org/pdf/1512.00567.pdf, page 6, Table 1).
- Note that technically, any network will do as long as it produces a wide amount of features, but it is common / best practice in the GAN community to use InceptionV3 for this purpose.

- You take a set of N samples generated by the Generator and feed these to the adapted InceptionV3 network. You’ll thus now get a set of Nx2048 activations.
- You’ll do the same – you take M samples from the real dataset and feed them to the adapted InceptionV3 network. You then have Mx2048 activations.
- In other words, you now have a list of length N with vectors having 2048 features each, and a list of length M with vectors having 2048 features each.
- For both lists, you now compare the feature-wise mean. In other words, for feature 1 across all vectors, you compare the mean; for feature 2, 3, … and so on. You now have two vectors with 2048 features, each of which represents the feature-wise mean of the lists or the “average” vector produced from either the Generator or Discriminator distributions.
- For the two lists with feature vectors, you can also compute the covariance matrix. In other words, you’ll then end up with a matrix suggesting which features change jointly – and hence are related to each other.
- Now that you know the “average vector” and how the vector contents relate internally, it’s time to compute the actual FID.

This is the formula of the Frechet Inception Distance or FID. Let’s break it apart into its components:

**d2**suggests that we are talking about a*squared distance*.- Its arguments are
**(m, C)**and**(mw, Cw).**These are the means and covariance matrices of the two lists – and thus the Generator and Discriminator functions – we computed above. - Inside the FID function, the following is summed.
- First of all, the sum of the
**absolute feature-wise difference**for both vectors is taken then squared. This suggests how far the center points of the distributions are away from each other at feature level. Big outliers are ‘penalized’ quadratically and hence significantly impact the distance metric. - Then, the
**Trace**(Tr) of the sum of both covariance matrices minus 2 times the square root of their product is taken. If we break down this into multiple steps:- We take the first covariance matrix and add the second covariance matrix. This yields another matrix with the same shape as both matrices, but with element-wise additions for all values.
- Then, we multiply both matrices and compute their square root. We subtract twice this multiplication from the covariance matrix sum.
- What happens here intuitively is that we are creating a “mixture” of the two covariances across features (by first adding the covariances at feature-level together) and then subtracting the covariances “shared” by the individual matrices (because large covariances in both yield large shared covariance in the multiplication). The output of the summation and subtraction is the “remaining” covariance that cannot be explained from both matrices – which is the “difference” between both distributions at feature level.
- Subsequently, the Trace is taken – which is a sum of the main diagonal of the resulting covariance matrix, and effectively summing the variances of the
*mixture*of features from both distributions. If variance is large, you can see that the distance increases – which makes sense, because then the features differ significantly. If variance is small, the distributions look alike.

**In other words**, if the “center point” of each distribution is close to each other, and the “non-shared differences at feature level” are small, the distributions are said to closely resemble because the FID score is close to zero. When this happens, the Generator of your GAN has learned to capture the underlying data well and will likely generate samples that mimic this data distribution.

- First of all, the sum of the

Your Answer