← Back to homepage

Easy Table Parsing with TAPAS, Machine Learning and HuggingFace Transformers

March 10, 2021 by Chris

Big documents often contain quite a few tables. Tables are useful: they can provide a structured overview of data that supports or contradicts a particular statement, written in the accompanying text. However, if your goal is to analyze reports - tables can especially be useful because they provide more raw data. But analyzing tables costs a lot of energy, as one has to reason over these tables in answering their questions.

But what if that process can be partially automated?

The Table Parser Transformer, or TAPAS, is a machine learning model that is capable of precisely that. Given a table and a question related to that table, it can provide the answer in a short amount of time.

In this tuturial, we will be taking a look at using Machine Learning for Table Parsing in more detail. Previous approaches cover extracting logic forms manually, while Transformer-based approaches have simplified parsing tables. Finally, we'll take a look at the TAPAS Transformer for table parsing, and how it works. This is followed by implementing a table parsing model yourself using a pretrained and finetuned variant of TAPAS, with HuggingFace Transformers.

After reading this tutorial, you will understand...

Let's take a look! 🚀

Machine Learning for Table Parsing: TAPAS

Ever since Vaswani et al. (2017) introduced the Transformer architecture back in 2017, the field of NLP has been on fire. Transformers have removed the need for recurrent segments and thus avoiding the drawbacks of recurrent neural networks and LSTMs when creating sequence based models. By relying on a mechanism called self-attention, built-in with multiple so-called attention heads, models are capable of generating a supervision signal themselves.

By consequence, Transformers have widely used the pretraining-finetuning paradigm, where models are first pretrained using a massive but unlabeled dataset, acquiring general capabilities, after which they are finetuned with a smaller but labeled and hence task-focused dataset.

The results are incredible: through subsequent improvements like GPT and BERT and a variety of finetuned models, Transformers can now be used for a wide variety of tasks ranging from text summarization, machine translation to speech recognition. And today we can also add table parsing to that list.

Additional reading materials:

BERT for Table Parsing

The BERT family of language models is a widely varied but very powerful family of language models that relies on the encoder segment of the original Transformer. Invented by Google, it employs Masked Language Modeling during the pretraining and finetuning stages, and slightly adapts architecture and embedding in order to add more context to the processed representations.

TAPAS, which stands for Table Parser, is an extension of BERT proposed by Herzig et al. (2020) - who are affiliated with Google. It is specifically tailored to table parsing - not unsurprising given its name. TAPAS allows tables to be input after they are flattened and thus essentially converted into 1D.

By adding a variety of additional embeddings, however, table specific and additional table context can be harnessed during training. It outputs a prediction for an aggregation operator (i.e., what to do with some outcome) and cell selection coordinates (i.e., what is the outcome to do something with).

TAPAS is covered in another article on this website, and I recommend going there if you want to understand how it works in great detail. For now, a visualization of its architecture will suffice - as this is a practical tutorial :)

Source: Herzig et al. (2020)

Implementing a Table Parsing model with HuggingFace Transformers

Let's now take a look at how you can implement a Table Parsing model yourself with HuggingFace Transformers. We'll first focus on the software requirements that you must install into your environment. You will then learn how to code a TAPAS based table parser for question answering. Finally, we will also show you the results that we got when running the code.

Software requirements

HuggingFace Transformer is a Python library that was created for democratizing the application of state-of-the-art NLP models, Transformers. It can easily be installed with pip, by means of pip install transformers. If you are running it, you will also need to use PyTorch or TensorFlow as the backend - by installing it into the same environment (or vice-versa, installing HuggingFace Transformers in your PT/TF environment).

The code in this tutorial was created with PyTorch, but it may be relatively easy (possibly with a few adaptations) to run it with TensorFlow as well.

To run the code, you will need to install the following things into an environment:

pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.6.0+${CUDA}.html

Model code

Compared to Pipelines and other pretrained models, running TAPAS requires you to do a few more things. Below, you can find the code for the TAPAS based model as a whole. But don't worry! I'll explain everything right now.

from transformers import TapasTokenizer, TapasForQuestionAnswering
import pandas as pd

# Define the table
data = {'Cities': ["Paris, France", "London, England", "Lyon, France"], 'Inhabitants': ["2.161", "8.982", "0.513"]}

# Define the questions
queries = ["Which city has most inhabitants?", "What is the average number of inhabitants?", "How many French cities are in the list?", "How many inhabitants live in French cities?"]

def load_model_and_tokenizer():
  """
    Load
  """
  # Load pretrained tokenizer: TAPAS finetuned on WikiTable Questions
  tokenizer = TapasTokenizer.from_pretrained("google/tapas-base-finetuned-wtq")

  # Load pretrained model: TAPAS finetuned on WikiTable Questions
  model = TapasForQuestionAnswering.from_pretrained("google/tapas-base-finetuned-wtq")

  # Return tokenizer and model
  return tokenizer, model


def prepare_inputs(data, queries, tokenizer):
  """
    Convert dictionary into data frame and tokenize inputs given queries.
  """
  # Prepare inputs
  table = pd.DataFrame.from_dict(data)
  inputs = tokenizer(table=table, queries=queries, padding='max_length', return_tensors="pt")

  # Return things
  return table, inputs


def generate_predictions(inputs, model, tokenizer):
  """
    Generate predictions for some tokenized input.
  """
  # Generate model results
  outputs = model(**inputs)

  # Convert logit outputs into predictions for table cells and aggregation operators
  predicted_table_cell_coords, predicted_aggregation_operators = tokenizer.convert_logits_to_predictions(
          inputs,
          outputs.logits.detach(),
          outputs.logits_aggregation.detach()
  )

  # Return values
  return predicted_table_cell_coords, predicted_aggregation_operators


def postprocess_predictions(predicted_aggregation_operators, predicted_table_cell_coords, table):
  """
    Compute the predicted operation and nicely structure the answers.
  """
  # Process predicted aggregation operators
  aggregation_operators = {0: "NONE", 1: "SUM", 2: "AVERAGE", 3:"COUNT"}
  aggregation_predictions_string = [aggregation_operators[x] for x in predicted_aggregation_operators]

  # Process predicted table cell coordinates
  answers = []
  for coordinates in predicted_table_cell_coords:
    if len(coordinates) == 1:
      # 1 cell
      answers.append(table.iat[coordinates[0]])
    else:
      # > 1 cell
      cell_values = []
      for coordinate in coordinates:
        cell_values.append(table.iat[coordinate])
      answers.append(", ".join(cell_values))

  # Return values
  return aggregation_predictions_string, answers


def show_answers(queries, answers, aggregation_predictions_string):
  """
    Visualize the postprocessed answers.
  """
  for query, answer, predicted_agg in zip(queries, answers, aggregation_predictions_string):
    print(query)
    if predicted_agg == "NONE":
      print("Predicted answer: " + answer)
    else:
      print("Predicted answer: " + predicted_agg + " > " + answer)


def run_tapas():
  """
    Invoke the TAPAS model.
  """
  tokenizer, model = load_model_and_tokenizer()
  table, inputs = prepare_inputs(data, queries, tokenizer)
  predicted_table_cell_coords, predicted_aggregation_operators = generate_predictions(inputs, model, tokenizer)
  aggregation_predictions_string, answers = postprocess_predictions(predicted_aggregation_operators, predicted_table_cell_coords, table)
  show_answers(queries, answers, aggregation_predictions_string)


if __name__ == '__main__':
  run_tapas()

Results

Running the WTQ based TAPAS model against the questions specified above gives the following results:

Which city has most inhabitants?
Predicted answer: London, England
What is the average number of inhabitants?
Predicted answer: AVERAGE > 2.161, 8.982, 0.513
How many French cities are in the list?
Predicted answer: COUNT > Paris, France, Lyon, France
How many inhabitants live in French cities?
Predicted answer: SUM > 2.161, 0.513

This is great!

Really cool! 😎

Summary

Transformers have really changed the world of language models. Harnessing the self-attention mechanism, they have removed the need for recurrent segments and hence sequential processing, allowing bigger and bigger models to be created that every now and then show human-like behavior - think GPT, BERT and DALL-E.

In this tutorial, we focused on TAPAS, which is an extension of BERT and which can be used for table parsing. It specifically focused on the practical parts: that is, implementing this model for real-world usage by means of HuggingFace Transformers.

Reading it, you have learned...

I hope that this tutorial was useful for you! 🚀 If it was, please let me know in the comments section below 💬 Please do the same if you have any questions or other comments. I'd love to hear from you.

Thank you for reading MachineCurve today and happy engineering! 😎

References

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30, 5998-6008.

Herzig, J., Nowak, P. K., Mßller, T., Piccinno, F., & Eisenschlos, J. M. (2020). Tapas: Weakly supervised table parsing via pre-training. arXiv preprint arXiv:2004.02349.

GitHub. (n.d.). Google-research/tapas. https://github.com/google-research/tapas

Google. (2020, April 30). Using neural networks to find answers in tables. Google AI Blog. https://ai.googleblog.com/2020/04/using-neural-networks-to-find-answers.html

HuggingFace. (n.d.). TAPAS — transformers 4.3.0 documentation. Hugging Face – On a mission to solve NLP, one commit at a time. https://huggingface.co/transformers/model_doc/tapas.html

Hi, I'm Chris!

I know a thing or two about AI and machine learning. Welcome to MachineCurve.com, where machine learning is explained in gentle terms.