List of Transformer tutorials for Deep Learning

Machine Learning and especially Deep Learning are playing increasingly important roles in the field of Natural Language Processing. Over the past few years, Transformer architectures have become the state-of-the-art (SOTA) approach and the de facto preferred route when performing language related tasks.

While once you are getting familiar with Transformes the architecture is not too difficult, the learning curve for getting started is steep. What’s more, the complexity of Transformer based architectures also makes it challenging to build them on your own using libraries like TensorFlow and PyTorch.

Fortunately, today, we have HuggingFace Transformers – which is a library that democratizes Transformers by providing a variety of Transformer architectures (think BERT and GPT) for both understanding and generating natural language. What’s more, through a variety of pretrained models across many languages, including interoperability with TensorFlow and PyTorch, using Transformers has never been easier.

Transformers (formerly known as pytorch-transformers and pytorch-pretrained-bert) provides general-purpose architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet…) for Natural Language Understanding (NLU) and Natural Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between TensorFlow 2.0 and PyTorch.

HuggingFace (n.d.)

At MachineCurve, we offer a variety of articles for getting started with HuggingFace. This page nicely structures all these articles around the question “How to get started with HuggingFace Transformers?”. It offers a go-to page for people who are just getting started with HuggingFace Transformers. In fact, I have learned to use the Transformers and library through writing the articles linked on this page. Going from intuitive understanding to advanced topics through easy, few-line implementations with Python, this should be a great place to start.

Have fun! 🚀🤗

What are Transformers?

Source: Introduction to Transformers in Machine Learning, based on Vaswani et al. (2017)

I’m a big fan of castle building. It means that when you want to understand something in great detail, it’s best to take a helicopter viewpoint rather than diving in and looking at a large amount of details. Castles are built brick by brick and with a great foundation. On this website, my goal is to allow you to do the same, through the Collections series of articles. That’s why, when you want to get started, I advise you to start with a brief history of NLP based Machine Learning and an introduction to the original Transformer architecture.

Saying hello to HuggingFace Transformers

Now that you understand the basics of Transformers, you have the knowledge to understand how a wide variety of Transformer architectures has emerged. Let’s now proceed with all the individual architectures. This is followed by implementing a few pretrained and fine-tuned Transformer based models using HuggingFace Pipelines. Slowly but surely, we’ll then dive into more advanced topics.

Looking at Transformer Architectures

  • BERT
    • ALBERT
    • BART
      • MBart
    • BARThez
    • BertGeneration
    • CamemBERT
    • ConvBERT
    • DeBERTa
    • FlauBERT
    • MobileBERT
    • RetriBERT
    • RoBERTa
      • XLM-RoBERTa
    • SqueezeBERT
  • Blenderbot
  • CTRL
  • DialoGPT
  • DPR
  • FSMT
  • Funnel Transformer
  • GPT
  • LayoutLM
  • Longformer
  • MarianMT
  • MT5
  • Pegasus
  • ProphetNet
    • XLM-ProphetNet
  • RAG
  • Reformer
  • T5
  • Transformer XL
  • Wav2Vec2
  • XLM
    • XLM-ProphetNet
    • XLM-RoBERTa
  • XLNet

Getting started with Transformer based Pipelines

Now that you know a bit more about the Transformer Architectures that can be used in the HuggingFace Transformers library, it’s time to get started writing some code. Pipelines are a great place to start, because they allow you to write language models with just a few lines of code. They use pretrained and fine-tuned Transformers under the hood, allowing you to get started really quickly. In the articles, we’ll build an even better understanding of the specific Transformers, and then show you how a Pipeline can be created.

Running other pretrained and fine-tuned models

The pipelines above are the easiest implementations of pretrained Transformer models. You have to do nothing more than importing the pipeline and then initializing it. It’s then ready to start converting data into summaries, translations, and more.

However, there are more pretrained models out there. The HuggingFace Model Hub contains many other pretrained and finetuned models, and weights are shared. This means that you can also use these models in your own applications. Now that you understand a pipeline into more detail, it’s time to dive into the PreTrainedTokenizer/PreTrainedTokenizerFast tokenizers and PreTrainedModel/TFPreTrainedModel pretrained models for PyTorch and TensorFlow, respectively. Let’s do that now.

Preprocessing data

  • Coming later.

Pretraining Transformers

  • Coming later.

Sharing HuggingFace Transformers

  • Coming later.

Advanced topics


HuggingFace. (n.d.). Transformers — transformers 4.1.1 documentation. Hugging Face – On a mission to solve NLP, one commit at a time.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … & Polosukhin, I. (2017). Attention is all you needAdvances in neural information processing systems30, 5998-6008.

Leave a Reply

Your email address will not be published. Required fields are marked *