BlogPost5 Natural Language Processing (NLP) using libraries from the Hugging Face ecosystem

1 minute read

Published:

Motivation

It has been a few years that I haven’t seriously code an NLP product. It turned out that I am a bit outdated. However, I am planning to come back in 2023. This a side note of my revision about NLP using Hugging Face platform. I recall that Hugging Face was just a github page providing BERT in pytorch in 2019. Now, it has grown into a giant ecosystem for NLP. Time really flies.

There will be jupyter notebook associating to each chapter

My note and flow of understanding/thinking about the course

Table of contents

A course from Hugging Face Hub.

IntroductionDiving inAdvanced
IntroductionDiving inAdvanced
Setup and IntroductionThe HF dataset libraryBuilding and sharing demos
Using HF TransformersThe HF tokenizer libraryTransformers can hear
Fine-tuning a pretrained modelMain NLP tasksTransformers can see
Sharing Models and TokenizersHelps from HFOptimizing for production
   

Apendix

Setup and Introduction

Although I am already experienced with NLP It is worth taking a short recap of the theory a little bit.

  • NLP is a field of study which uses machine learning to understand human related language. Not only single words individually but to be able to understand the context of those words.

Common NLP tasks including:

  • Classifying whole sentences: i.e. detect spam email

  • Classifying each word in a sentence: the grammatical component of a sentence. of named entities (person, location, organization)

  • Extracting an answer from a text (a.k.a question answering task): given a question and a context, extracting the answer to the question based on the information provided.

  • Generating a new sentence from an input text: Translating a text into new language, summarizing a text

Transformers, what can they do?

Using HF Transformers

Fine-tuning a pretrained model

Sharing Models and Tokenizers

The HF dataset library

The HF tokenizer library

Main NLP tasks

Helps from HF

Building and sharing demos

Transformers can hear

Transformers can see

Optimizing for production

Apendix