Category: Tutorial, Tools, Machine Learning
Tags: #HuggingFace #Transformers #MachineLearning #Tutorial #NLP
—
Hugging Face has emerged as the central hub of the open-source AI community, often described as the “GitHub of machine learning.” With over 700,000 models, 150,000 datasets, and a vibrant community of researchers and practitioners, Hugging Face has become essential infrastructure for modern AI development. Understanding how to use this platform effectively is now a core skill for anyone working with machine learning.
This comprehensive tutorial covers everything you need to get started with Hugging Face, from basic model usage to advanced features like model training, dataset management, and deployment. Whether you’re a newcomer to machine learning or an experienced practitioner expanding your toolkit, this guide provides practical, hands-on knowledge for leveraging the Hugging Face ecosystem.
What Is Hugging Face?
Hugging Face began as a chatbot company in 2016 but pivoted to become the leading platform for sharing and collaborating on machine learning. Today, it encompasses several core components:
The Hub
The Hugging Face Hub is a central repository for sharing models, datasets, and Spaces (interactive applications). Think of it as GitHub for ML—anyone can upload their work, and anyone can use or build upon it.
Transformers Library
The Transformers library provides a unified API for working with state-of-the-art pre-trained models. It supports PyTorch, TensorFlow, and JAX, making it framework-agnostic while maintaining consistency.
Datasets Library
The Datasets library provides efficient access to thousands of datasets, with features optimized for machine learning workflows: memory mapping, streaming, and built-in preprocessing.
Spaces
Spaces allows users to host and share interactive ML applications using frameworks like Gradio and Streamlit. This enables easy demonstration and deployment of models.
Other Libraries
The ecosystem includes additional libraries like Tokenizers (fast tokenization), Accelerate (distributed training), PEFT (parameter-efficient fine-tuning), TRL (reinforcement learning from human feedback), and many more.
Getting Started: Installation and Setup
Let’s begin with practical setup.
Installation
Install the core libraries using pip:
“bash
# Install transformers (the core library)
pip install transformers
# Install with PyTorch support (recommended)
pip install transformers[torch]
# Or with TensorFlow
pip install transformers[tf]
# Install additional useful libraries
pip install datasets evaluate accelerate
`
Hugging Face Account
Create a free account at huggingface.co. This enables:
- Saving and sharing your own models
- Accessing gated models (some models require accepting terms)
- Using Inference API
- Creating Spaces
Authentication
For operations requiring authentication (pushing models, accessing gated resources), use the CLI:
`bash
# Login via browser
huggingface-cli login
# Or set token directly
huggingface-cli login --token YOUR_TOKEN
`
You can find your token at huggingface.co/settings/tokens.
Using Pre-Trained Models
The most common Hugging Face use case is using pre-trained models. The pipeline abstraction makes this remarkably easy.
Pipelines: The Simplest Interface
Pipelines provide high-level abstractions for common tasks:
`python
from transformers import pipeline
# Text generation
generator = pipeline("text-generation", model="gpt2")
result = generator("The future of AI is", max_length=50)
print(result[0]['generated_text'])
# Sentiment analysis
classifier = pipeline("sentiment-analysis")
result = classifier("I love using Hugging Face!")
print(result) # [{'label': 'POSITIVE', 'score': 0.9998}]
# Question answering
qa = pipeline("question-answering")
result = qa(
question="What is Hugging Face?",
context="Hugging Face is a platform for sharing machine learning models and datasets."
)
print(result) # {'answer': 'a platform for sharing machine learning models and datasets', ...}
# Summarization
summarizer = pipeline("summarization")
text = """
Hugging Face has become the central hub for open-source AI development.
The platform hosts hundreds of thousands of models and datasets, making
it easy for researchers and practitioners to share their work and build
upon others' contributions. The company has raised significant funding
and continues to expand its offerings.
"""
summary = summarizer(text, max_length=50, min_length=10)
print(summary[0]['summary_text'])
`
Available Pipeline Tasks
Common pipeline tasks include:
- text-generation
: Generate text from a prompt - text-classification
/sentiment-analysis: Classify text into categories - token-classification
/ner: Named entity recognition - question-answering
: Extract answers from context - summarization
: Summarize longer texts - translation
: Translate between languages - fill-mask
: Fill in masked tokens - image-classification
: Classify images - object-detection
: Detect objects in images - automatic-speech-recognition
: Transcribe audio - text-to-speech
: Generate audio from text - zero-shot-classification
: Classify without task-specific training - feature-extraction
: Extract embeddings
Choosing Models
The pipeline uses a default model if none specified. To use a specific model:
`python
# Use a specific model for text generation
generator = pipeline("text-generation", model="meta-llama/Llama-2-7b-chat-hf")
# Use a specific model for sentiment analysis
classifier = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")
`
Find models on the Hub by browsing or searching huggingface.co/models.
Working with Models and Tokenizers Directly
For more control, work with models and tokenizers directly.
Loading Models and Tokenizers
`python
from transformers import AutoTokenizer, AutoModel, AutoModelForSequenceClassification
# Load tokenizer and model
model_name = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
# For classification tasks, use the appropriate model class
model = AutoModelForSequenceClassification.from_pretrained(
"distilbert-base-uncased-finetuned-sst-2-english"
)
`
Tokenization
Tokenizers convert text to model-ready input:
`python
# Basic tokenization
text = "Hello, how are you?"
tokens = tokenizer(text)
print(tokens)
# {'input_ids': [101, 7592, 1010, 2129, 2024, 2017, 1029, 102],
# 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1]}
# Tokenize with options
tokens = tokenizer(
text,
padding="max_length", # Pad to max length
truncation=True, # Truncate if too long
max_length=128,
return_tensors="pt" # Return PyTorch tensors
)
# Batch tokenization
texts = ["Hello!", "How are you?", "I'm using Hugging Face."]
tokens = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")
`
Model Inference
`python
import torch
# Prepare input
text = "I really enjoyed this movie!"
inputs = tokenizer(text, return_tensors="pt")
# Run inference
with torch.no_grad():
outputs = model(**inputs)
# For classification, get predictions
logits = outputs.logits
predictions = torch.nn.functional.softmax(logits, dim=-1)
print(predictions) # Probability distribution over classes
`
Using the Datasets Library
The Datasets library provides efficient access to machine learning datasets.
Loading Datasets
`python
from datasets import load_dataset
# Load a dataset
dataset = load_dataset("imdb")
print(dataset)
# DatasetDict({
# train: Dataset({features: ['text', 'label'], num_rows: 25000})
# test: Dataset({features: ['text', 'label'], num_rows: 25000})
# })
# Access splits
train_data = dataset["train"]
print(train_data[0]) # First example
# Load specific splits
train_only = load_dataset("imdb", split="train")
partial = load_dataset("imdb", split="train[:1000]") # First 1000 examples
`
Working with Data
`python
# Iterate over examples
for example in train_data:
print(example["text"][:100])
break
# Filter data
positive = train_data.filter(lambda x: x["label"] == 1)
# Map transformations
def tokenize_function(examples):
return tokenizer(examples["text"], padding=True, truncation=True)
tokenized_data = train_data.map(tokenize_function, batched=True)
# Select columns
tokenized_data = tokenized_data.remove_columns(["text"])
tokenized_data = tokenized_data.rename_column("label", "labels")
`
Streaming Large Datasets
For datasets too large to fit in memory:
`python
# Stream without downloading entirely
dataset = load_dataset("oscar", "unshuffled_deduplicated_en", streaming=True)
for example in dataset["train"].take(10):
print(example["text"][:100])
`
Creating Custom Datasets
`python
from datasets import Dataset
# From Python dictionaries
data = {
"text": ["Hello world", "How are you?", "Fine thanks!"],
"label": [0, 1, 0]
}
dataset = Dataset.from_dict(data)
# From pandas DataFrame
import pandas as pd
df = pd.DataFrame(data)
dataset = Dataset.from_pandas(df)
# From local files
dataset = load_dataset("csv", data_files="my_data.csv")
dataset = load_dataset("json", data_files="my_data.json")
`
Fine-Tuning Models
Fine-tuning adapts pre-trained models to specific tasks.
Basic Fine-Tuning with Trainer
`python
from transformers import (
AutoTokenizer,
AutoModelForSequenceClassification,
TrainingArguments,
Trainer
)
from datasets import load_dataset
import numpy as np
from evaluate import load as load_metric
# Load data
dataset = load_dataset("imdb")
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
# Tokenize
def tokenize_function(examples):
return tokenizer(examples["text"], padding=True, truncation=True, max_length=512)
tokenized_datasets = dataset.map(tokenize_function, batched=True)
tokenized_datasets = tokenized_datasets.remove_columns(["text"])
tokenized_datasets = tokenized_datasets.rename_column("label", "labels")
tokenized_datasets.set_format("torch")
# Load model
model = AutoModelForSequenceClassification.from_pretrained(
"distilbert-base-uncased",
num_labels=2
)
# Define training arguments
training_args = TrainingArguments(
output_dir="./results",
evaluation_strategy="epoch",
learning_rate=2e-5,
per_device_train_batch_size=16,
per_device_eval_batch_size=16,
num_train_epochs=3,
weight_decay=0.01,
push_to_hub=False, # Set True to upload to Hub
)
# Define metrics
metric = load_metric("accuracy")
def compute_metrics(eval_pred):
predictions, labels = eval_pred
predictions = np.argmax(predictions, axis=1)
return metric.compute(predictions=predictions, references=labels)
# Create Trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_datasets["train"].select(range(1000)), # Subset for demo
eval_dataset=tokenized_datasets["test"].select(range(200)),
compute_metrics=compute_metrics,
)
# Train
trainer.train()
# Evaluate
results = trainer.evaluate()
print(results)
`
Parameter-Efficient Fine-Tuning (PEFT)
For large models, full fine-tuning is expensive. PEFT methods like LoRA (Low-Rank Adaptation) update only a small subset of parameters:
`python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model, TaskType
# Load base model
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")
# Configure LoRA
peft_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
inference_mode=False,
r=8, # LoRA rank
lora_alpha=32,
lora_dropout=0.1,
)
# Create PEFT model
model = get_peft_model(model, peft_config)
model.print_trainable_parameters() # Shows only ~0.1% of parameters are trainable
# Now train as usual with Trainer
`
Sharing Models on the Hub
Share your trained models with the community.
Push to Hub
`python
# After training, push to Hub
trainer.push_to_hub("my-awesome-model")
# Or push model and tokenizer separately
model.push_to_hub("my-awesome-model")
tokenizer.push_to_hub("my-awesome-model")
# With specific settings
model.push_to_hub(
"my-awesome-model",
private=True, # Private repository
commit_message="Initial model upload"
)
`
Model Cards
Every model should have a model card (README.md) describing:
- What the model does
- Intended use cases
- Training data and procedure
- Evaluation results
- Limitations and biases
Create a model card in your repository:
`markdown
---
language: en
tags:
- sentiment-analysis
- text-classification
license: apache-2.0
datasets:
- imdb
metrics:
- accuracy
---
# My Sentiment Analysis Model
Model Description
This model classifies movie reviews as positive or negative.
Training Data
Trained on the IMDB dataset (25,000 reviews).
Evaluation Results
Accuracy: 92.5%
Limitations
Trained only on movie reviews; may not generalize to other domains.
`
Using Hugging Face Spaces
Spaces host interactive ML applications.
Creating a Space with Gradio
`python
# app.py for your Space
import gradio as gr
from transformers import pipeline
# Load model
classifier = pipeline("sentiment-analysis")
# Define interface function
def classify(text):
result = classifier(text)[0]
return f"{result['label']} (confidence: {result['score']:.2%})"
# Create interface
demo = gr.Interface(
fn=classify,
inputs=gr.Textbox(placeholder="Enter text to analyze..."),
outputs="text",
title="Sentiment Analysis",
description="Analyze the sentiment of your text!"
)
# Launch
demo.launch()
`
Deploying to Spaces
- Create a new Space at huggingface.co/new-space
- Choose Gradio or Streamlit SDK
- Upload your app.py and requirements.txt
- The Space builds and deploys automatically
Requirements.txt
`
transformers
torch
gradio
`
Advanced Topics
Quantization for Efficient Inference
Reduce model size and improve speed:
`python
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
# 4-bit quantization
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16,
)
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-2-7b-hf",
quantization_config=quantization_config,
device_map="auto"
)
`
Flash Attention
Enable Flash Attention for faster transformer inference:
`python
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-2-7b-hf",
attn_implementation="flash_attention_2",
torch_dtype=torch.float16,
device_map="auto"
)
`
Distributed Training with Accelerate
Train across multiple GPUs:
`python
from accelerate import Accelerator
accelerator = Accelerator()
model, optimizer, train_dataloader = accelerator.prepare(
model, optimizer, train_dataloader
)
for batch in train_dataloader:
outputs = model(**batch)
loss = outputs.loss
accelerator.backward(loss)
optimizer.step()
optimizer.zero_grad()
`
Inference API
Use Hugging Face's hosted inference:
`python
import requests
API_URL = "https://api-inference.huggingface.co/models/gpt2"
headers = {"Authorization": "Bearer YOUR_TOKEN"}
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()
result = query({"inputs": "The future of AI is"})
print(result)
`
Best Practices
Model Selection
- Check model cards for task suitability
- Consider model size vs. available resources
- Check license compatibility with your use case
- Look at community ratings and downloads
Performance Optimization
- Use appropriate batch sizes for your hardware
- Enable mixed precision training (fp16=True
) - Use gradient checkpointing for large models
- Consider quantization for inference
Reproducibility
- Set random seeds
- Document training configuration
- Version your datasets
- Use model cards comprehensively
Community Engagement
- Contribute model cards and documentation
- Report issues and bugs
- Share your fine-tuned models
- Engage in discussions
Troubleshooting Common Issues
CUDA Out of Memory
`python
# Reduce batch size
training_args = TrainingArguments(per_device_train_batch_size=4)
# Use gradient accumulation
training_args = TrainingArguments(
per_device_train_batch_size=4,
gradient_accumulation_steps=4 # Effective batch size = 16
)
# Enable gradient checkpointing
model.gradient_checkpointing_enable()
# Use quantization
`
Tokenizer Warnings
`python
# Set padding token for GPT-style models
tokenizer.pad_token = tokenizer.eos_token
model.config.pad_token_id = model.config.eos_token_id
`
Slow Downloads
`python
# Cache models locally
from transformers import AutoModel
model = AutoModel.from_pretrained("bert-base-uncased", cache_dir="./model_cache")
# Use offline mode
import os
os.environ["TRANSFORMERS_OFFLINE"] = "1"
“
Conclusion
Hugging Face has democratized access to state-of-the-art machine learning models and datasets. The platform’s combination of powerful libraries, vast model repositories, and collaborative features makes it indispensable for modern AI development.
This tutorial covered the essentials: using pipelines for quick inference, working with models and tokenizers directly, leveraging the Datasets library, fine-tuning models, sharing work on the Hub, and deploying with Spaces. These skills form the foundation for productive work with Hugging Face.
The ecosystem continues to expand with new libraries, features, and models appearing regularly. The best way to stay current is to engage with the community, follow the Hugging Face blog, and experiment with new releases.
Whether you’re building a quick prototype or training production models, Hugging Face provides the tools and community to accelerate your work. The platform has made “democratizing good machine learning” more than a slogan—it’s a reality that’s changing how AI is developed worldwide.
—
*Master the Hugging Face ecosystem. Subscribe to our newsletter for tutorials, tips, and updates on the latest tools and models. Join thousands of ML practitioners building with Hugging Face.*
*[Subscribe Now] | [Share This Article] | [Explore More Tutorials]*