Text Generation with LLMs using Huggingface

Apr 03, 2024

What is Text Generation

Text generation is a process in which a computer program or algorithm produces text autonomously. This technology falls under the umbrella of natural language processing (NLP) and artificial intelligence (AI). The aim is to create written content that is coherent, contextually relevant, and, depending on the application, either informative or creative. Here's an overview of key aspects:

How It Works

Algorithm Types: Text generation commonly employs machine learning algorithms, particularly those in the subset of deep learning. Recurrent Neural Networks (RNNs), Long Short-Term Memory networks (LSTMs), and Transformer-based models like GPT (Generative Pre-trained Transformer) and BERT (Bidirectional Encoder Representations from Transformers) are popular choices.
Training: These models are trained on large datasets of text. During training, they learn patterns in language, such as grammar, syntax, and style, allowing them to generate coherent and contextually appropriate text.
Input: Many text generation models work by taking an input (often called a "prompt") and expanding on it. For instance, if you provide the first sentence of a story, the model can continue writing the story.

Applications

Content Creation: From writing articles and reports to creating narratives and poetry, text generation can be used for a variety of content creation tasks.
Chatbots and Virtual Assistants: It enables chatbots and virtual assistants to generate human-like responses in conversations.
Email and Messaging: Some applications can draft emails or messages based on brief user inputs.
Programming and Coding: Advanced models can generate code snippets or even full programs based on a user's specifications.

Challenges and Considerations

Quality and Coherence: While effective, these models can sometimes produce text that is nonsensical or not entirely relevant to the context.
Bias and Ethics: Models can inadvertently learn and reproduce biases present in their training data. There are also ethical considerations around the use of generated text, particularly in areas like journalism or academic writing.
Computational Resources: Advanced models require significant computational power, especially for training.

Evolution

The field of text generation is rapidly evolving, with new models and techniques emerging regularly. This advancement is not only improving the quality and efficiency of text generation but also expanding its potential applications. As these technologies develop, they are likely to become an even more integral part of various digital platforms and services.

Text generation using Hugging Face's models, like GPT-4, is an advanced application of natural language processing (NLP). Hugging Face provides a platform that offers various models for tasks such as text generation, language translation, summarization, and more.

What is Huggingface

Hugging Face is a company and community known for its work in the field of artificial intelligence (AI), particularly focusing on natural language processing (NLP) and machine learning. They have gained significant attention and popularity for several reasons:

AI Models and Libraries

Transformers Library: Hugging Face is most renowned for its open-source "transformers" library. This library provides a vast collection of pre-trained models for a variety of NLP tasks like text classification, information extraction, question answering, summarization, translation, and text generation. It includes popular models like BERT, GPT, T5, and many others.
Model Hub: They offer a platform where researchers and developers can share and collaborate on AI models. This hub features a wide range of models trained on diverse datasets for various tasks, making it easier for users to find a model suited to their specific needs. Here are a few open source models.

Easy Integration and Accessibility

Hugging Face's tools are designed to be easily integrable into existing projects, lowering the barrier to entry for using advanced AI models. Their APIs facilitate straightforward integration of models into applications.

Community and Collaboration

Hugging Face has a strong community focus, providing a space for AI researchers and enthusiasts to collaborate, share knowledge, and contribute to the development of NLP technology.They contribute to education in the AI field by providing resources, documentation, and tutorials, making NLP more accessible to a broader audience. Their platform is also widely used in academic research for experimenting with and deploying AI models.

Business Solutions

Beyond their open-source contributions, Hugging Face also offers enterprise solutions, providing tools and support for businesses to integrate AI into their operations

In summary, Hugging Face plays a pivotal role in democratizing access to cutting-edge NLP technology, fostering a community for AI innovation, and promoting ethical AI development. Their contributions have significantly impacted both the academic and industrial landscapes of AI and NLP.

Text Generation using Huggingface

Using Large Language Models (LLMs) for text generation with Hugging Face involves several steps and considerations to harness the power of some of the most advanced NLP models available. Here's a breakdown of the process:

1. Choosing the Right Model

Hugging Face provides access to a variety of LLMs like GPT-2, GPT-3, and newer versions, each with varying capabilities.
The choice depends on the complexity of the task and the resources available (larger models require more computational power).

2. Setting Up the Environment

Install the Hugging Face transformers library, which is the primary toolkit for interacting with these models.
Install the torch library to install pytorch. PyTorch's capabilities in handling neural networks, its flexibility, and its compatibility with GPU acceleration make it an ideal choice for working with the advanced NLP models provided by Hugging Face.

!pip install transformers
!pip install torch

Some models may require additional setup, like API keys, especially for models hosted by OpenAI like GPT-3.

3. Initializing the Model and Tokenizer

In Python, import the necessary classes and initialize the model and its tokenizer.
The tokenizer converts text to and from a format the model can understand.

from transformers import AutoTokenizer, AutoModelForCausalLM, GPT2LMHeadModel, GPT2Tokenizer

tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")

Autotokenizer and automodel for Causal Language model can be used for all models, whereas there are specific modules for each Language Model for example:

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("gpt2")

4. Generating Text

Provide a prompt to the model. This is the starting point for text generation.
Use the tokenizer to encode this prompt and pass it to the model.
Control the generation using parameters like max_length, temperature, and top_k.

inputs = tokenizer.encode("What is a cat", return_tensors='pt')
outputs = model.generate(inputs, max_length=50, num_return_sequences=1)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


What is a cat?

A cat is a "human" creature. A cat uses its tail to carry, feed, and carry more than one person and to run or run from any part of the home. That means that cats have many

Use the Text Generation Pipeline

Huggingface transformer has another module : Pipeline that can be used for text generation. Here’s a simple example in Python to generate text using a pipeline:

from transformers import pipeline

# Create a text generation pipeline
generator = pipeline('text-generation', model='gpt2')  # You can replace 'gpt2' with any other model

# Use the pipeline to generate text
prompt = "Once upon a time"
generated_text = generator(prompt, max_length=50, num_return_sequences=1)

# Output the generated text
for text in generated_text:
    print(text['generated_text'])

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Once upon a time, he wanted out, the great white whale would get in the way, he would fall into the sea, or even drown.

This article is the first of a four part series in an upcoming series on the history of

In this example, I've used GPT-2 as the model, but you can replace 'gpt2' with any other text generation model available on Hugging Face. The max_length parameter controls the maximum length of the generated text, and num_return_sequences determines how many different text sequences you want to generate.

Customizing the Output

You can customize the behavior of the pipeline by tweaking its parameters:

Model: Choose different models for different styles and capabilities of text generation.
Max Length: Adjust the length of the generated text.
Temperature: Affects the randomness of the generated text. Higher values lead to more creative and less predictable text.
Top-k and Top-p: Parameters that control the sampling strategy, influencing how the model picks the next word.

Note on Usage

The first time you run the pipeline for a specific model, it downloads and caches the model, which requires an internet connection.
Larger models may consume significant computational resources.
Always consider ethical implications and potential biases in the generated text.

The pipeline function is a versatile tool within the Hugging Face ecosystem, designed to simplify the usage of complex models. It supports not only text generation but also other tasks like sentiment analysis, question answering, and more. Let us also see an example for Question answering. But for this we need Huggingface setup to use one of the models that are present in huggingface as it is not directly available like GPT2.

Huggingface Setup

To use certain features of Hugging Face, such as uploading models to the Hugging Face Model Hub or using their Inference API, you need to log in to your Hugging Face account. Here's how you can do it:

Online Login (Web Interface)

Visit Hugging Face Website: Go to the Hugging Face website.
Sign Up/Log In: If you don't already have an account, you'll need to sign up. Otherwise, log in with your credentials.
Accessing Features: Once logged in, you can access various features such as browsing models, datasets, and using the Inference API.

Using Hugging Face in Code (API Login)

If you're planning to use Hugging Face within your code, especially for tasks like uploading models to the Hugging Face Model Hub, you'll need to authenticate using an API token.

Get Your API Token:
- After logging into the Hugging Face website, go to your profile settings.
- Find the section for API tokens and generate a new token if you don’t have one.
Installing Hugging Face CLI:
- The Hugging Face command line interface (CLI) tool can be installed via pip if it's not already installed.

!pip install huggingface_hub

Login via CLI:
- Open your command line tool.
- Run the following command and enter your API token when prompted:

huggingface-cli login

Using the Token in Your Code:
- You can also manually use the token in your Python scripts, especially when using the Hugging Face API for tasks like model uploading or private model downloading.
- It’s crucial to keep this token secure and not expose it in publicly shared code.

Note on Security

Your Hugging Face API token is like a password. Do not share it publicly or check it into version control systems like Git.

Additional Features After Login

Once logged in, you can contribute to the community, upload models, collaborate on projects, and access private models (depending on your plan).

Logging in to Hugging Face allows you to fully leverage the community and technical resources of the platform, making it a crucial step for anyone looking to contribute to or utilize advanced features of the Hugging Face ecosystem.

Now that setup is done, let us get back to our question answering functionality in pipeline

# Create a question-answering pipeline
qa_pipeline = pipeline("question-answering", model = "distilbert-base-uncased-distilled-squad")

# Define the context and the question
context = """
The Great Wall of China is one of the world's most famous landmarks. It is a series of fortifications made of stone, brick, tamped earth, wood, and other materials, generally built along an east-to-west line across the historical northern borders of China to protect the Chinese states and empires against the raids and invasions of the various nomadic groups of the Eurasian Steppe.
"""

question = "What was the Great Wall of China built to protect against?"

# Use the pipeline to answer the question
answer = qa_pipeline(question=question, context=context)

# Output the answer
print(f"Question: {question}")
print(f"Answer: {answer['answer']}")

Question: What was the Great Wall of China built to protect against?
Answer: raids and invasions of the various nomadic groups of the Eurasian Steppe

Text generation using Mistral

Now lets try with a bigger model: Mistral. Try to load the model using transformers using the following code snippet.

from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")

messages = [
    {"role": "user", "content": "What is your favourite condiment?"},
    {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
    {"role": "user", "content": "Do you have mayonnaise recipes?"}
]

encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")

model_inputs = encodeds.to(device)
model.to(device)

generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

What happened? If you have a high functioning GPU, this would work fine without any issues, otherwise most likely you would get an out of memory issue. Now how can we use bigger LLMs like Mistral which has 7 billion params? We will discuss this in our upcoming blog.

Hidden Gems of Data Science by ML+

Discussion about this post