What is Text Generation? - Hugging Face

Text Generation

Generating text is the task of producing new text. These models can, for example, fill in incomplete text or paraphrase.

Inputs

Input

Once upon a time,

Text Generation Model

Output

Once upon a time, we knew that our ancestors were on the verge of extinction. The great explorers and poets of the Old World, from Alexander the Great to Chaucer, are dead and gone. A good many of our ancient explorers and poets have

About Text Generation

Use Cases

Code Generation

A Text Generation model, also known as causal language model, can be trained on code from scratch to help the programmers in their repetitive coding tasks.

Stories Generation

A Story Generation model, trained for example in a GPT-2 model, could receive an input like "Once upon a time" and proceed to create a story-like text based on those first words (as an example, look at this model inside the Hub).

If your generative model training data is different than your use case, you can train a causal language model from scratch. Learn how to do it in the free transformers course!

Task Variants

Completion Generation Models

A popular variant of Text Generation models predicts the next word given a bunch of words. Word by word a longer text is formed that results in for example:

Given an incomplete sentence, complete it.
Continue a story given the first sentences.
Provided a code description, generate the code.

The most popular models for this task are GPT-based models (such as GPT-2). These models are trained on data that has no labels, so you just need plain text to train your own model. You can train GPT models to generate a wide variety of documents, from code to stories.

Text-to-Text Generation Models

These models are trained to learn the mapping between a pair of texts (e.g. translation from one language to another). The most popular variants of these models are T5, T0 and BART. Text-to-Text models are trained with multi-tasking capabilities, they can accomplish a wide range of tasks, including summarization, translation, and text classification.

Inference

You can use the 🤗 Transformers library text-generation pipeline to do inference with Text Generation models. It takes an incomplete text and returns multiple outputs with which the text can be completed.

from transformers import pipeline
generator = pipeline('text-generation', model = 'gpt2')
generator("Hello, I'm a language model", max_length = 30, num_return_sequences=3)
## [{'generated_text': "Hello, I'm a language modeler. So while writing this, when I went out to meet my wife or come home she told me that my"},
##  {'generated_text': "Hello, I'm a language modeler. I write and maintain software in Python. I love to code, and that includes coding things that require writing"}, ...

Text-to-Text generation models have a separate pipeline called text2text-generation. This pipeline takes an input containing the sentence including the task and returns the output of the accomplished task.

from transformers import pipeline

text2text_generator = pipeline("text2text-generation")
text2text_generator("question: What is 42 ? context: 42 is the answer to life, the universe and everything")
[{'generated_text': 'the answer to life, the universe and everything'}]

text2text_generator("translate from English to French: I'm very happy")
[{'generated_text': 'Je suis très heureux'}]

The T0 model is even more robust and flexible on task prompts.

text2text_generator = pipeline("text2text-generation", model = "bigscience/T0")

text2text_generator("Is the word 'table' used in the same meaning in the two previous sentences? Sentence A: you can leave the books on the table over there. Sentence B: the tables in this book are very hard to read." )
## [{"generated_text": "No"}]

text2text_generator("A is the son's of B's brother. What is the family relationship between A and B?")
## [{"generated_text": "brother"}]

text2text_generator("Is this review positive or negative? Review: this is the best cast iron skillet you will ever buy")
## [{"generated_text": "positive"}]

text2text_generator("Reorder the words in this sentence: justin and name bieber years is my am I 27 old.")
##  [{"generated_text": "Justin Bieber is my name and I am 27 years old"}]

Useful Resources

Would you like to learn more about the topic? Awesome! Here you can find some curated resources that you may find helpful!

Notebooks

Scripts for training

Compatible libraries

Transformers

Text Generation demo

using gpt2

Text Generation

Examples

This model can be loaded on the Inference API on-demand.

Models for Text Generation Browse Models (5720)

gpt2

Text Generation

• Updated May 19, 2021 • 10.8M • 207

Note The model from OpenAI that helped usher in the Transformer revolution.

bigscience/T0pp

Text2Text Generation

• Updated Jun 21 • 59.1k • 276

Note A special Transformer model that can generate high-quality text for various tasks.

Datasets for Text Generation

mc4

Preview • Updated Jul 27 • 71.5k • 16

Note A large multilingual dataset of text crawled from the web.

the_pile

Preview • Updated Jul 1 • 3.42k • 8

Note Diverse open-source data consisting of 22 smaller high-quality datasets. It was used to train GPT-Neo.

Metrics for Text Generation

Cross Entropy: Cross Entropy is a metric that calculates the difference between two probability distributions. Each probability distribution is the distribution of predicted words

Perplexity: The Perplexity metric is the exponential of the cross-entropy loss. It evaluates the probabilities assigned to the next word by the model. Lower perplexity indicates better performance

text gen

Text Generation

Input

Output

About Text Generation

Use Cases

Code Generation

Stories Generation

Task Variants

Completion Generation Models

Text-to-Text Generation Models

Inference

Useful Resources

Notebooks

Scripts for training

Compatible libraries

gpt2

bigscience/T0pp

mc4

the_pile

Post a Comment

0 Comments

Popular Posts

Subscribe Us

Search This Blog

Report Abuse

About Me

Footer Menu Widget

Contact form

text gen

Text Generation

Input

Output

About Text Generation

Use Cases

Code Generation

Stories Generation

Task Variants

Completion Generation Models

Text-to-Text Generation Models

Inference

Useful Resources

Notebooks

Scripts for training

Compatible libraries

Post a Comment

0 Comments

Social Plugin

Popular Posts

Subscribe Us

Search This Blog

Report Abuse

About Me

Footer Menu Widget

Contact form