[llama3/llama/generation.py][class Llama] def text_completion

-·2024년 8월 30일

목록 보기

12/16

def _text_completion(
	self,
    prompts: List[str],
    temperature: float = 0.6,
    top_p: float = 0.9,
    max_gen_len: Optional[int] = None,
    logprobs: bool = False,
    eco: bool = False,
) -> List[CompletionPrediction]: # List인데, Dict로 채워졌음 (Dict는 generation(str)과 tokens(List[str]):optional, logprobs(List[float]):optional
	"""
    Performance text completion for a list of prompts using the language generation model.
    
    Args:
    	prompts (List[str]): List of text prompts for completion.
        temperature (float, optional): Temperature value for controlling randomness in sampling. Default to 0.6.
        top_p (float, optional): Top-p probability threshold for nucleus sampling. Default to 0.9.
        max_gen_len (Optional[int], optional): Maximum length of the generated completion sequence. If not provided, it's set to the model's maximum sequence length minus 1.
        logprobs (bool, optional): Flag indicating whether to compute token log probabilities. Default to False.
        echo (bool, optional): Flag indicating whether to include prompt tokens in the generated output. Defaults to False.
        
    Returns:
    	List[CompletionPrediction]: List of completion predictions, each containing the generated text completion.
    
    Note:
    	This method generates text completions for the provided prompts, employing nucleus sampling to introduce controlled randomness.
        If logprobs is True, token log probabilities are computed for each generated token.
    """
    if max_gen_len is None:
    	max_gen_len = self.model.params.max_seq_len - 1
    prompt_tokens = [self.tokenizer.encode(x, bos=True, eos=False) for x in prompts] 
	generation_tokens, generation_logprobs = self.generate(
    	prompt_tokens=prompt_tokens,
        max_gen_len=max_gen_len,
        temperature=temperature,
        top_p=top_p,
        logprobs=logprobs,
        echo=echo,
    )
    if logprobs:
    	return [
        	{
            	"generation": self.tokenizer.decode(t),
                "tokens": [self.tokenizer.decode([x]) for x in t],
                "logprobs": logprobs_i,
            }
            for t, logprobs_i in zip(generation_tokens, generation_logprobs)
        ]
    return [{"generation": self.tokenizer.decode(t)} for t in generation_tokens]

class CompletionPrediction(TypedDict, total=False):
	generation: str
    tokens: List[str] # not required
    logprobs: List[float] # not required

In text generation models like LLaMA (Large Language Model Meta AI), the use of BOS (Beginning of Sequence) and EOS (End of Sequence) tokens is crucial for controlling the model's behavior during text generation. The specific choices of bos=True and eos=False in the code you provided are aligned with how the model is expected to handle input prompts and generate text.

Explanation:

BOS (Beginning of Sequence) Token (bos=True):
- The BOS token is used to indicate the start of a sequence. When you set bos=True, you tell the tokenizer to prepend a special token at the beginning of the sequence. This helps the model recognize that it is the start of the input and begins generating text from this point.
- In the context of text completion or generation, you typically want the model to understand where the input prompt starts, so the model can generate appropriate completions or responses.
EOS (End of Sequence) Token (eos=False):
- The EOS token is used to indicate the end of a sequence. When generating text, you usually don't want to include an EOS token in the input prompts because the model might interpret it as the end of the input, and thus stop generating any further text.
- By setting eos=False, you ensure that the input prompt is treated as an incomplete sequence that needs to be continued or completed by the model. The model will then generate text that logically follows the input prompt.

Summary:

In the code snippet you provided:

prompt_tokens = [self.tokenizer.encode(x, bos=True, eos=False) for x in prompts]

bos=True: Ensures the BOS token is added, marking the start of each prompt.
eos=False: Ensures the EOS token is not added, allowing the model to continue generating text beyond the provided prompt.

This setup is crucial for text completion tasks where the model needs to generate the continuation of the input rather than treating it as a complete sequence.

Sure! Let's go through an example to illustrate how the bos=True and eos=False settings affect text generation in a model like LLaMA.

Example Scenario:

Suppose we have a prompt "The weather today is" and we want the model to complete this sentence.

Without BOS and EOS Tokens:

Input Prompt: "The weather today is"
Encoding Without BOS and EOS:
```
tokens = self.tokenizer.encode("The weather today is")
```
- This might produce a sequence of token IDs corresponding to the words in the prompt: [101, 2215, 2650, 2154, 2003].
- When passed to the model, the model may not know this is the beginning of a new sequence, and it might generate unpredictable or less coherent continuations.

With BOS=True and EOS=False:

Input Prompt: "The weather today is"
Encoding With BOS=True and EOS=False:
```
tokens = self.tokenizer.encode("The weather today is", bos=True, eos=False)
```
- This would produce something like [2, 101, 2215, 2650, 2154, 2003], where 2 represents the BOS token.
- The BOS token informs the model that this is the start of a new sequence, so it treats the input as the beginning of a coherent thought.
- The absence of the EOS token signals that the sequence isn't finished, encouraging the model to generate a continuation.

Text Generation:

Model Input: [2, 101, 2215, 2650, 2154, 2003]
Model Output: The model might generate something like [2024, 1005, 1996, 2203, 2284, 102], which could correspond to the text "sunny and warm with a light breeze.".

Final Output:

Concatenated Result: "The weather today is sunny and warm with a light breeze."

Comparison:

Without BOS: The model might not understand that it needs to generate a continuation specifically related to the input, leading to less relevant or coherent output.
With BOS=True, EOS=False: The model is primed to generate a meaningful continuation that directly follows from the input prompt.

Summary:

Using bos=True helps the model understand that it's starting a new thought or sentence, which is critical in generating coherent text that aligns with the input prompt. Setting eos=False ensures that the model knows the input is not complete, so it continues generating additional text.

거인의 어깨에 올라서서 더 넓은 세상을 바라보라 - 아이작 뉴턴

[llama3/llama/generation.py][class Llama] def text_completion

LLM

Explanation:

Summary:

Example Scenario:

Without BOS and EOS Tokens:

With BOS=True and EOS=False:

Text Generation:

Final Output:

Comparison:

Summary:

[llama3/llama/generation.py][class Llama] def generate

[llama3/llama/generation.py][class Llama] def sample_top_p

0개의 댓글