def _text_completion(
self,
prompts: List[str],
temperature: float = 0.6,
top_p: float = 0.9,
max_gen_len: Optional[int] = None,
logprobs: bool = False,
eco: bool = False,
) -> List[CompletionPrediction]: # List인데, Dict로 채워졌음 (Dict는 generation(str)과 tokens(List[str]):optional, logprobs(List[float]):optional
"""
Performance text completion for a list of prompts using the language generation model.
Args:
prompts (List[str]): List of text prompts for completion.
temperature (float, optional): Temperature value for controlling randomness in sampling. Default to 0.6.
top_p (float, optional): Top-p probability threshold for nucleus sampling. Default to 0.9.
max_gen_len (Optional[int], optional): Maximum length of the generated completion sequence. If not provided, it's set to the model's maximum sequence length minus 1.
logprobs (bool, optional): Flag indicating whether to compute token log probabilities. Default to False.
echo (bool, optional): Flag indicating whether to include prompt tokens in the generated output. Defaults to False.
Returns:
List[CompletionPrediction]: List of completion predictions, each containing the generated text completion.
Note:
This method generates text completions for the provided prompts, employing nucleus sampling to introduce controlled randomness.
If logprobs is True, token log probabilities are computed for each generated token.
"""
if max_gen_len is None:
max_gen_len = self.model.params.max_seq_len - 1
prompt_tokens = [self.tokenizer.encode(x, bos=True, eos=False) for x in prompts]
generation_tokens, generation_logprobs = self.generate(
prompt_tokens=prompt_tokens,
max_gen_len=max_gen_len,
temperature=temperature,
top_p=top_p,
logprobs=logprobs,
echo=echo,
)
if logprobs:
return [
{
"generation": self.tokenizer.decode(t),
"tokens": [self.tokenizer.decode([x]) for x in t],
"logprobs": logprobs_i,
}
for t, logprobs_i in zip(generation_tokens, generation_logprobs)
]
return [{"generation": self.tokenizer.decode(t)} for t in generation_tokens]
class CompletionPrediction(TypedDict, total=False):
generation: str
tokens: List[str] # not required
logprobs: List[float] # not required
In text generation models like LLaMA (Large Language Model Meta AI), the use of BOS (Beginning of Sequence) and EOS (End of Sequence) tokens is crucial for controlling the model's behavior during text generation. The specific choices of bos=True
and eos=False
in the code you provided are aligned with how the model is expected to handle input prompts and generate text.
BOS (Beginning of Sequence) Token (bos=True
):
bos=True
, you tell the tokenizer to prepend a special token at the beginning of the sequence. This helps the model recognize that it is the start of the input and begins generating text from this point.EOS (End of Sequence) Token (eos=False
):
eos=False
, you ensure that the input prompt is treated as an incomplete sequence that needs to be continued or completed by the model. The model will then generate text that logically follows the input prompt.In the code snippet you provided:
prompt_tokens = [self.tokenizer.encode(x, bos=True, eos=False) for x in prompts]
bos=True
: Ensures the BOS token is added, marking the start of each prompt.eos=False
: Ensures the EOS token is not added, allowing the model to continue generating text beyond the provided prompt.This setup is crucial for text completion tasks where the model needs to generate the continuation of the input rather than treating it as a complete sequence.
Sure! Let's go through an example to illustrate how the bos=True
and eos=False
settings affect text generation in a model like LLaMA.
Suppose we have a prompt "The weather today is"
and we want the model to complete this sentence.
"The weather today is"
tokens = self.tokenizer.encode("The weather today is")
[101, 2215, 2650, 2154, 2003]
."The weather today is"
tokens = self.tokenizer.encode("The weather today is", bos=True, eos=False)
[2, 101, 2215, 2650, 2154, 2003]
, where 2
represents the BOS token.[2, 101, 2215, 2650, 2154, 2003]
[2024, 1005, 1996, 2203, 2284, 102]
, which could correspond to the text "sunny and warm with a light breeze."
."The weather today is sunny and warm with a light breeze."
Using bos=True
helps the model understand that it's starting a new thought or sentence, which is critical in generating coherent text that aligns with the input prompt. Setting eos=False
ensures that the model knows the input is not complete, so it continues generating additional text.