딥러닝 - 12

CYSSSSSSSSS·2023년 9월 8일

딥러닝

목록 보기

12/12

Open AI

다양한 언어 모델과 chatGPT 를 제공하는 회사이다

실습

import os
import openai

openai.api_key = "API_key"

response = openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=[
    {
      "role": "user",
      "content": "오랜만이야"
    },
    {
      "role": "user",
      "content": "요즘 어떻게 지내?"
    }
  ],
  temperature=1,
  max_tokens=256,
  top_p=1,
  frequency_penalty=0,
  presence_penalty=0
)

학습이 끝난 모델로 서비스를 만들어 둔 상태라 설명서 대로 이용하면 된다.
학습에 관련된 텍스트에 관한 설정을 조절할수 있다 (토큰 , 문장길이 등등)

텍스트 생성기

import gradio as gr
import openai

openai.api_key = "API-key"

def 텍스트생성(prompt):
    response = openai.Completion.create(
        model="text-davinci-003",
        prompt=prompt,
        temperature=0.7,
        max_tokens=1024,
    )
    return prompt + response['choices'][0]['text'].strip()

demo = gr.Interface(fn=텍스트생성, inputs="text", outputs="text")
demo.launch(share=True)

영어 번역 앱

import gradio as gr
import openai

openai.api_key = "Apikey"

def 텍스트생성(prompt):
    response = openai.Completion.create(
        model="text-davinci-003",
        prompt=f"{prompt} 를 영어로 변역해줘",
        temperature=0.7,
        max_tokens=1024,
    )
    return response['choices'][0]['text'].strip()

demo = gr.Interface(fn=텍스트생성, inputs="text", outputs="text")
demo.launch(share=True)

prompt 를 조절하여 다양한 앱을 만들어 줄수 있다.
prompt 가 언어 모델을 결과를 다르게 만들수 있을 정도로 잘 작성되어야 한다.

chatbot + gradio

import gradio as gr

import os
import openai

openai.api_key = "API_key"

messages = [
    {"role": "system", "content": "넌 이제부터 나의 오랜 친구야"}
]

def chat(msg, history):
    messages.append({"role": "user", "content": msg}) # user 가 보낸 메세지
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=messages, # chat gpt 를 이용하는 방식 system role 은 역할 지정 , user 는 내가 입력하는 문장  # 메세지에 내용을 추가  현재 의 대화를 통해 chatgpt 가 대답 
        temperature=1,
        max_tokens=256
    )
    messages.append({"role": "assistant", "content": response.choices[0].message['content']}) # chat봇이 대답한 메세지
    print(messages)

    history.append((msg, response.choices[0].message['content'])) # gradio 의 문장을 구성하는 변수
    return "", history

with gr.Blocks() as demo:
    chatbot = gr.Chatbot() # 챗봇 결과 
    msg = gr.Textbox() # 유저가 보내는 텍스트 창
    send = gr.Button("Send") # 유저가 보내는 메세지 버튼
    clear = gr.ClearButton([msg, chatbot]) # 대화 내용 삭제 메세지

    msg.submit(chat, [msg, chatbot], [msg, chatbot])
    send.click(chat, [msg, chatbot], [msg, chatbot])

demo.launch(share=True)

피자 주문 챗봇

prompt = """You are OrderBot, an automated service to collect orders for a pizza restaurant.
You first greet the customer, then collects the order, and then asks if it's a pickup or delivery.
You wait to collect the entire order, then summarize it and check for a final time if the customer wants to add anything else.
If it's a delivery, you ask for an address. Finally you collect the payment.
Make sure to clarify all options, extras and sizes to uniquely identify the item from the menu.
You respond in a short, very conversational friendly style.
The menu includes
 pepperoni pizza 12.95, 10.00, 7.00
 cheese pizza 10.95, 9.25, 6.50
 eggplant pizza 11.95, 9.75, 6.75
 fries 4.50, 3.50
 greek salad 7.25
Toppings:
 extra cheese 2.00,
 mushrooms 1.50
 sausage 3.00
 canadian bacon 3.50
 AI sauce 1.50
 peppers 1.00
Drinks:
 coke 3.00, 2.00, 1.00
 sprite 3.00, 2.00, 1.00
 bottled water 5.00
"""

import gradio as gr

import os
import openai

openai.api_key = "API_key"

messages = [
]

def chat(msg, history):
    messages.append({"role": "user", "content": msg}) # user 가 보낸 메세지
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "system", "content": prompt}, *messages[-10:]], # chat gpt 를 이용하는 방식 system role 은 역할 지정 , user 는 내가 입력하는 문장  # 메세지에 내용을 추가  현재 의 대화를 통해 chatgpt 가 대답  # message 는 토큰을 통한 최근대화 
        temperature=1,
        max_tokens=256
    )
    messages.append({"role": "assistant", "content": response.choices[0].message['content']}) # chat봇이 대답한 메세지
    print(messages)

    history.append((msg, response.choices[0].message['content'])) # gradio 의 문장을 구성하는 변수
    return "", history

with gr.Blocks() as demo:
    chatbot = gr.Chatbot() # 챗봇 결과 
    msg = gr.Textbox() # 유저가 보내는 텍스트 창
    send = gr.Button("Send") # 유저가 보내는 메세지 버튼
    clear = gr.ClearButton([msg, chatbot]) # 대화 내용 삭제 메세지

    msg.submit(chat, [msg, chatbot], [msg, chatbot])
    send.click(chat, [msg, chatbot], [msg, chatbot])

demo.launch(share=True)

GPT

다음 단어를 예측하는 함수
다음 토큰 확률 함수

오늘부터 내 꿈은 + Next
Next : 너야(5%) , 대통령(2%) , 슈퍼맨 (1%)

gpt 는 현재 문장을 기준으로 다음 토큰의 확률을 예측하는 확률 함수이다.

토큰과 인베딩

뜻을 지니는 가장 작은 단위와 조각

토큰화

의미를 지니는 가장 최소 단위를 찾는다
문장을 토큰화로 분리
각각의 토큰에 라벨링을 하는것이다.
라벨링한 토큰들을 다음토큰을 찾는 함수에 input 으로 들어간다
가장 높은 확률의 토큰을 다음 토큰으로 한다.
문장이 끝날때까지 계속 반복한다.

from transformers import GPT2LMHeadModel, PreTrainedTokenizerFast

# 토큰화 함수
tokenizer = PreTrainedTokenizerFast.from_pretrained("taeminlee/kogpt2")
# 다음 단어 함수
model = GPT2LMHeadModel.from_pretrained("taeminlee/kogpt2")

text = "아들아 너는 계획이"
tokens = tokenizer.encode(text , return_tensors = "pt")
tokens

outputs = model(tokens)[0][0,-1,:] # 모든 토큰들의 확률값
outputs

token = outputs.argmax(-1)# 가장 높은 확률의 토큰
decoded = tokenizer.decode(token)
print(token,decoded)

토큰들 간에 관계를 표현

단어들간의 관계가 잘 표현된 숫자이다.
단어를 공간속에 점을 잘찍어두면(임베딩) 단어 간에 관계 를 표현 할수 있다.
임베딩 함수를 통해 구해진 숫자값으로 입력에 사용한다.

임베딩 레이터

모델을 만드는 사람이 컬럼의 수를 바꾸어 n차원 벡터로 바꾸면 테이블을 만든다
처음에는 random 하게 점을 찍는다.
해당 포인트들을 조금씩 이동시킨다 (weigths 이동)
loss 가 낮아지게 만든다.(최소를 만든다)
각 점들은 자기가 있어야 할 위치로 이동 한 상태이다(모델이 백터를 스스로 만든다)

Embedding

데이터 분야에서 데이터를 효과적으로 처리하기 위해 사용하는 방법
다양한 종류의 데이터를 수치화 하는 과정
데이터를 수치 벡터로 변환
고차원 에서 저차원으로 매핑.

Vector DB

딥러닝 함수는 임베딩 함수를 잘 만들어주는 도구
임베딩 벡터로 뽑아 비슷한 데이터 들 끼리 모으는것이다.
임베딩 함수를 잘 만들기 어려워서 이용이 힘들었다
딥러닝이 발달된 이후로 비슷한 분야의 임베딩 벡터가 가능하여 학습할수 있어서 벡터 DB 가 활용 할수가 있다.
임베딩 함수에 에 차원에 벡터 값이 들어가고
ex) 이미지 검색 에 대해 벡터 값과 유사한 이미지를 뽑는다.
vector db 에서는 비슷한 유형의 데이터를 찾을수 있다.

content + application -> embedding model -> vector embedding -> vertor db

chroma ,Langchain

from langchain.vectorstores import Chroma
from langchain.embeddings import HuggingFaceEmbeddings

# "jhgan/ko-sbert-sts" -> 문장을 벡터로 바꿔주는 함수 

db = Chroma.from_texts(
    collection_name = 'sample',
    texts = examples,
    embedding = HuggingFaceEmbeddings(model_name = "jhgan/ko-sbert-sts")
)

question = '메리 볼 워싱턴의 딸은 누구인가요?'
doc = db.similarity_search(question , k=1)
print(doc)
print(doc[0].page_content)

질문을 통한 search 가 가능하다 -> why? : 우리가 만든 examples (sentenct) 를 벡터 db 에 저장하고 새로운 질문이 들어오면 임베딩 vector 가 비슷한 답은 찾을수 있다.

CYSSSSSSSSS

개발자 되고 싶어요

이전 포스트

딥러닝 - 12

딥러닝

Open AI

실습

텍스트 생성기

영어 번역 앱

chatbot + gradio

피자 주문 챗봇

GPT

토큰과 인베딩

토큰화

토큰들 간에 관계를 표현

임베딩 레이터

Embedding

Vector DB

chroma ,Langchain

딥러닝 - 11

0개의 댓글