2. 프롬프트로 Switch ~ Case 문 구현하기

dasd412·2023년 10월 24일

switch문 프롬프트

SW 마에스트로

목록 보기

10/10

문제 상황

'프롬프트만으로 if ~ else if ~ else 또는 switch ~ case를 표현할 수 없을까?', '프롬프트의 결과에 대해 분기 처리를 할 수 없을까?' 와 관련된 문제를 해결하고 싶었습니다.

왜냐하면 면접 질문은 일반적인 대화보다 프롬프트의 품질이 높아야 하고, Prompt Injection 등 다양한 돌발 상황을 방지하고 싶었기 때문입니다.

해결책

프롬프트

사실 위와 같은 문제는 저번 시리즈에서 이미 답으로 나왔었습니다. 이번에는 langchain까지 적용한 파이썬 코드를 살펴보겠습니다.

class AnswerFilter:
    def check_answer_appropriate(self,job_group:str,question:str,answer:str)->str:
        prompt = ChatPromptTemplate(
            messages=[
                SystemMessagePromptTemplate.from_template(
                        f"""
                        Let's think step by step.

                        [Situation]
                        You are interviewers working at company in job position {job_group}.
                        You are evaluating the interview candidate's response to the previous interview question.

                        [Evaluation]
                        After analyzing the interviewee's response, please select the option with the highest relevance among the following items.

                        1. Direct request or Repeated requests to repeat the question
                        2. Uncertain responses
                        3. Silence or minimal response
                        4. Off-topic responses
                        5. A standard/interview-appropriate response.
                        
                        (HINT!)The criteria of evaluation:
                        - Direct request: The most definitive criterion is when the candidate explicitly requests further explanation or clarification, such as "I did not understand the question" or "Could you please explain that again?".

                        - Uncertain responses: If a candidate responds to a question hesitantly or ambiguously, or uses uncertain language like 'maybe', 'possibly', 'it could be', the interviewer may infer that they did not fully understand the question.

                        - Repeated requests to repeat the question: If a candidate repeatedly asks for the question to be repeated, they likely did not understand the question.

                        - Off-topic responses: If a candidate's answer is not related to the question or strays from the topic, the interviewer may suspect that the candidate did not properly understand the question.

                        - Silence or minimal response: If the candidate provides no or minimal response to a question, the interviewer may infer that they did not understand the question or are unsure about how to respond.

                        [Conclusion]
                        (This part must not be omitted from the output.)
                        (The beginning of output format must be as follows)
                        "I think it is (only number)"

                        """),
                HumanMessagePromptTemplate.from_template(
                        """
                        Previous interview question: {question}

                        Candidate's response : {answer}

                        """)
            ],
            input_variables=["question","answer"],
        )
        
        llm=ChatOpenAI(temperature=0.3,model_name='gpt-3.5-turbo',verbose=True,streaming=True, callbacks=[StreamingStdOutCallbackHandler()])
        chain = LLMChain(llm=llm,prompt=prompt)

        return chain.run({"question":question,"answer":answer})

이번에 주목해야 할 부분은 다음 부분들입니다.

                        After analyzing the interviewee's response, please select the option with the highest relevance among the following items.

                        1. Direct request or Repeated requests to repeat the question
                        2. Uncertain responses
                        3. Silence or minimal response
                        4. Off-topic responses
                        5. A standard/interview-appropriate response.

와

                        [Conclusion]
                        (This part must not be omitted from the output.)
                        (The beginning of output format must be as follows)
                        "I think it is (only number)"

입니다.

첫 번째 부분을 해석해보면, "면접 지원자의 답변을 분석해서 가장 관련성 높은 선택지(option)을 고르라"라는 것입니다.

두 번째는 출력 형식의 강제성을 주입하고 있습니다. 물론 이 역시 100% 안전하다곤 할 수 없습니다. 더 높은 품질을 위해선 이 프롬프트 결과에 대한 예외 처리도 꼼꼼히 해주세요.

코드

예외 처리

class InappropriateAnswerError(Exception):
    def __init__(self, message="Inappropriate answer provided."):
        self.message = message
        super().__init__(self.message)

class ResubmissionRequestError(Exception):
    def __init__(self, message="A resubmission has been requested."):
        self.message = message
        super().__init__(self.message)

먼저 예외 처리 코드를 살펴보죠. InappropriateAnswerError는 부적절한 답변일 경우 발생하는 예외입니다. AnswerFilter 클래스의 결괏값 중 2,3,4면 첫 번째 예외에 걸립니다. 1번이면 ResubmissionRequestError 예외를 발생시킵니다. 5번인 경우 정상 답변이므로 예외가 발생하지 않습니다.

정규 표현식

def find_first_number(text):
    match = re.search(r'\d+', text)
    if match:
        return match.group()
    else:
        return None

프롬프트 결과에서 숫자로 해당되는 문자열을 가져오는 정규표현식입니다.

흐름 제어

class FollowUpQuestionManager:
    def __init__(self):
        self.filter=AnswerFilter()
        # 이하 생략
        
    def manage_followup_question(self,job_group:str,question:str,answer:str)->str:
        
        # 필터
        check=self.filter.check_answer_appropriate(job_group=job_group,question=question,answer=answer)

        number=find_first_number(check)
        
        if number=="1":
            raise ResubmissionRequestError()
        elif number=="2" or number=="3" or number=="4":
            raise InappropriateAnswerError()
        # 이하 생략

위 코드는 면접의 꼬리질문을 출제하는데 사용하는 로직의 일부입니다. 맨 처음에 AnswerFilter 클래스를 활용해서, 면접 지원자의 답변이 적절한지 LLM이 판단합니다. 그리고 그 결과를 if ~ else if ~ else 문을 활용해서 알맞은 예외를 캐치하고 있습니다.

사실 더더더더 꼼꼼히 예외 처리해야 합니다.

LLM은 비결정적입니다. 1,2,3,4,5만 나온다고 보장할 수 있을까요? 아니오.
단위 테스트를 짜서 assertEqual문을 활용하면, 몇 십번에 1번씩은 Test Fail을 뱉어내는게 LLM입니다.

따라서 위의 예시처럼 1,2,3,4 뿐만 아니라 여러 예외 케이스도 잡아낼 필요가 있겠습니다.
(위 예시는 사실 Jupyter notebook에서 작성한 테스트 예제이기도 합니다 ^^;;)

langchain 이미 있지 않음? 왜 안씀?

본심부터 말하면 전 langchain을 별로 좋아하지 않습니다. 심지어 지금 production 상태인 코드도 langchain 걷어내고 open ai로 갈아끼우고 있습니다.

이유는 다음과 같습니다.

문서화가 혼란스럽게 되어 있고 체계화가 안 되어있다. 문제가 발생해도 어딜 찾아봐야 할 지 모르겠다.
Sequential Chain은 그냥 프롬프트 연결을 한 것 뿐이다.
Router Chain은 복잡하다. 지금 프롬프트처럼 switch ~ case로 잡아내는게 더 간결했다.
예외를 캐치하는게 더 끔찍하다. RateLimitError의 경우 openai는 try ~ except면 쉽게 잡아낼 수 있었습니다. 그런데 langchain은 대체 어디서 잡아야 함?
Memory는 사실 프롬프트 안에 이전 맥락을 넣으면 끝이다... summary memory의 경우에도 llm으로 요약 후에 맥락을 넣는것과 별반 차이 없음...
예제를 봐도 이해가 잘 안됨.
Chain이 많은데 뭔 차이가 있는 지 모르겠음. 그래서 까봤는데요. LLMChain과 ConversationChain의 차이를 살펴보니, 후자의 경우엔 "더 친밀하게 대답해라"라는 프롬프트가 하나 추가된 것 밖에 없더라고요 ㅎㅎ;;

Agent 영역은 제가 프로젝트에 활용 해본 적이 없어서 그 쪽은 잘 모르겠습니다.
어쨋든 그 부분을 제외하면 langchain은 배우기 어렵고 쓰기도 어렵고 다루기도 어려운 것 같습니다. 개인적인 추척으론 그냥 openai를 쓰는게 좋은 것 같아요.

뭐... 아니면 라마나 GPT 등 여러가지 모델을 같이 써야할 경우엔 써야 될수도...?

참고 사항

모든 예제는 GPT 3.5기준입니다. 그리고 이 글은 디스콰이엇에도 게시되고 있습니다.

dasd412

시스템 아키텍쳐 설계에 관심이 많은 백엔드 개발자입니다. (Go/Python/MSA/graphql/Spring)

이전 포스트

1. CoT와 FewShotPrompt 조합하기

1개의 댓글

fariha naz

3일 전

This is such a great resource that you are providing and you give it away for free. I love seeing blog that understand the value of providing a quality resource for free.slot online

답글 달기