Chapter 4. Conversational Memory

2023. 11. 15. 00:33 | Posted by 솔웅

https://www.pinecone.io/learn/series/langchain/langchain-conversational-memory/

Conversational Memory for LLMs with Langchain | Pinecone

Conversational memory is how a chatbot can respond to multiple queries in a chat-like manner. It enables a coherent conversation, and without it, every query would be treated as an entirely independent input without considering past interactions. The LLM w

www.pinecone.io

Conversational Memory for LLMs with Langchain

대화형 메모리는 챗봇이 채팅과 같은 방식으로 여러 쿼리에 응답할 수 있는 방법입니다. 이는 일관된 대화를 가능하게 하며, 이것이 없으면 모든 쿼리는 과거 상호 작용을 고려하지 않고 완전히 독립적인 입력으로 처리됩니다.

The LLM with and without conversational memory. The blue boxes are user prompts and in grey are the LLMs responses. Without conversational memory (right), the LLM cannot respond using knowledge of previous interactions.  대화형 메모리가 있거나 없는 LLM. 파란색 상자는 사용자 프롬프트이고 회색 상자는 LLM 응답입니다. 대화형 메모리(오른쪽)가 없으면 LLM은 이전 상호 작용에 대한 지식을 사용하여 응답할 수 없습니다.

The memory allows a Large Language Model (LLM) to remember previous interactions with the user. By default, LLMs are stateless — meaning each incoming query is processed independently of other interactions. The only thing that exists for a stateless agent is the current input, nothing else.

메모리를 사용하면 LLM(대형 언어 모델)이 사용자와의 이전 상호 작용을 기억할 수 있습니다. 기본적으로 LLM은 상태 비저장입니다. 즉, 들어오는 각 쿼리는 다른 상호 작용과 독립적으로 처리됩니다. 상태 비저장 에이전트에 존재하는 유일한 것은 현재 입력이며 다른 것은 없습니다.

There are many applications where remembering previous interactions is very important, such as chatbots. Conversational memory allows us to do that.

챗봇과 같이 이전 상호 작용을 기억하는 것이 매우 중요한 애플리케이션이 많이 있습니다. 대화 기억을 통해 우리는 그렇게 할 수 있습니다.

There are several ways that we can implement conversational memory. In the context of [LangChain](/learn/langchain-intro/, they are all built on top of the ConversationChain.

대화형 메모리를 구현하는 방법에는 여러 가지가 있습니다. [LangChain](/learn/langchain-intro/의 맥락에서 이들은 모두 ConversationChain 위에 구축되었습니다.

https://youtu.be/X05uK0TZozM?si=fvoIMy8W8ZtPueGO

ConversationChain

We can start by initializing the ConversationChain. We will use OpenAI’s text-davinci-003 as the LLM, but other models like gpt-3.5-turbo can be used.

ConversationChain을 초기화하는 것부터 시작할 수 있습니다. OpenAI의 text-davinci-003을 LLM으로 사용하지만 gpt-3.5-turbo와 같은 다른 모델도 사용할 수 있습니다.

from langchain import OpenAI
from langchain.chains import ConversationChain

# first initialize the large language model
llm = OpenAI(
	temperature=0,
	openai_api_key="OPENAI_API_KEY",
	model_name="text-davinci-003"
)

# now initialize the conversation chain
conversation = ConversationChain(llm=llm)

이 코드는 langchain 패키지에서 OpenAI 및 ConversationChain 클래스를 사용하여 대화 체인을 초기화하는 예제입니다. 아래는 코드의 각 부분에 대한 설명입니다:

from langchain import OpenAI: langchain 패키지에서 OpenAI 클래스를 가져옵니다. 이 클래스는 OpenAI의 언어 모델을 사용하기 위한 인터페이스를 제공합니다.
from langchain.chains import ConversationChain: langchain 패키지에서 ConversationChain 클래스를 가져옵니다. 이 클래스는 대화 체인을 구현하는 데 사용됩니다.
llm = OpenAI(...):
- OpenAI 클래스의 인스턴스를 생성합니다.
- temperature=0: 온도(temperature)를 0으로 설정합니다. 온도는 모델의 출력에 대한 불확실성을 조절하는 매개변수로, 0으로 설정하면 출력이 더 결정적이고 확실해집니다.
- openai_api_key="OPENAI_API_KEY": OpenAI API에 액세스하기 위한 API 키를 지정합니다. 여기서는 실제 API 키를 넣어야 합니다.
- model_name="text-davinci-003": 사용할 OpenAI 모델의 이름을 지정합니다. 여기서는 "text-davinci-003" 모델을 사용합니다.
conversation = ConversationChain(llm=llm): ConversationChain 클래스의 인스턴스를 생성합니다.
- llm=llm: 사용할 언어 모델을 나타내는 OpenAI 클래스의 인스턴스를 전달합니다.

이렇게 초기화된 llm과 conversation 객체를 사용하면 OpenAI 언어 모델을 활용한 대화형 작업을 수행할 수 있습니다.

로컬에서는 api 키를 외부 파일에서 읽어오고 모델은 gpt-3.5-turbo-instruct를 사용했습니다.

We can see the prompt template used by the ConversationChain like so:

다음과 같이 ConversationChain에서 사용하는 프롬프트 템플릿을 볼 수 있습니다.

print(conversation.prompt.template)

The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
{history}
Human: {input}
AI:

ConversationChain()에서 사용하는 default prompt를 볼 수 있습니다.

Here, the prompt primes the model by telling it that the following is a conversation between a human (us) and an AI (text-davinci-003). The prompt attempts to reduce hallucinations (where a model makes things up) by stating:

여기서 프롬프트는 인간(우리)과 AI(text-davinci-003) 사이의 대화임을 알려 모델을 준비시킵니다. 프롬프트에서는 다음과 같이 말하여 환각 hallucinations (모델이 꾸며낸 현상)을 줄이려고 시도합니다.

"If the AI does not know the answer to a question, it truthfully says it does not know."

This can help but does not solve the problem of hallucinations — but we will save this for the topic of a future chapter.

이것은 도움이 될 수 있지만 환각 문제를 해결하지는 않습니다. 그러나 우리는 이것을 다음 장의 주제로 남겨두겠습니다.

Following the initial prompt, we see two parameters; {history} and {input}. The {input} is where we’d place the latest human query; it is the input entered into a chatbot text box:

초기 프롬프트에 이어 두 개의 매개변수가 표시됩니다. {history} 및 {input} . {input}은 최신 human query 를 배치하는 곳입니다. 그것은 챗봇 텍스트 상자에 입력된 내용입니다.

The {history} is where conversational memory is used. Here, we feed in information about the conversation history between the human and AI.

{history}는 대화형 메모리가 사용되는 곳입니다. 여기에서는 인간과 AI 간의 대화 이력에 대한 정보를 제공합니다.

These two parameters — {history} and {input} — are passed to the LLM within the prompt template we just saw, and the output that we (hopefully) return is simply the predicted continuation of the conversation.

이 두 매개변수({history} 및 {input})는 방금 본 프롬프트 템플릿 내에서 LLM으로 전달되며, (희망적으로) 반환되는 출력은 단순히 예측된 대화의 연속입니다.

Forms of Conversational Memory

We can use several types of conversational memory with the ConversationChain. They modify the text passed to the {history} parameter.

ConversationChain을 통해 여러 유형의 대화 메모리를 사용할 수 있습니다. {history} 매개변수에 전달된 텍스트를 수정합니다.

ConversationBufferMemory

(Follow along with our Jupyter notebooks)

https://colab.research.google.com/github/pinecone-io/examples/blob/master/learn/generation/langchain/handbook/03-langchain-conversational-memory.ipynb#scrollTo=uZR3iGJJtdDE

==> 이곳으로 가면 CoLab에서 직접 실행할 수 있습니다.

The ConversationBufferMemory is the most straightforward conversational memory in LangChain. As we described above, the raw input of the past conversation between the human and AI is passed — in its raw form — to the {history} parameter.

ConversationBufferMemory는 LangChain에서 가장 간단한 대화 메모리입니다. 위에서 설명한 대로 인간과 AI 간의 과거 대화의 원시 입력은 원시 형식으로 {history} 매개변수에 전달됩니다.

from langchain.chains.conversation.memory import ConversationBufferMemory

conversation_buf = ConversationChain(
    llm=llm,
    memory=ConversationBufferMemory()
)

conversation_buf("Good morning AI!")

{'input': 'Good morning AI!',
 'history': '',
 'response': " Good morning! It's a beautiful day today, isn't it? How can I help you?"}

이 코드는 langchain 패키지에서 ConversationBufferMemory 클래스를 사용하여 대화 기록을 저장하면서 대화 체인을 초기화하고, 초기화된 대화 체인을 사용하여 "Good morning AI!"라는 문장을 처리하는 예제입니다. 아래는 코드의 각 부분에 대한 설명입니다:

from langchain.chains.conversation.memory import ConversationBufferMemory: langchain 패키지에서 ConversationBufferMemory 클래스를 가져옵니다. 이 클래스는 대화 기록을 저장하는 메모리를 구현한 것으로, 대화 체인에서 사용됩니다.
conversation_buf = ConversationChain(llm=llm, memory=ConversationBufferMemory()): ConversationChain 클래스의 인스턴스를 생성합니다.
- llm=llm: 사용할 언어 모델을 나타내는 OpenAI 클래스의 인스턴스를 전달합니다.
- memory=ConversationBufferMemory(): 대화 기록을 저장할 메모리로 ConversationBufferMemory 클래스의 인스턴스를 전달합니다. 이를 통해 대화 중에 이전 대화 내용을 기억하고 저장할 수 있습니다.
conversation_buf("Good morning AI!"): 생성된 conversation_buf 객체에 대화 문장을 전달하여 처리합니다. 여기서는 "Good morning AI!"라는 문장을 전달했습니다. 이 문장은 대화 기록에 추가되고, 이전 대화 기록을 활용하여 언어 모델이 적절한 응답을 생성하는 데 활용될 수 있습니다.

이렇게 초기화된 conversation_buf 객체를 사용하면 대화 과정에서 이전 대화 내용을 유지하면서 언어 모델을 활용할 수 있습니다.

로컬 실행 결과는 아래와 같습니다.

여기서 memory 부분을 없애고 실행 해 보겠습니다.

결과는 같습니다. momory를 없애면 default 로 사용하는 것이 있나 봅니다.

일단 교재에 나온대로 memory를 사용해서 이후 코드를 실행해 보겠습니다.

We return the first response from the conversational agent. Let’s continue the conversation, writing prompts that the LLM can only answer if it considers the conversation history. We also add a count_tokens function so we can see how many tokens are being used by each interaction.

대화 에이전트의 첫 번째 응답을 반환합니다. 대화를 계속하면서 LLM이 대화 기록을 고려할 경우에만 답변할 수 있다는 프롬프트를 작성해 보겠습니다. 또한 각 상호 작용에서 사용되는 토큰 수를 확인할 수 있도록 count_tokens 함수를 추가합니다.

from langchain.callbacks import get_openai_callback

def count_tokens(chain, query):
    with get_openai_callback() as cb:
        result = chain.run(query)
        print(f'Spent a total of {cb.total_tokens} tokens')

    return result

count_tokens(
    conversation_buf, 
    "My interest here is to explore the potential of integrating Large Language Models with external knowledge"
)

Spent a total of 179 tokens

' Interesting! Large Language Models are a type of artificial intelligence that can process natural language and generate text. They can be used to generate text from a given context, or to answer questions about a given context. Integrating them with external knowledge can help them to better understand the context and generate more accurate results. Is there anything else I can help you with?'

이 코드는 langchain 패키지에서 get_openai_callback 함수를 사용하여 OpenAI API 콜백을 얻은 후, 이를 활용하여 대화 체인을 실행하면서 사용된 토큰의 수를 계산하고 출력하는 함수를 정의하고 호출하는 예제입니다. 아래는 코드의 각 부분에 대한 설명입니다:

from langchain.callbacks import get_openai_callback: langchain 패키지에서 get_openai_callback 함수를 가져옵니다. 이 함수는 OpenAI API 호출에 대한 콜백을 생성하는 함수로, 어떤 함수를 실행하는 동안 API 호출에 대한 정보를 추적할 수 있게 해줍니다.
def count_tokens(chain, query): ...: count_tokens라는 함수를 정의합니다. 이 함수는 언어 모델 체인을 받아들이고, 주어진 쿼리를 실행하면서 사용된 토큰의 수를 계산하고 출력합니다.
with get_openai_callback() as cb: ...: get_openai_callback 함수를 호출하여 OpenAI API 콜백을 가져오고, 이를 cb 변수에 할당합니다. with 문을 사용하여 콜백을 적용하면 해당 블록 내에서 API 호출에 대한 정보를 추적할 수 있습니다.
result = chain.run(query): 대화 체인(chain)을 실행하면서 주어진 쿼리를 처리합니다.
print(f'Spent a total of {cb.total_tokens} tokens'): API 호출 동안 사용된 총 토큰 수를 출력합니다. 이 정보는 cb.total_tokens를 통해 얻어옵니다.
return result: 실행 결과를 반환합니다.
count_tokens(conversation_buf, "..."): 앞서 정의한 count_tokens 함수를 호출하여 대화 체인(conversation_buf)을 사용하면서 특정 쿼리를 실행하고, 사용된 총 토큰 수를 출력합니다. 여기서는 "My interest here is to explore the potential of integrating Large Language Models with external knowledge"라는 문장을 사용했습니다.

이 코드는 대화 체인을 실행하면서 사용된 토큰의 수를 계산하고 출력하는 함수를 호출하는 간단한 예제입니다.

로컬에서 실행한 결과는 아래와 같습니다.

새로운 input을 넣습니다.

count_tokens(
    conversation_buf,
    "I just want to analyze the different possibilities. What can you think of?"
)

Spent a total of 268 tokens

' Well, integrating Large Language Models with external knowledge can open up a lot of possibilities. For example, you could use them to generate more accurate and detailed summaries of text, or to answer questions about a given context more accurately. You could also use them to generate more accurate translations, or to generate more accurate predictions about future events.'

Token 수가 점점 늘어나는 것을 볼 수 있습니다.

왜냐하면 history가 계속 쌓이기 때문입니다.

로컬에서 실행한 결과를 보겠습니다.

계속 대화를 이어 나가 봅니다.

count_tokens(
    conversation_buf, 
    "Which data source types could be used to give context to the model?"
)

Spent a total of 360 tokens

'  There are a variety of data sources that could be used to give context to a Large Language Model. These include structured data sources such as databases, unstructured data sources such as text documents, and even audio and video data sources. Additionally, you could use external knowledge sources such as Wikipedia or other online encyclopedias to provide additional context.'

count_tokens(
    conversation_buf, 
    "What is my aim again?"
)

Spent a total of 388 tokens

' Your aim is to explore the potential of integrating Large Language Models with external knowledge.'

로컬에서 돌린 결과 입니다.

LLM은 같은 질문을 하더라도 약간씩 다른 답변을 하기 때문에 결과는 약간 다릅니다. 또한 교재에서 사용한 모델과 제가 로컬에서 사용한 모델이 다르기 때문에 답변이 다르기도 합니다.

어쨌든 질문이 계속 될 수록 입력 토큰 수는 늘어나는 것을 볼 수 있습니다.

history에 이전 대화내용이 계속 쌓이기 때문입니다.

The LLM can clearly remember the history of the conversation. Let’s take a look at how this conversation history is stored by the ConversationBufferMemory:

LLM은 대화 내용을 명확하게 기억할 수 있습니다. 이 대화 기록이 ConversationBufferMemory에 의해 어떻게 저장되는지 살펴보겠습니다.

print(conversation_buf.memory.buffer)

Human: Good morning AI!
AI:  Good morning! It's a beautiful day today, isn't it? How can I help you?
Human: My interest here is to explore the potential of integrating Large Language Models with external knowledge
AI:  Interesting! Large Language Models are a type of artificial intelligence that can process natural language and generate text. They can be used to generate text from a given context, or to answer questions about a given context. Integrating them with external knowledge can help them to better understand the context and generate more accurate results. Is there anything else I can help you with?
Human: I just want to analyze the different possibilities. What can you think of?
AI:  Well, integrating Large Language Models with external knowledge can open up a lot of possibilities. For example, you could use them to generate more accurate and detailed summaries of text, or to answer questions about a given context more accurately. You could also use them to generate more accurate translations, or to generate more accurate predictions about future events.
Human: Which data source types could be used to give context to the model?
AI:   There are a variety of data sources that could be used to give context to a Large Language Model. These include structured data sources such as databases, unstructured data sources such as text documents, and even audio and video data sources. Additionally, you could use external knowledge sources such as Wikipedia or other online encyclopedias to provide additional context.
Human: What is my aim again?
AI:  Your aim is to explore the potential of integrating Large Language Models with external knowledge.

We can see that the buffer saves every interaction in the chat history directly. There are a few pros and cons to this approach. In short, they are:

버퍼가 채팅 기록의 모든 상호 작용을 직접 저장하는 것을 볼 수 있습니다. 이 접근 방식에는 몇 가지 장점과 단점이 있습니다. 간단히 말해서, 그들은 다음과 같습니다:

Pros Cons

Storing everything gives the LLM the maximum amount of information 모든 것을 저장하면 LLM에 최대한의 정보가 제공됩니다.	More tokens mean slowing response times and higher costs 토큰이 많을수록 응답 시간이 느려지고 비용이 높아집니다.
Storing everything is simple and intuitive 모든 것을 저장하는 것은 간단하고 직관적입니다.	Long conversations cannot be remembered as we hit the LLM token limit (4096 tokens for text-davinci-003 and gpt-3.5-turbo) LLM 토큰 제한(text-davinci-003 및 gpt-3.5-turbo의 경우 4096개 토큰)에 도달하여 긴 대화를 기억할 수 없습니다.

The ConversationBufferMemory is an excellent option to get started with but is limited by the storage of every interaction. Let’s take a look at other options that help remedy this.

ConversationBufferMemory는 시작하기에 탁월한 옵션이지만 모든 상호 작용의 저장 공간으로 인해 제한됩니다. 이 문제를 해결하는 데 도움이 되는 다른 옵션을 살펴보겠습니다.

ConversationSummaryMemory

Using ConversationBufferMemory, we very quickly use a lot of tokens and even exceed the context window limit of even the most advanced LLMs available today.

ConversationBufferMemory를 사용하면 많은 토큰을 매우 빠르게 사용할 수 있으며 심지어 오늘날 사용 가능한 가장 고급 LLM의 컨텍스트 창 제한도 초과합니다.

To avoid excessive token usage, we can use ConversationSummaryMemory. As the name would suggest, this form of memory summarizes the conversation history before it is passed to the {history} parameter.

과도한 토큰 사용을 방지하기 위해 ConversationSummaryMemory를 사용할 수 있습니다. 이름에서 알 수 있듯이 이 형태의 메모리는 {history} 매개변수에 전달되기 전에 대화 기록을 요약합니다.

We initialize the ConversationChain with the summary memory like so:

다음과 같이 요약 메모리를 사용하여 ConversationChain을 초기화합니다.

from langchain.chains.conversation.memory import ConversationSummaryMemory

conversation = ConversationChain(
	llm=llm,
	memory=ConversationSummaryMemory(llm=llm)
)

이 코드는 langchain 패키지에서 ConversationSummaryMemory 클래스를 사용하여 대화 기록의 요약 정보를 저장하면서 대화 체인을 초기화하는 예제입니다. 아래는 코드의 각 부분에 대한 설명입니다:

from langchain.chains.conversation.memory import ConversationSummaryMemory: langchain 패키지에서 ConversationSummaryMemory 클래스를 가져옵니다. 이 클래스는 대화 기록을 요약하고 저장하는 메모리를 구현한 것으로, 대화 체인에서 사용됩니다.
conversation = ConversationChain(llm=llm, memory=ConversationSummaryMemory(llm=llm)): ConversationChain 클래스의 인스턴스를 생성합니다.
- llm=llm: 사용할 언어 모델을 나타내는 OpenAI 클래스의 인스턴스를 전달합니다.
- memory=ConversationSummaryMemory(llm=llm): 대화 기록을 저장할 메모리로 ConversationSummaryMemory 클래스의 인스턴스를 전달합니다. 이때, 언어 모델(llm)도 함께 전달됩니다.

이렇게 초기화된 conversation 객체는 대화 체인을 실행하면서 대화 기록을 요약 정보와 함께 저장하는 데 사용될 수 있습니다. 이를 통해 대화 중에 이전 대화 내용을 요약하고 저장하여 활용할 수 있습니다.

print(conversation_sum.memory.prompt.template)

Progressively summarize the lines of conversation provided, adding onto the previous summary returning a new summary.

EXAMPLE
Current summary:
The human asks what the AI thinks of artificial intelligence. The AI thinks artificial intelligence is a force for good.

New lines of conversation:
Human: Why do you think artificial intelligence is a force for good?
AI: Because artificial intelligence will help humans reach their full potential.

New summary:
The human asks what the AI thinks of artificial intelligence. The AI thinks artificial intelligence is a force for good because it will help humans reach their full potential.
END OF EXAMPLE

Current summary:
{summary}

New lines of conversation:
{new_lines}

New summary:

ConversationSummaryMemory() 에서 제공하는 기본 prompt 입니다.

LLM에게 대화의 내용을 요약하라고 가이드를 주고 예제까지 보여 주는 프롬프트를 사용합니다.

로컬에서 돌린 결과 입니다. 2023년 11월 14일 현재 해당 프롬프트는 변화 없이 그대로 사용 중이네요.

Using this, we can summarize every new interaction and append it to a “running summary” of all past interactions. Let’s have another conversation utilizing this approach.

이를 사용하여 모든 새로운 상호 작용을 요약하고 모든 과거 상호 작용의 "실행 요약"에 추가할 수 있습니다. 이 접근 방식을 활용하여 또 다른 대화를 나누겠습니다.

# without count_tokens we'd call `conversation_sum("Good morning AI!")`
# but let's keep track of our tokens:
count_tokens(
    conversation_sum, 
    "Good morning AI!"
)

Spent a total of 290 tokens

" Good morning! It's a beautiful day today, isn't it? How can I help you?"

count_tokens(
    conversation_sum, 
    "My interest here is to explore the potential of integrating Large Language Models with external knowledge"
)

Spent a total of 440 tokens

" That sounds like an interesting project! I'm familiar with Large Language Models, but I'm not sure how they could be integrated with external knowledge. Could you tell me more about what you have in mind?"

count_tokens(
    conversation_sum, 
    "I just want to analyze the different possibilities. What can you think of?"
)

Spent a total of 664 tokens

' I can think of a few possibilities. One option is to use a large language model to generate a set of candidate answers to a given query, and then use external knowledge to filter out the most relevant answers. Another option is to use the large language model to generate a set of candidate answers, and then use external knowledge to score and rank the answers. Finally, you could use the large language model to generate a set of candidate answers, and then use external knowledge to refine the answers.'

count_tokens(
    conversation_sum, 
    "Which data source types could be used to give context to the model?"
)

Spent a total of 799 tokens

' There are many different types of data sources that could be used to give context to the model. These could include structured data sources such as databases, unstructured data sources such as text documents, or even external APIs that provide access to external knowledge. Additionally, the model could be trained on a combination of these data sources to provide a more comprehensive understanding of the context.'

count_tokens(
    conversation_sum, 
    "What is my aim again?"
)

Spent a total of 853 tokens

' Your aim is to explore the potential of integrating Large Language Models with external knowledge.'

로컬에서 실행 한 결과 입니다.

In this case the summary contains enough information for the LLM to “remember” our original aim. We can see this summary in it’s raw form like so:

이 경우 요약에는 LLM이 원래 목표를 " remember "할 수 있을 만큼 충분한 정보가 포함되어 있습니다. 이 요약을 다음과 같은 원시 형식으로 볼 수 있습니다.

print(conversation_sum.memory.buffer)

The human greeted the AI with a good morning, to which the AI responded with a good morning and asked how it could help. The human expressed interest in exploring the potential of integrating Large Language Models with external knowledge, to which the AI responded positively and asked for more information. The human asked the AI to think of different possibilities, and the AI suggested three options: using the large language model to generate a set of candidate answers and then using external knowledge to filter out the most relevant answers, score and rank the answers, or refine the answers. The human then asked which data source types could be used to give context to the model, to which the AI responded that there are many different types of data sources that could be used, such as structured data sources, unstructured data sources, or external APIs. Additionally, the model could be trained on a combination of these data sources to provide a more comprehensive understanding of the context. The human then asked what their aim was again, to which the AI responded that their aim was to explore the potential of integrating Large Language Models with external knowledge.

로컬 실행 결과

The number of tokens being used for this conversation is greater than when using the ConversationBufferMemory, so is there any advantage to using ConversationSummaryMemory over the buffer memory?

이 대화에 사용되는 토큰의 수가 ConversationBufferMemory를 사용할 때보다 많은데, 버퍼 메모리보다 ConversationSummaryMemory를 사용하면 어떤 이점이 있나요?

Token count (y-axis) for the buffer memory vs. summary memory as the number of interactions (x-axis) increases.  상호 작용 수(x축)가 증가함에 따라 버퍼 메모리와 요약 메모리의 토큰 수(y축)가 달라집니다.

For longer conversations, yes. Here, we have a longer conversation. As shown above, the summary memory initially uses far more tokens. However, as the conversation progresses, the summarization approach grows more slowly. In contrast, the buffer memory continues to grow linearly with the number of tokens in the chat.

더 긴 대화를 원하신다면 그렇습니다. 여기서 우리는 좀 더 긴 대화를 나누었습니다. 위에 표시된 것처럼 요약 메모리는 처음에 훨씬 더 많은 토큰을 사용합니다. 그러나 대화가 진행됨에 따라 요약 접근 방식은 더욱 느리게 성장합니다. 대조적으로, 버퍼 메모리는 채팅의 토큰 수에 따라 선형적으로 계속 증가합니다.

Pros Cons

Shortens the number of tokens for long conversations. 긴 대화를 위한 토큰 수를 줄입니다.	Can result in higher token usage for smaller conversations 소규모 대화에서는 토큰 사용량이 높아질 수 있습니다.
Enables much longer conversations 훨씬 더 긴 대화가 가능해집니다.	Memorization of the conversation history is wholly reliant on the summarization ability of the intermediate summarization LLM 대화 내용의 암기는 전적으로 중간 요약 LLM의 요약 능력에 달려 있습니다.
Relatively straightforward implementation, intuitively simple to understand 비교적 간단한 구현, 직관적으로 이해하기 쉽습니다.	Also requires token usage for the summarization LLM; this increases costs (but does not limit conversation length) 또한 요약 LLM을 위한 토큰 사용이 필요합니다. 이로 인해 비용이 증가합니다(그러나 대화 길이는 제한되지 않음).

Conversation summarization is a good approach for cases where long conversations are expected. Yet, it is still fundamentally limited by token limits. After a certain amount of time, we still exceed context window limits.

Conversation summarization 은 긴 대화가 예상되는 경우에 좋은 접근 방식입니다. 그러나 여전히 토큰 한도에 의해 근본적으로 제한됩니다. 일정 시간이 지난 후에도 여전히 컨텍스트 창 제한을 초과합니다.

ConversationBufferWindowMemory

The ConversationBufferWindowMemory acts in the same way as our earlier “buffer memory” but adds a window to the memory. Meaning that we only keep a given number of past interactions before “forgetting” them. We use it like so:

ConversationBufferWindowMemory는 이전의 "버퍼 메모리"와 동일한 방식으로 작동하지만 메모리에 창을 추가합니다. 이는 과거 상호 작용을 "잊기" 전에 주어진 수의 과거 상호 작용만 유지한다는 의미입니다. 우리는 그것을 다음과 같이 사용합니다:

from langchain.chains.conversation.memory import ConversationBufferWindowMemory

conversation = ConversationChain(
	llm=llm,
	memory=ConversationBufferWindowMemory(k=1)
)

이 코드는 langchain 패키지에서 ConversationBufferWindowMemory 클래스를 사용하여 대화 기록의 윈도우화된(최근 대화 기록만 유지하는) 메모리를 구현하면서 대화 체인을 초기화하는 예제입니다. 아래는 코드의 각 부분에 대한 설명입니다:

from langchain.chains.conversation.memory import ConversationBufferWindowMemory: langchain 패키지에서 ConversationBufferWindowMemory 클래스를 가져옵니다. 이 클래스는 대화 기록을 윈도우화하여 최근 대화 기록만을 유지하고 저장하는 메모리를 구현한 것으로, 대화 체인에서 사용됩니다.
conversation = ConversationChain(llm=llm, memory=ConversationBufferWindowMemory(k=1)): ConversationChain 클래스의 인스턴스를 생성합니다.
- llm=llm: 사용할 언어 모델을 나타내는 OpenAI 클래스의 인스턴스를 전달합니다.
- memory=ConversationBufferWindowMemory(k=1): 대화 기록을 저장할 메모리로 ConversationBufferWindowMemory 클래스의 인스턴스를 전달합니다. k=1은 윈도우의 크기를 나타내며, 여기서는 최근 대화 한 Window을 유지하도록 설정되어 있습니다.

이렇게 초기화된 conversation 객체는 대화 체인을 실행하면서 대화 기록을 윈도우화된 메모리에 저장하는 데 사용될 수 있습니다. 이를 통해 최근 대화 내용만을 유지하고 필요한 경우 활용할 수 있습니다.

In this instance, we set k=1 — this means the window will remember the single latest interaction between the human and AI. That is the latest human response and the latest AI response. We can see the effect of this below:

이 경우 k=1로 설정합니다. 이는 창이 인간과 AI 간의 최신 상호 작용을 기억한다는 의미입니다. 그것이 최신 인간의 대응이고 최신의 AI 대응이다. 아래에서 이에 대한 효과를 볼 수 있습니다.

count_tokens(
    conversation_bufw, 
    "Good morning AI!"
)

Spent a total of 85 tokens

" Good morning! It's a beautiful day today, isn't it? How can I help you?"

count_tokens(
    conversation_bufw, 
    "My interest here is to explore the potential of integrating Large Language Models with external knowledge"
)

Spent a total of 178 tokens

' Interesting! Large Language Models are a type of artificial intelligence that can process natural language and generate text. They can be used to generate text from a given context, or to answer questions about a given context. Integrating them with external knowledge can help them to better understand the context and generate more accurate results. Do you have any specific questions about this integration?'

count_tokens(
    conversation_bufw, 
    "I just want to analyze the different possibilities. What can you think of?"
)

Spent a total of 233 tokens

' There are many possibilities for integrating Large Language Models with external knowledge. For example, you could use external knowledge to provide additional context to the model, or to provide additional training data. You could also use external knowledge to help the model better understand the context of a given text, or to help it generate more accurate results.'

count_tokens(
    conversation_bufw, 
    "Which data source types could be used to give context to the model?"
)

Spent a total of 245 tokens

' Data sources that could be used to give context to the model include text corpora, structured databases, and ontologies. Text corpora provide a large amount of text data that can be used to train the model and provide additional context. Structured databases provide structured data that can be used to provide additional context to the model. Ontologies provide a structured representation of knowledge that can be used to provide additional context to the model.'

count_tokens(
    conversation_bufw, 
    "What is my aim again?"
)

Spent a total of 186 tokens

' Your aim is to use data sources to give context to the model.'

로컬에서 실행한 결과 입니다.

로컬에서는 aim을 모른다고 대답하네요. 교재에서는 최근 대화에서 유추해서 대답을 했구요.

By the end of the conversation, when we ask "What is my aim again?", the answer to this was contained in the human response three interactions ago. As we only kept the most recent interaction (k=1), the model had forgotten and could not give the correct answer.

대화가 끝날 무렵, “또 내 목표는 무엇입니까?”라고 물으면 이에 대한 대답은 세 번의 상호작용 전 인간의 반응에 담겨 있었습니다. 가장 최근의 상호작용(k=1)만 유지했기 때문에 모델이 잊어버리고 정답을 줄 수 없었습니다.

We can see the effective “memory” of the model like so:

다음과 같이 모델의 효과적인 "메모리"를 볼 수 있습니다.

bufw_history = conversation_bufw.memory.load_memory_variables(
    inputs=[]
)['history']

이 코드는 대화 체인의 ConversationBufferWindowMemory에서 현재까지의 대화 기록을 불러오는 예제입니다. 아래는 코드의 각 부분에 대한 설명입니다:

conversation_bufw.memory.load_memory_variables(inputs=[])['history']:
- conversation_bufw: 앞서 초기화한 ConversationChain 객체입니다. 이 객체는 대화 체인을 나타냅니다.
- memory: ConversationBufferWindowMemory에서 사용된 메모리 객체를 나타냅니다. 이 메모리에는 최근 대화 기록이 유지되고 있습니다.
- load_memory_variables(inputs=[]): 메모리에서 변수들을 불러오는 메서드를 호출합니다. 여기서 inputs=[]는 불러오기에 필요한 입력 변수를 지정하는데, 여기서는 입력이 필요하지 않아 빈 리스트로 설정되었습니다.
- ['history']: 불러오고자 하는 변수의 이름을 지정합니다. 여기서는 대화 기록을 나타내는 'history' 변수를 불러옵니다.

따라서, bufw_history는 현재까지의 대화 기록을 나타내는 변수로 할당됩니다. 이 변수를 통해 이전 대화 내용을 활용할 수 있습니다.

print(bufw_history)

Human: What is my aim again?
AI:  Your aim is to use data sources to give context to the model.

로컬 실행 결과

Although this method isn’t suitable for remembering distant interactions, it is good at limiting the number of tokens being used — a number that we can increase/decrease depending on our needs. For the longer conversation used in our earlier comparison, we can set k=6 and reach ~1.5K tokens per interaction after 27 total interactions:

이 방법은 멀리 떨어져 있는 상호 작용을 기억하는 데 적합하지 않지만 사용되는 토큰 수(필요에 따라 늘리거나 줄일 수 있는 숫자)를 제한하는 데는 좋습니다. 이전 비교에서 사용된 더 긴 대화의 경우 k=6으로 설정하고 총 27번의 상호 작용 후 상호 작용당 ~1.5K 토큰에 도달할 수 있습니다.

Token count including the ConversationBufferWindowMemory at k=6 and k=12. k=6 및 k=12에서 ConversationBufferWindowMemory를 포함한 토큰 수입니다.

If we only need memory of recent interactions, this is a great option. However, for a mix of both distant and recent interactions, there are other options.

최근 상호작용에 대한 기억만 필요하다면 이는 훌륭한 선택입니다. 그러나 원거리 상호작용과 최근 상호작용이 혼합된 경우에는 다른 옵션이 있습니다.

ConversationSummaryBufferMemory

The ConversationSummaryBufferMemory is a mix of the ConversationSummaryMemory and the ConversationBufferWindowMemory. It summarizes the earliest interactions in a conversation while maintaining the max_token_limit most recent tokens in their conversation. It is initialized like so:

ConversationSummaryBufferMemory는 ConversationSummaryMemory와 ConversationBufferWindowMemory가 혼합된 것입니다. 대화에서 가장 최근의 토큰인 max_token_limit를 유지하면서 대화의 가장 초기 상호 작용을 요약합니다. 다음과 같이 초기화됩니다.

conversation_sum_bufw = ConversationChain(
    llm=llm, memory=ConversationSummaryBufferMemory(
        llm=llm,
        max_token_limit=650
)

When applying this to our earlier conversation, we can set max_token_limit to a small number and yet the LLM can remember our earlier “aim”.

이를 이전 대화에 적용할 때 max_token_limit를 작은 숫자로 설정할 수 있지만 LLM은 이전 "목표"를 기억할 수 있습니다.

This is because that information is captured by the “summarization” component of the memory, despite being missed by the “buffer window” component.

이는 해당 정보가 "버퍼 창" 구성 요소에 의해 누락되었음에도 불구하고 메모리의 "요약" 구성 요소에 의해 캡처되기 때문입니다.

Naturally, the pros and cons of this component are a mix of the earlier components on which this is based.

당연히 이 구성 요소의 장단점은 이 구성 요소의 기반이 되는 이전 구성 요소가 혼합되어 있습니다.

Pros Cons

Summarizer means we can remember distant interactions Summarizer 는 멀리 떨어져 있는 상호 작용을 기억할 수 있음을 의미합니다.	Summarizer increases token count for shorter conversations Summarizer 는 더 짧은 대화를 위해 토큰 수를 늘립니다.
Buffer prevents us from missing information from the most recent interactions 버퍼는 가장 최근의 상호 작용에서 정보가 누락되는 것을 방지합니다.	Storing the raw interactions — even if just the most recent interactions — increases token count 가장 최근의 상호작용이라도 원시 상호작용을 저장하면 토큰 수가 늘어납니다.

Although requiring more tweaking on what to summarize and what to maintain within the buffer window, the ConversationSummaryBufferMemory does give us plenty of flexibility and is the only one of our memory types (so far) that allows us to remember distant interactions and store the most recent interactions in their raw — and most information-rich — form.

요약할 내용과 버퍼 창 내에서 유지 관리할 내용에 대해 더 많은 조정이 필요하지만 ConversationSummaryBufferMemory는 우리에게 많은 유연성을 제공하며 (지금까지) 먼 상호 작용을 기억하고 가장 최근의 내용을 저장할 수 있는 유일한 메모리 유형입니다. 정보가 가장 풍부하고 원시적인 형태의 상호 작용입니다.

Token count comparisons including the ConversationSummaryBufferMemory type with max_token_limit values of 650 and 1300.  max_token_limit 값이 650 및 1300인 ConversationSummaryBufferMemory 유형을 포함한 토큰 수 비교.

We can also see that despite including a summary of past interactions and the raw form of recent interactions — the increase in token count of ConversationSummaryBufferMemory is competitive with other methods.

또한 과거 상호작용의 요약과 최근 상호작용의 원시 형태를 포함함에도 불구하고 ConversationSummaryBufferMemory의 토큰 수 증가가 다른 방법과 경쟁적이라는 것을 알 수 있습니다.

Other Memory Types

The memory types we have covered here are great for getting started and give a good balance between remembering as much as possible and minimizing tokens.

여기에서 다룬 메모리 유형은 시작하기에 적합하며 가능한 한 많이 기억하는 것과 토큰을 최소화하는 것 사이에 적절한 균형을 제공합니다.

However, we have other options — particularly the ConversationKnowledgeGraphMemory and ConversationEntityMemory. We’ll give these different forms of memory the attention they deserve in upcoming chapters.

그러나 다른 옵션, 특히 ConversationKnowledgeGraphMemory 및 ConversationEntityMemory가 있습니다. 우리는 다음 장에서 이러한 다양한 형태의 기억에 마땅한 관심을 기울일 것입니다.

That’s it for this introduction to conversational memory for LLMs using LangChain. As we’ve seen, there are plenty of options for helping stateless LLMs interact as if they were in a stateful environment — able to consider and refer back to past interactions.

이것이 LangChain을 사용하는 LLM의 대화형 메모리에 대한 소개입니다. 앞서 살펴보았듯이, 상태 비저장 LLM이 마치 상태 저장 환경에 있는 것처럼 상호 작용하여 과거 상호 작용을 고려하고 다시 참조할 수 있도록 돕는 다양한 옵션이 있습니다.

As mentioned, there are other forms of memory we can cover. We can also implement our own memory modules, use multiple types of memory within the same chain, combine them with agents, and much more. All of which we will cover in future chapters.

언급한 바와 같이, 우리가 다룰 수 있는 다른 형태의 기억이 있습니다. 또한 자체 메모리 모듈을 구현하고, 동일한 체인 내에서 여러 유형의 메모리를 사용하고, 이를 에이전트와 결합하는 등의 작업을 수행할 수 있습니다. 이에 대한 모든 내용은 향후 장에서 다룰 것입니다.

저작자표시

'Pinecone > LangChain AI Handbook' 카테고리의 다른 글

Chapter 7. Custom Tools (0)	2023.11.17
Chapter 6. AI Agents (1)	2023.11.16
Chapter 5. Retrieval Augmentation (1)	2023.11.15
Chapter 3. Building Composable Pipelines with Chains (1)	2023.11.14
Chapter 2. Prompt Templates and the Art of Prompts (0)	2023.11.13
Chapter 1. An Introduction to LangChain (0)	2023.11.10
0. Pinecone - LangChain AI Handbook (0)	2023.11.08

IT 기술 따라잡기

공지사항

최근에 올라온 글

최근에 달린 댓글

최근에 받은 트랙백

글 보관함

카테고리

Chapter 4. Conversational Memory

Conversational Memory for LLMs with Langchain

ConversationChain

Forms of Conversational Memory

ConversationBufferMemory

ConversationSummaryMemory

ConversationBufferWindowMemory

ConversationSummaryBufferMemory

Other Memory Types

'Pinecone > LangChain AI Handbook' 카테고리의 다른 글

티스토리툴바