Conversational Memory is a very important topic when building Chatbots that are robust and interactive. In this article, we will learn about how to implement Conversational Memory in LangChain and Types of Memories.
What is Conversational Memory?
Conversational memory refers to an AI agent’s ability to recall and use information from previous encounters during current talks. It is critical for establishing the coherence and context of dialogues.
Conversational agents widely utilize short-term memory, also known as episodic memory. This memory saves information from previous exchanges and allows the agent to return to earlier points in the dialogue. It assists the agent in maintaining a consistent comprehension of the discussion and avoiding repeating or inconsistent answers.
Conversational bots utilize long-term or semantic memory as another form of memory. This memory entails storing and retrieving information and facts that go beyond the scope of a particular discourse. The agent enhances the dialogue by recalling user preferences, personal information, and general knowledge.
Long-term memory, which can be structured as a knowledge graph or unstructured like a collection of documents or a neural network-based representation, enables the agent to store and retrieve this information.
Conversational memory also includes the capacity to resolve coreferences. In a discussion, this refers to the task of finding the referent of a pronoun or noun phrase. To correctly resolve coreferences and offer meaningful replies, a good conversational agent must track and remember the entities and concepts stated during the interaction.
Practical Implementation with LangChain
We will utilize the LangChain library in conjunction with the OpenAI and TicToken libraries. These libraries will enable us to implement and experiment with different conversational memory techniques.
Let’s start with the basic imports:
import inspect
from getpass import getpass
from langchain import OpenAI
from langchain.chains import LLMChain, ConversationChain
from langchain.chains.conversation.memory import (ConversationBufferMemory,
ConversationSummaryMemory,
ConversationBufferWindowMemory,
ConversationKGMemory)
from langchain.callbacks import get_openai_callback
import tiktoken
llm = OpenAI(
openai_api_key="Your Open Ai Key",
model_name="text-davinci-003",
temperature=0
)
Then as always, we need a function to manage and track our token usage just like we learnt in the article about Chains in LangChain.
def count_tokens(chain,query):
with get_openai_callback() as cb:
result = chain.run(query)
print(f"spent a total of {cb.total_tokens} tokens\n")
return result
First we define an LLM agent:
conversation = ConversationChain(
llm=llm,
)
So how does this agent function? To know this let’s get its prompt:
print(conversation.prompt.template)
Output:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.
Current conversation:
{history}
Human: {input}
AI:
Interesting!
The prompt instructs the chain to engage in conversation with the user and make genuine attempts to provide truthful responses. Upon closer examination, we notice a new element in the prompt that was not present when interacting with the LLMBashChain: history. This is where our memory becomes useful.
If you want to build Chatbots for your website, we have created a tool called ChatClient to build AI Conversational Chatbots like ChatGPT.
Role of Memory in LLMs
Large language models, such as ChatGPT and others, are stateless by default. The models consider each incoming inquiry as a separate entity and do not retain any memory of past encounters. They solely focus on the current input and lack the context from previous exchanges.
However, we can enhance their memory capabilities by implementing conversational memory strategies, which we will delve into in more detail in this article.
So they change the discussion in a certain way and save it as a placeholder in history.
Types of Memories
Conversational Buffer Memory
The conversational buffer memory functions by preserving the complete text of previous conversations in its original, unprocessed format. It is which is the most fundamental form of conversational memory.
This implies that without any preprocessing or summarization, all inputs and outputs from past interactions are retained and included in the model’s history parameter. The conversational buffer memory technique guarantees that all information exchanged during the conversation is saved and available for the model to utilize.
The capacity of conversational buffer memory to record the entire context of the dialogue is its key benefit. By adding all previous interactions, the model may get a thorough picture of the existing conversation and develop replies that take the whole conversation history into consideration. Because the model has access to all relevant information, it can provide more coherent and contextually appropriate replies.
However, there are a number of drawbacks to employing conversational buffer memory. The possible influence on reaction times is one constraint. The model’s processing time increases as the conversation history becomes longer, resulting in decreased response rates. This can be an issue in real-time conversational applications when latency is critical.
Additionally, retaining the entire chat history requires additional processing resources and storage space, leading to increased costs. Conversational buffer memory models require more memory and processing capacity to accommodate larger input sizes, which can be a significant constraint in contexts with limited resources. Furthermore, there are token limitations to consider.
Most language models have a maximum token limit for input sequences, which can be exceeded when including the entire conversation history. As a result, certain portions of the dialogue may need to be truncated or omitted, potentially resulting in the loss of crucial context or valuable information.
Let’s observe this in practice.
conversation_buf = ConversationChain(
llm=llm,
memory=ConversationBufferMemory()
)
print(conversation_buf("Welcome to LancerNinja!"))
Output:
{'input': 'Welcome to LancerNinja!', 'history': '', 'response': " Hi there! I'm LancerNinja, your friendly AI assistant. How can I help you today?"}
This one call used 90 tokens, although we can’t tell from the above. If we want to count the number of tokens used, we simply send our conversation chain object and the message to the count_tokens method we defined earlier. To see how conversational memory works let’s talk with the agent a bit:
count_tokens(
conversation_buf,
"My interest here is to Learn all about vector databases"
)
Output:
spent a total of 170 tokens
' Sure thing! Vector databases are a type of database that stores data in the form of vectors. They are used to store and query large amounts of data quickly and efficiently. Vector databases are often used in machine learning applications, as they can quickly process large amounts of data. Do you have any other questions about vector databases?'
count_tokens(
conversation_buf,
"How are they different from traditional databases?"
)
count_tokens(
conversation_buf,
"what are we talking about again?"
)
Output:
spent a total of 246 tokens
' Vector databases are different from traditional databases in that they store data in the form of vectors, rather than in the form of tables. Vector databases are also optimized for fast data processing, which makes them ideal for machine learning applications. Traditional databases, on the other hand, are optimized for data storage and retrieval.'
spent a total of 306 tokens
' We are talking about vector databases, which are a type of database that stores data in the form of vectors. They are used to store and query large amounts of data quickly and efficiently, and are often used in machine learning applications.'
Now let’s see the memory buffer:
print(conversation_buf.memory.buffer)
Human: Welcome to LancerNinja!
AI: Hi there! I'm LancerNinja, your friendly AI assistant. How can I help you today?
Human: My interest here is to Learn all about vector databases
AI: Sure thing! Vector databases are a type of database that stores data in the form of vectors. They are used to store and query large amounts of data quickly and efficiently. Vector databases are often used in machine learning applications, as they can quickly process large amounts of data. Do you have any other questions about vector databases?
Human: How are they different from traditional databases?
AI: Vector databases are different from traditional databases in that they store data in the form of vectors, rather than in the form of tables. Vector databases are also optimized for fast data processing, which makes them ideal for machine learning applications. Traditional databases, on the other hand, are optimized for data storage and retrieval.
Human: what are we talking about again?
AI: We are talking about vector databases, which are a type of database that stores data in the form of vectors. They are used to store and query large amounts of data quickly and efficiently, and are often used in machine learning applications.
As we see this memory just stores the conversations as they happened in the context without any modification or anything
Now let’s move to another type of memory!
Conversation Summary Memory
The conversation summary memory approach provides a potential option for overcoming the constraints of conversational buffer memory. Rather than keeping the complete conversation history in its raw form, this method summarises previous interactions before giving them to the model’s history parameter.
The conversation summary memory strategy seeks to prevent excessive token consumption, which can result in longer response times and possible conversation termination. The model may maintain the key information while greatly lowering the number of tokens necessary by summarising the dialogue.
Typically, the summary is prepared by summarising the essential points, critical background, and pertinent facts from the conversation history. This simplified representation captures the most important components of the dialogue, allowing the model to provide relevant replies.
As new interactions occur, the summary is gradually updated to include the most recent information. This guarantees that the model gets access to the most recent context and may adjust its answers as needed.
It is, therefore, critical to strike a balance between summary length and token efficiency. While shorter summaries use fewer tokens, they may miss the important context, resulting in less accurate replies. Longer summaries, on the other hand, may approach or surpass the token limit, necessitating truncation and risking the loss of critical information.
Let’s understand it with code:
conversation_sum = ConversationChain(
llm=llm,
memory=ConversationSummaryMemory(llm=llm)
)
We use the same LLM instance to generate a summary of the conversation. if we see its prompt:
print(conversation_sum.memory.prompt.template)
Output:
Progressively summarize the lines of conversation provided, adding onto the previous summary returning a new summary.
EXAMPLE
Current summary:
The human asks what the AI thinks of artificial intelligence. The AI thinks artificial intelligence is a force for good.
New lines of conversation:
Human: Why do you think artificial intelligence is a force for good?
AI: Because artificial intelligence will help humans reach their full potential.
New summary:
The human asks what the AI thinks of artificial intelligence. The AI thinks artificial intelligence is a force for good because it will help humans reach their full potential.
END OF EXAMPLE
Current summary:
{summary}
New lines of conversation:
{new_lines}
New summary:
So each new interaction is summarized and appended to a running summary as the memory of our chain. Let’s see how this works in practice:
count_tokens(conversation_sum,"Welcome to LancerNinja!")
spent a total of 293 tokens
" Hi there! I'm LancerNinja, your friendly AI assistant. How can I help you today?"
count_tokens(
conversation_sum,
"My interest here is to Learn all about vector databases"
)
spent a total of 531 tokens
" Hi there! I'm LancerNinja, your AI assistant. I'm here to help you learn all about vector databases. Vector databases are a type of database that stores data in the form of vectors. They are used to store and query large amounts of data quickly and efficiently. Vector databases are often used in machine learning applications, as they can quickly process large amounts of data. Do you have any other questions about vector databases?"
count_tokens(
conversation_sum,
"How are they different from traditional databases?"
)
spent a total of 717 tokens
' Vector databases are different from traditional databases in that they store data in the form of vectors, which are mathematical objects that can represent a variety of data types. Vector databases are also optimized for fast retrieval and storage of large amounts of data, making them ideal for machine learning applications. Traditional databases, on the other hand, are optimized for more general-purpose data storage and retrieval.'
count_tokens(
conversation_sum,
"what are we talking about again?"
)
spent a total of 791 tokens
' We are talking about vector databases, which are a type of database that stores data in the form of vectors. Vector databases are often used in machine learning applications and are optimized for fast retrieval and storage of large amounts of data.'
Now let’s see the memory
print(conversation_sum.memory.buffer)
The human welcomed the AI assistant, LancerNinja, and asked how it could help. LancerNinja explained that vector databases are a type of database that stores data in the form of vectors, which are used to store and query large amounts of data quickly and efficiently. Vector databases are often used in machine learning applications and are optimized for fast retrieval and storage of large amounts of data, making them different from traditional databases which are optimized for more general-purpose data storage and retrieval. The AI asked if the human had any other questions about vector databases, to which the human asked what they were talking about again.
You may be questioning why we should utilize this form of memory since the aggregate token count is higher in each call here than in the buffer example. If we look at the buffer, we’ll notice that, although utilizing more tokens in each occurrence of our chat, our overall history is shorter. This allows us to have many more encounters before reaching the maximum duration of our prompt, making our chatbot more resilient to lengthier talks.
ConversationBufferWindowMemory
Another memory type that may be used to improve chatbot interactions is the ConversationBufferWindowMemory. It keeps a few of the most recent exchanges in the conversation memory but deletes the oldest ones on purpose, behaving as short-term memory. This memory type saves the most recent bits of communication in their raw form.
We may use the following code to show this memory type:
conversation_bufw = ConversationChain(
llm=llm,
memory=ConversationBufferWindowMemory(k=1)
)
count_tokens(
conversation_bufw,
"Welcome to LancerNinja!"
)
spent a total of 90 tokens
" Hi there! I'm LancerNinja, your friendly AI assistant. How can I help you today?"
count_tokens(
conversation_bufw,
"My interest here is to Learn all about vector databases"
)
spent a total of 170 tokens
' Sure thing! Vector databases are a type of database that stores data in the form of vectors. They are used to store and query large amounts of data quickly and efficiently. Vector databases are often used in machine learning applications, as they can quickly process large amounts of data. Do you have any other questions about vector databases?'
count_tokens(
conversation_bufw,
"How are they different from traditional databases?"
)
spent a total of 224 tokens
' Vector databases are different from traditional databases in that they store data in the form of vectors, rather than in the form of tables. Vector databases are also optimized for fast data processing, which makes them ideal for machine learning applications. Traditional databases are optimized for data storage and retrieval, which makes them better suited for applications that require large amounts of data to be stored and retrieved quickly.'
count_tokens(
conversation_bufw,
"what are matrix databases?"
)
spent a total of 219 tokens
' Matrix databases are a type of vector database that stores data in the form of matrices. Matrix databases are optimized for fast data processing, making them ideal for machine learning applications. They are also capable of handling large amounts of data, making them suitable for applications that require large amounts of data to be stored and retrieved quickly.'
Now let’s try to get back the original context from the chatbot
count_tokens(
conversation_bufw,
"what are we talking about again?"
)
spent a total of 213 tokens
' We are talking about matrix databases, which are a type of vector database that stores data in the form of matrices. Matrix databases are optimized for fast data processing, making them ideal for machine learning applications. They are also capable of handling large amounts of data, making them suitable for applications that require large amounts of data to be stored and retrieved quickly.'
and as we see it has forgotten about vector databases as we set the value of k to 1, let’s see its history
bufw_history = conversation_bufw.memory.load_memory_variables(
inputs=[]
)['history']
PRINT(BUFW_HISTORY)
Human: what are we talking about again?
AI: We are talking about matrix databases, which are a type of vector database that stores data in the form of matrices. Matrix databases are optimized for fast data processing, making them ideal for machine learning applications. They are also capable of handling large amounts of data, making them suitable for applications that require large amounts of data to be stored and retrieved quickly.
and as expected it only contains the most recent conversation.
You can read about them further on Memories!
Takeways
This covers the three major types of memories there are even more such as: ConversationSummaryBufferMemory, ConversationKnowledgeGraphMemory or ConversationEntityMemory. Overall, conversational memory allows a LLMs to remember previous interactions and behave in a chat-like manner.