AI in Java: Building a ChatGPT Clone With Spring Boot and LangChain
Learn to build a ChatGPT clone with Spring Boot, LangChain, and Hilla in Java. Cover both synchronous chat completions and advanced streaming completion.
Join the DZone community and get the full member experience.
Join For FreeMany libraries for AI app development are primarily written in Python or JavaScript. The good news is that several of these libraries have Java APIs as well. In this tutorial, I'll show you how to build a ChatGPT clone using Spring Boot, LangChain, and Hilla.
The tutorial will cover simple synchronous chat completions and a more advanced streaming completion for a better user experience.
Completed Source Code
You can find the source code for the example in my GitHub repository.
Requirements
- Java 17+
- Node 18+
- An OpenAI API key in an
OPENAI_API_KEY
environment variable
Create a Spring Boot and React project, Add LangChain
First, create a new Hilla project using the Hilla CLI. This will create a Spring Boot project with a React frontend.
npx @hilla/cli init ai-assistant
Open the generated project in your IDE. Then, add the LangChain4j dependency to the pom.xml
file:
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j</artifactId>
<version>0.22.0</version> <!-- TODO: use latest version -->
</dependency>
Simple OpenAI Chat Completions With Memory Using LangChain
We'll begin exploring LangChain4j with a simple synchronous chat completion. In this case, we want to call the OpenAI chat completion API and get a single response. We also want to keep track of up to 1,000 tokens of the chat history.
In the com.example.application.service
package, create a ChatService.java
class with the following content:
@BrowserCallable
@AnonymousAllowed
public class ChatService {
@Value("${openai.api.key}")
private String OPENAI_API_KEY;
private Assistant assistant;
interface Assistant {
String chat(String message);
}
@PostConstruct
public void init() {
var memory = TokenWindowChatMemory.withMaxTokens(1000, new OpenAiTokenizer("gpt-3.5-turbo"));
assistant = AiServices.builder(Assistant.class)
.chatLanguageModel(OpenAiChatModel.withApiKey(OPENAI_API_KEY))
.chatMemory(memory)
.build();
}
public String chat(String message) {
return assistant.chat(message);
}
}
@BrowserCallable
makes the class available to the front end.@AnonymousAllowed
allows anonymous users to call the methods.@Value
injects the OpenAI API key from theOPENAI_API_KEY
environment variable.Assistant
is the interface that we will use to call the chat API.init()
initializes the assistant with a 1,000-token memory and thegpt-3.5-turbo
model.chat()
is the method that we will call from the front end.
Start the application by running Application.java
in your IDE, or with the default Maven goal:
mvn
This will generate TypeScript types and service methods for the front end.
Next, open App.tsx
in the frontend
folder and update it with the following content:
export default function App() {
const [messages, setMessages] = useState<MessageListItem[]>([]);
async function sendMessage(message: string) {
setMessages((messages) => [
...messages,
{
text: message,
userName: "You",
},
]);
const response = await ChatService.chat(message);
setMessages((messages) => [
...messages,
{
text: response,
userName: "Assistant",
},
]);
}
return (
<div className="p-m flex flex-col h-full box-border">
<MessageList items={messages} className="flex-grow" />
<MessageInput onSubmit={(e) => sendMessage(e.detail.value)} />
</div>
);
}
- We use the
MessageList
andMessageInput
components from the Hilla UI component library. sendMessage()
adds the message to the list of messages, and calls thechat()
method on theChatService
class. When the response is received, it is added to the list of messages.
You now have a working chat application that uses the OpenAI chat API and keeps track of the chat history. It works great for short messages, but it is slow for long answers. To improve the user experience, we can use a streaming completion instead, displaying the response as it is received.
Streaming OpenAI Chat Completions With Memory Using LangChain
Let's update the ChatService
class to use a streaming completion instead:
@BrowserCallable
@AnonymousAllowed
public class ChatService {
@Value("${openai.api.key}")
private String OPENAI_API_KEY;
private Assistant assistant;
interface Assistant {
TokenStream chat(String message);
}
@PostConstruct
public void init() {
var memory = TokenWindowChatMemory.withMaxTokens(1000, new OpenAiTokenizer("gpt-3.5-turbo"));
assistant = AiServices.builder(Assistant.class)
.streamingChatLanguageModel(OpenAiStreamingChatModel.withApiKey(OPENAI_API_KEY))
.chatMemory(memory)
.build();
}
public Flux<String> chatStream(String message) {
Sinks.Many<String> sink = Sinks.many().unicast().onBackpressureBuffer();
assistant.chat(message)
.onNext(sink::tryEmitNext)
.onComplete(sink::tryEmitComplete)
.onError(sink::tryEmitError)
.start();
return sink.asFlux();
}
}
The code is mostly the same as before, with some important differences:
Assistant
now returns aTokenStream
instead of aString
.init()
usesstreamingChatLanguageModel()
instead ofchatLanguageModel()
.chatStream()
returns aFlux<String>
instead of aString
.
Update App.tsx
with the following content:
export default function App() {
const [messages, setMessages] = useState<MessageListItem[]>([]);
function addMessage(message: MessageListItem) {
setMessages((messages) => [...messages, message]);
}
function appendToLastMessage(chunk: string) {
setMessages((messages) => {
const lastMessage = messages[messages.length - 1];
lastMessage.text += chunk;
return [...messages.slice(0, -1), lastMessage];
});
}
async function sendMessage(message: string) {
addMessage({
text: message,
userName: "You",
});
let first = true;
ChatService.chatStream(message).onNext((chunk) => {
if (first && chunk) {
addMessage({
text: chunk,
userName: "Assistant",
});
first = false;
} else {
appendToLastMessage(chunk);
}
});
}
return (
<div className="p-m flex flex-col h-full box-border">
<MessageList items={messages} className="flex-grow" />
<MessageInput onSubmit={(e) => sendMessage(e.detail.value)} />
</div>
);
}
The template is the same as before, but the way we handle the response is different. Instead of waiting for the response to be received, we start listening for chunks of the response. When the first chunk is received, we add it as a new message. When subsequent chunks are received, we append them to the last message.
Re-run the application, and you should see that the response is displayed as it is received.
Conclusion
As you can see, LangChain makes it easy to build LLM-powered AI applications in Java and Spring Boot.
With the basic setup in place, you can extend the functionality by chaining operations, adding external tools, and more following the examples on the LangChain4j GitHub page, linked earlier in this article. Learn more about Hilla in the Hilla documentation.
Opinions expressed by DZone contributors are their own.
Comments