Power Up Java with AI: Integrate LLaMA 3 Using Spring Boot

posted Originally published at medium.com 2 min read

In today’s world of artificial intelligence, creating a personal AI assistant has become surprisingly accessible thanks to various LLM models. With Spring Boot and open-source AI models, we can build a conversational AI within Java applications. Let’s explore how this works.

In this example, we’ll install and use Ollama locally, and create an endpoint that accepts a user’s prompt and returns a response using AI.

Step 1: Install Ollama
To get started, install Ollama using this link.
Once the installation is complete, download one of the available models. For this example, we’ll use the LLaMA 3 model:

ollama run llama3 // or any other model

After the model is downloaded, you can also interact with it directly through the terminal:

>>> hello
Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?
>>> 

Step 2: Connect to Ollama from a Spring Boot Application
Now it’s time to connect Ollama with a Spring Boot application.
Let’s quickly generate a Spring Boot project using start.spring.io.
Here’s an example of the build.gradle file with the necessary dependencies:

plugins {
 id 'java'
 id 'org.springframework.boot' version '3.5.0'
 id 'io.spring.dependency-management' version '1.1.7'
}

group = 'com.example'
version = '0.0.1-SNAPSHOT'

java {
 toolchain {
  languageVersion = JavaLanguageVersion.of(21)
 }
}

repositories {
 mavenCentral()
}

ext {
 set('springAiVersion', "1.0.0")
}

dependencies {
 implementation 'org.springframework.ai:spring-ai-starter-model-ollama'
}

dependencyManagement {
 imports {
  mavenBom "org.springframework.ai:spring-ai-bom:${springAiVersion}"
 }
}

tasks.named('test') {
 useJUnitPlatform()
}

Step 3: Create a REST Endpoint
After setting up the dependencies, all we need is a simple REST endpoint that receives a user prompt, sends it to the LLaMA 3 model, and returns the response.
For this, we can use the ChatModel class.

import org.springframework.ai.chat.model.ChatModel;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.ai.ollama.api.OllamaModel;
import org.springframework.ai.ollama.api.OllamaOptions;

@RestController
@RequiredArgsConstructor
public class ChatController {
    private final ChatModel chatModel;

    @GetMapping("/message")
    public String handleMessage(@RequestParam String message) {
        var response = chatModel.call(
                new Prompt(
                        message,
                        OllamaOptions.builder()
                                .model(OllamaModel.LLAMA3)
                                .temperature(0.4)
                                .build()
                ));
        return response.getResult().getOutput().getText();
    }
}

Key Components:
ChatModel — Interface that calls the AI model to generate responses.
Prompt — Wraps the user’s message and AI configuration into a single request.
OllamaOptions — Configures model settings such as temperature and model selection.
OllamaModel.LLAMA3 — Specifies the LLaMA 3 model for processing the input.
temperature — Controls the creativity of the response (0 = strict, 1 = more random).

Its time to test:

Conclusion
Using Ollama locally with Spring AI makes it easy to build AI-powered APIs without relying on external cloud services. Without spending anything for the LLM model, an AI-powered chat endpoint is ready. We can do a little bit of prompt engineering to make it more specialized.

If you read this far, tweet to the author to show them you care. Tweet a Thanks

Great article, Mandeep! Really appreciate how clearly you broke down integrating LLaMA 3 with Spring Boot makes it feel so approachable. How would you suggest handling model updates or switching models without downtime in a live Spring Boot app?

That's an interesting question. To switch or update models without downtime in a live Spring Boot app, we can use a config-driven approach with hot-reload, feature flags, model warm-up, and graceful fallbacks to ensure seamless transitions and high availability. I would take these things into consideration to handle updating or switching models without downtime.

Config: The app reads the model name (e.g., llama3) from a config file or database, so you don’t hardcode it.

Hot-Reload: An admin hits a /reload-model API when the config changes, and the app loads the new model without restarting or CRON jobs can be used.

Feature Flags: Gradually enable the new model (e.g., LLaMA 3) for 10% of users while others still use the old one.

Model Warm-Up: The new model is downloaded and initialized in the background before it starts handling any real traffic.

Graceful Fallback: If the new model fails to respond or crashes, the system automatically falls back to the previous stable model.

More Posts

Learn how to write GenAI applications with Java using the Spring AI framework and utilize RAG for improving answers.

Jennifer Reif - Sep 22, 2024

The Secret Weapon for Java Developers Building with AI

myfear - Mar 18

Creating RESTful API Using Spring Boot for The First Time

didikts - Oct 14, 2024

Upgrading JJWT on Spring Boot 3.3.x for GraalVM and Cloud Native

Ricardo Campos - Sep 3

Designing APIs for the AI Era with Spring AI and MCP

David Lopez Felguera - Sep 22
chevron_left