Spring AI: Integrate LLMs into Your Spring Boot Application (2026)

A year ago, adding AI features to a Java application meant choosing between heavyweight Python microservices or poorly-documented SDKs that didn't fit the Spring programming model. Spring AI changes this. It brings the same familiar abstractions — auto-configuration, dependency injection, declarative templates — to LLM integration. You write Spring code. Spring AI handles the AI plumbing.

This article walks through the most important Spring AI features and how to use them in a real application.

What Spring AI Actually Does

Spring AI is not a wrapper around a single provider. It's an abstraction layer that supports OpenAI, Azure OpenAI, Anthropic, Ollama (local models), Google Vertex AI, and others — all through the same Java interface. If you start with OpenAI and later want to switch to a self-hosted model, you change configuration, not code.

The core abstractions are:

ChatClient — for conversational AI and text generation
EmbeddingModel — for converting text to vector representations
VectorStore — for semantic search over your own documents
PromptTemplate — for structured, reusable prompts
Tool/Function calling — for giving the LLM access to your services

Setup

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>

spring:
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
      chat:
        options:
          model: gpt-4o
          temperature: 0.7

Spring AI auto-configures the ChatClient, EmbeddingModel, and everything else you need. No boilerplate setup required.

Basic Chat Integration

The simplest use case is generating text. Spring AI's ChatClient uses a fluent builder API that will feel immediately familiar to any Spring developer:

@Service
@RequiredArgsConstructor
public class ProductDescriptionService {

    private final ChatClient chatClient;

    public String generateDescription(String productName, String category, List<String> features) {
        return chatClient.prompt()
            .user(u -> u.text("""
                Write a compelling product description for a {category} product called "{name}".
                Key features: {features}.
                The description should be 2-3 sentences, professional, and focus on customer benefits.
                """)
                .param("category", category)
                .param("name", productName)
                .param("features", String.join(", ", features)))
            .call()
            .content();
    }
}

The PromptTemplate approach keeps your prompts clean and testable. You can version them, unit test them with mock responses, and update them without touching business logic.

Structured Output — Getting Java Objects Back

One of the most practical Spring AI features is structured output. Instead of parsing a string response, you declare the Java type you want and Spring AI handles the marshalling:

public record OrderAnalysis(
    String sentiment,           // positive, negative, neutral
    List<String> issues,        // identified problems
    int urgencyScore,           // 1-10
    String recommendedAction    // what to do next
) {}

@Service
public class OrderReviewService {

    private final ChatClient chatClient;

    public OrderAnalysis analyzeCustomerFeedback(String feedback) {
        return chatClient.prompt()
            .user("Analyze this customer feedback and return a structured assessment: " + feedback)
            .call()
            .entity(OrderAnalysis.class);  // Spring AI converts the response to your type
    }
}

This is a game-changer for integration. Your downstream code works with a typed Java object, not an unstructured string. Validation, error handling, and testing all become straightforward.

RAG — Retrieval-Augmented Generation

RAG is the pattern that makes LLMs useful for domain-specific applications. The problem with using a general LLM directly is that it doesn't know your data — your product catalog, your documentation, your customer history. RAG solves this by:

At ingestion time: splitting your documents into chunks, converting each chunk to a vector embedding, and storing them in a vector database
At query time: converting the user's question to an embedding, finding the most similar document chunks, and injecting them into the LLM prompt as context

The LLM then answers based on your data, not just its training data.

@Configuration
public class RagConfig {

    @Bean
    public VectorStore vectorStore(EmbeddingModel embeddingModel) {
        // PgVectorStore uses PostgreSQL with the pgvector extension
        // Works with your existing PostgreSQL database
        return new PgVectorStore(jdbcTemplate, embeddingModel);
    }
}

@Service
@RequiredArgsConstructor
public class DocumentIngestionService {

    private final VectorStore vectorStore;
    private final TokenTextSplitter splitter;

    public void ingestDocumentation(String documentContent, String source) {
        // Split large documents into overlapping chunks
        List<Document> chunks = splitter.apply(
            List.of(new Document(documentContent,
                Map.of("source", source, "ingestedAt", LocalDate.now().toString())))
        );
        // Store as vector embeddings
        vectorStore.add(chunks);
    }
}

@Service
@RequiredArgsConstructor
public class SupportChatService {

    private final ChatClient chatClient;
    private final VectorStore vectorStore;

    public String answerQuestion(String userQuestion) {
        // Find relevant documentation chunks
        List<Document> context = vectorStore.similaritySearch(
            SearchRequest.query(userQuestion).withTopK(4)
        );

        String contextText = context.stream()
            .map(Document::getContent)
            .collect(Collectors.joining("\n\n"));

        return chatClient.prompt()
            .system("""
                You are a helpful support assistant.
                Answer questions based on the documentation provided.
                If the answer is not in the documentation, say so clearly.
                Documentation context:
                """ + contextText)
            .user(userQuestion)
            .call()
            .content();
    }
}

The pgvector store is particularly convenient because it uses PostgreSQL — no new infrastructure required if you're already running Postgres.

Function Calling — Giving the LLM Access to Your Services

Function calling lets the LLM invoke your Spring beans when it needs real-time data. The LLM decides when to call a function and what parameters to pass — you implement the function:

@Component
@Description("Get the current order status for a customer order")
public class OrderStatusFunction
        implements Function<OrderStatusFunction.Request, OrderStatusFunction.Response> {

    private final OrderRepository orderRepo;

    public record Request(String orderId) {}
    public record Response(String status, String estimatedDelivery, String trackingNumber) {}

    @Override
    public Response apply(Request request) {
        Order order = orderRepo.findByOrderNumber(request.orderId())
            .orElseThrow(() -> new IllegalArgumentException("Order not found"));
        return new Response(
            order.getStatus().name(),
            order.getEstimatedDelivery().toString(),
            order.getTrackingNumber()
        );
    }
}

// Wire it into the chat
@Service
public class CustomerAssistantService {

    public String handleCustomerQuery(String query) {
        return chatClient.prompt()
            .user(query)
            .functions("orderStatusFunction")  // Register the function by bean name
            .call()
            .content();
        // The LLM will call orderStatusFunction if it needs order data
    }
}

When a customer asks "Where is my order #12345?", the LLM automatically calls orderStatusFunction with {"orderId": "12345"} and incorporates the response into its answer. Your business logic stays in Spring beans — the LLM just orchestrates when to call them.

Observability and Cost Control

AI features can get expensive fast. Spring AI integrates with Micrometer to expose token usage metrics:

spring:
  ai:
    openai:
      chat:
        options:
          max-tokens: 500      # Hard limit on response length
          temperature: 0.3     # Lower = more deterministic, cheaper

// Spring AI automatically registers metrics:
// spring.ai.chat.client.token.usage (tagged by model, operation)
// Use Grafana to alert when daily token spend exceeds budget

// Add a caching layer for repeated identical prompts
@Cacheable(value = "ai-responses", key = "#prompt.hashCode()")
public String generateWithCache(String prompt) {
    return chatClient.prompt().user(prompt).call().content();
}

When to Use Spring AI (and When Not To)

Spring AI is the right choice when you're building AI features into an existing Spring Boot application and want to stay in the Java ecosystem. It's particularly strong for RAG applications, customer-facing chat features, and structured data extraction from unstructured text.

It's not the right choice for:

Fine-tuning models — use Python tooling for this
Complex ML pipelines — stay with Python/MLflow
Real-time streaming at scale — works but Kafka + Python is more mature for high-volume AI pipelines

For the majority of Java teams adding AI features to existing applications, Spring AI eliminates the need for a separate Python microservice and brings AI into the same codebase, CI pipeline, and deployment model as the rest of the application.

Common Mistakes to Avoid

Sending entire database records to the LLM — LLMs have context windows; filter and summarize data before sending it, and never send PII unless your AI provider agreement covers it
No fallback on LLM errors — LLM APIs fail, rate-limit, and return unexpected formats; always wrap calls with Resilience4j circuit breakers
Prompts mixed with business logic — treat prompts like SQL queries: externalize them into templates, version them, and test them independently
Ignoring token costs — a single RAG query can cost $0.01-0.10 at scale; add caching for repeated queries and set max-tokens limits

Summary

Spring AI brings LLM integration into the Spring Boot programming model with auto-configuration, familiar abstractions, and multi-provider support. The most impactful use cases are RAG for domain-specific Q&A, structured output extraction, and function calling for real-time data access. The programming model is clean, the observability is built in, and the migration path between providers is a configuration change.

Detect Performance Issues in Your AI-Powered App

Adding AI features often introduces new performance patterns — repeated database lookups for RAG context, missing caches on expensive embedding operations, unoptimized vector search queries. JOptimize helps you catch these issues early.

IntelliJ Plugin — performance analysis for Spring Boot: Install JOptimize
Web Dashboard — full project audit: Analyze your project free →

Build AI features on a solid performance foundation.