Spring AI brings first-class LLM integration to the Spring ecosystem. Learn how to build AI-powered features — chat, RAG, embeddings — without leaving the Spring Boot programming model.
JOptimize Team
A year ago, adding AI features to a Java application meant choosing between heavyweight Python microservices or poorly-documented SDKs that didn't fit the Spring programming model. Spring AI changes this. It brings the same familiar abstractions — auto-configuration, dependency injection, declarative templates — to LLM integration. You write Spring code. Spring AI handles the AI plumbing.
This article walks through the most important Spring AI features and how to use them in a real application.
Spring AI is not a wrapper around a single provider. It's an abstraction layer that supports OpenAI, Azure OpenAI, Anthropic, Ollama (local models), Google Vertex AI, and others — all through the same Java interface. If you start with OpenAI and later want to switch to a self-hosted model, you change configuration, not code.
The core abstractions are:
<dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-openai-spring-boot-starter</artifactId> </dependency>
spring: ai: openai: api-key: ${OPENAI_API_KEY} chat: options: model: gpt-4o temperature: 0.7
Spring AI auto-configures the ChatClient, EmbeddingModel, and everything else you need. No boilerplate setup required.
The simplest use case is generating text. Spring AI's ChatClient uses a fluent builder API that will feel immediately familiar to any Spring developer:
@Service @RequiredArgsConstructor public class ProductDescriptionService { private final ChatClient chatClient; public String generateDescription(String productName, String category, List<String> features) { return chatClient.prompt() .user(u -> u.text(""" Write a compelling product description for a {category} product called "{name}". Key features: {features}. The description should be 2-3 sentences, professional, and focus on customer benefits. """) .param("category", category) .param("name", productName) .param("features", String.join(", ", features))) .call() .content(); } }
The PromptTemplate approach keeps your prompts clean and testable. You can version them, unit test them with mock responses, and update them without touching business logic.
One of the most practical Spring AI features is structured output. Instead of parsing a string response, you declare the Java type you want and Spring AI handles the marshalling:
public record OrderAnalysis( String sentiment, // positive, negative, neutral List<String> issues, // identified problems int urgencyScore, // 1-10 String recommendedAction // what to do next ) {} @Service public class OrderReviewService { private final ChatClient chatClient; public OrderAnalysis analyzeCustomerFeedback(String feedback) { return chatClient.prompt() .user("Analyze this customer feedback and return a structured assessment: " + feedback) .call() .entity(OrderAnalysis.class); // Spring AI converts the response to your type } }
This is a game-changer for integration. Your downstream code works with a typed Java object, not an unstructured string. Validation, error handling, and testing all become straightforward.
RAG is the pattern that makes LLMs useful for domain-specific applications. The problem with using a general LLM directly is that it doesn't know your data — your product catalog, your documentation, your customer history. RAG solves this by:
The LLM then answers based on your data, not just its training data.
@Configuration public class RagConfig { @Bean public VectorStore vectorStore(EmbeddingModel embeddingModel) { // PgVectorStore uses PostgreSQL with the pgvector extension // Works with your existing PostgreSQL database return new PgVectorStore(jdbcTemplate, embeddingModel); } } @Service @RequiredArgsConstructor public class DocumentIngestionService { private final VectorStore vectorStore; private final TokenTextSplitter splitter; public void ingestDocumentation(String documentContent, String source) { // Split large documents into overlapping chunks List<Document> chunks = splitter.apply( List.of(new Document(documentContent, Map.of("source", source, "ingestedAt", LocalDate.now().toString()))) ); // Store as vector embeddings vectorStore.add(chunks); } } @Service @RequiredArgsConstructor public class SupportChatService { private final ChatClient chatClient; private final VectorStore vectorStore; public String answerQuestion(String userQuestion) { // Find relevant documentation chunks List<Document> context = vectorStore.similaritySearch( SearchRequest.query(userQuestion).withTopK(4) ); String contextText = context.stream() .map(Document::getContent) .collect(Collectors.joining("\n\n")); return chatClient.prompt() .system(""" You are a helpful support assistant. Answer questions based on the documentation provided. If the answer is not in the documentation, say so clearly. Documentation context: """ + contextText) .user(userQuestion) .call() .content(); } }
The pgvector store is particularly convenient because it uses PostgreSQL — no new infrastructure required if you're already running Postgres.
Function calling lets the LLM invoke your Spring beans when it needs real-time data. The LLM decides when to call a function and what parameters to pass — you implement the function:
@Component @Description("Get the current order status for a customer order") public class OrderStatusFunction implements Function<OrderStatusFunction.Request, OrderStatusFunction.Response> { private final OrderRepository orderRepo; public record Request(String orderId) {} public record Response(String status, String estimatedDelivery, String trackingNumber) {} @Override public Response apply(Request request) { Order order = orderRepo.findByOrderNumber(request.orderId()) .orElseThrow(() -> new IllegalArgumentException("Order not found")); return new Response( order.getStatus().name(), order.getEstimatedDelivery().toString(), order.getTrackingNumber() ); } } // Wire it into the chat @Service public class CustomerAssistantService { public String handleCustomerQuery(String query) { return chatClient.prompt() .user(query) .functions("orderStatusFunction") // Register the function by bean name .call() .content(); // The LLM will call orderStatusFunction if it needs order data } }
When a customer asks "Where is my order #12345?", the LLM automatically calls orderStatusFunction with {"orderId": "12345"} and incorporates the response into its answer. Your business logic stays in Spring beans — the LLM just orchestrates when to call them.
AI features can get expensive fast. Spring AI integrates with Micrometer to expose token usage metrics:
spring: ai: openai: chat: options: max-tokens: 500 # Hard limit on response length temperature: 0.3 # Lower = more deterministic, cheaper
// Spring AI automatically registers metrics: // spring.ai.chat.client.token.usage (tagged by model, operation) // Use Grafana to alert when daily token spend exceeds budget // Add a caching layer for repeated identical prompts @Cacheable(value = "ai-responses", key = "#prompt.hashCode()") public String generateWithCache(String prompt) { return chatClient.prompt().user(prompt).call().content(); }
Spring AI is the right choice when you're building AI features into an existing Spring Boot application and want to stay in the Java ecosystem. It's particularly strong for RAG applications, customer-facing chat features, and structured data extraction from unstructured text.
It's not the right choice for:
For the majority of Java teams adding AI features to existing applications, Spring AI eliminates the need for a separate Python microservice and brings AI into the same codebase, CI pipeline, and deployment model as the rest of the application.
max-tokens limitsSpring AI brings LLM integration into the Spring Boot programming model with auto-configuration, familiar abstractions, and multi-provider support. The most impactful use cases are RAG for domain-specific Q&A, structured output extraction, and function calling for real-time data access. The programming model is clean, the observability is built in, and the migration path between providers is a configuration change.
Adding AI features often introduces new performance patterns — repeated database lookups for RAG context, missing caches on expensive embedding operations, unoptimized vector search queries. JOptimize helps you catch these issues early.
Build AI features on a solid performance foundation.
Master Spring Boot, security, and Java performance with hands-on courses.
JOptimize finds N+1 queries, EAGER collections, and 70+ other issues in your Java codebase — in under 30 seconds.