Microservices Timeout Budgeting in Spring Boot
Microservice architecture is very powerful in terms of scaleability and performance but within a short time it can grow big to become a Spagetti mess. One service can call another which in turn can call another 2 services to finally get the response. This is ok in a perfect world where everything works but when timeouts happen due to network, database or process issues things can turn ugly quickly.
Sometime back I watched a presentation by Adrian Cockcroft on the topic “Microservices Retrospective — What We Learned (and Didn’t Learn) from Netflix” [1]. Everyone who has worked with Microservices know that Netflix paved the way for wide adaptation of Microservices architecture and Adrian was instrumental on that journey.
During his talk one particular topic interested me the most because I had faced a similar issue at work which resulted in request timeout and retry issues.
This was the problem extracted from Adrian’s presentation
In this problem scenario the Edge service, middle and sink microservices have 2 seconds timeout and 2 retries each configured. This means if middle service failed to respond with 2 seconds the edge service will retry one more time.
But middle service will also try two more times for the first request which the Edge service already ignored due to time. Edge service has made a retry and now waiting for response but middle service will be unnecessarily retrying the already ignored first request wasting processing, network traffic etc.
The solution for this is to have a Timeout Budget according to Adrian. The Edge service should have a bigger timeout and all other services should avoid retrying if the edge timeout is already exceeded and should fail-fast at the beginning if the Timeout budge is exceeded.
I wanted to implement this in Spring Boot since it is the most widely used Microservices framework. My solution is to use two headers namely:
- REQUEST_START_MILLIS — This is the current time in milliseconds since the UNIX epoch (January 1, 1970 00:00:00 UTC).
- TIMEOUT_BUDGET_SECONDS — This is Timeout Budget in seconds given by the Edge service.
By making use of Spring Retry and Spring Web frameworks within Spring Boot I wrote following components to implement this.
- TimeoutBudgetingContext — This is store the values from aforementioned headers per HTTP request
public class TimeoutBudgetingContext {
public static final String TIMEOUT_BUDGET_SECONDS_HEADER = "TIMEOUT_BUDGET_SECONDS";
public static final String REQUEST_START_MILLIS_HEADER = "REQUEST_START_MILLIS";
private ThreadLocal<Long> timeoutBudgetSeconds = new ThreadLocal<>();
private ThreadLocal<Long> requestStartMillis = new ThreadLocal<>();
private static final TimeoutBudgetingContext CONTEXT = new TimeoutBudgetingContext();
public static final TimeoutBudgetingContext getContext() {
return CONTEXT;
}
public void setTimeoutBudgetSeconds(Long timeoutBudgetSeconds) {
this.timeoutBudgetSeconds.set(timeoutBudgetSeconds);
}
public Long getTimeoutBudgetSeconds() {
return this.timeoutBudgetSeconds.get();
}
public void setRequestStartMillis(Long requestStartMillis) {
this.requestStartMillis.set(requestStartMillis);
}
public Long getRequestStartMillis() {
return this.requestStartMillis.get();
}
public boolean isTimeoutBudgetExceeded() {
if (requestStartMillis.get() != null && timeoutBudgetSeconds.get() != null) {
if (((System.currentTimeMillis() - requestStartMillis.get()) / 1000) > timeoutBudgetSeconds.get()) {
return true;
}
}
return false;
}
}
2. TimeoutBudgetingFilter — A Servlet Filter which will be responsible extracting the headers and populating them in TimeoutBudgetingContext per HTTP request. This filter will also fail-fast without ever reaching any resource endpoints if the Timeout Budget is exceeded to save network, memory and processing.
@Component
public class TimeoutBudgetingFilter implements Filter {
private static final Logger LOGGER = LoggerFactory.getLogger(TimeoutBudgetingFilter.class);
private static final String TIMEOUT_BUDGET_FILTER_DISABLED_HEADER = "TIMEOUT_BUDGET_FILTER_DISABLED";
@Override
public void init(FilterConfig filterConfig) throws ServletException {
LOGGER.info("Filter initialized");
Filter.super.init(filterConfig);
}
@Override
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException {
Boolean filterEnabled = true;
if (request instanceof HttpServletRequest httpServletRequest) {
LOGGER.info("Filter invoked");
if (httpServletRequest.getHeader(TIMEOUT_BUDGET_SECONDS_HEADER) != null) {
TimeoutBudgetingContext.getContext().setTimeoutBudgetSeconds(getValue(TIMEOUT_BUDGET_SECONDS_HEADER, httpServletRequest.getHeader(TIMEOUT_BUDGET_SECONDS_HEADER)));
}
if (httpServletRequest.getHeader(REQUEST_START_MILLIS_HEADER) != null) {
TimeoutBudgetingContext.getContext().setRequestStartMillis(getValue(REQUEST_START_MILLIS_HEADER, httpServletRequest.getHeader(REQUEST_START_MILLIS_HEADER)));
}
if (httpServletRequest.getHeader(TIMEOUT_BUDGET_FILTER_DISABLED_HEADER) != null) {
filterEnabled = false;
}
LOGGER.info("Filter invocation done");
}
if (filterEnabled && TimeoutBudgetingContext.getContext().isTimeoutBudgetExceeded()) {
LOGGER.info("Timeout Budget Exceeded");
if (response instanceof HttpServletResponse httpServletResponse) {
httpServletResponse.setStatus(HttpStatus.REQUEST_TIMEOUT.value());
httpServletResponse.setContentType("application/json");
httpServletResponse.getOutputStream().write("{\"message\":\"Timeout Budget Exceeded\"}".getBytes(StandardCharsets.UTF_8));
httpServletResponse.getOutputStream().flush();
}
} else {
chain.doFilter(request, response);
}
}
@Override
public void destroy() {
LOGGER.info("Filter destroyed");
Filter.super.destroy();
}
private Long getValue(String name, String input) {
try {
Long value = Long.parseLong(input);
LOGGER.info("{} value set for {}", value, name);
return value;
} catch (NumberFormatException e) {
LOGGER.error("Error while reading long value for {}", name, e);
return null;
}
}
}
3. TimeoutBudgetRestTemplateInterceptor — This is an interceptor for RestTemplate which will make sure that the headers received from Edge service will be carried forward to all outgoing downstream requests.
public class TimeoutBudgetRestTemplateInterceptor implements ClientHttpRequestInterceptor {
private static final Logger LOGGER = LoggerFactory.getLogger(TimeoutBudgetRestTemplateInterceptor.class);
@Override
public ClientHttpResponse intercept(HttpRequest request, byte[] body, ClientHttpRequestExecution execution) throws IOException {
if (TimeoutBudgetingContext.getContext().getRequestStartMillis() != null) {
request.getHeaders().add(REQUEST_START_MILLIS_HEADER, String.valueOf(TimeoutBudgetingContext.getContext().getRequestStartMillis()));
}
if (TimeoutBudgetingContext.getContext().getTimeoutBudgetSeconds() != null) {
request.getHeaders().add(TIMEOUT_BUDGET_SECONDS_HEADER, String.valueOf(TimeoutBudgetingContext.getContext().getTimeoutBudgetSeconds()));
}
return execution.execute(request, body);
}
}
4. TimeoutBudgetingRetryInterceptor — This is an interceptor to Spring Retry framework to avoid unnecessary retries if the edge service had already ignored a requests due to its timeout exceeded.
@Component
public class TimeoutBudgetingRetryInterceptor implements MethodInterceptor {
private static final Logger LOGGER = LoggerFactory.getLogger(TimeoutBudgetingRetryInterceptor.class);
@Override
public Object invoke(MethodInvocation invocation) throws Throwable {
if (TimeoutBudgetingContext.getContext().isTimeoutBudgetExceeded()) {
LOGGER.info("Timeout Budget Exceed");
return null;
}
LOGGER.info("No Timeout Budget Applied");
return invocation.proceed();
}
}
5. TimeoutBudgetingRetryable — This is a custom annotation to mark any methods that needs to retried only if the edge service request is not timed out and exceeded.
@Target({ ElementType.METHOD, ElementType.TYPE })
@Retention(RetentionPolicy.RUNTIME)
@Documented
@Retryable(interceptor = "timeoutBudgetingRetryInterceptor")
public @interface TimeoutBudgetingRetryable {
}
Finally when everything is put together I tested this with an example where there are two Microservices namely for users and payments. The user microservice needs to access payments service to get payment history of a user but should only do it if the Edge service timeout is not exceeded.
Demo
The edge service is calling user service with REQUEST_START_MILLIS and TIMEOUT_BUDGET_SECONDS headers. The user service will outright ignore the request and will return 408 Request Timeout if the seconds between current time and REQUEST_START_MILLIS is greater than TIMEOUT_BUDGET_SECONDS.
Contrarily if request to user service from edge service came within the Timeout budget but due to some reason initial request to payment service timed out, but the retry must be avoided due to edge has already ignored the request. In this case the retry must be stopped and partial success (without payment history) will be sent to edge service.
If everything happened within Timeout Budget complete response will be sent back to edge service.
The source code is available in Github