How to implement observability with Elasticsearch

3 steps for planning an observable Java application deployment and how to implement it with Elastic Observability

How to implement observability with Elasticsearch
Getty Images

The concept of observability has been around for decades, but it’s a relative newcomer to the world of IT infrastructure. So what is observability in this context? It’s the state of having all of the information about the internals of a system so when an issue occurs you can pinpoint the problem and take the right action to resolve it.

Notice that I said state. Observability is not a tool or a set of tools — it’s a property of the system that we are managing. In this article, I will walk through how to plan and implement an observable deployment including API testing and the collection of logs, metrics, and application performance monitoring (APM) data. I’ll also direct you to a number of free, self-paced training courses that help you develop the skills needed for achieving observable systems with the Elastic Stack.

Three steps to observability

These are the three steps toward observability presented in this article:

  1. Plan for success
    1. Collect requirements
    2. Identify data sources and integrations
  2. Deploy Elasticsearch and Kibana
  3. Collect data from systems and your services
    1. Logs
    2. Metrics
    3. Application performance management
    4. API synthetic testing

Plan for success

I have been doing fault and performance management for the past twenty years. In my experience, to reliably reach a state of observability, you have to do your homework before getting started. Here’s a condensed list of a few steps I take to set up my deployments for success:

Goals: Talk to everyone and write the goals down

Talk to your stakeholders and identify the goals: “We will know if the user is having a good or bad experience using our service;” “The solution will improve root cause analysis by providing distributed traces;” “When you page me in the middle of the night you will give me the info I need to find the problem;” etc.

Data: Make a list of what data you need and who has it

Make a list of the necessary information (data and metadata) needed to support the goals. Think beyond IT information — include whatever data you need to understand what is happening. For example, if Ops is checking the Weather Channel during their workflow, then consider adding weather data to your list of required information. Snoop around the best problem solver’s desk and find out what they’re looking at during an outage (and how they like their coffee). If your organization does postmortems, take a look at the data that the people bring into the room; if it’s valuable to determine the root cause at a finger-pointing session, then it’s so much more valuable in Ops before an outage.

Fix: Think about the solution and information that can speed it up

If Ops needs a hostname, a runbook, some asset info, and a process name to fix the problem, then have that data available in your observability solution and send it over when you page them. Add the required bits of information to the list you started in the previous step.

A good starting point

At this point, you have a list of data that you need so that when an issue occurs you can pinpoint the problem and take the right action to resolve it. That list might look something like this:

Service data

  • User experience data for my service
    • Response time of the application per transaction and the components that make up the application (e.g., the front end and the database)
    • Proper API functionality via synthetic testing
  • Performance data for my infrastructure
    • Operating system metrics
    • Database metrics
  • Logs from servers and apps

Inbound integrations

  • History of past incidents
  • Runbooks
  • Asset info
  • Weather or other “non-IT” data

Outbound integrations

  • Incident management integration for alerting

Elastic Observability

The Elastic Stack — Elasticsearch, Kibana, Beats, and Logstash; formerly known as the ELK Stack — is a set of powerful open source tools for searching, analyzing, and visualizing data in real time. The Elastic Stack is widely used to centralize logs from operational systems. Over time, Elastic has added products for metrics, APM, and uptime monitoring — this is the Elastic Observability solution.

The value of Elastic Observability is that it brings together all the types of data you need to help you make the right operational decisions and achieve a state of observability. Let’s jump into a scenario to demonstrate how to put Elastic Observability into action.

elastic observability 01 Elastic

Scenario

I have a simple application to manage. It consists of a Spring Boot application running on a Linux VM in Google Cloud Platform. The application exposes two API endpoints and has a MariaDB back end. You can find the application in the Spring Guides. I have created an Elasticsearch Service deployment in Elastic Cloud and I will follow the agent install tutorials right in Kibana, the Elasticsearch analysis and management UI. The open source agents that will be used are:

  • Filebeat for logs
  • Metricbeat for metrics
  • Heartbeat for API testing and response time monitoring
  • Elastic APM Java Agent for distributed tracing of the application

Note: This guide is written for a specific application based on Spring Boot and MySQL. If you have something else that you want to collect logs, metrics, and APM traces from, then you should be able to modify these instructions to do what you want. When you open up Kibana you will be greeted with a long list of out-of-the-box observability integrations. 

Implementation

In this article I will go over the steps to get the basics done, and then in future articles I’ll dive into best practices and some of the integrations. Let’s walk through a simple deployment. 

Hosted Elasticsearch Service

To follow along in this guide, create a deployment in Elasticsearch Service on Elastic Cloud (a trial account is free). Once you sign up, watch and follow the steps in the Deploy Elasticsearch in 3 minutes or less video. A few minutes later you will have a cluster that you can use to follow along with the rest of this article. Download the password that is presented to you; you will use that to log in to Kibana and to configure the Beats. The screenshots are from version 7.6 of the Elastic Stack — your UI may look slightly different based on your version.

elastic observability 02 Elastic

If you forget the password, reset it:

elastic observability 03 Elastic

Kibana

Kibana is the visualization and management tool of the Elastic Stack. Kibana will guide us through installing and configuring the Beats and Elastic APM Java Agent. 

Launch Kibana from the deployment details and log in with the elastic username and password:

elastic observability 04 Elastic

The instructions for everything that you need to install can be found right in your Kibana instance. Often over the next few pages I will direct you to Kibana Home; you can get there by clicking on the Kibana icon in the top left of any Kibana page.

elastic observability 05 Elastic

Add integrations

This is the list of what will be collected:

  • Logs from the infrastructure and MariaDB
  • Metrics from the infrastructure and MariaDB
  • API test results and response time measurements
  • Distributed tracing of the application including the database

Kibana guides you through adding logs, metrics, and APM. This video shows how to add MySQL logs, and once you know how to do that you can follow the same process to add metric and APM data. 

Logs from my infrastructure and MariaDB

Both MariaDB and MySQL provide logs. I am interested in the error log and the slow log. By default the slow log is not produced. To configure these logs, have a look in the MariaDB docs. For my deployment the configuration file is /etc/mysql/mariadb.conf.d/50-server.cnf. Here are the relevant parts:

# This group is only read by MariaDB servers, not by MySQL.
# If you use the same .cnf file for MySQL and MariaDB,
# you can put MariaDB-only options here
[mariadb]
slow_query_log
#
# * Logging and Replication
#
# Both location gets rotated by the cronjob.
# Be aware that this log type is a performance killer.
# As of 5.1 you can enable the log at runtime!
#general_log_file        = /var/log/mysql/mysql.log
#general_log             = 1
#
# Error log - should be very few entries.
#
log_error = /var/log/mysql/error.log
#
# Enable the slow query log to see queries with especially long duration
slow_query_log_file = /var/log/mysql/mariadb-slow.log
long_query_time = 0.5
log_slow_rate_limit = 1
log_slow_verbosity = query_plan
#log-queries-not-using-indexes

To enable the slow query log, uncomment the lines in the slow query section and adjust the long query time as desired (the default is ten seconds). 

A quick test of the configuration is to force a slow query with a SELECT SLEEP():

$ sudo -- sh -c 'echo "select sleep(2);" | mysql'

sleep(2)
0

This results in a record being added to the slow log:

# Time: 200427 15:19:59
# User@Host: root[root] @ localhost []
# Thread_id: 13  Schema:   QC_hit: No
# Query_time: 2.000173  Lock_time: 0.000000  Rows_sent: 1  Rows_examined: 0
# Rows_affected: 0
SET timestamp=1588000799;
select sleep(2);

Install Filebeat

Follow the directions in Kibana Home > Add log data > MySQL logs. When you are instructed to enable and configure the mysql module, refer to these details for additional information:

  • The filebeat modules enable command takes a list of modules, so save some steps and add system and auditd to the list:
    sudo filebeat modules enable mysql system auditd
  • When you are instructed to Modify the settings in the d/mysql.yml file, note that the slow log I added is not in the default location, so edit the file modules.d/mysql.yml and specify the location of the slow log as an entry in the var.paths array:
- module: mysql
  # Error logs
  error:
    enabled: true

    # Set custom paths for the log files. If left empty,
    # Filebeat will choose the paths depending on your OS.
    #var.paths:

  # Slow logs
  slowlog:
    enabled: true

    # Set custom paths for the log files. If left empty,
    # Filebeat will choose the paths depending on your OS.
    var.paths:
            - /var/log/mysql/mariadb-slow.log

Run the setup command and start Filebeat as directed in Kibana > Add log data > MySQL logs. At the bottom of that page is a link to the MySQL dashboard. You should also look at the [Filebeat System] Syslog dashboard ECS and [Filebeat System] Sudo commands ECS dashboards. You can search for these in the dashboard list:

elastic observability 06 Elastic

API test results and response time measurements

In order to measure proper functionality of the API endpoints we need to POST some URL encoded data, read the response, and verify it. This is often done manually by using curl or the Postman API Client. By automating the testing with Heartbeat, the response time and test results are available alongside the logs, APM, and other metrics for the service. Heartbeat monitors the availability of services by testing API endpoints for proper responses, checking websites for content and response codes, verifying ICMP pings, etc. 

Install Heartbeat

Follow the instructions in Kibana Home > Add metric data > Uptime monitors. When you are instructed to edit the heartbeat.monitors setting in the heartbeat.yml file, replace the existing monitor with this API test:

# Configure monitors inline
heartbeat.monitors:
- type: http
  name: SpringToDoApp
  schedule: '@every 5s'
  urls: ["http://roscigno-obs:8080/demo/add"]
  check.request:
    method: POST
    headers:
      'Content-Type': 'application/x-www-form-urlencoded'
    body: "name=first&email=someemail%40someemailprovider.com"
  check.response:
    status: 200
    body:
      - Saved
      - saved
  response.include_body: 'always'

Run the setup command and start Metricbeat as directed in Kibana > Add metric data > MySQL metrics. At the bottom of that page is a link to the Uptime App.

elastic observability 07 Elastic

And

elastic observability 08 Elastic

Distributed tracing of the application including the database

Elastic APM instruments your applications to ship performance metrics to Elasticsearch for visualization in Kibana with the APM app. By adding the APM jar file to the command used to launch the application I get distributed tracing so I can see where my app is spending time (whether it is in the Java code or in the calls to MariaDB).

The process is provided in Kibana Home > Add APM > Java and consists of downloading the jar file and using the Java instrumentation API to start the agent.

elastic observability 09 Elastic

I prefer to use environment variables, so I take the details provided and set the environment variables:

$ cat environment
export
ELASTIC_APM_SERVER_URL=https://1530f7c8afdf402eb281750f0b127bc4.apm.us-central1.gcp.cloud.es.io:443

export ELASTIC_APM_SECRET_TOKEN=WjyW67R0eSWDhILWDD

export ELASTIC_APM_SERVICE_NAME=winter-mysql

export ELASTIC_APM_APP_PACKAGES=com.example

I am launching the app via ./mvnw spring-boot:run and sourcing the environment variables in the Maven Wrapper:

exec "$JAVACMD" \
  -javaagent:./elastic-apm-agent.jar \
  -Delastic.apm.service_name=${ELASTIC_APM_SERVICE_NAME:-demo-app} \
  -Delastic.apm.server_url=${ELASTIC_APM_SERVER_URL} \
  -Delastic.apm.secret_token=${ELASTIC_APM_SECRET_TOKEN} \
  -Delastic.apm.application_packages=org.example \
  $MAVEN_OPTS \
  -classpath "$MAVEN_PROJECTBASEDIR/.mvn/wrapper/maven-wrapper.jar" \
  "-Dmaven.home=${M2_HOME}" \
  "-Dmaven.multiModuleProjectDirectory=${MAVEN_PROJECTBASEDIR}" \
  ${WRAPPER_LAUNCHER} "$@"

As soon as the application is started, the API tests set up earlier with Heartbeat will result in traces in Elasticsearch:

elastic observability 10 Elastic
1 2 Page 1
Page 1 of 2