Configuring EKS Observability with Grafana Loki and OpenEBS LVM Local PV

 

Configuring EKS ObservabilityThis blog guides users on how to set up persistent storage for logs generated in Grafana Loki. The demo relies on the OpenEBS Local Volume Manager to collect logs and metrics of an EKS cluster using Local PV volume for persistent storage. 

Overview

Modern applications are designed to generate system logs to help with event analysis, debugging and troubleshooting.Besides tracking defects in the application, logs offer an insight into a user’s experience while using the software. Since applications and underlying platforms are consistently generating log data, it often gets overwhelming to operate log management systems at scale. Some systems index a small portion of log metadata, making log management highly horizontally scalable. 

Grafana Loki is a horizontally scalable, multi-tenant log management system that relies on log metadata indexing to simplify observability. With Loki, the log data is compressed and stored in object or file storage systems, following which the log manager indexes various low cardinality fields. When deployed as a StatefulSet, multiple replicas with Loki can sustain the failure or reduced availability of a single replica. 

OpenEBS includes the Local Volume Manager (LVM) CSI driver that allows you to create, manage and attach storage devices directly on the nodes of your cluster without having to use external LVs. OpenEBS LVM Local PV offers a persistent storage solution for Loki that offers benefits of local volumes by allowing to efficiently add a disk to the LVM group and scale the provisioned persistent volume.

This guide explores how OpenEBS LVM Local PV Volumes can be used to store logs collected by Grafana Loki for an EKS cluster. 

Deep Dive Into Log Aggregation with Loki

Log aggregation allows the centralized collection of log files from numerous services and applications. This sets the foundation for further log management functions, such as searching, tagging and parsing log data. Log aggregation also enables to compare performance across systems and applications, extract insightful metrics and identify potential defects before they cause an outage. 

Some benefits of implementing effective log aggregation mechanisms include:

  • Log aggregation makes it easier to identify patterns across disparate systems
  • Enables teams to monitor logs in real-time for real-time insights
  • Since log aggregation simplifies the search of specific logs/events, it helps to optimize queries through granular control 
  • Simplify auditing for compliance
  • Enforces faster root cause analysis

The following section explores how Grafana Loki simplifies log aggregation by collecting logs and metrics of Kubernetes resources. 

Benefits of Integrating Grafana Loki for Observability 

The Loki datastore is optimized to hold log data through metadata indexing. Unlike other logging platforms that index the original log message, Loki builds indexes from labels. The Loki log aggregation platform offers several features out of the box that makes it cost-effective and easy to operate. These include: 

Efficient Usage of Memory

Since Loki indexes labels rather than entire log messages, the log index is significantly smaller than in other log management platforms. This means that the platform requires less memory for indexing, making it less costly to operate. 

Flexible Query Language

Loki uses LogQL to help generate queries against collected logs. Similar to Prometheus’ PromQL, LogQL allows accessing metrics from log data, thereby helping to seamlessly perform multiple functions beyond log aggregation.

Platform Flexibility

Loki supports integration with most popular platforms through plugins. This makes it easy to integrate Grafana Loki log aggregation into the observability stack without having to switch existing platforms. 

Scalability

Loki works well in single-process mode, in which all microservices are monitored by a single agent. While the single-process mode is considered appropriate for small scale deployments such as test environments, the platform is also built to handle large scale deployments through scaling out. In large scale installations, extensive microservice components can be split into separate processes to scale components individually. 

Multi-tenancy

The Grafana Loki platform is built to enable multiple tenants to use a single Loki instance. Loki completely isolates data of distinct tenants using an agent in the instance to assign tenant IDs.

Grafana Loki Architecture

Grafana Loki is broken down into subcomponents that are internally known as modules. Each module spawns one server for internal traffic (gRPC) and another server for external API requests (HTTP/1). The modules connect with each other through the gRPC server and expose health, readiness and metrics endpoints through the HTTP/1 server. 

Grafana Loki components include:

  1. Distributor - handles incoming client streams and validates them for correctness to ensure it conforms to tenant limits. 
  2. Ingester - writes log data onto a persistent file or object storage system.
  3. Query Front End - an optional service that accelerates the read path while providing API endpoints to generate queries. This component is responsible for queueing, splitting and caching queries.
  4. Chunk Store - the persistent storage platform for log data.
  5. The read & write paths for committing and accessing log data

Loki Architecture

A typical Grafana Loki log collection engine (Image Source: https://grafana.com/)

Loki Log Aggregation for Kubernetes Clusters

The PLG (Promtail, Loki, Grafana) stack can be used for efficient performance monitoring of Kubernetes workloads and clusters. The monitoring essentially works in the following order - 

  1. Promtail is installed on Kubernetes nodes where it detects targets such as log files, attaches labels to log streams then ships them to Loki. 
  2. The data is then stored and indexed by Loki. 
  3. The log details from Loki are then pulled into the Grafana visualization platform. 
  4. Grafana processes log data from Loki, then makes it accessible on a web dashboard for deeper analysis. 

Persisting Kubernetes Resource Logs in Loki using OpenEBS LVM

The OpenEBS LVM CSI Driver combines individual physical storage disks into a Volume Group. The Volume group can be used as a single logical volume or partitioned into various logical volumes. The LVM uses these storage management structures to simplify the management of dynamic client storage needs, such as log files. 

OpenEBS Storage for Grafana Loki

OpenEBS for Grafana Loki Persistent Storage

The LVM CSI driver partitions volume groups into extents that determine the smallest amount of storage space that can be allocated. This makes storage orchestration flexible since the logical volume extents don’t need to be mapped directly with corresponding physical extents. The driver can rearrange and duplicate physical extents in a physical volume without having to migrate logical extents. When resizing logical extents, this eliminates the need to migrate users and applications.

Persisting Loki Logs on OpenEBS LVM Volume

This solution guide outlines detailed steps to set up the monitoring tool Grafana Loki for collecting and viewing logs and metrics of Kubernetes resources using OpenEBS LVM Local PV as the persistent storage solution. To persist the logs of Kubernetes resources, the Loki application will be provisioned by OpenEBS volume using the Local PV storage engine. Once the Loki application is running with OpenEBS Local PV volume, we will configure alert rules, to trigger warning messages over a configured slack channel, if disk utilization reaches the threshold limit.

The solution guide includes end-to-steps including - 

  • Cluster pre-configuration
  • Configuring LVM 
  • Installing Loki 
  • Setting up alerts & monitoring in Grafana Loki. 
  • Resizing persistent volume size already in use by the Loki pod 
  • Increasing PVC size across multiple disks

By following this guide, you can deploy a Loki pod to collect logs from an EKS cluster, then visualize them in the Prometheus dashboard.

Conclusion

The Loki log aggregation platform relies on log metadata indexing to simplify log management. By persisting log data on OpenEBS LVM LocalPV devices, you can extend the flexibility of Loki through dynamic resizing. 

This article explored the steps to set up OpenEBS LVM devices for persisting log Loki data for observability. Using the solution guide, you can configure your EKS cluster to generate logs that are stored in LocalPV devices, processed by Loki, and then visualized in Grafana. 

To understand how OpenEBS can be used to help in log aggregation and management, you can schedule a call with one of our experts. If you need professional help to decide, we can also connect you with one of our trusted partners. 

Don Williams
Don is the CEO of MayaData and leading the company for last one year. He has an exceptional record of accomplishments leading technology teams for organizations ranging from private equity-backed start-ups to large, global corporations. He has deep experience in engineering, operations, and product development in highly technical and competitive marketplaces. His extensive professional network in several industries, large corporations and government agencies is a significant asset to early stage businesses, often essential to achieve product placement, growth and position for potential exit strategies.
Kiran Mova
Kiran evangelizes open culture and open-source execution models and is a lead maintainer and contributor to the OpenEBS project. Passionate about Kubernetes and Storage Orchestration. Contributor and Maintainer OpenEBS projects. Co-founder and Chief Architect at MayaData Inc.
Murat Karslioglu
VP @OpenEBS & @MayaData_Inc. Murat Karslioglu is a serial entrepreneur, technologist, and startup advisor with over 15 years of experience in storage, distributed systems, and enterprise hardware development. Prior to joining MayaData, Murat worked at Hewlett Packard Enterprise / 3PAR Storage in various advanced development projects including storage file stack performance optimization and the storage management stack for HPE’s Hyper-converged solution. Before joining HPE, Murat led virtualization and OpenStack integration projects within the Nexenta CTO Office. Murat holds a Bachelor’s Degree in Industrial Engineering from the Sakarya University, Turkey, as well as a number of IT certifications. When he is not in his lab, he loves to travel, advise startups, and spend time with his family. Lives to innovate! Opinions my own!