The past, present and future of observability

Fasterj

Tools: |

Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks!

Training online: Concurrency, Threading, GC, Advanced Java and more ...

The past, present and future of observability

JProfiler

Get rid of your performance problems and memory leaks!

Modern Garbage Collection Tuning

Java Performance Tuning, 2nd ed

The classic and most comprehensive book on tuning Java

Java Performance Tuning Newsletter

Your source of Java performance news. Subscribe now!

JProfiler

Get rid of your performance problems and memory leaks!

Over 20 years ago, I envisioned the future of Application Performance Management (APM). We're not quite there yet, but the recent LLM developments in AI have moved us to within sight. What I envisaged 20 years ago should become available within the next 5 years. In this article I cover where we came from in observability, where we are, where we'll go to next and how we'll get to what I envisaged

Published November 2025, Author Jack Shirazi

JFantasticAPM alerting and showing the options for handling the alert, including auto-remediation options
The imagined JFantasticAPM auto-managing an issue with human authorization needed to proceed, illustrating next generation auto-remediation

The tools and techniques used to achieve observability have progressed through significant transformations over the last few decades. Each generation of observability solutions has reduced the level of technical expertise required to identify and resolve issues. This article traces the evolution of observability, focusing on how each generation has lowered the barrier to entry for users, and explains what you can expect for the future of observability

Past observability generations

The first generations, generations 0-2, extending from the 1980s until 2025

Generation 0: Command-line utilities

Before the first generation of observability tools were created, users detecting and analysing issues needed deep system-level knowledge. Expertise in command-line tools like ps, vmstat, and netstat was required, along with the ability to analyze server logs, application crash reports, and activity logs. Identifying and resolving issues needed in-depth OS knowledge, understanding of kernel metrics, memory management, disk I/O, and network protocols. Analysis required proficiency in shell scripting, awk, sed, and other command-line utilities for data analysis and manipulation, as well as manual correlation of many separate pieces of information from different systems to understand the root cause of problems

Generation 1: Metric collection

The first generation of observability tools, such as Nagios and Zabbix, automated infrastructure metric collection and alerting. This first generation of tools made metric data available in a centralized location, reducing the expertise and time required to create and manage scripts across systems.

These tools still required significant technical expertise. Users needed to understand monitoring concepts, define meaningful metrics, set appropriate thresholds, and configure alerts. Implementing these aspects required learning the specific configuration syntax and features of these monitoring tools. While these tools provided notifications of problems, understanding the relationships between the metrics and identifying the root cause still relied on manual analysis of data correlated across systems

Generation 2: Application performance monitoring (APM)

Second generation APM tools like Elastic APM, Dynatrace, Datadog, were a significant shift from the first generation tools. The added features included:

Application agents that provide application and framework specific signals, such as the agents available from OpenTelemetry
Distributed tracing to identify where time is spent across the system
Correlated views of the observability signals: logs, traces, metrics, errors
Integrated dashboards with drill-down and analysis capabilities, including historical data
Application dependency tracking and service maps
APIs to add custom signals

These tools lowered the barrier to entry by providing a more intuitive user experience and automating some of the more complex tasks for developers and operations teams. APM tools reduced the technical burden by:

Automating the collection of performance data from applications without requiring extensive code changes
Visualizing the relationships between different services, making it easier to understand request flows
Automating the correlation of traces and metrics, simplifying root cause analysis

However, identifying and resolving issues even with these second generation tools still requires a depth of technical knowledge that needs significant training or experience

The current generation - Generation 3: Search AI powered observability

The current generation of observability tools, such as Elastic Observability 9, integrates AI technologies to democratize the user experience. This enables people with less specialized knowledge to gain valuable insights from the tools. This generation adds:

Generative AI (GenAI) with Large Language Models (LLMs) to interact with people using natural language, allowing them to understand system behaviour and impacts without needing in-depth knowledge
Retrieval-augmented generation (RAG) which combines the GenAI+LLM with observability-specific data using the Elasticsearch Search AI capabilities to make the interaction more accurate and specific to the actual observability data being accumulated in the APM
AI assistants that provide natural language contextual responses and AI driven root cause analysis driven by a natural language interface
Machine Learning (ML) to look for patterns in the data and identify anomalies

Third generation observability is not just second generation boosted with capabilities. It fundamentally alters who can use the solution effectively. Anyone in the organization familiar with the system being monitored can now simply ask "what changed between yesterday and today that could have caused that alert?" to start a directed investigation. Previously you could only usefully ask this question of someone whose job was monitoring, managing, or developing the system. Now your third-generation observability solution replaces the need to mentally correlate data across systems, providing answers! Instead of searching dashboards and linking traces manually, you can ask: "Why did latency spike in checkout yesterday around 2pm?" The system inspects traces, detects an upstream dependency slowdown, identifies a code deployment coinciding with the spike, and explains the causal chain in natural language.

The observability generations, 0-4, categorized by their features and the analyst skills needed, showing each generation is much easier to use
Observability generations 0 to 4

The next generation - Generation 4: Proactive agents

Current third generation observability solutions are still mainly reactive. They can monitor for anomalies and alert you, provide useful answers to your questions, guide you to the root cause of an issue, and even suggest actions you can carry out if you ask for suggestions. However, the AI is not yet something you would rely on to act autonomously in any role other than monitoring (e.g., they are fine for anomaly detection leading to alerts).

The next generation will be more proactive. While it's unlikely that your fourth generation system will be given general autonomy to carry out remediation actions, it will be able to execute specific types of safe actions, such as auto-rolling back a bad rollout that has lead to significant degradation in the system. It will have the capability to analyse performance trends, predict that system degradations may occur in future based on current trends, autonomously analyse the cause of the degradation, search for solutions that would remediate the trend and suggest to a human supervisor specific remedial actions which it can be authorized to proceed with. This generation will hugely reduce the time needed to be taken by people having to analyse the system and carry out actions. But safety will be built-in. Fourth-generation systems will operate within defined "safe action scopes" and require human authorization for changes that could impact customer experience, ensuring predictability and governance.

Note that at the time of writing this article, the author Jack Shirazi is employed within the Elastic Observability organization

Contact Me

Last Updated: 2025-12-25
Copyright © 2007-2025 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on Fasterj.com are the property of their respective owners.
URL: http://www.fasterj.com/articles/generations.shtml
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us