Purpose of This Tutorial
The purpose of this Tutorial is to provide a practical and easy-to-understand guide on span links in OpenTelemetry.
It aims to help developers, especially those working with complex and asynchronous systems, understand what span links are, how they differ from traditional parent-child relationships in tracing, and why they are valuable for better trace correlation.
By the end of this guide, you will gain the skills needed to effectively use span links to track interactions within distributed systems, leading to improved observability and debugging.
A Brief Introduction to Distributed Tracing
In the past, Applications were typically monolithic, meaning every process or feature is executed as a single unit on one server.Monitoring such applications was straightforward.
For Example: if something went wrong, you could look at logs from that server to identify the problem. However, the rise of microservices changed this simplicity.
Now, modern applications are often made up of dozens or even hundreds of smaller, independent services that work together. For example: when you use a mobile app to place an order, there might be separate services to handle user authentication, process payments, manage inventory, and send confirmation emails.
These services don’t always live on the same server and can even communicate over the internet, which adds complexity to tracking what happens when you interact with an application.
This is where distributed tracing comes in. Think of distributed tracing as a way to follow a single request as it travels through various services in a complex application.It tracks the journey of a request through a complex system.
In modern applications, requests often travel through multiple services, each running on different machines. Distributed tracing helps us visualize this journey, making it easier to identify bottlenecks and errors.
It’s like a detective’s map that connects the dots between each step of the process, showing you how long each part took and where any issues occurred. When you look at a trace, you see a timeline of how your request moved through different services, making it easier to identify slowdowns, errors, or failures.
Code Repo
Here’s the code repo for this tutorial:
[https://github.com/Noibisjunior/Span-Links-in-OpenTelemetry]
The Role of OpenTelemetry in Modern Observability
OpenTelemetry is a key player in enabling this kind of visibility. It’s an open-source observability framework that allows developers to collect data like logs, metrics, and traces from their applications.It serve as a toolset for capturing detailed information about what’s happening inside your services.
In the world of modern observability, OpenTelemetry helps you understand the performance and health of your distributed applications. It acts like a bridge that gathers data from various services and sends it to tools like SigNoz, where you can visualize what’s going on. This makes OpenTelemetry invaluable for identifying bottlenecks, tracking down errors, and ensuring that your applications run smoothly.
By using OpenTelemetry with distributed tracing, you can get a full picture of how your applications behave, making it easier to diagnose issues and improve the user experience.
The Significance of spans as a building blocks for tracing in OpenTelemetry
As software, especially distributed systems grow in complexity, understanding their inner workings becomes a challenging task. That's where OpenTelemetry's spans come in to solve the challenge easily.
What Are Spans?
A span is a fundamental unit of work in OpenTelemetry’s tracing system.It is a single operation or event that occurs within your application.
It captures what happened during that operation, how long it took, and any relevant details, like whether it succeeded or failed.
For example, imagine your application processes a user request:
- When the request comes in, OpenTelemetry creates a span that represents the request being received.
- If the request then triggers a database query, another span is created to represent that database interaction.
- If the app calls another service, another span tracks that.
Key attributes of a span:
- Name: A descriptive label for the operation (e.g., "Get User Data").
- Start and End Timestamps: The time the operation began and ended.
- Parent Span: The span that initiated this operation.
- Tags: Additional metadata (e.g., HTTP status code, error messages).
How Spans Work Together to Create Traces
Individually, spans are useful, but they are effective when they work together to form a trace.
A trace is a collection of spans that represents the entire journey of a request or operation as it flows through your system.
Let’s go back to our user request example:
The trace begins when the request enters the system, and a root span is created.As the request triggers the database query, the database interaction span is linked to the root span, showing that it’s part of the same process.
Additional spans for calling other services get added to the trace.By looking at this trace, you can see the big picture of how the request traveled through different parts of your system. It helps you understand not just what happened, but how different parts of your application are connected.
Why Are Spans Important?
Pinpointing Problems: Spans help you zoom in on where things go wrong. If a request is slow, spans can tell you whether it’s the database query, the network call, or some other part of the process that’s causing the delay. You can see which span took longer than expected, making it easier to find bottlenecks.
Building Context: Each span contains contextual information like start time, end time, and custom labels (attributes). This data provides insights into what was happening at a particular moment in your system, like the specific user ID involved in a request or the query that was executed.
- Creating Relationships: Spans have relationships with one another, often in a parent-child structure. The root span is the parent, and subsequent spans are its children. This structure helps you see the order in which events occurred and how they depend on one another. It’s like looking at a family tree of operations in your app.
- Debugging Distributed Systems: For applications with microservices (where different services handle different parts of a request), spans are especially crucial. They help you track a request as it moves between services, even if those services are running on different servers or in different data centers. This is key for understanding complex interactions between services.
Understanding Span Links in OpenTelemetry
What Are Span Links?
In the world of distributed systems, where multiple services work together to handle a user request, tracing is like a detective's map, it shows the path a request takes as it moves through these services. Each activity in this journey is called a span, and a complete journey is called a trace.
Traditionally, spans are connected using parent-child relationships. Imagine these like a family tree: a parent span initiates a process (like making a request to another service), and child spans represent the activities that happen as a result (like the service processing the request). This is a straightforward way to represent a request’s flow.
But what happens when two spans are related, yet they don’t fit perfectly into that parent-child hierarchy? This is where span links comes in.
A span link allows you to connect two spans that are related but don’t have a direct parent-child relationship. It is like a “reference” or “shortcut” between two activities in a distributed system.
For example, let’s say you have a user making a request that triggers multiple independent processes, like sending an email and writing to a database. These processes aren’t child activities of each other; they happen side by side. Using a span link, you can indicate that the email sending span and the database writing span are related to the same initial user request, even though they aren’t directly connected in the parent-child concept.
How Span Links Differ from Parent-Child Relationships
Parent-Child Relationship: it is a straightforward chain of events. A user sends a request (parent), which triggers the creation of a record in a database (child). The child span wouldn’t exist without the parent span, making it a direct consequence.
Span Links: These are more like drawing dotted lines between activities that are related in some context but don’t follow a direct chain of actions. They provide a way to say, “These things are related, even though one didn’t directly cause the other.” Span links are ideal for representing parallel activities or events that interact but aren’t strictly hierarchical.
Importance of Span Links in Complex and Asynchronous Systems
Span links are particularly valuable in complex and asynchronous systems, where the flow of events doesn’t always follow a clear parent-child path. Here are some scenarios of how it can be used practically;
Asynchronous Workflows:
Imagine a user request that starts with a background job (like generating a report). The initial request finishes, but the report generation continues in the background.
With the implementation of span links, you can relate the initial request span to the background job span, even though they don’t follow a direct parent-child pattern.
Microservice Communication:
In a microservices architecture, services often communicate with each other in ways that aren’t strictly hierarchical.
For instance, a user action could trigger multiple services to process different parts of the data simultaneously. Span links allow you to track these independent and related spans as part of a broader workflow.
Batch Processing: If you’re processing batches of data where each item in the batch generates its own spans, you can use span links to connect these spans back to the original batch process.
This makes it easier to trace the entire lifecycle of a batch and understand how individual items relate back to the main process.
Prerequisites: Tools and Libraries required to Configure Span Links using OpenTelemetry and Signoz
- OpenTelemetry SDK: The OpenTelemetry SDK (Software Development Kit) is your toolkit for gathering observability data like traces, metrics, and logs from your application.
It acts as a bridge between your code and observability systems, making it possible to collect detailed information about how your application is running.
Imagine OpenTelemetry as a “camera” that captures snapshots of your application's operations. By integrating the SDK into your app, you’re positioning this camera to record what’s happening behind the scenes.
You’ll need to install the SDK in your application’s programming language (e.g., Python, Java, JavaScript).
(2) SigNoz Setup: SigNoz is an open-source observability tool that allows you to visualize and analyze the data you collect with OpenTelemetry.
Think of SigNoz as the “control room” where you view the footage captured by your OpenTelemetry setup. It’s where you have a clear picture of traces and metrics in your application.
You’ll need to set up a SigNoz instance, which involves deploying it on your local machine or on a server, usually using Docker or Kubernetes.
SigNoz helps transform the raw data into visualizations, like graphs and charts, making it easier to understand what's happening inside your application.
Basic Knowledge of Traces, Spans, and Instrumenting Code:
Traces:
In simple terms, a trace is like a “story” of what happens when a user or a request interacts with your application. It captures all the actions that occur as a result of that interaction, from the initial request to all the services and databases that might be involved.
Imagine a user clicking a button on your website. A trace would record every step of what happens next.
Spans:
Spans are the “chapters” within a trace's story. Each span represents a specific operation or task that takes place as part of a trace.
For instance, if the trace captures the entire process of a user request, a span could represent a single step, like querying the database or calling an external API.
Each span has a start and end time, giving you precise details about how long each step took. This makes it easier to pinpoint any slowdowns or errors.
Instrumenting Code with OpenTelemetry:
Instrumentation is the process of adding code to your application to collect observability data. By instrumenting your code with OpenTelemetry, this typically involves adding a few lines of code where you want to create traces and spans.
For example, you might instrument a database query to see how long it takes or instrument a user login process to track its performance.
The OpenTelemetry SDK makes this easier by providing libraries and functions that you can integrate directly into your code. Think of it like attaching trackers to parts of a machine to monitor how they work together.
Creating Span Links in Python: Step-by-Step Example
Let’s look at a basic example in Python. We’ll use the OpenTelemetry SDK to create two spans and link them together.
from opentelemetry import trace
from opentelemetry.trace import Link
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor, ConsoleSpanExporter
# Set up the tracer provider and span exporter
trace.set_tracer_provider(TracerProvider())
tracer = trace.get_tracer(__name__)
span_processor = SimpleSpanProcessor(ConsoleSpanExporter())
trace.get_tracer_provider().add_span_processor(span_processor)
# Create the first span, simulating some work being done
with tracer.start_as_current_span("span_one") as span_one:
span_one.add_event("Processing order")
order_id = "12345" # Imagine this as an order ID we're processing
# Create a second span with a link to the first span
with tracer.start_as_current_span("span_two", links=[Link(span_one.get_span_context())]) as span_two:
span_two.add_event("Updating order status")
# Simulate some additional work here
print("Order ID: {order_id} has been updated.")
print("Tracing complete")
Explanation of the Above Python Code Snippet
Set Up the Tracer Provider:
The above code snippet begins with a tracer provider
, which manages the creation of spans.
This is essential for OpenTelemetry to know how to handle spans.We also configure a SimpleSpanProcessor
and ConsoleSpanExporter
to print span data to the console. This helps us see what type of spans that are being created and how they’re linked
.
(2) Create the First Span (span_one):
Using the tracer.start_as_current_span method
, we create a span called span_one
. This could represent any action, like processing an order.
Inside this span, we add an event Processing order
to indicate what’s happening at that particular point in time.
We also simulate an order ID (order_id = "12345")
that would be used in the next span.
(3) Create the Second Span with a Link (span_two)
:
Here, we initiated another span called span_two
to represent a different, but related action—like updating the status of the order.
Notice the links parameter.We use Link(span_one.get_span_context())
to create a link between span_two and span_one.
This tells OpenTelemetry, "While these actions aren't parent-child, they are related."
Inside span_two
, we added another event, Updating order status
, and simulate some work like updating an order status in a database.
(4) Output:
When you run this code, you’ll see output in the console from the ConsoleSpanExporter
that shows both spans, along with the link between them. This helps visualize how these two spans relate to each other in a trace.
Common Errors to watch out for and How to Troubleshoot the Errors.
(1) Missing Span Contexts:
Error: If you try to create a link without calling span_one.get_span_context()
, you’ll get an error because OpenTelemetry requires a valid span context to create a link.
Solution: Always ensure that you are passing a span context when creating a link. Use the .get_span_context()
method of an active span.
(2) Linking Unstarted or Ended Spans:
Error: If you attempt to create a link to a span that hasn’t been started or has already ended, you might run into issues where the link is not recognized.
Solution: Make sure that the span you’re linking to is active when you create the link. Creating links with spans that have already ended can cause unexpected behavior in how traces are displayed.
(3) Performance Considerations:
Performance Issue: Linking too many spans can increase the overhead of trace data, leading to performance degradation in high-traffic systems.
Solution: Use links selectively. Only link spans when there is a meaningful relationship that you need to visualize or analyze. For high-traffic environments, you can use OpenTelemetry’s sampling options to reduce the amount of trace data being captured.
Next Steps
In this tutorial, we learnt how to use span links to track interactions within distributed systems
In the next tutorial, we will learn the Best Practices for Using Span Links and Advanced Use Cases