As Large Language Models (LLMs) become widely used in real-world applications, developers need better ways to monitor, debug, and improve them. That’s where observability tools like LangSmith and Phoenix by Arize AI come in. Understanding and debugging LLM applications, from prompt engineering to model outputs, requires specialized platforms. Two prominent players in this space are LangSmith by LangChain and Phoenix by Arize AI. While both aim to provide observability for LLMs, they cater to slightly different needs and offer distinct features. This article will delve into the purpose, features, installation, and code examples of both tools, helping you determine which is best suited for your specific LLM project.
1. Purpose:
LangSmith: Best suited for developers building with LangChain, as it helps trace, debug, and optimize chains and agents step by step.**. It acts as a centralized hub to visualize, understand, and improve the performance of complex LLM workflows built using the LangChain framework. LangSmith is deeply integrated with LangChain, making it a natural choice for developers already invested in this ecosystem. It emphasizes end-to-end tracing of LangChain components, from prompt templates to final outputs.
Phoenix by Arize AI: Covers a wider range of observability needs — from monitoring model accuracy and performance to detecting bias and data quality issues — for any LLM, not just LangChain.**. While it can be integrated with LangChain, it’s not limited to it. Phoenix provides a more holistic view of your LLM application, focusing on understanding model behavior in production and identifying potential issues related to data drift, prompt quality, and fairness. It aims to provide insights into model performance metrics and data distributions.
2. Features:
While both tools fall under the category of AI observability platforms, their strengths are different:
LangSmith focuses on tracing and debugging LangChain workflows. It includes a prompt playground, run tracing, dataset management, and team collaboration tools. It is tightly integrated into the LangChain ecosystem, requiring little setup.
Phoenix provides a wider set of observability features. It offers performance monitoring (latency, accuracy, cost), data drift detection, embedding visualization, and bias analysis. This makes it especially useful for teams that care about fairness, data quality, and large-scale monitoring across different LLMs.
- Installation:
LangSmith:
pip install langchain
pip install langsmith
After installation, you’ll need to set your LangSmith API key as an environment variable:
export LANGCHAIN_TRACING_V2="true"
export LANGCHAIN_API_KEY="YOUR_LANGSMITH_API_KEY"
export LANGCHAIN_ENDPOINT="https://api.smith.langchain.com"
export LANGCHAIN_PROJECT="your-project-name" # Optional: If you want to use a specific project
Phoenix by Arize AI:
pip install phoenix
pip install phoenix
No API key is strictly required for local development, but you’ll need one for using Arize AI’s cloud platform.
- Code Examples:
LangSmith:
from langchain.llms import OpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
template = """Question: {question}
Answer: Let's think step by step."""
prompt = PromptTemplate(template=template, input_variables=["question"])
llm = OpenAI(temperature=0)
llm_chain = LLMChain(prompt=prompt, llm=llm)
question = "What is capital of France?"
print(llm_chain.run(question))
By setting the environment variables as shown in the installation section, LangSmith will automatically trace the execution of this LangChain chain. You can then view the trace in the LangSmith UI.
Phoenix by Arize AI:
import phoenix as px
import pandas as pd
import numpy as np
# Create some dummy data
data = {
"prompt": ["Translate to French: Hello", "What is the capital of Germany?"],
"response": ["Bonjour", "Berlin"],
"latency": [0.5, 0.7],
"sentiment": [0.8, 0.9],
"embedding": [np.random.rand(128), np.random.rand(128)]
}
df = pd.DataFrame(data)
# Start Phoenix
session = px.launch_app()
# Log the data to Phoenix
px.log(df, session=session, embeddings_column_names={'embedding': 'embedding_column_name'})
This code logs a DataFrame containing prompts, responses, latencies, sentiment scores, and embeddings to Phoenix. You can then use the Phoenix UI to explore this data, visualize the embeddings, and analyze model performance. You can then view the data in the Phoenix UI running in your browser at the URL outputted by px.launch_app().
5. Choosing the Right Tool:
Choose LangSmith if:
- You are heavily invested in the LangChain ecosystem.
- Your primary need is to debug and trace complex LangChain chains and agents.
- You want detailed visibility into the inner workings of your LangChain
workflows.
- Choose Phoenix by Arize AI if: You need broader LLM observability, including model performance monitoring, data quality assessment, and bias detection.
- You are working with various LLM frameworks or custom models. You need tools for identifying data drift and biases in your LLM applications.
- You need to visualize embeddings and understand the semantic space of your data.
Conclusion:
Both LangSmith and Phoenix help you understand and improve LLMs but they serve different purposes. LangSmith is best for developers deep in LangChain workflows, while Phoenix is ideal for teams needing a full picture of LLM performance, fairness, and data quality in production. LangSmith excels at tracing and debugging LangChain chains, while Phoenix offers a broader view of LLM performance and data quality. Understanding their strengths and weaknesses will help you choose the right tool for your specific LLM project and ensure its success in production. Consider evaluating both tools with a small proof-of-concept to determine which best fits your team’s workflow and requirements.