Build a Data Science Query Language in Python using Lark

Build a Data Science Query Language in Python using Lark

Leader posted Originally published at youngtechnologist.hashnode.dev 2 min read

Build a Data Science Query Language in Python using Lark

What if you could write something like this:

DATA [1, 2, 3, 4, 5]
SUM
MEAN
STD


…and have it behave like a mini data science engine?

In this tutorial, we’ll build a **Domain-Specific Language (DSL)** for data analysis using:

- Python   
- Lark (parser library)   
- NumPy   

---

#  What Are We Building?

We are creating a **custom query language** that:

- Accepts a dataset
- Runs statistical commands
- Prints results

---

#  Step 1: Install Dependencies

```bash
pip install lark numpy

Step 2: Define the Grammar

The grammar defines how our language looks.

from lark import Lark, Transformer
import numpy as np

grammar = """
start: data command+

data: "DATA" list

command: "SUM" -> sum
       | "MEAN" -> mean
       | "STD" -> std
       | "MAX" -> max
       | "MIN" -> min

list: "[" NUMBER ("," NUMBER)* "]"

%import common.NUMBER
%import common.WS
%ignore WS
"""

Explanation

start: data command+

  • Program must start with DATA
  • Followed by one or more commands

data: "DATA" list

  • Defines dataset input
  • Example:

    DATA [1, 2, 3]
    

Commands

SUM → sum
MEAN → mean
STD → std
MAX → max
MIN → min
  • These map text → function names
  • -> sum means call sum() in Transformer

List Rule

list: "[" NUMBER ("," NUMBER)* "]"
  • Accepts:

    • [1]
    • [1, 2, 3]
  • (, NUMBER)* means repeat

Ignore Spaces

%ignore WS
  • Allows flexible formatting

⚙️ Step 3: Build the Interpreter

Now we convert parsed text into execution.

class DLangInterpreter(Transformer):

    def data(self, items):
        self.data = np.array([float(x) for x in items[0]])
        return self.data

Explanation

  • items[0] → list of numbers
  • Convert to NumPy array
  • Store in self.data for reuse

Step 4: Add Operations

SUM

def sum(self, _):
    print(np.sum(self.data))

MEAN

def mean(self, _):
    print(np.mean(self.data))

STD

def std(self, _):
    print(np.std(self.data))

MAX

def max(self, _):
    print(np.max(self.data))

MIN

def min(self, _):
    print(np.min(self.data))

Explanation

  • Each function matches grammar rule
  • _ = unused input
  • Uses NumPy for computation
  • Prints result immediately

Step 5: Parse List

def list(self, items):
    return items

Explanation

  • Returns list of numbers
  • Passed to data() method

Step 6: Create the Parser

parser = Lark(grammar, parser="lalr", transformer=DLangInterpreter())

Explanation

  • lalr → fast parsing algorithm
  • transformer → auto-executes logic

Step 7: Read Input File

with open("example.dl") as f:
    code = f.read()

parser.parse(code)

Example example.dl

DATA [10, 20, 30, 40]
SUM
MEAN
MAX

✅ Output

100
25.0
40

How It Works (Flow)

Text Input
   ↓
Parser (Lark)
   ↓
Grammar Rules Match
   ↓
Transformer Methods Trigger
   ↓
NumPy Executes
   ↓
Output Printed

✨ Why This Is Powerful

  • You built a mini programming language
  • Clean separation of:

    • Syntax (grammar)
    • Execution (Transformer)
  • Easily extensible

Next Features You Can Add

1. Filtering

FILTER > 10

2. Sorting

SORT ASC

3. CSV Support

DATA file.csv

4. Chaining

DATA [1,2,3,4]
FILTER > 2
MEAN

Final Thought

This is how real systems like:

  • SQL
  • Pandas query engine
  • Spark

…start at a basic level.

You just built the foundation of a data query engine


If You Liked This

Drop a like ❤️
Follow for more AI + Systems content
And try extending this DSL yourself!

More Posts

Dashboard Operasional Armada Rental Mobil dengan Python + FastAPI

Masbadar - Mar 12

Forecast Kebutuhan Bahan & Produksi Konveksi dengan Python (Praktis + Template)

Masbadar - Mar 8

How I Built a React Portfolio in 7 Days That Landed ₹1.2L in Freelance Work

Dharanidharan - Feb 9

Optimizing the Clinical Interface: Data Management for Efficient Medical Outcomes

Huifer - Jan 26

Bridging the Silence: Why Objective Data Outperforms Subjective Health Reports in Elderly Care

Huifer - Jan 27
chevron_left

Related Jobs

View all jobs →

Commenters (This Week)

5 comments
2 comments
1 comment

Contribute meaningful comments to climb the leaderboard and earn badges!