Tried Scraping, AI Agents, and APIs… None Worked. So I Built a Chrome Extension.

Question

Tried Scraping, AI Agents, and APIs… None Worked. So I Built a Chrome Extension.

Knihal

calendar_todayApr 9 • schedule3 min read

I Disappeared for 15 Days… Because I Was Building This Chrome Extension

For the past couple of weeks, I completely stopped posting about LeadIt.

Not because I quit.
Not because the project died.

But because I hit one of the most frustrating problems I’ve faced while building a product.

Getting good lead data.

The Problem I Hit While Building LeadIt

LeadIt’s UI and UX were already finished.

But the real challenge wasn’t the product.

It was the data.

I needed things like:

company name
founder name
founder LinkedIn
company website
company size
location
YC batch

And the biggest pain?

Founder emails are hidden almost everywhere.

Platforms like Y Combinator don’t expose them easily.
So I decided to ignore emails for now and add them later.

But then another problem appeared.

Manual Data Collection Was Killing Me

To build the LeadIt database, I started collecting data manually from Y Combinator companies.

Open company page → copy data → paste → move to next.

Simple, right?

Wrong.

Even a single lead took around 5 minutes.

After doing this repeatedly, I started hating the process.

It was slow.
It was boring.
And honestly… it felt stupid.

There had to be a better way.

My First Attempt: Scraping

So I tried Playwright.

The idea was simple:

Automate the browser → scrape YC pages → collect the data.

But there was one big problem.

Platforms don’t like scrapers.

Pages started blocking the requests.

So scraping wasn’t going to scale.

My Second Attempt: AI Agents

Then I tried something more experimental.

I built an AI agent using LangChain and used Serper API as a tool for searching data.

The plan was:

AI agent finds companies → extracts founder info → structures the data.

In theory it sounded amazing.

In reality?

The data was terrible.

Instead of startup founders, it returned things like:

random big tech companies
irrelevant sites
incomplete information

The data quality was so bad that it was unusable.

So I stopped everything.

I Took a Break for Two Days

For two days, I didn’t touch the project.

I just kept thinking:

“How do tools actually collect data from websites without scraping or APIs?”

Because most platforms don’t give you APIs.

LinkedIn?
You need partnerships.

Apollo?
Paid access.

Hunter?
Paid.

Everything was locked.

Then suddenly the idea hit me.

The Idea: What If the Browser Reads the Page?

Instead of scraping servers…

What if the browser itself reads the page?

Right when the user opens it.

That’s when the idea of building a Chrome extension came.

The extension would:

read the page DOM
extract the data
structure it instantly

No scraping servers.
No blocked requests.

Just reading the page already loaded in the browser.

So I started building it.

Another Problem: Every Page Has Different CSS

When I started building the extension, I hit another challenge.

Each page had different CSS selectors.

So relying on simple selectors like .class-name wouldn’t work.

At first it looked messy.

But after spending a full day analyzing the YC page DOM, I discovered something interesting.

Even though CSS classes were different…

The underlying DOM structure patterns were consistent.

So instead of targeting CSS classes, I started targeting DOM structure patterns.

And suddenly…

It worked.

Introducing: Kallector

That’s how Kallector was born.

Kallector is a Chrome extension that reads startup data directly from the page DOM.

No scraping servers.
No API access needed.

Just open the page.

Kallector reads it.

And structures the data instantly.

Current Features

Right now Kallector can collect data from Y Combinator company pages, including:

Company name
Company website
Founder name
Founder LinkedIn
Company size
Location
YC batch

The extension reads the page DOM and shows the structured data inside its panel.

What Kallector Solves

Kallector turns 5 minutes of manual work into seconds.

Instead of copying data manually across multiple tabs…

You open the page and the data is already structured.

This is especially useful for:

startup research
building lead databases
founder outreach
sales prospecting

What’s Next

Right now Kallector only supports Y Combinator company pages.

But the plan is to expand to:

LinkedIn profiles
startup directories
other founder databases

Originally I built Kallector only to power the LeadIt database.

But now I’m starting to see it as something bigger.

It might become a separate SaaS product on its own.

If you’ve ever tried collecting startup data manually…

You know exactly why I built this.

And this is just the beginning.

7 Comments

🔥 Join developers growing publicly

Share your knowledge, build in public, and grow your developer presence with a global community.

Join CoderLegion

chevron_left

kumar nihal

1.8k Points • 33 Badges

Jharkhand, India • linkedin.com/in/kumar-nihal-260b7a351

5Posts

16Comments

6Connections

Indie developer building practical AI-powered SaaS products and sharing the journey in public.

Curr... Show more

Commenters (This Week)

Contribute meaningful comments to climb the leaderboard and earn badges!

Sergey C Kryukov · Answer 1 · 2026-04-09T08:19:36+0000

Sergey C Kryukov • Apr 9

Chrome extension makes total sense for quick data extraction. What was the nastiest scraping blocker you ran into?

Knihal • Apr 9

@[Sergey C Kryukov] The biggest blocker I ran into was platforms detecting automated scraping — rate limits, blocked requests, and sometimes even dynamic content that Playwright couldn’t reliably extract.

That’s actually what pushed me toward the Chrome extension approach. Since the page is already loaded in the user’s browser, Kallector just reads the DOM directly instead of scraping from a server.

Gavin Cettolo · Answer 2 · 2026-04-09T13:36:43+0000

Gavin Cettolo • Apr 9

This resonates a lot—especially the part about “nothing worked” before building your own thing.

What stood out to me is that you didn’t just switch tools, you changed the abstraction. Most scraping tools (AI or not) still think in terms of selectors, flows, or brittle rules. But what people actually want is: “give me the data I mean,” not “tell me how to click the DOM.”

That gap is exactly where things usually break.

I’ve seen the same pattern others mention too: the hard part isn’t extraction logic anymore, it’s reliability. Once you move beyond demos, you start hitting all the messy realities—JS timing, layout shifts, auth, anti-bot, etc.

Your Chrome extension approach is interesting because it shifts the problem closer to where context actually exists—the browser. That aligns with a broader trend: treating the browser less like a script target and more like an execution environment.

Also appreciate that you built something instead of over-optimizing the stack. A lot of people get stuck trying 10 tools instead of validating what actually works for their use case.

Curious how you’re thinking about durability over time:

Do you expect the extraction logic to adapt automatically as pages change?
Or is the goal more “fast iteration + break visibly” rather than “never break”?

Either way, this feels like a more honest direction than pretending current AI agents can reliably scrape anything out of the box.

Knihal • Apr 9

@[Gavin Cettolo] Really appreciate this perspective — you actually articulated the abstraction shift better than I did in the post.

You’re right that most scraping tools still operate around selectors and brittle flows. What I realized while building this is that the real challenge isn’t extraction logic anymore, it’s reliability once things move beyond demos.

Right now Kallector is intentionally simple. It reads consistent DOM patterns instead of relying purely on CSS selectors, which reduces breakage, but I’m not assuming it will never break. My current approach is more fast iteration + visible breakage so the extraction logic can be adjusted quickly if pages change.

Longer term I’m thinking about adding a small abstraction layer so it can adapt across similar page structures. But for now the goal is validating that the browser-native approach actually works in practice.

Also curious about your perspective — do you think a browser-native extraction tool like this could realistically evolve into a useful SaaS product? If yes, what direction would you prioritize expanding first?

Gavin Cettolo • Apr 9

@[Knihal]
This is a great direction, and yes, I do think this can realistically evolve into a solid SaaS. But only if it leans into what makes it different, instead of competing head-on with traditional scrapers.

To me, the key insight is this: you’re not building a “better scraper,” you’re building a browser-native data extraction layer.

That opens up a few interesting directions

1. Reliability as a product (not a feature)
Most tools sell “we can scrape anything.” In reality, users care about: “will this still work tomorrow?”
Your current approach, fast iteration + visible breakage, is actually honest and powerful. If you wrap that with:

versioning of extractors
change detection (DOM drift alerts)
quick re-training / fixing flows

…you’re already solving a much bigger pain than extraction itself.

2. Human-in-the-loop by design
Fully autonomous scraping is still fragile. But assisted extraction? That’s viable today.
Think:

user highlights data once
system generalizes pattern
user validates / corrects when it breaks

That feedback loop could become your moat.

3. “Context-first” extraction
Being in the browser is a huge advantage:

authenticated sessions
rendered JS state
user intent (what they’re looking at)

APIs and headless tools constantly fight to recreate that. You already have it.

4. Narrow before broad
If I had to prioritize expansion, I wouldn’t go horizontal (“scrape anything”). I’d go vertical first:

lead generation (LinkedIn, directories, marketplaces)
e-commerce monitoring (prices, competitors)
internal tools (ops teams extracting from dashboards)

Pick one where:

data is semi-structured
breakage is painful
users are willing to pay for reliability

5. Positioning matters a lot
If you market this as:
→ “AI scraper” → crowded, low trust
→ “no-code scraping tool” → commoditized

But if you position it as:
→ “extract structured data from any page you can see, reliably”

…that’s much clearer and closer to the real value.

If I had to summarize:
The opportunity isn’t in making scraping smarter, it’s in making it usable and dependable in the real world.

Have you seen any specific use cases where people kept coming back to use Kallector? That might be your wedge.

Knihal • Apr 9

@[Gavin Cettolo] That’s a great question. Since Kallector is still very early, I don’t have real usage data yet.

The main use case I’m building it for right now is founder lead collection from startup directories (starting with YC) while building outreach lists for LeadIt. My hypothesis is that people doing founder outreach, sales prospecting, or startup research might keep coming back to it for that workflow.

Still validating that, though — so I’m curious to see where it naturally ends up being most useful.

Gavin Cettolo • Apr 9

Thank you @[Knihal], keep us updated :)

	How I Built a React Portfolio in 7 Days That Landed ₹1.2L in Freelance Work Dharanidharan - Feb 9
	I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt Karol Modelskiverified - Mar 19
	Everyone says DeepSeek is cheaper, but I got tired of guessing the exact math. So I built a calculat abarth23 - Apr 27
	Why Email-Only Contact Forms Are Failing in 2026 (And What Developers Should Do Instead) JayCode - Mar 2
	TypeScript Complexity Has Finally Reached the Point of Total Absurdity Karol Modelskiverified - Apr 23

Tried Scraping, AI Agents, and APIs… None Worked. So I Built a Chrome Extension.

I Disappeared for 15 Days… Because I Was Building This Chrome Extension

The Problem I Hit While Building LeadIt

Manual Data Collection Was Killing Me

My First Attempt: Scraping

My Second Attempt: AI Agents

I Took a Break for Two Days

The Idea: What If the Browser Reads the Page?

Another Problem: Every Page Has Different CSS

Introducing: Kallector

Current Features

What Kallector Solves

What’s Next

7 Comments

Please log in to add a comment.

Please log in to add a comment.

Please log in to comment on this post.

More Posts

How I Built a React Portfolio in 7 Days That Landed ₹1.2L in Freelance Work

I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt

Everyone says DeepSeek is cheaper, but I got tired of guessing the exact math. So I built a calculat

Why Email-Only Contact Forms Are Failing in 2026 (And What Developers Should Do Instead)

TypeScript Complexity Has Finally Reached the Point of Total Absurdity

More From Knihal

Is it Me or the AI? How I almost lost my ability to think.

I kept losing my best ideas mid-thought. So I built the fix.

The Day Google OAuth Taught Me an MVP Lesson

Related Jobs

Commenters (This Week)

Welcome to Coder Legion

Connect with 4,659 amazing developers

Don't have an account? Sign up

OR

Tried Scraping, AI Agents, and APIs… None Worked. So I Built a Chrome Extension.

I Disappeared for 15 Days… Because I Was Building This Chrome Extension

The Problem I Hit While Building LeadIt

Manual Data Collection Was Killing Me

My First Attempt: Scraping

My Second Attempt: AI Agents

I Took a Break for Two Days

The Idea: What If the Browser Reads the Page?

Another Problem: Every Page Has Different CSS

Introducing: Kallector

Current Features

What Kallector Solves

What’s Next

7 Comments

Please log in to add a comment.

Please log in to add a comment.

Please log in to comment on this post.

More Posts

More From Knihal

Related Jobs

Commenters (This Week)