How I Mapped Brain Cell Changes in Alzheimer's Disease Using Single-Cell RNA Sequencing

How I Mapped Brain Cell Changes in Alzheimer's Disease Using Single-Cell RNA Sequencing

calendar_today agoschedule3 min read
— Originally published at dev.to

Alzheimer's disease affects over 55 million people worldwide, yet the precise molecular changes happening inside individual brain cells remain poorly understood. I wanted to dig into that question - not at the tissue level, but at single-cell resolution.

So I built a full scRNA-seq analysis pipeline in Python using Scanpy, working with a publicly available dataset of 63,608 nuclei from human prefrontal cortex tissue (sourced from CZ CELLxGENE). The donors spanned three Braak stages: 0 (cognitively normal), 2 (early Alzheimer's), and 6 (severe Alzheimer's).

Here's what I found and how I found it.


The Dataset

The data came from a study on the molecular characterisation of selectively vulnerable neurons in AD. It covers the superior frontal gyrus, a prefrontal region known to be hit hard by neurodegeneration - and includes seven major brain cell types:

  • Glutamatergic neurons
  • GABAergic neurons
  • Oligodendrocytes
  • OPCs (oligodendrocyte precursor cells)
  • Astrocytes
  • Microglia
  • Endothelial cells

31,997 genes. 63,608 cells. Three disease stages. A lot to work with.


The Pipeline

1. Quality Control

No dataset is clean out of the box. I filtered cells to keep only those with between 200 and 6,000 detected genes, and excluded anything with more than 20% mitochondrial gene content (high mitochondrial reads usually signal a dying or damaged cell). This removed around 2,809 low-quality cells.

2. Normalisation

Library sizes were normalised to 10,000 counts per cell, followed by log1p transformation, standard practice that makes cells comparable regardless of how deeply they were sequenced. I then identified 5,607 highly variable genes to focus the downstream analysis.

3. Dimensionality Reduction

PCA (50 components) → neighbourhood graph (10 neighbours, 20 PCs) → UMAP embedding.

The UMAP is where the biology starts to become visible. All seven cell types separated into distinct clusters, with clear separation between neuronal subtypes and glial populations.

4. Differential Expression

For the microglial analysis, I used a Wilcoxon rank-sum test comparing AD vs normal microglia, with Benjamini-Hochberg multiple testing correction to control the false discovery rate.


The Findings

Glutamatergic Neurons Are Selectively Depleted

One of the most striking results: glutamatergic (excitatory) neurons dropped from ~34% of cells in normal tissue to ~30% in AD tissue. This might sound like a small shift, but at the scale of 60,000+ cells it's biologically meaningful and it's consistent with what the literature already tells us about the selective vulnerability of excitatory neurons in AD.

Alzheimer's Leaves a Clear Signature in Microglia

Microglia are the brain's resident immune cells, and they showed the most dramatic transcriptomic shifts between AD and normal tissue. The differential expression analysis revealed:

Upregulated in AD microglia:

  • MALAT1 - a long non-coding RNA strongly linked to neuroinflammation
  • FTH1 - ferritin heavy chain, pointing to iron dysregulation
  • B2M - beta-2 microglobulin, a known AD biomarker reflecting immune activation
  • FOXP1 - a transcription factor tied to microglial activation states

Downregulated in AD microglia:

  • MT-CO3, MT-CO1, MT-ATP6, MT-ND2 - mitochondrial complex genes, suggesting impaired energy metabolism in AD-affected microglia

This pattern is consistent with what's described as disease-associated microglia (DAM) in the literature, a distinct activation state that emerges in neurodegeneration.

Disease Progression Captured Across Braak Stages

Cells from all three Braak stages were distributed across every cluster in the UMAP. This reflects that AD-associated transcriptomic changes are not confined to one cell type, they propagate across the whole cellular ecosystem as the disease progresses.


What I Learned

  • Memory management matters. 60K+ cells × 30K+ genes is a big matrix. Working with sparse AnnData objects and being deliberate about which steps you checkpoint to disk makes a real difference.
  • Cell type annotation is an art. The dataset came with pre-annotated cell types, but validating them against canonical marker genes (the dotplot step) is essential and satisfying when the biology confirms itself.
  • Volcano plots are still one of the most readable ways to communicate differential expression. They give you significance and fold change in one glance.

The Code

Everything is in a fully annotated Jupyter Notebook. If you want to reproduce the analysis, download the H5AD file from CZ CELLxGENE and drop it in the data/ folder. https://github.com/Farhan89082/alzheimers-scrna-analysis


If you're working with single-cell data or have questions about the pipeline, I'd love to hear from you in the comments. There's something fascinating about watching biology emerge from a matrix of gene counts.

🔥 Join developers growing publicly
Share your knowledge, build in public, and grow your developer presence with a global community.

More Posts

How I Built a React Portfolio in 7 Days That Landed ₹1.2L in Freelance Work

Dharanidharan - Feb 9

Dashboard Operasional Armada Rental Mobil dengan Python + FastAPI

Masbadar - Mar 12

I Wrote a Script to Fix Audible's Unreadable PDF Filenames

snapsynapseverified - Apr 20

How I Used Python to Analyse 40,000 Human Gut Cells and Uncover What Makes Crohn's Disease Different

FarhansBioAI - Jul 4

How I built a computational AMP screening pipeline: from 24,000 sequences to 47 drug candidates

FarhansBioAI - Jul 4
chevron_left
5Posts
0Comments
Bioinformatician working on single-cell RNA-seq analysis of human disease. Python + Scanpy. Interested in neurodegeneration and cancer immunology.

Related Jobs

View all jobs →

Commenters (This Week)

13 comments
1 comment
1 comment

Contribute meaningful comments to climb the leaderboard and earn badges!