5 Months Later: What Actually Happened (SRE/DevOps Self-Taught)

posted 5 min read

5 Months Later: What Actually Happened (SRE/DevOps Self-Taught)

Five months ago I wrote a post called "My First Steps in a Tough World". I was one month into this journey, I had just built a basic monitoring script, and I was equal parts excited and terrified.

I want to tell you what happened next. Not the polished version. The real one.


Where I was then

When I wrote that first post, my biggest achievement was a Python script that checked CPU and memory usage and sent a notification. I called it a "production-grade monitoring system" because I genuinely didn't know what production-grade meant yet.

I had WSL2, some basic Linux commands, and a lot of determination. That was it.


What the last 5 months actually looked like

I'm not going to pretend it was a clean upward line. There were weeks where I felt like I was drowning in concepts I didn't understand. There were moments where I questioned whether any of this was going anywhere.

But I kept building. And somewhere in between the confusion and the commits, things started to click.

Here's what I actually shipped:


Project 1: SRE Expense Tracker — from local to production on AWS

This started as a FastAPI + PostgreSQL application running on my local machine with Docker Compose. By the end, it was a production deployment on AWS EC2 with:

  • CI/CD pipeline via GitHub Actions + AWS SSM (no open SSH ports)
  • Database migrations with Alembic, applied cleanly in production
  • Automated S3 backups of PostgreSQL via a cron job inside the container
  • CloudWatch log streaming using the Docker awslogs driver
  • Prometheus metrics exposed and scraped
  • API key authentication on all endpoints
  • Elastic IP, IAM roles with least privilege, security groups locked down

I hit real production problems. At one point my FastAPI request-level logs weren't appearing in CloudWatch even though container startup logs were coming through fine — a uvicorn stdout routing issue I had to dig into and debug against a live system.

That kind of problem doesn't exist in tutorials. You only find it when something is actually deployed and you're staring at empty log streams at midnight wondering what's wrong.


Project 2: DevSecOps Security Pipeline — security as an independent layer

After getting the Expense Tracker into production, I realized I had real credentials in a live environment. A Docker image deployed on AWS. Python dependencies with specific versions. Code that could have insecure patterns. And no systematic way to know about any of it.

So I built a separate repository that implements a continuous security pipeline targeting the Expense Tracker as its subject.

The pipeline runs four independent layers in sequence:

  1. Gitleaks — scans the full Git history for exposed secrets
  2. Bandit — static analysis of the Python source code for insecure patterns
  3. Trivy — scans the Docker image on Docker Hub for CVEs
  4. pip-audit — scans declared Python dependencies for known vulnerabilities

Each job blocks the next. If Gitleaks finds a secret, nothing else runs.

The pipeline runs on push, on pull requests, and on a nightly schedule at 2AM UTC. That last part matters more than it sounds. Push triggers only catch what you broke. The nightly schedule catches new CVEs published in vulnerability databases even when your code hasn't changed. Security is proactive, not just reactive.

It found real things.

Trivy detected three CVEs in pip 25.0.1 inside the Docker image — with available fixes. I patched it. pip-audit found HIGH and MEDIUM severity CVEs in setuptools and pytest. I patched those too. Seven HIGH CVEs in the Debian base image had no available fix at the time — I documented them with a formal risk acceptance document covering attack surface analysis and review criteria. Then the nightly schedule detected newly published fixes for some of them weeks later. I patched within hours.

Total infrastructure cost: €0. GitHub Actions ephemeral runners on a public repo.

The architectural decision I'm most proud of: I kept the security pipeline in a separate repository from the application. It would have been simpler to put it inside the Expense Tracker. But that would make security look like a feature of the application. It isn't. It's an independent operational concern. Making that separation explicit was a deliberate engineering decision, and I documented why in an Architecture Decision Record.


Open Source Contributions: two merged PRs

This is the part that still feels surreal.

I submitted a pull request to bluewave-labs/Checkmate, an open source monitoring platform. The fix addressed a Docker non-root container permission issue. The maintainers gave feedback, I iterated, all CI checks passed. It's merged.

Then I submitted a PR to henrygd/beszel — a server monitoring hub with 22,000 stars on GitHub.

The issue: the hub image is distroless. No shell, no wget, no curl. Standard Docker healthcheck patterns fail silently because there's nothing to run them with. But the binary already exposes a health subcommand that performs an HTTP self-check. I added a HEALTHCHECK directive to the Docker Compose files using the binary itself:

healthcheck:
  test: ["/beszel", "health", "--url", "http://127.0.0.1:8090"]
  interval: 30s
  timeout: 10s
  retries: 3

Without this, containers running the distroless hub show as "running" indefinitely even when wedged — no orchestrator liveness signal, no automatic restart trigger. I also noticed the same file was included in a second Compose configuration and added the healthcheck there too, without being asked.

It was merged today.


AWS Certified Cloud Practitioner

I passed the AWS CCP exam. It unlocks a 50% discount voucher for the Solutions Architect Associate, which I'm currently studying for. Stephane Maarek's course and Tutorials Dojo practice exams if you're on the same path.

The certification matters less than what it represents in terms of learning sequence. You can't meaningfully study IAM, VPCs, security groups, and EC2 instance profiles in the abstract. Doing it while having a real application deployed in AWS makes every concept land differently.


What I actually learned

Not a skills list. The real things.

Production is a different planet. Everything works in development. Production has IAM roles that are slightly wrong, log drivers that need specific permissions, database connections that time out in ways that never happened locally. The only way to learn production is to deploy something to production and watch it break.

Documentation is engineering. I write Architecture Decision Records for every significant choice I make. Not because someone told me to — because six weeks later I genuinely couldn't remember why I made a decision, and having the reasoning written down is the difference between understanding your own system and being confused by it.

Security is not a checklist. It's an automated layer that runs independently, finds real problems, and forces you to make documented decisions about risk. The moment you have something deployed with real credentials, this stops being theoretical.

Open source is learnable. I was intimidated by the idea of contributing to real projects. The first PR felt impossible. The second one I found by using a tool, hitting a problem, tracing it to the root cause, and fixing it. That's just debugging. It's the same skill.


Where I'm going

AWS Solutions Architect Associate is next. Then more open source contributions. The Expense Tracker deployment still has open problems I want to solve. And somewhere in the longer arc, MLOps — which is where I wanted to be from the beginning.

Five months ago I was scared and excited in roughly equal measure. Today I'm still both. But the fear is smaller and the excitement is backed by things I actually built and shipped.

That's the only honest way I know how to measure progress.


I'm Iriome Santana, a self-taught SRE/DevOps learner based in Gran Canaria, Spain. You can find my projects on GitHub and connect with me on LinkedIn.

More Posts

What Is an Availability Zone Explained Simply

Ijay - Feb 12

Why most people quit AWS

Ijay - Feb 3

AWS Account Locked! How One IAM Mistake Cost Me

Ijay - Mar 18

10 Proven Ways to Cut Your AWS Bill

rogo032 - Jan 16

Entry-Level Careers You Can Start After Learning AWS

Ijay - Feb 10
chevron_left

Related Jobs

View all jobs →

Commenters (This Week)

4 comments
3 comments
1 comment

Contribute meaningful comments to climb the leaderboard and earn badges!