A personal automation that turns Reddit’s daily top posts into a concise, well-formatted email digest so I can stay informed without endless scrolling.
Snapshot
Field
Details
Type
Personal automation / digest pipeline
Context
Personal project
Role
Solo developer
Year
2025
Status
Personal project (scheduled daily runs)
Main focus
Reddit scraping, AI summarization, and reliable email delivery
Overview
Mailius is a Python pipeline I built to solve a simple habit problem: Reddit is one of the fastest places to spot news and trends, but checking it every day takes more time than I wanted to spend. The project fetches top posts from a curated list of subreddits, summarizes each one with a small language model, and sends a single digest email with both HTML and plain-text versions.
I designed it as a modular script I could run locally or on a schedule. A GitHub Actions workflow runs it daily so the digest arrives without manual intervention. It is not a product for other users, but it reflects how I think about small automations: clear boundaries between scraping, intelligence, rendering, and delivery.
The problem
Staying current on Reddit means opening many communities, skimming long threads, and deciding what is worth your attention. That works for occasional browsing, but it is a poor daily routine when you only want a quick sense of what changed.
Manual browsing scales poorly across multiple subreddits.
Top posts vary in length and quality, so titles alone are often misleading.
There is no single, skimmable view of “what mattered today” across the topics I care about.
Who it was for
Me as the primary reader, receiving a daily inbox digest on topics I follow.
Anyone who wants a similar pattern, since the subreddit list and recipients are configuration-driven rather than hard-coded to a single account.
Future me maintaining the pipeline, with separated modules for scraping, summarization, templates, and transport.
My role
I owned the project end to end: Reddit integration with PRAW, OpenAI summarization prompts, Jinja2 email templates, Mailgun delivery, environment-based configuration, and the GitHub Actions schedule with failure notification. I also chose which subreddits to include and how aggressively to filter posts so the digest stays short and readable.
What the project does
Each run collects today’s top posts from configured subreddits, generates short summaries, renders a structured email, and sends it to one or more recipients through Mailgun.
Fetches up to three posts per subreddit that were created on the current UTC day.
Summarizes each post with GPT-4o Mini under strict length and style rules.
Groups content by subreddit with emoji labels for quick scanning.
Adds a daily affirmation fetched from an external JSON API, with a safe fallback string.
Delivers multipart HTML and text email via Mailgun.
Runs on a daily cron in GitHub Actions, with optional failure alerts.
Key features
Curated subreddit digest
The scraper walks a fixed list of communities spanning news, regional topics, and tech interests (for example news, mexico, python, uxdesign). Posts are limited to the day’s top results so the email reflects what is actually trending now, not an arbitrary historical mix.
Same-day post filtering
I filter by created_utc so only posts from the current UTC date appear, even when Reddit’s “top today” query returns a wider candidate set. That keeps the digest focused and avoids stale threads slipping in.
python
for post in subreddit.top(time_filter=config["time_filter"], limit=config["posts_per_subreddit"] * 3):
post_date = datetime.fromtimestamp(post.created_utc, tz=timezone.utc).date()
if post_date == today:
posts_data.append(format_post(post, config["default_fields"]))
if len(posts_data) >= config["posts_per_subreddit"]:
break
I over-fetch then trim because Reddit’s API ordering does not guarantee three same-day hits on the first page; this pattern trades a few extra API reads for predictable digest size.
Reporter-style AI summaries
Summaries are capped at roughly forty words, written in a direct newsroom tone, and instructed to prefer post body over title when both exist. Temperature stays low so output stays stable day to day.
I chose a small model because this runs every day on many posts; cost and latency matter more than creative flair.
Dual-format email rendering
HTML and plain-text bodies share the same data model but separate Jinja2 templates. Subreddit emoji mapping lives in one place and is exposed to templates as a global, which keeps presentation tweaks out of Python logic.
Scheduled, credential-safe execution
GitHub Actions injects secrets at runtime (Reddit, OpenAI, Mailgun, recipients). The workflow uses concurrency control so overlapping runs do not stack, and a failure step can send a minimal alert email when Mailgun credentials are present.
Technical approach
The architecture is a straight pipeline with four layers: configuration, services, emailing, and orchestration in main.py.
Scraping uses PRAW with client credentials from the environment. Summarization calls the OpenAI Chat Completions API once per post. Rendering uses Jinja2 with auto-escaping for HTML safety. Transport is a thin MailgunTransport class that posts multipart messages over HTTP with basic auth.
I kept Mailgun behind a small wrapper so swapping providers later would not touch template or summarization code.
Configuration is entirely environment-driven via python-dotenv locally and GitHub Secrets in CI. There is no database; each run is stateless apart from optional artifact upload hooks in the workflow. The project does not include automated tests, which is acceptable for a personal script but would be the first thing I would add if the pipeline grew.
Design decisions
The digest is meant to be read on a phone in under a minute, so I optimized for scanability over completeness.
Subreddit sections with emoji headers so readers can jump to the topics they care about.
Inline CSS in the HTML template for predictable rendering in common mail clients.
Separate text template for clients that strip HTML or for plain inbox previews.
One affirmation per run, fetched once and shared across HTML and text, to add a human touch without cluttering each post.
Per-recipient sends rather than a single BCC batch, which simplifies Mailgun logging and future per-user customization.
Challenges and tradeoffs
Reddit API quirks: “Top today” does not always yield three same-day posts; over-fetching and date filtering adds complexity but stabilizes output.
API cost and rate limits: Summarizing every post on every run scales linearly with subreddit count; I mitigated this with a small model and a low post cap.
Email client inconsistency: Rich HTML still needs defensive, simple layout choices; fancy components were intentionally avoided.
Secret management across environments: Local .env and GitHub Secrets must stay in sync manually; there is no centralized config service.
No deduplication across days: If a post stays on top, it could reappear; I accepted that for v1 simplicity.
What I learned
Building Mailius reminded me that useful automations are less about framework choice and more about crisp stage boundaries. When scraping, summarization, and delivery are isolated, I could change the model or email provider without rewiring the whole flow. I also learned to treat LLM prompts like product requirements: explicit length, tone, and “output only the summary” rules matter as much as the API call itself.
How to structure a small Python project for scheduled CI execution with secrets.
How to write summarization prompts that stay factual and skimmable under token limits.
How separating Jinja2 templates from transport code keeps email experiments cheap.
Current status
This is an active personal utility I run on a daily schedule. It is not packaged as a public SaaS and does not have multi-tenant auth or a settings UI. I keep it in my portfolio because it shows pragmatic integration work across external APIs, templated email, and CI automation rather than a tutorial-scale script.
If I revisited this today
Add unit tests around date filtering, summarization input shaping, and template rendering with fixture data.
Persist the last sent post IDs to avoid duplicate summaries across consecutive days.
Introduce a configuration file or CLI for subreddit lists instead of editing Python constants.