how the blog was built, python edition

posted 11 Nov 20 tagged: python

There are many excellent blog engines out there, but to customise any of them takes so much understanding of how they work, the template and theme engines they use, that its easier to just use them exactly as is with an existing theme.

I wanted my own custom static blog, which played well with jupyter notebooks and markdown files, as well as a reason to do some python coding, so here goes yet another python blogging engine.

This post documents the process of building this blog. The main goal is to put markdown and jupyter notebooks in a folder, and build a static site which gets autoupdated on github pages or netlify. Just like hugo, jekyll, gatsby and so many others!

blog engine

Its straightforward to read a set of markdown posts and convert to python. I am using python to read the posts with python-markdown to parse them into html, complete with inline syntax highlighting.

Key tools used:

write: markdown docs using any editor and jupyter notebooks having yaml front matter.
- obsidian to edit markdown
- vs code for jupyter notebooks. Jupyter lab is ok in a pinch but it causes me more problems than not. My fav cloud alternative is Deepnote.
make the blog:
- nbconvert to parse jupyter to markdown.
- tried nbdev but had too many problems, though it has a lot more blog friendly features.
- python to read all the markdown files using python markdown and yaml.
- finally, writing html pages for index, tags and posts using mako for python friendly templates.
search: fusejs to make a in browser search engine
build site: every time I commit to my blog repo, a github action is triggered which rebuilds the sites and saves the output to a public folder.

hosting: the site is hosted on the gh-pages of my blog repository, which github pages auto republishes. I am using a github action to deploy output files from the public folder to gh-pages on every commit to the main branch.
local sever: running the script with --serve flag starts a local python server.

Below are notes for the specifics used.

parsing markdown

Python markdown

Start a server from cli:

python -m http.server

notebooks to markdown

I started with nbdev to convert notebooks to markdown, but it slowed down rebuilding the blog a lot, and its pretty complex. So in the end I’ve stuck with nbconvert. Some useful tips:

specify templates

I need to customize nbconvert so it implements some of the features from fastpages, namely:

renders output differently
collapses code cells if #hide is at the top

code highlight

python markdown has pygments built in, which has a bunch of styles. To generate the css:

pygmentize -S default -f html -a .codehilite > codestyles.css

But on second thought decided to can this and go with highlightjs for now as it speeds up builds and keeps the html clean (at the cost of loading more javascript).

One thing to investigate is how to make the output cells of jupyter notebooks blend into the main website. This seems to require some css trickery.

html

I’m no longer familiar with html, even though I build my first weblog on geocities way back in 1997/8. So we have now reached html5.

css

CSS is hard. So I want a simple to use framework, ended up looking at:

https://tachyons.io/
https://tailwindcss - at first sight it looked horrible, with style mixed in with html, but once I thought about it some more, its beautiful. Everything is there visible in the one file and I hate css files anyways. So leaning towards using this, the only downside being is that you need npm to generate the final production tailwind css file. More to follow once I actually implement it…
Pure.css
water.css
milligram
newcss - awesome, simple, super easy to use - basically just write html and it it makes it look nice and clean. Best for simple things like this blog. Only reason to switch to a more complex css file is cause even simple posts like this need small text and slide-outs for post meta-data like tags and date info etc.
lit
concrete.css - very minimal
https://csslayout.io/ - examples of using css directly

todo: decide on one.

Things to implement using css:

a floating toc, like so many websites have it these days. I like tocs.

Search

Search. I want search. This is pretty straightforward, we need a list of content and some javascript to do the searching. Jupyter notebooks mess this up as the converted markdown files have a bunch of js and other cruft.

To just search post titles is pretty easy, from direct js to algolias autocomplete library.

Ideally I want to search across all the content is well, which takes some thinking as the output of jupyter notebooks can be huge, with all kinds of js embedded.

Some things to look at:

fusejs - blog post implementing this in hugo - used this first. It works and is pretty straightforward but has no python integration and I would like better examples as a js newbie.
minisearch
lunr.js as well as lunr.py to pre generate the index.

So step one is to build a search index - which my script does as a json file containing all the post attributes I want searched.

Github actions

Github actions add superpowers to a repo - they can be set to be triggered at a time interval or on every code push to a branch. To make a github action: Save a github approved formatted yaml file to .github/workflows folder and it should run on every push. For this blog my actions:

copys the contents of the repo to the github runner
sets up python - python versions available on github actions
install dependencies as defined in requirements.txt

Misc stuff

embed media

The oEmbed specifies how companies like youtube, twitter provide information about their content.

Calling youtube like so: https://www.youtube.com/oembed?url=https://youtu.be/48A-7GBxZco returns a json like:

{
  "title": "Anthony Hyman Memorial Lecture 2022: The Refugee Crisis and Afghanistan",
  "author_name": "SOAS University of London",
  "author_url": "https://www.youtube.com/c/SoasAcUk",
  "type": "video",
  "height": 113,
  "width": 200,
  "version": "1.0",
  "provider_name": "YouTube",
  "provider_url": "https://www.youtube.com/",
  "thumbnail_height": 360,
  "thumbnail_width": 480,
  "thumbnail_url": "https://i.ytimg.com/vi/48A-7GBxZco/hqdefault.jpg",
  "html": "\u003ciframe width=\u0022200\u0022 height=\u0022113\u0022 src=\u0022https://www.youtube.com/embed/48A-7GBxZco?feature=oembed\u0022 frameborder=\u00220\u0022 allow=\u0022accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\u0022 allowfullscreen\u003e\u003c/iframe\u003e"
}

The html key is the html code to embed the video.

Twitter also has an oembed api, called like so:

curl --request GET --url 'https://publish.twitter.com/oembed?url=https%3A%2F%2Ftwitter.com%2FInterior%2Fstatus%2F507185938620219395'

and this too returns a json, with the html key being the embed code.

Sadly all the python wrappers are old and abandoned, so I need a tiny wrapper to do this myself.

emoji

Some unicode fonts have emojis built in, so ways to enable emoji is:

use a unicode font and investigate python-markdown emoji parser to deal with them properly
use a script to replace :smile: with emoji incons - nope, don’t want images scattered all over the shop.

Emoji test:

smiley face test copy pasted from some website: 😅
smiley face written using markdown code :smile: :smile:
- dang, this works in the markdown editor but not in the browser

twitter embed

This should show a twitter tweet embedded inside the page:

Sunsets don't get much better than this one over @GrandTetonNPS. #nature #sunset pic.twitter.com/YuKy2rcjyU
— US Department of the Interior (@Interior) May 5, 2014

testing syntax highlight

from mako.template import Template
from mako.lookup import TemplateLookup
lookup = TemplateLookup(directories=["templates"])

# make the big picture templates
for tmpl in ["index.html"]:
    template = lookup.get_template(tmpl)
    html = template.render(posts=posts).strip()
    path = path_publish / tmpl
    path.write_text(html)
    print(f"wrote {tmpl} to {path}")

# write all the posts
template = lookup.get_template("post.html")
for post in posts:
    html = template.render(post=post).strip()
    path = path_publish / f"{post.slug}.html"
    path.write_text(html)
    print(f"wrote {post.slug} to {path}")