Getting my feed under the table

For my difficult second post I've decided to add a feature to my blog, straight out of the good ol' days of blogging: an RSS feed.

❤️ ❤️ ❤️ I love feeds! ❤️ ❤️ ❤️

I was an avid Google Reader user and, since Google put a bullet in Reader's head, I've been a paid user of Feedbin. I use Reeder on my phone and one of these days I'll get around to installing NewNewsWire.

For reasons I'll hopefully explain in a future post, I've decided to make this blog by hand and to do my best to make it straight-up standards-compliant, so the feed will need to work this way too (I won't have a wordpress plug-in doing it for me).

As well as this, I want to make sure that if I add a feed it'll actually work in feed readers properly. Finally, I want it to include the whole post including all semantic mark-up for the best end-user reading experience possible.

So, my requirements boil down to a feed that:

  1. can be updated by hand without too much pain
  2. is valid and standards compliant
  3. works
  4. contains the whole blog post, including mark-up.

RSS 1.0, RSS 2.0, Atom or JSON feed?

Because I'm doing this by hand, I don't want to have to maintain several different outputs — I need to pick a single format.

A bit of research yields three XML based formats, plus the more recent JSON feed.

Unfortunately most of the advice is about a decade old so it needs to be taken with a grain of salt, but at least we're dealing with a settled technology I guess?

The first choice is between using Atom or one of the two versions of RSS. In terms of support they seem to be equal these days but only Atom is an official standard (it's "maintained" by the Internet Engineering Task Force (IETF), though there haven't been updates in years). Additionally, Atom's structure is a bit clearer which will help with authoring by hand.

Next I considered Atom vs JSON Feed. JSON is the data format of choice these days, and it seems to be supported by feed readers, but for now I'm going to stick with Atom, largely due to the reasons above.

Creating the feed

With the format decided, I went about creating my feed in a new file call imaginatively atom.xml.

I found the Introduction to Atom by W3 to be the easiest way in to putting my feed together. It explains all the required, recommended and optional elements for both the feed as a whole and the individual entries, with some good examples to boot.

You can see the result in the feed itself, it was no more difficult than writing an HTML page by hand (like this one). To save time later — and assuming this isn't the last ever post on this blog — I also included a commented out entry template for future use.

That said, there are two potential snags that a CMS would usually take care of for me: text encoding and time stamps.

Encoding

As a person who works on the web every day, I really should get encoding, but really, I don't. I know that UTF-8 has solved most of our problems in HTML, but sadly for me this ain't HTML, it's XML.

I'll start with what I'd like to be able to do: I'd like to be able to cut and paste the substance of my blog post, in HTML, straight into the feed.

If I specify the feeds encoding as UTF-8 like so: <?xml version="1.0" encoding="utf-8"?> I'm halfway there, "special" characters like macrons and emoji will pass through fine (🤘). But, unfortunately, because XML uses the same reserved characters as HTML (e.g. < >) this is not possible without confusing the computer that will be trying to understand the feed.

And then I found this post from 2005: Handling Atom Text and Content Constructs. It runs through all the options, the ultimate winner for me being:

<content type="xhtml">
  <div xmlns="http://www.w3.org/1999/xhtml">
    <p>Blog content goes <strong>here</strong> with <a href="https://www.w3schools.com/html/">normal HTML markup</a></p>
  </div>
</content>

(Ironically I had swap out all the reserved characters in this code to get it to display right here).

This did exactly what I wanted, with only the minor inconvenience of that extra <div>. No need to transform every < into &lt; and so on or use CDATA which is a bit funky when it comes to the document structure of XML.

Time stamps

Remember how hard it was to learn to tell the time? Remember those irritating maths questions about adding and subtracting time? Remember the last time you tried to work out what time it is in another part of the world in your head? I just had that feeling all over again.

Clearly a feed of recent content isn't useful unless it tells us when the things in the feed were published, in fact this is required by the specification and this publication date needs to be easily interpreted by computers. Our benevolent standards overlords have defined a format for this called ISO 8601 which looks like this: YYYY-MM-DDThh:mm:ssTZD which looks like this for my first post: 2019-11-20T09:00+13:00.

To get this I have to work backwards from local time: removeing the (currently) 13 hours of time difference and then express the result with the time zone adjustment at the end. I am not looking forward to doing that manually, but needs must.

Linking it all together

Finally, I needed to add the feed to the index.html so feed readers can discover it. The was as simple as adding <link type="application/atom+xml" rel="alternate" title="Left Align full feed" href="https://leftalign.cc/atom.xml" /> to the page's <head>.

So that's it. This blog now has a feed and as soon as I stick it up we'll find out if it works!


Postscript

Once I put it up I ran it through the official validator and I got a lot of errors. Almost all were XML being upset about incorrectly closed tags (the browser interpreted then fine, but there were some missing closing tags dotted about) but I got there in the end.

[Valid Atom 1.0]