The ultimate Weblogging system?

May 2, 2003 by Steven Garrity in Site Updates, Usability & Design, Web Development

User interface critic and man-about-New Zealand, Matthew Thomas has outlined his view of the The ultimate Weblogging system. We’ve run our system against his checklist.

Good points all around. We ran our weblogging system against his list – citing where we’ve met his requirements and where we have not. While we are missing a fair number, I think we’ve done very well considering that our system was not built with the intention to be marketed or used by others. For example, you can pretty quickly rule out a Blogger import system by asking the three eventual end-users of your system, “Does anyone here use Blogger?”.

Some of the points are things I hadn’t considered and will be implemented thanks to Thomas’ suggestion (RSS feeds for categories, in particular).

I would encourage those with weblogging systems intended for wider use run their own system against this checklist as we have done here.

Here’s my annotated version of Matthew Thomas’ list:
(Note: grey text is from Matthew Thomas’ original post, bold text is our own. When I say “nope”, I’m not disagreeing; I’m saying that we don’t meet that given requirement.)

Forward compatibility
- License under the GPL (minimizing lock-in, architecture rot, and wasted development effort). (nope – we’re a bit tied down here due to our reliance on proprietary code)
- Work with at least one Free database (e.g. mySQL). (nope, but PostgreSQL support may be coming)
- In case of emergencies, allow entries to be exported to XML. (nope)
- Use entirely non-crufty URIs. (check)
  - Give individual entries URIs (permalinks) of the form http://base/2003/05/02/oneMeaningfulWordFromTheTitle. (close, we have the word archives in there, though I think we’ve hit the mark – 6 for 6 on this list – down with linkrot! Also, we’re generating an automatic “shortname” for the URL based on the title, but it is human-editable in case the robot’s suggestion doesn’t work out.)
    - No irrelevant system-specific cruft (e.g. mt-static/, msgReader$, or weblog.cgi). (check)
    - No ? characters, so all entries get indexed by search engines. (check)
    - No irrelevant filetype-specific cruft (e.g. .html, .php, or .xml). (check)
    - Every entry is on its own page, not just an internal anchor on a daily/weekly archive (which makes search engines and statistics tools less useful). (check)
    - Net effect: Even with a stupidly worded inbound link (e.g. “I came across this”), a reader can tell a lot about an entry (host, date, and hint at subject) from glancing at its URI. (check)
    - Something Thomas didn’t mention: support for the old URLs of imported content – we’re handling the old ColdFusion based URLs from two years of Acts of Volition transparently.
  - Give daily archives URIs of the form http://base/2003/05/02/. (we don’ have daily archives)
  - Give monthly archives URIs of the form http://base/2003/05/. (close enough)
  - Give yearly archives URIs of the form http://base/2003/. (close enough)
  - Give category archives URIs of the form http://base/name-of-category/2003/05/, etc. (close enough)
  - Theory: URL as UI, Cool URIs don’t change.
  - Practice: Making clean URLs with Apache and PHP.
Metadata
- Each entry has a title, a category string, contents, time posted (auto-generated), and one or more objects (e.g. images). (not quite)
- Invite (but do not require) the author to provide a summary for any item longer than n words, for use in mobile editions and RSS feeds. (no, but we are generating relatively good automated summaries with a brief excerpt, and word count – human-written summaries coming soon)
- Categories are faceted. I may categorize an entry by subject, by current location (integrating with GPS devices), by mood, and so on. (nope)
- Each category facet can be hierarchical. (For example, an “interface design” subject category could be subdivided into “desktop application design”, “Web design”, “appliance design”, and “signage and artifact design”.) (we didn’t think this was necessary for our humble blog, but it is a good point)
- Invite (but do not require) an author to subdivide a category whenever it collects more than n entries (rather than forcing them to be architecture astronauts specifying all their categories at the beginning). (nope)
- An entry may have multiple values for each category facet. (For example, one post might be about both CSS specifications and buggy Web browsers.) (check)
- Why does all this need to scale so deeply? Because when you’ve been keeping a Weblog for twenty or thirty years, and you can’t remember any semi-unique words you used in a particular entry, finding it will be horribly difficult, and you’ll need all the semantic help you can get. (interesting – Stuart Brand would be proud).
Syndication
- Provide an RSS feed for the Weblog as a whole. (check)
- Provide an RSS feed for any category. (great idea! – coming soon)
  - Because of the faceting, category feeds will need to be dynamically generated, but they should still send correct caching responses. (we are not caching)
- Automatically ping Weblogs.com. (check)
- Automatically convert Slashdotted entries to static pages, and switch back to dynamic generation once the traffic subsides. (oh, to be slashdotted… nope)
- Integrate support for Creative Commons licenses. (not yet)
Management
- Web interface. (check)
- Native LinuxSTEP interface. (huh?)
- Accept entries from software on any other platform or device, using the metaWeblog API. (not yet)
- Accept entries sent by e-mail. (nope)
- Make it easy to send entries from a mobile phone (e.g. by replicating the features of Textile). (nope)
Backward compatibility
- Import entries from Blogger, Radio, Manila, Movable Type etc. (not automatically, but if you are a database wiz…)
- Keep URLs the same for legacy entries, while still allowing control over their appearance. (check)

24 thoughts on “The ultimate Weblogging system?”

Dave says:

May 2, 2003 at 11:45 am

I guess I’d better get our Blog platform running on Postgresql.
steven says:

May 2, 2003 at 11:59 am

There goes my facade that I actually do this stuff myself.
mpt says:

May 2, 2003 at 6:38 pm
1. Does “our weblogging system” have a name?
2. What’s the proprietary code you’re relying on?
3. Well done with your URIs! I hummed and haahed about archives/, and eventually decided against it, because it makes posts look old even if they were posted five minutes ago. (The flip side of that decision would be that root-level categories couldn’t have a wholly numeric name, because it might clash with the number of a year.)
4. I did indeed mention “support for the old URLs of imported content” — “Keep URLs the same for legacy entries”.
steven says:

May 3, 2003 at 11:05 am

mpt, the weblog system we’re referring to is one developed with my friends and co-workers at silverorange. It was developed without any intention for commercialization. The main goal was to setup a weblogging system that we could use for ourselves (mainly Acts of Volition and a few other sites).

The “proprietary code” that is used is a web-application development platform that we use at silverorange.

We’re got five weblogs running on the system (see the list). We’ll probably have a page up soon with a description and feature list.
Lou Quillio says:

May 3, 2003 at 1:28 pm

MPT’s all-in-one-place thinking on this subject is a boon, collecting some of my thoughts on a durable CMS schema, adding others and saving me some brain cells. I’ll be referring to it plenty.

What I’m still not seeing clearly is category faceting, with an eye toward multiple category assignment _plus_ hierarchy … plus future category reassignment, plus avoiding architecture astronautics.

Any thoughts here? I might, for instance, start out with the top-level categories WebDev and CSS2, and rightly assign a post to both. Later I might decide that CSS2 is subordinate to WebDev. I don’t want to manually reassign posts, rather just rearrange my hierarchy, so category assignment at the post-level can’t be slave to the changing hierarchy.

I suppose this can be solved by assigning categories to posts as a delimited string of equivalent category names, without regard to hierarchy. The string is parsed, and archives are served by considering the current state of the hierarchy — finest to coarsest granularity. Where once WebDev and CSS2 were peers, CSS2 has become a subordinate (more fine-grained than WebDev by a factor of 1) — but that fact is transparent to the post records themselves.

This would allow rich, free-form assignment of multiple categories *and* free-form hierarchy tweaking (really alternate “views”) without welding categories as posts understand them to categories as the hierarchy does.

Hmmm … okay, never mind. I think I figured it out. I think. Thanks.

LQ
steven says:

May 3, 2003 at 1:57 pm

Right on Lou.

Epinions does this very nicely. When you browse something like “Laptops“, you are presented with “sub-categories” like Brand, Screen Size, and Price Range. If you click “Brand” you’ll see “Screen Size” and “Price Range” as “sub-categories”, but if you had clicked “Screen Size”, you’d see “Brand and “Price Range” as “sub-categories”.

They are really more like “attributes” than “categories”. We had an internal debate about what to call them (though we aren’t nesting, yet). We settled on “categories” because it is a recognized convention in weblog archives. However, “attributes” is more accurate.

It’s not flat. Taking Epinions for example, the category of laptops with 15″ screens is obviously a child of the “laptops” group, but you can narrow down sibling categories in a way that makes them appear as though they were a tree-structure.

For examples of this in practice see the Epinions Laptop section or my friend Nick’s photo gallery.

silverorange will be using this technique in some of our future e-commerce sites as well.

Make sense?
Daniel Burka says:

May 3, 2003 at 3:05 pm
Good post Steve. Perhaps some sort of spell-checking tool (I’m aware how difficult this is with current technologies) should be involved in any good web blogging tool. Of note, you made about 5 typing errors in this post alone.
- end-users of you system
- and will be implementing thanks for Thomas’ suggestion
- the robot’ suggestion does’ work out (2 errors here)
- Something Thomas didn#8217;t mention, support for the old URLs of imported content – we#8217;re handling
steven says:

May 3, 2003 at 3:18 pm

Thanks Daniel. Perhaps we should implment the Distributed-Remote-Daniel-Burka-Weblog-Proofreading-Protocol (DRDBWPP).
Lou Quillio says:

May 4, 2003 at 12:18 am

SG: What I’m thinking of, though, is a case where post (item) records contain a string of delimited category names (better “attributes,” as you say) that category-interpreting code applies a set of heuristics to. Posts don’t belong to categories, categories belong to them. And categories don’t own posts.

I’m sure we’re talking the same language, but what I’m chewing on now are the programmatic nuts and bolts, and database schemas. The category-interpreting module (and its UI, and its very identity) is the real challenge. Instructions to the user are to slap as many categories on an item as are germane, and to tier categories if desired. Some will make a real mess.

And the category-interpreting module must impose order. It must control category creation and insist that they are distinct. It must perform all global name changes. It must be air-tight against anomalies. Hardest, it must make sense. I’ve never seen a UI like this done well in GUI, far less in browser-limited controls. Jesus, that’d be a fun challenge.

Yeah, so that’s it: Items are dumb and have moods and topical concerns — and dates and titles, let’s don’t forget. But they also don’t need to think about their “recent posts” browsability. A separate control structure interprets and serves categories to the category-minded user.

Hold it.

Am I the only one thinking that this whole business of CMS item categorization will mean nothing once the average user learns to construct a focused site-search string? Never mind. Different subject.

How is this done:

http://gallery.whitelands.com/photos/galleryinfo

I’m interested to know if the rich hierarchy is embedded in the database schema or the code.

LQ
Alan says:

May 4, 2003 at 8:22 am

I don’t want spell checking in a blog. A blog is often not formal writing an if there are spelling errors they are most usually of the dashed off note sort of thing. Spell check also introduces errors based on using an improper but properly spelled word. It also fails on the standardization of linguistic variation – ie Canadian English is not recognized. It would also take way all the “fun” for people who count aleged spelling errors in other peoples’ posts.
Alan says:

May 4, 2003 at 8:24 am

Also, it falsely places spelling errors above poor grammer and lack of clarity, bit of which are more important obstacles to conveying meaning.
steven says:

May 4, 2003 at 12:28 pm

Lou, I think this type of categorization can be realitvely simple. Also, having the capability to nest attributes doesn’t require you to. You can always just have one flat set instead.

Here’s a screenshot of the administration system for the photo gallery that might help. On the left, you see a heirarchy of the “categories”. On the right, you see an “Add New Photos Page” scrolled down to the area to select attributes (called “galleries” in this system).

In the example in the screenshot, I’ve selected “Halifax Trip” under “Trips & Travel” and “Dan”, “Isaac”, and “Steven” under “People. This would indicate that this photo was from our Halifax trip and includes Dan, Isaac, and Steven.
Lou Quillio says:

May 4, 2003 at 3:45 pm

Steven: Screenshot’s worth 1,064 words. Thanks.
abhi says:

May 6, 2003 at 2:25 am

My blogging system is in ASP and uses an access database as of now.
Supports – Archives, RSS, Calendar, etc

I’ll be soon making it public as right now I am busy working on a Generic Database Editor.
Lou Quillio says:

May 6, 2003 at 2:50 pm

I’ll be soon making it public as right now I am busy working on a Generic Database Editor.

I quit ASP a while back, but check James Shaw’s work on an ASP CMS at CoverYourASP.com. Mature code, full source, well documented, and (bonus) a generic table editor.

LQ
abhi says:

May 8, 2003 at 1:04 am

heh.

Nice. But its feels better to use ur own script/program. And then I can just put in a scaled down version of the same for other scripts/projects that I make public. 🙂

Regards,
Abhi
Stephen DesRoches says:

May 10, 2003 at 1:43 pm

spell checking makes more sense at the browser level rather then the blogging system
steven says:

May 11, 2003 at 2:49 pm

Re: Stephen DesRoches – I totally agree that spell checking makes far more sense at the browser level. However, it makes even more sense at the operating system level, as illustrated the use of OS X’s system wide spell check in Safari. One interface and dictionary across all programs.
Alan says:

May 11, 2003 at 3:46 pm

Just make sure the dictionary and the rules of usage are good – unlike MS Word now. You either spend time now correcting the errors in MS Word’s dicitonary or, worse, you come to believe it to be are a superior source of good usage and spelling.
Keir says:

May 21, 2003 at 3:05 am

Having recently finished reading We blog: Publishing online with weblogs I went about setting one up for myself. Having read the book Moveable Type seemed the best bet but my limited experience of PHP and MySQL put me off attempting to install it. I may revisit this in time as their feature list is impressive. Eventually I decided to write my own basic system using ASP. Although you are able to replicate common features of popular weblogs the one thing I really struggled with is URLs.

I believe Apache has an in built feature (mod_rewrite?) enabling you to pass parameters through the URL without the use of the ?. This feature is available in IIS but as a purchasable add on which needs installing on your live server, or in my case external web host. As a result I have to make do with ?postID=XYZ which is a major compromise. If anyone has used ASP and solved this problem any pointers would be gladly received.

In terms of scripting languages I noticed that this Blog used to appear in CFM but is now in PHP. As ASP is on it’s way out I wonder whether or not to begin learning ASP.NET, which probably has less of a learning curve or move over to PHP which is more open source. I like the idea of ASP.Net separating the code from the HTML a lot but PHP seems to be on the rise. Steven – Out of interest what made you move to PHP?

Finally – Any chance of the feature list promised above?
abhi says:

May 21, 2003 at 11:23 pm

hey Kier,
that ?id=1239 in place of /2003/10/13 got me bugged for some time too.

if your server admin gives you a 404 page, u can modify that to suit you.

I’m currently making a makeover for my CMS and i’ll be using querystrings like:

?d=2003/05/22 – for the posts and
?d=2003/05 – for the monthly archives

then use RegExps to break it up and do the stuff. This obviously looks better than ?id=50.

Good luck.
Lou Quillio says:

June 11, 2003 at 8:09 pm

Keir: If anyone has used ASP and solved [the messy querystring] problem any pointers would be gladly received.

The way to do it on a hosted IIS domain is with a custom 404 script.

You decide on a virtual directory structure that describes the hierarchy of your data. These “clean URLs” don’t actually exist on your site, so requests for them are kicked to the 404 script. But before displaying the 404 message, your script grabs the referring page’s URL and examines the virtual path/filename. If it can convert the path/filename requested into a meaningful querystring-style URL, it redirects to that page; if not, it goes ahead and throws the 404.

It’s actually pretty easy to do (though there will be performance issues), just takes some planning.

Conceive the virtual path/filename scheme such that they include everything you’d need to construct a query from the parts. If you get a request for, say

http://base/2003/05/02/oneMeaningfulWordFromTheTitle

your 404 handler could field-strip that into

REQUEST id FROM tablename WHERE yr=’2003′ AND mo=’5′ AND dy=’2′ AND title LIKE ‘%oneMeaningfulWordFromTheTitle%’

Empty result set? Throw the 404. Found it? Redirect to the querystring-style URL that your CMS understands.

The other side of the coin is that your CMS should reverse the process when it forms internal links. So when it’s outputting the permalink for an item, say, it shouldn’t set an href of

http://base/index.asp?id=7

but rather it should do a lookup on the id #7 record and explode the date into a virtual path and the title (perhaps) into the OneMeaningfulWord/filename. This way your messy querystring URLs aren’t released into the wild.

Did I mention there are performance issues? Obviously there are, but if you think things through this method will work, and nearly every IIS host allows custom 404 scripts.
Brad Pineau says:

June 12, 2003 at 6:05 pm

I’m standing behind MySQL, all the way. Steven, I say you should consider making the blogging system work with MySQL – as this is the only database supported by a low of cheap web hosting companies. You’d be surprised what it can handle.
Keir says:

June 13, 2003 at 4:23 am

Lou – Thanks a lot for your entry relating to the custom 404 error scripts. I had never thought of using the 404 for that reason but it seems so obvious after reading your outline. I will need to redo the way I handle dates but that is not a major problem with so few entries.

Comments are closed.