gregory david collins: Building a website with Haskell, part 4

Introduction

In the last installment of this series, I described a basic filesystem-based content management system (called “Blaaargh”) I built to power my homepage. In this (tremendously overdue) post, the final one of the series, I’ll give you a quick walkthrough of Blaaargh’s basic design and code. You can also view the Blaaargh API documentation and manual, which describes how Blaaargh is configured and used.

About Blaaargh

Blaaargh is a minimalist program with a stupid name, called after the sound I want to make whenever I hear or read “blog”: the clumsiest new word in the English language. (Canonized by Webster’s, we’re stuck with it now, just like “utilize” — and part of me dies inside.) I wrote it to make it quick and easy to push posts and pages to my web site while cleanly separating content from presentation. (It’s similar in philosophy to the venerable bloxsom, if you’ve seen that.)

I should warn you: because I’m “O.G. Slackware 1994”, “quick and easy” for me means writing with God’s own text editor and publishing with “git push”.

Some of Blaaargh’s design choices:

the content database is a directory on the filesystem.
files on the disk with a .md extension are treated as “posts”, and get syndicated/indexed/served as such. Other files are considered to be “static” and are served as a typical webserver would.
posts have metadata (e.g. “title”, “author”, “summary”, publication dates, etc) — I decided to encode these in a delimited header within the posts themselves.
posts are read in once and served from RAM; I don’t have gigabytes of text content to serve.
posts are styled with templates, using the HStringTemplate library

Let’s walk through the code to see how this all plays together.

Starting with types

If you want to follow along, you can pull up the source to src/Blaaargh/Internal/Types.hs from the Blaaargh repo on github.

In order to make feed generation extra-brainless (and because it was a reasonable internal representation for post data), Posts are just a newtype wrapper around the feed package’s Atom Entry datatype:

newtype Post = Post { unPost :: Atom.Entry }
  deriving (Show)

HStringTemplate defines a typeclass called ToSElem which allows you to expose an arbitrary datatype to a template; you can map a String key to a ToSElem value and then refer to the value by name. Values can be exposed to templates in three ways; as a String, as a list of ToSElem values, or as a key-value mapping (allowing you to write things like $foo.bar$ in your templates.)

The Post datatype will be exposed to templates as a mapping containing fields like “date”, “url”, “content”, etc.:

instance ToSElem Atom.EntryContent where
    toSElem (Atom.TextContent s) = toSElem s
    toSElem (Atom.HTMLContent s) = toSElem s
    toSElem _ = toSElem (""::String)

...

instance ToSElem Post where
    toSElem post@(Post p) = SM $ Map.fromList attrs
      where
        url = Atom.entryId p
        body = fromMaybe (Atom.TextContent "") $ Atom.entryContent p
        summary = fromMaybe (Atom.HTMLString "") $ Atom.entrySummary p

        attrs = [ ("id", toSElem url)
                , ("date", toSElem $ friendlyTime $ getPostTime post)
                , ("url", toSElem url)
                , ("title", toSElem $ Atom.entryTitle p)
                , ("content", toSElem body)
                , ("summary", toSElem summary)
                , ("authors", toSElemList $ Atom.entryAuthors p) ]

We also define a ToMessage instance (from Happstack) for Atom.Feed so that we can convert it directly into a response:

instance ToMessage Atom.Feed where
    toContentType _ = "application/atom+xml"
    toMessage f = L.pack $ XML.showElement $ Atom.xmlFeed f

Next we define a datatype for describing the content directory:

type ContentMap = Map ByteString ContentItem

data ContentItem =
       ContentPost Post                        -- ^ a post
     | ContentDirectory ByteString ContentMap  -- ^ a path prefix + content
                                               -- mapping
     | ContentStatic FilePath                  -- ^ a static file
  deriving (Show)

We’ll walk this structure when we serve HTTP requests.

Next there is a datatype we use to hold the application state; it bundles together all of the information we need to serve content.

data BlaaarghState = BlaaarghState
    { blaaarghPath      :: FilePath      -- ^ path on disk
    , blaaarghSiteURL   :: String        -- ^ site URL, minus slash
                                         --   (e.g. http://foo.com)
    , blaaarghBaseURL   :: String        -- ^ base URL of content section,
                                         --   e.g. "/posts"
    , blaaarghPostMap   :: ContentMap    -- ^ content
    , blaaarghTemplates :: TemplateDirs  -- ^ templates
    , blaaarghFeedInfo  :: Atom.Feed     -- ^ feed info

    , blaaarghFeedExcludes :: ExcludeList -- ^ these URLs won't appear in
                                          -- feeds or in post listings

    , blaaarghExtraTmpl :: Template -> Template
                                         -- ^ extra template variables get
                                         --   inserted here
    }

Some of the fields of this record deserve some special mention:

blaaarghFeedInfo contains information that is served with the top-level atom feed, like “feed title”. When we serve atom feeds we’ll just be tacking some Posts and a little bit of metadata onto this structure.
blaaarghFeedExcludes enumerates posts/directories in the content area that we don’t wish to traverse when building atom feeds/post indices. For example, my content area for my homepage contains some “static” pages (“about”, “contact”) which would appear in the syndication feed for the site if they weren’t specifically excluded. (Note that we always skip posts called “index”, because we use those when serving directory indices).
blaaarghExtraTmpl is a function that transforms string templates, which allows the user to fill in extra template variables. We also provide a function to help with this:

addExtraTemplateArguments :: ToSElem a =>
                             [(String,a)]
                          -> BlaaarghMonad ()
addExtraTemplateArguments args = do
    modify $ \t ->
        t { blaaarghExtraTmpl = foldl f (blaaarghExtraTmpl t) args }

  where
    f :: ToSElem a => (Template -> Template) -> (String, a) -> (Template -> Template)
    f xtmpl (k,v) = (setAttribute k v) . xtmpl

Finally, we define the happstack monad we’ll use to handle Blaaargh requests, as well as a function to run it:

type BlaaarghMonad   = StateT BlaaarghState IO
type BlaaarghHandler = ServerPartT BlaaarghMonad Response

runBlaaarghHandler :: BlaaarghState
                   -> ServerPartT BlaaarghMonad a
                   -> ServerPartT IO a
runBlaaarghHandler s = mapServerPartT $ \m -> do
                         (a,_) <- runStateT m s
                         return a

The content area

Blaaargh expects to be given a data directory with the following contents:

config           -- a configuration file
content/         -- the content area, contains posts & files
templates/       -- contains templates

When Blaaargh is initialized, the content area gets read in from disk and dumped into a BlaaarghState record. Reading in the content area is pretty straightforward, if you’d like to examine the details you can read src/Blaaargh/Internal/Post.hs.

Templates

Before we dive into the details of how templates are looked up, I’ll call your attention to the file src/Blaaargh/Internal/Util/Templates.hs, which is an adaptation of some routines from HStringTemplateHelpers for reading in directory trees containing template files. The difference between this code and the version from HStringTemplateHelpers is that templates in subdirectories here can refer to templates from their parent directories by name. It introduces a TemplateDirs type which we’ll use in the “business logic”.

The meat of the templating code resides in src/Blaaargh/Internal/Templates.hs. I’ll briefly take you through the details of the findTemplateForPost function, the other public functions (findTemplateForDirectory and findFourOhFourTemplate) are similar.

Let’s say we’ve received a request for the post “foo/bar/baz/quux”. We need to decide which master template we’re going to use to present the post contents. We’ll search the data directory for the following template files, in order, and use the first one that matches:

templates/foo/bar/baz/quux.st
templates/foo/bar/baz/post.st
templates/foo/bar/post.st
templates/foo/post.st
templates/post.st

In other words, we do a cascading template search. We’ll rely on a couple of helper functions:

lookupTmpl :: TemplateDirs          -- ^ templates
           -> (String, ByteString)  -- ^ (dir, template), where "dir"
                                    -- starts with "./"
           -> Maybe (StringTemplate ByteString)
lookupTmpl tmpls (d,t) =
    lookupDirgroup d tmpls >>= getStringTemplate (B.unpack t)

The lookupDirgroup function is one of the ones we cribbed from HStringTemplateHelpers (albeit rewritten). It looks up a template group from the TemplateDirs type and pulls a named template out of it, if it exists.

Next we provide a little function which, when given a list of templates to search, will map lookupTmpl across the list until it finds one that matches (using the First monoid):

findFirstMatchingTemplate :: [(String,ByteString)]
                          -> BlaaarghMonad (Maybe (StringTemplate ByteString))
findFirstMatchingTemplate templatesToSearch = do
    templates <- liftM blaaarghTemplates get

    return . getFirst . mconcat $
      map (First . lookupTmpl templates) templatesToSearch

Next we can define the code to do a cascading template search within our Blaaargh monad:

cascadingTemplateFind :: [ByteString]
                      -> ByteString
                      -> BlaaarghMonad (Maybe (StringTemplate ByteString))
cascadingTemplateFind directories templateName = do
    assert (not $ null directories) (return ())

    findFirstMatchingTemplate templatesToSearch

  where
    -- if requested "foo/bar/baz", then containingDirs contains
    -- [["foo","bar"], ["foo"], []]
    containingDirs    = tail . reverse . inits $ directories

    templatesToSearch = map (\d -> (listToPath d, templateName))
                            containingDirs

Finally, we’re ready to lookup the template for a given post:

findTemplateForPost :: [ByteString]   -- ^ path to the post, relative
                                      -- to the "content/" directory;
                                      -- if the file is in
                                      -- "content/foo/bar/baz.md" then
                                      -- this list will contain
                                      -- ["foo", "bar", "baz"]
                    -> BlaaarghMonad (Maybe (Template))
findTemplateForPost pathList = do
    xformTmpl <- liftM blaaarghExtraTmpl get
    templates <- liftM blaaarghTemplates get
    assert (not $ null pathList) (return ())

    let ft  = First $ lookupTmpl templates firstTmpl
    st     <- cascadingTemplateFind pathList "post" >>= return . First
    let mbT = getFirst (ft `mappend` st)

    return $ xformTmpl `fmap` mbT

  where
    postName = last pathList

    -- search for a template specific to this post first, then walk up
    -- the directory structure looking for a template named "post"
    firstTmpl = (listToPath $ init pathList, postName)

This looks up a template for the post first by name (i.e. requesting “foo/bar/baz” results in a lookup for “templates/foo/bar/baz.st”), then it does a cascading lookup for a template called “post”, and if there’s a template that matches it transforms it using the blaaarghExtraTmpl function we described earlier.

Handling requests

Go ahead and open src/Blaaargh/Internal/Handlers.hs. The serveBlaaargh handler is the toplevel handler for Blaaargh pages:

serveBlaaargh :: BlaaarghHandler
serveBlaaargh = do
    methodOnly GET
    compressedResponseFilter

    cm    <- lift get >>= return . blaaarghPostMap
    paths <- askRq >>= return . map B.pack . rqPaths

    serve [] paths cm `mappend` fourohfour

  where
    --------------------------------------------------------------------------
    serve :: [ByteString] -> [ByteString] -> ContentMap -> BlaaarghHandler
    serve soFar paths content = do
        case paths of
          []      -> serveIndex soFar content
          (a:[])  -> serveFile soFar a content
          (a:b)   -> serveDir soFar a b content


    --------------------------------------------------------------------------
    serveFile :: [ByteString] -> ByteString -> ContentMap -> BlaaarghHandler
    serveFile soFar a content = do
        if a == "feed.xml" then
            lift $ serveFeed soFar content
          else
            maybe mzero
                  (\f -> case f of
                           (ContentStatic fp)     -> serveStatic fp
                           (ContentPost post)     -> lift $ servePost (soFar ++ [a]) post
                           (ContentDirectory _ d) -> serveIndex (soFar ++ [a]) d)
                  (Map.lookup a content)


    --------------------------------------------------------------------------
    serveDir :: [ByteString]
             -> ByteString
             -> [ByteString]
             -> ContentMap
             -> BlaaarghHandler
    serveDir soFar d rest content = do
        let mbD = Map.lookup d content

        maybe mzero
              (\f -> case f of
                       (ContentDirectory _ mp) -> serve (soFar ++ [d]) rest mp
                       _                       -> mzero)
              mbD

In short, we use the request path to walk down the ContentMap until we find an applicable object to serve, and failing that we throw up a 404 error.

Conclusion

It’s a very simplistic package that doesn’t do much, but Blaaargh meets my (very basic) needs and unless someone comes along with some complaints or feature requests, I’ll be putting it aside for now. I’m hoping others find it useful as an example of a self-contained “module” that you can easily plug into other happstack apps; there aren’t many of those.

gregory david collins

Building a website with Haskell, part 4