diary at Telent Netowrks

The trouble with triples#

Sat, 26 Mar 2016 16:29:15 +0000

The other day I had occasion to write

(defn triples-to-map [triples]
  (reduce (fn [m row]
            (update-in m (butlast row)
                       (fn [old new] (if old (conj old new) [new]))
                       (last row)))
          {}
          triples))

and be surprised and delighted that it ran first time with the expected result. As witness:

foo.search=> (clojure.pprint/pprint triples_)
([:bnb:016691109 :published "2014"]
 [:bnb:016691109 :title "The Seven Streets of Liverpool"]
 [:bnb:016691109 :publisher "Orion"]
 [:bnb:016691109 :schema :shlv:Book]
 [:bnb:016691109 :author "Lee, Maureen"]
 [:bnb:016594932 :published "2013"]
 [:bnb:016594932 :title "Stephen Guy's forgotten Liverpool"]
 [:bnb:016594932 :publisher "Trinity Mirror"]
 [:bnb:016594932 :schema :shlv:Book]
 [:bnb:016594932 :author "Guy, Stephen"]
 [:bnb:016242841 :published "2012"]
 [:bnb:016242841
  :title
  "Robbed : my Liverpool life : the Rob Jones story"]
 [:bnb:016242841 :publisher "Kids Academy Publishing"]
 [:bnb:016242841 :schema :shlv:Book]
 [:bnb:016242841 :author "Jones, Rob, 1971-"]
 [:bnb:016744037 :published "2012"]
 [:bnb:016744037 :title "Steven Gerrard : my Liverpool story"]
 [:bnb:016744037 :publisher "Headline"]
 [:bnb:016744037 :schema :shlv:Book]
 [:bnb:016744037 :author "Gerrard, Steven, 1980-"])
foo.search=> (clojure.pprint/pprint (triples-to-map triples_))
{:bnb:016691109
 {:published ["2014"],
  :title ["The Seven Streets of Liverpool"],
  :publisher ["Orion"],
  :schema [:shlv:Book],
  :author ["Lee, Maureen"]},
 :bnb:016594932
 {:published ["2013"],
  :title ["Stephen Guy's forgotten Liverpool"],
  :publisher ["Trinity Mirror"],
  :schema [:shlv:Book],
  :author ["Guy, Stephen"]},
 :bnb:016242841
 {:published ["2012"],
  :title ["Robbed : my Liverpool life : the Rob Jones story"],
  :publisher ["Kids Academy Publishing"],
  :schema [:shlv:Book],
  :author ["Jones, Rob, 1971-"]},
 :bnb:016744037
 {:published ["2012"],
  :title ["Steven Gerrard : my Liverpool story"],
  :publisher ["Headline"],
  :schema [:shlv:Book],
  :author ["Gerrard, Steven, 1980-"]}}
nil

(Now I write that code down for the second time I wonder whether using update-in is slightly overkill when I know the map will only ever be two levels deep. But that's not something I'm interested in right now.)

What I'm interested in right now is that the input list for this function is itself the output of some other code which - mostly thanks to Instaparse - was unexpectedly easy to write. I've been playing around lately with RDF and the Semantic Web, and needed a way of parsing N-Triples - which looks superficially simple enough that Awk could do it, until you start thinking about comments and strings with spaces in them and escaped special characters and ...

Anyway, Instaparse steps in to save the day again. I believe I have written previously to give my opinion that Instaparse is awesome and I will go on record to say that this fresh experience merely serves to cement my first impression.

N-Triples has a published EBNF grammar . I had to monkey with this a bit to get it into Instaparse

Here's the final result

ntriplesDoc 	::= line*
line ::= WS* triple? EOL
triple 	::= 	subject WS* predicate WS* object WS* '.' WS*
subject 	::= 	IRIREF | BLANK_NODE_LABEL
predicate 	::= 	IRIREF
object 	::= 	IRIREF | BLANK_NODE_LABEL | literal
literal 	::= 	STRING_LITERAL_QUOTED ('^^' IRIREF | LANGTAG)?
LANGTAG 	::= 	'@' #"[a-zA-Z]"+ ('-' #"[a-zA-Z0-9]"+)*
EOL 	::= 	#"[\n\r]"+ 
WS 	::= 	#"[ \t]" | #"#.*"
IRIREF 	::= 	'<' IRI '>'
IRI ::= (#"[^\u0000-\u0020<>\"{}|^`\\]" | UCHAR)*
STRING_LITERAL_QUOTED 	::= 	'"' STRING_LITERAL  '"'
STRING_LITERAL ::= ( #"[^\u0022\u005C\u000A\u000D]" | ECHAR | UCHAR)*
BLANK_NODE_LABEL 	::= 	'_:' (PN_CHARS_U | #"[0-9]") ((PN_CHARS | '.')* PN_CHARS)?
UCHAR 	::= 	'\\u' HEX HEX HEX HEX | '\\U' HEX HEX HEX HEX HEX HEX HEX HEX
ECHAR ::= "\\" #"[tbnrf\"\'\\]"

HEX ::= #"[0-9A-Fa-f]"

PN_CHARS_BASE ::= #"[A-Z]" | #"[a-z]" | #"[\u00C0-\u00D6]" | #"[\u00D8-\u00F6]" | #"[\u00F8-\u02FF]" | #"[\u0370-\u037D]" | #"[\u037F-\u1FFF]" | #"[\u200C-\u200D]" | #"[\u2070-\u218F]" | #"[\u2C00-\u2FEF]" | #"[\u3001-\uD7FF]" | #"[\uF900-\uFDCF]" | #"[\uFDF0-\uFFFD]" | #"[\x{10000}-\x{EFFFF}]"

PN_CHARS_U ::= PN_CHARS_BASE | ":" | "_"

PN_CHARS ::= PN_CHARS_U | "-" | #"[0-9]" | "\u00B7" | #"[\u0300-\u036F]" | #"[\u203F-\u2040]"

Calling insta/parse with this grammar on a sample line gets you something looking like

[:ntriplesDoc
 [:line
  [:triple
   [:subject
    [:IRIREF "<" [:IRI  "h" "t" "t" "p" ":" "/" "/" "b" "n" "b" "."
                        "d" "a" "t" "a" "."  "b" "l" "."  "u" "k" "/" "i" "d"
                        "/" "r" "e" "s" "o" "u" "r" "c" "e" "/" "0" "1" "6" "7"
                        "0" "6" "8" "5" "5"] ">"]]
   [:WS " "]
   [:predicate
    [:IRIREF "<" [:IRI "h" "t" "t" "p" ":" "/" "/" "l" "o" "c" "a" "l"
                       "h" "o" "s" "t" ":" "3" "0" "3" "0" "/" "p" "u"
                       "b" "l" "i" "s" "h" "e" "d"] ">"]]
   [:WS " "] [:object [:literal [:STRING_LITERAL_QUOTED
      "\"" [:STRING_LITERAL "2" "0" "1" "4"] "\""]]] [:WS " "]
   "."]
  [:EOL "\n"]]]

which clearly is going to need some more attention before it's usable. We do this in two passes: first we visit the entire tree node-by-node to do things like turn literal node values into strings and IRI nodes into URI objects.

(defn visit-node [branch]
  (if (vector? branch)
    (case (first branch)
      :IRIREF
      (let [[_< [_iri_tok & letters] _>] (rest branch)
            iri (str/join letters)]
        (or (prefixize iri)
            (URI. iri)))
      :STRING_LITERAL (str/join (rest branch))
      :STRING_LITERAL_QUOTED (let [[_ string _] (rest branch)] string)
      :literal (second branch)
      :WS ""
      :UCHAR (let [[_ & hexs] (rest branch)]
               (String.
                (Character/toChars
                 (Integer/parseInt (str/join (map second hexs)) 16))))
      :triple (let [m (reduce (fn [m [k v]] (assoc m k v)) {}
                              (rest branch))]
                [:triple [(:subject m) (:predicate m) (:object m)]])
      branch)
    branch))

Then we transform the tree into a seq and filter the seq to get only the :triple nodes. Putting it all together:

(defn parse-n-triples [in-string]
  (->> in-string
       (insta/parse n-triple-parser)
       (walk/postwalk visit-node)
       (tree-seq #(and (vector? %)
                       (keyword? (first %))
                       (not (= (first %) :triple)))
                 #(rest %))
       (filter #(= (first %) :triple))
       (map second)))

I'm reasonably confident that the grammar is correct: I pushed all the official N-Triples Test Suite through it without error. My post-parsing massage passes, though, are possibly not correct and certainly not complete, which is one reason I'm just blogging about it instead of publishing it as a standalone library somewhere. Things I already know it doesn't do: blank node support, language tags, datatypes, escaped characters. Things I don't know it doesn't do: don't know. But it seems to work for my use case - of which, more later.

First steps in NixOS#

Mon, 15 Jun 2015 15:10:01 +0000

According to the mtime of /nix on this laptop I've been running NixOS since February, so I should be past "first steps" by now, really. But I decided last week to switch to the Nix package collection on my work Mac as well, and that has prompted me to learn how to package some of the stuff I use that isn't already available.

(tl;dr - It's here: https://github.com/telent/nix-local/ )

Item zero was to find a way of keeping local packages alongside the real nixpkgs collection without a permanently divergent fork of the nixpkgs repo. The approach I eventually decided on was to use packageOverrides to augment the upstream package list with my own packages in an entirely separate repo. See https://github.com/telent/nix-local/blob/master/README.md#installation for details

With that out of the way, the fist thing I needed to package is vault which is a quite neat program for generating secure passwords given a master secret and a service name - i.e. you can have per-service passwords for each site you use without having to store the passwords anywhere.

It's Javascript/NPM. NPM is a bad fit for Nix because as explained by Sander van der Burg it does dependency management as well as building, and its model of dependencies (semver, in theory) is considerably more lax than the Nix model. So we use npm2nix to produce nix expressions for all its dependencies from `package.json`

$ git clone git@github.com:jcoglan/vault
$ git co 0.3.0
$ `nix-build '<nixpkgs>' -A npm2nix`/bin/npm2nix package.json node-packages.generated.nix

then we copy the generated files into our nix-local repo.

$ mkdir -p ~/nix-local/vault/
$ cp node-packages.generated.nix default.nix ~/nix-local/vault/

The generated default.nix then needed significant manual editing:

deps = (filter (v: nixType v == "derivation") (attrValues nodePackages))

Finally the package can be installed with nix-env -iA nixpkgs.vault or nix-env -i nodejs-vault. I don't know which of these is stylistically preferable, but in this case they both have exactly the same effect. As far as I know.

Regaining my compojure#

Thu, 19 Feb 2015 11:16:22 +0000

Picking up $secret_project which I put down in November to do Sledge and Yablog, I find that the routing of http requests is a horrendous mess based on substring matching and ad hoc argument parsing, and would really benefit from some Compojure

(Now, that is, that I've actually got my head around how Compojure works)

Because I'm using Hiccup to compose pages out of parts, I thought it would be neat if I could return hiccup page bodies directly as responses so that they get uniformly topped and tailed and turned into response maps. Turns out to be fairly straightforward:

  1. define a type for the hiccup response
  2. extend the compojure.response/Renderable protocol to deal with the new type

(deftype Hiccupage [body])
 
(defn page-surround [page]
  (let [head [:head
              [:link {:rel "stylesheet" :type "text/css"
                      :href "/static/default.css"}]
              [:script {:src "/static/stuff.js"}]
              [:title "Hey you"]]]
    (into head (.body page))))
 
(extend-protocol compojure.response/Renderable
  Hiccupage
  (render [page req]
    {:status  200
     :headers {"Content-Type" "text/html; charset=utf-8"}
     :body   (hiccup.page/html5 (page-surround page))}))

Now we can return Hiccupage (hiccup page? geddit? I'm still looking for a better name, yes) objects directly from our routes

(defroutes app
    (GET "/hello/:name" [name] 
	 (Hiccupage.
	  [[:h1 "hello"]
	   [:p "hello " name]
	   [:p "This is text. I like text"]]))
  ...
  )

Yet another blog engine#

Mon, 16 Feb 2015 22:35:11 +0000

Yablog is the unimaginatively named "Yet Another Blog engine" and is what's now behind the blog at ww.telent.net - mostly because every third attempt to post with my-way ended up with having to relearn out how all the baroque git hooks worked.

It was supposed to be a two-hour Tuesday morning hack, but it's now almost the following Tuesday morning as I did not anticipate that

This entry is mostly just a placeholder to check the new posting process works, but do check out Instaparse if you haven't used it already.

The Invisible AUDIO element#

Thu, 22 Jan 2015 19:26:47 +0000

I said this morning that I was going to replace the browser-native audio controls with something which looks (approximately, at least) consistent everywhere. There's another couple of reasons for wanting to revisit the way we render the audio element

The nice thing about Om application state is that it's also a perfectly ordinary Clojure atom and we can call add-watch on it to have a perfectly ordinary Clojure(Script) function called whenever it changes. So what we're going to do is