(With apologies to Samuel.)
Hi, I must be going. I'm running into more and more issues that stop me being happy posting here.
• Data ownership
My other hosted blog is on Tumblr. Unlike Vox, they let you use a custom domain so that you control the URLs. I realise Six Apart make a nice living off doing this with TypePad, and Vox isn't really aimed at the sort of people who care, but I do, so maybe this isn't the right place for me. Related to that:
• Badly documented, badly supported API
To get my content out of Tumblr, I need use only one API call: /api/read/json, with well-documented paging parameters. In contrast, I've spent several hours grappling with the Atom API that Vox supports, finding it inconsistent, barely-documented, broken, and otherwise infuriating. I dare say eventually I'll manage to liberate all my data, but damnit, it should be easier than this. (If you're lucky, there'll be a follow-up on how I managed to get on, with more technical details, later.)
• Lack of one-click export support
... in fact, it should be that easy. I believe Blogger may recently have added this; certainly Pownce had to when they were acquired and shut down. I don't want to wait until a crisis point, though; I want backups of my content as and when. The recent loss of JournalSpace and AOL Homepages show you can never be too paranoid.
• The HTML editor still doesn't work in Safari
Well, it's better than it was; instead of locking up your browser, it does now allow you to post. Unfortunately, it also inserts loads of random tags that mess up formatting when I come to copy the entry to anywhere else
• Even in Camino, the WYSIWYG editor can mangle things
• There's no raw view, which makes fixing the editor's bad HTML even harder
• The best bits are now in Movable Type's UI
When I started using Vox, I still had Movable Type 2.6 on my personal site. I still do, but I have a version of 4.1 or something on my laptop, and at some point (probably sooner rather than later, now) I'll deploy it to husk.org proper. A lot of the niceties of the interface in Vox are replicated over there.
(Meanwhile, Six Apart still shuffle backwards and forwards on whether MT is free or not. I think for my uses, it's definitely free as in beer, but I can choose between whether I have a copy that's free as in speech or not. Sigh.)
• Lack of control over page design
Editing a header image and choosing from some (admittedly pretty) backgrounds is a bit poor when you compare it to Tumblr. Sorry.
• No stats/analytics
Even Flickr has stats now, and Tumblr lets you add in Google Analytics to your HTML.
• The UI feels too "heavy"
When I started using Vox, it felt nice and simple compared to that MT 2.6 editing screen. Since then, however, we've seen Tumblr, Twitter and ffffound, where the posting interface is a simple text box, bookmarklet, or similarly stripped-down. Editing on Vox feels like it's a battle far too often.
• There's no feeling of community / not a one-stop shop
Vox feels like it was intended to fix some of the issues with LiveJournal and the isolated blogging of MT and TypePad, but sadly it never hit critical mass. Similarly, the idea of allowing users to upload all their stuff was a nice idea, but it doesn't seem to have worked out, for me anyway. (Once again, Tumblr does both of these right, for me anyway.)
• It's not going anywhere
I don't know what Six Apart's focus is, but Vox definitely doesn't feel like it's part of it. While I'll continue to watch them with interest, it feels like a lot of the work that's been going on hasn't really had any useful impact over the last couple of years.
I'm not going to give up posting; as I said above, I do have a Movable Type installation I'll be reverting to, and I'll continue reading what my friends and neighbours have to post. However, I don't feel comfortable posting here any more. Sorry.
When I posted about lib-flickr-minimal, I noted that the newly-launched flickr.places.placesForUser method made a more interesting demo of data you could fetch when authenticated than, say, showing a user's most recent private photos. Evidently the developers at Flickr agreed it was an interesting concept, because over the last couple of months that area of the API has been extended considerably, As a result, I've expanded the demo into an AppJet application of its own.
Where? What? When? is the result. It shows you, on a map, the locations with the most photos according to a given criterion: by default, that's a tag, but it can also show your photos, or those from your friends and family, or your contacts. You can then inspect a place and see the most recent relevant photos, or the most popular tags, for that location.
How did that evolve from the initial demo app? Instead of simply printing a table based on Flickr's response into the document, I directly plotted the results on the map. I added a small form to enable the choice of criteria, and when Flickr added the placesForTags method, I added that as a choice. Belatedly, I realised that would also work for users without authentication, so I removed the requirement to authenticate, and made tags the logged-out default. (The image above shows a slight change to the initial results: it's the same tag, London, but at the neighbourhood, not locality, level. All of the locations are within the greater city's area, which probably won't be a surprise, but that's not true for Paris. Evidently, what happens in Vegas doesn't always stay there.)
The design of the application isn't quite settled, but I knew I wanted to replace the standard Google Maps pushpins with partially-transparent circles. Initially, I went with red, but when I showed it to colleagues, they said it reminded them of maps of bomb blast radii, so I spent a while looking around for the right colour, before settling on a yellow. The circles themselves are scaled according to the natural log of the number of photos for that location; I played with square roots as well, but I feel that logarithms give the right sense of scale.
The last piece of work I did was adding tag display for locations, using the tagsForPlace method. These tags can be surfed: clicking on one will load a new search for the given tag. It's noticable that the first few tags for most places are almost always place names, while common tags seem to share a familiar pattern of scattered, similarly-sized circles across the US, Europe, south-east Asia and coastal Australia.
There's still a few things I could add; tag persistence in URLs (to make it easier to share pages), better loading indicators (especially initially), options on which photos are shown, and links to view the search on Flickr itself, for example. There's also a missing question: while the API methods support maximum and minimum times, I haven't yet added options to allow you to show When? However, for now I think I've done enough (and I'll note that the site has a link to view the source of the application, if you fancy hacking on it yourself.) Enjoy.
A few weeks ago, when I was finally prompted to write up my EXIF to machine tags script, I parenthetically remarked that
ways of getting all predicates for a namespace, and values for a namespace (at least within a given user's photos), would have made my list for 'things you'd like to see in Flickr' if I'd felt able to get away with being so demanding
Funnily enough, a mere week after posting that, Aaron Straup Cope posted to the yws-flickr group, announcing exactly what I'd obliquely asked for: methods to work with the parts of all machine tags on Flickr. I set to work, and by that weekend had produced a machine tag browser.
Thanks to some coding help from Tom Insam and suggestions by Ryan Gallagher, the currently live version is a fair bit nicer than the initial version. The code is still a bit of a mess internally (there's far too much repetition), there are some bugs (values with full stops (or decimal points) in particular), and I still have three items on the TODO list.
Despite this, it's still sufficient for users to see that the astrometry.net system has been able to solve about 85% of the images it's processed; that three images have had an ImageMagick Lomo effect applied before upload; the names of Len Peralta's monsters by mail; and where people take screenshots in Second Life. In fact, I've been pleasantly surprised to note that the code.flickr blog mentioned it when Aaron launched machine tag heirarchies to the wider world.
As it says on the browser itself, the source code (all the clever stuff is in JavaScript) is available on github, and I'd love to recieve fixes, changes, or requests. In the meantime, have fun looking around.
Aral Balkan is trying to run the website for his conference on Google App Engine, the same platform that snaptrip uses. In October, he posted twice on Twitter:
I'd also noticed this, because the snaptrip login page (which does double-duty as an FAQ and news page - maybe I should rename it?) pulls in entries tagged 'snaptrip' from the Atom feed of this very weblog, and after my third post it failed to update for a good half a day. I wasn't that bothered, and didn't bother to double-check the documentation, which does clearly state thatGreat, you have no control over how Google App Engine caches data requests. Pulling in RSS feeds? Forget about it! (It uses Google Proxy and you can't tell it not to cache a feed or set the cache duration.)
Evidently this is the same problem as Aral has, and as usual, Tom Insam had an answer. It's from a slightly different direction (working with Google's Open Social containers), and as he said, it's what "everyone has done for years to bust caches you don't control": append an incrementing (or random) parameter to each request, which should mean that you're not hitting the cache. Having finally written a new blog post about snaptrip, I can confirm that this approach works. I'm not sure I'll leave it in - it seems a bit rude - but if timeliness is important, you might want to do the same.App Engine uses a HTTP/1.1 compliant proxy to fetch the result.
It also occurs to me that, if every call to urlfetch is cached for some time, then you may find that repeated calls via API libraries might give somewhat unexpected results (although they're more likely to have changing arguments, anyway). Be careful out there.
A decade ago, when the Jubilee Line was extended from Green Park to Stratford, there were plenty of glossy books published, examining the design and architecture of the twelve stations that made up the extension. Deservedly so, too; one, Foster's Canary Wharf, has become iconic in that time. There's still plenty you can find about the philosophy of the designers, and the way they wanted a commonality but individuality for each of the stations.
By contrast, it's almost impossible to find out about the thinking behind the Victoria Line. This was only all-new Underground line in the last century¹, and it's forty years old this year. Most people, if they think of its design at all, consider it dull at best.
However, I've been using it for my commute for a year now, and as a primary line for half a decade, and I think that does it a disservice. First, consider the station layouts. This is, I'll admit, more commonly thought of as engineering, but even so, someone had to think about it. There are sixteen stations on the line; five have cross-platform interchanges with either Tube or British Rail lines, far more than any other line², while all but one station offer interchanges with either Underground or British Rail lines.
Admittedly, partly this is due to politics: during the "tube boom" from 1898 to 1908, the organisations building lines were in competition with one another, whereas the Victoria was the first line designed by London Underground, a single company responsible for all lines. Even so, it's a boon to people who use the line - ask anyone who changes to the Piccadilly at Finsbury Park, or the Bakerloo at Oxford Street.
Beyond the engineering, though, I think the stations are also designed well. Unlike the aforementioned Jubilee Line, most stations follow the same basic look, with three escalators³ down to a main hallway between the two platforms. Unlike some earlier lines (the Central Line springs to mind), these are almost always straight, and I can't think of a station with steps from the central section to either platform. As I've said before, there are also cross-platform interchanges, which complicate things, but even there, consistency leaps out in other ways.
All of the Victoria Line platforms are tiled in a light, almost blue-tinted, grey, with simple wooden benches. Each also has a mural; there's a lovely set on Flickr by Chutney Bannister collecting them all. Recently, the southbound Oxford Street tiling was refurbished as part of the station's PPP makeover, and I was impressed by the lovely, modern design that replaced the snakes-and-ladders mural you can still see on the northbound platform. It turns out that this was the original design, removed in the 1980s after the Oxford Circus fire, but now re-instated, and it doesn't look at all dated - in fact, it's positively modern.
For now, the original 1967 tube stock is still used on the line. However, next year should see the introduction of the new 2009 stock, which, to be honest, I'm somewhat dreading. As with the stations, these are nicely consistent and minimal, with a quirky use of circular glass panels dividing vestibules from seating areas, and standard seating. The new stock will introduce more fold-up seats, and more room to stand, at the cost of fixed seats. I suppose I should wait and see how it turns out, but my gut feeling is that I'll dislike them.
That's not to say the line is without problems. As part of the engineering work to get the line ready for the new trains, its previously solid reliability seems to have taken a knock. More seriously, the above-ground buildings are generally appalling, with far too many of the stations lumbered with unpleasant subway complexes or buildings that look like glorified portakabins. This is particularly shameful at Highbury and Islington, where a damaged but glorious old station was demolished in favour of the current single-storey shed.
Despite this, I think the effort going into the line has been unfairly neglected. The design work for the Victoria line seems to be largely lost, on the Internet at least. Mischa Black was in charge of the overall design effort, leading the Design Research Unit, but I can easily imagine how the utilitarian style leant itself to concealing the identities of the others who contributed. I think it's a shame; the line, while perhaps understated, deserves more attention than it gets. I can't imagine my London without it.
¹ Parts of the Jubilee line were inherited from the Metropolitan line in 1977, and of course the extension in 1999, while needing new tracks, was not a new line end-to-end. Amazingly, the Central London tube network we know today - with the exception of the Victoria and Jubilee lines - was completed by 1907.
² I believe the Central, District and Piccadilly each have two, excluding Victoria Line interchanges, but none are within zone 1 (I'm thinking of Stratford, Mile End, and Hammersmith).
³ Annoyingly, cost-cutting sometimes (as at my home station, Blackhorse Road) led to the central escalator being replaced by a fixed staircase, which means that any failure results in people having to walk or, in extreme cases, station closures.
Last week Safari 3.2 was released, with the usual minimal release notes: "This update includes stability improvements and is recommended for all Safari users." The security notes were somewhat more forthcoming, but even there, not everything is covered, for as well as bug fixes, 3.2 quietly added support for two big security features: EV SSL, and Google Safe Browsing.
Neither of these changes, obviously, is covered in the release information, but since the (very good) MacJournals writeup of details of the anti-phishing features was reposted at Macworld, there's been a small whirl of further commentary, especially as the latter includes data collection for Google. Most of the (sensible*) concern has been raised because Apple's terms and conditions, unlike those of Firefox (who also use the Google Safe Browsing API), allow Google to make use of the data sent as a result of surfing using this plugin for any purpose, not merely enhancing that particular service. This might not be so bad if it wasn't also for the fact that the Safe Browsing checks fetch and send data by default.
Personally, though, I can't say I'm bothered by either of these. I'm sure Google get far more useful information from searches and opt-in service usage than they get from partial hashes returned when browsing to potentially hacked sites. As for defaulting to using the service, well, both Chrome and Mozilla also do that, and as with Firefox, Safari offers a preference to disable phishing detection.
What is more surprising to me is that so few people have connected the release of 3.2, and its emphasis on security over features, to the removal of Safari as a "safe" browser from Paypal's list in February:
"Apple, unfortunately, is lagging behind what they need to do, to protect their customers," [PayPal security chief] Barrett said in an interview.
I have little doubt that there's been behind-the-scenes back and forth between PayPal, and similar organisations pushing these changes, led Apple to release this sooner rather than later, in the 3.0 branch (rather than waiting for Mac OS X 10.6 and Safari 4.) Perhaps a more sensible place for people to raise questions is whether EV-SSL and Safe Browsing are actually useful, or if they're merely security theatre? Now there's a well-researched comment piece I'd like to see.
* There's also a lot of kneejerk "OMG Google haz my datorz!" nonsense, but reading the article makes it clear that only hashes of URLs are checked, and even that's only when a partial hash is matched against a hash of your current URL.
Last week Kellan from Flickr published my interview on code.flickr. I'm still somewhat amazed that they chose me to ask, but then I'm also pleased at how much people are liking snaptrip, and I'm happy to see my words in print, as it were.
I actually compiled my answers a couple of weeks before it was posted, hence the reference to groupr as a "lost project". Now, of course, it's back, but I've already posted a couple of times about that. What I would like to do is - finally, and belatedly - document (and update the released version of) my EXIF machine tagger.
Why bother with such a thing? Flickr will extract EXIF metadata, but it won't allow you to do any aggregate queries on it. (Well, that's not quite true; at dConstruct 2007 Tom Coates leaked some URLs which I picked over, but they don't cover all the useful things I'd like. Plus, it's not documented.) By extracting all the data from my photos into machine tags (and a local SQLite database), it becomes possible to point people at all the photos taken at the wide end of my widest lens, or those taken with a particular make of camera (and to do more complex queries locally).
With that out of the way, how do you go about such a thing? Well, as usual, it's actually a fairly simple joining operation. Get a list of photos, and for each of them, get the EXIF data (using flickr.photos.getExif), then store the data locally, and add tags back to Flickr. There's not much munging invovled - I convert spaces in the EXIF field names to underscores, and some things get put in the "file:" or "camera:" namespace, rather than "exif:" - so it's all pretty straightforward. (I do preserve spaces in the EXIF values, though, by quoting my arguments to the addTags method.)
I also add an meta:exif field with either "none" or the epoch seconds of the time of tagging, so that it's easy to exclude previously-tagged images from being examined again. Another minor niggle is that, to add tags, a script has to be authorised. I copied the code chunk from the flickr_upload script in a Perl module, and it seems to work for me.
However, the fact that users need to get an API key, secret, and then a token, is naturally going to limit the audience for such a script. A few other users have metadata in the "exif:" namespace, but it's not exactly common. It's hard to turn the script into a web app, too, since it needs about a second per image to run, and the first run has to examine your entire library, which these days is typically thousands of images. I may still do it, but I haven't bothered for months, so I wouldn't count on it.
Another drawback is that machine tags are normalised at Flickr. This means that when I query on exposure bias, both -1/3EV and +1/3EV show as just "exif:exposure_bias=13ev". I've been thinking about ways around this - by querying raw tags - but it's not straightforward. (Ways around this normalising, and ways of getting all predicates for a namespace, and values for a namespace (at least within a given user's photos), would have made my list for "things you'd like to see in Flickr" if I'd felt able to get away with being so demanding.)
One final observation is that the script's in Perl, and uses XML (which is, apparently, sometimes compressed at Flickr's end; at least, I had to add Compress::Zlib at one point for some reason). If I was to redo it, either in Python or Ruby, the data would all be fetched as JSON, and it'd probably get a few more users. Ah well. Installing the prereqs shouldn't be too hard.
That said, of course the script, as is, proved useful. I run it manually after an upload, while Tom, who is (as ever) a bit more sensible, has his fork running as a cron job. Either way, please download it, play, and feel free to let me know what you think.
I've finally launched a new version of groupr that includes a view that I've wanted for ages: recent photos in your groups. Like this, in fact:
The obvious inspiration for this is Flickr's own recent photos from your contacts page, which, as the name suggests, shows photos that your friends, family and others have posted to the site, in reverse chronological order. It's a great leaping off point if you want to follow other people's work, and I'm sure that without it I wouldn't use Flickr nearly as much.
It's pretty self-evident that a similar page for your groups would be a good idea, then, and it has been much requested over the years. The problem is that building it isn't easy. You have to deal with the API join: to get information on the photos in each group you have to make an API call, so for 140 groups, that's 140 calls, each taking up to a second.
For a while, I thought this was an insurmountable problem for groupr: nobody's going to wait for over two minutes for a page to load. However, this problem also exists on the groups page, and so I solved it the same way: by using client-side calls (AJAX, if you like, although technically I use JSON by preference), cached by the server for later use in generated pages.
Last week I started putting the theory into practice, and the final conceptual leap was in the order I made the group calls. Initially, I queried the groups in alphabetical order, but it occured to me that the more photos were in a group, the more likely it was to have recent updates. Of course, this isn't perfectly reliable, but having made the change, it seems to work; typically, the "recent photos" page fills up with the most recent photos fairly quickly.
(I use a similar technique on the page that lists your groups, and lets you sort them in various ways. The list page needs a little more work, but I expect it's still handy.)
Having finally implemented this, I'm pleased at how easy it was now I've laid the groundwork, and also a bit surprised that nobody's ever done this before. Hopefully the recent view page will prove useful to some of you. Feel free to leave comments here if you're seeing errors or have suggestions on how to improve it.
groupr (my little JavaScript application that gives users an overview of their Flickr group membership) needs to be able to communicate with Flickr. That's really not hard; getting the most recent public photos posted by a user can be done trivially, either using feeds or the API proper.
However, most of the calls that you need to write really interesting applications require authentication, so that they can see private data. Rather than use the password antipattern, Flickr uses a well-thought-out multi-step system. Unfortunately, this can be a bit tricky to wrap your head around, and harder still to debug. It was certainly something I spent a while grappling with for groupr. That's the main reason I've split out the parts of groupr that talk to Flickr into a library on AppJet called lib-flickr-minimal.
As the name suggests, the library doesn't actually do that much. There are methods to handle the steps of authentication, and there's a generic function to call any Flickr method. However, it's more than enough for me to write both groupr, and a little demo application that guides other users through the process of handling authentication.
(A little on that demo application. I spent a few minutes trying to think of a method that required read privileges that would not be too obvious and dull ("you have 500 private photos", for example). Thankfully I remembered the recently-launched flickr.places.placesForUser method, and so I decided to use that as my example call. A bit more work meant I could plot the places returned onto a Google map, so now you can see where you've taken (or at least, geotagged) the most photos.
Ideally I'd rewrite this to produce something prettier, like Dopplr's lovely raumzeitgeist images, but for now, it's a nice little one-page example.)
Philosphically, I prefer this style of library. There seem to be two schools of thought when it comes to building such things. You can tell from the source of the library that I'm in the "least possible work" camp: provide helpers for the functions that are tricky, but for most calls, let the user consult Flickr's documentation to figure out what to call, and use JSON as a return format to make everything that you get back an object (or at least, a rich data structure).
The other camp, which I think of as being influenced by Java and other less dynamic languages, wants to provide a method for everything. As a result their implementations tend to have lots of boilerplate code for handling every single Flickr method (there are about a hundred now), and more for parsing the returned XML (rarely, if never, JSON) and add to it convenience methods for such things as constructing URLs.
While the latter style is probably superficially appealing (you get documents in one place, and the library can error-check locally) it also has significant drawbacks. When Flickr add a method, or extend the returned data, the library has to be patched and re-released. Many libraries only implement the methods of interest to the author, leaving chunks of the API unimplemented. (These are particularly annoying for me; they tend to implement flickr.photos.search, which seems to be the cornerstone of the Flickr API, but ignore the interesting methods around the edges, which I seem to be drawn to.)
There is a nice middle way, which is to use metaprogramming and the API's own reflection methods to construct a list of allowed calls and arguments, giving error-checking but also updating automatically when Flickr add methods. The libraries I prefer for both Python and Ruby do this, and very nice they are too.
To be honest, this is probably where I want lib-flickr-minimal to end up, but for now, I'll happily take a library that stays out of my way rather than one that aims to do everything but only implements a few things. Hopefuly others on AppJet, or those looking to implement Flickr authentication, will find it useful too.
Long-time readers here may remember groupr. (If you don't, it was a small web application that loaded the photos in your Flickr groups, something that, oddly, you can't do on Flickr itself.) I wrote it at the beginning of 2007 for Fotango's Zimki platform. Of course, when that died at the end of last year, groupr vanished, but not before I took a backup of the code and templates underlying it, in the hope that one day I might be able to revive it.
For a few different reasons, I've been considering bringing groupr back recently. I could use Google's App Engine, as I've done for snaptrip, but that was from scratch, and for this project, I didn't fancy porting both the code and templates. I had a quick look at Helma and Trimpath, but I didn't get on with either of them. There's also the fact that they they're not hosted solutions, and part of the joy of server-side JavaScript (SSJS) is not having to worry about finding a server. I also tried Reasonably Smart, but you have to be pretty clever to get git working, and I couldn't, so that was out.
Eventually I found AppJet, and after a quick look I was convinced that this was probably a good place to end up, and after about eight hours to port what I had, and another five or so to fix up some things I never quite polished off on the old version, you can now use groupr.appjet.net.
So, how does it compare to Zimki, and how hard was it to port the code? (After all, big names are now talking about portability in the cloud). Well, AppJet may be closed source, but they offer a downloadable JAR which ran without any effort for me on Mac OS X, meaning both that I could develop locally (even offline, with cached data), and that if AppJet vanishes (which, after all, happened to Zimki) I can take groupr and run it on a server of my own. In this case, practicality trumps theoretical openness.
AppJet's IDE feels a lot nicer than Zimki's did (although I barely use (or used) either, preferring BBEdit with AppJet's JAR, or Trawler for Zimki). I also approve of the way that libraries are handled (they're just apps whose name includes the 'lib-' prefix) is pretty nice. You can see what is using a library and there's provision for inline documentation too. The community feels bigger than Zimki's ever did (although that might just be because the idea of SSJS is taking off), and I was able to find a few useful libararies (such as a TrimPath template port) pretty easily. Speaking of libaries, AppJet's 'storage' is oddly non-core, but it's a pretty nice row-style store with nice querying facilities. It lacks Zimki's handy "expires:+2h" syntax, but that wasn't too hard to fit in myself.
One definite annoyance I have with AppJet is that they don't keep all their libraries out of the global namespace. Zimki's functionality was all hidden in a zimki object, but AppJet has a few top-level standard libraries, and 'page' and 'response' both clashed with names I was using in groupr's previous version. Another is that there's no way of handling non-JavaScript files, so both static files and templates are tricky. I've ended up with the former being hosted on my main server, and the latter as a hash of triple-quoted strings (a Python-ism that AppJet has imported into their JS runtime). Proper file support, like Zimki had, would be a boon there. However, both of these were pretty easy to overcome, and it turned out Zimki did very little that AppJet couldn't replicate. (Replacing the (Mojo, I believe) API calls was four lines of jQuery; replacing the server-side API cleverness, for my needs, was a few lines of JSON.)
Overall, then, I think I'm pretty happy with my experience so far. I've managed to revive the project without too much hair-pulling, and, as I said, even extended it from the state it was in on Zimki. Maybe server-side JavaScript has a future after all?
on Flickr, EXIF, Machine Tags