Futuretalk: ThruDB

Posted by tobi — 12:36 PM Dec 28

Igvita shares the details on ThruDB, another take on the document storage paradigm.

Article: ThruDB, faster and cheaper than SimpleDB

The architecture sound incredible. Amongst others there is the rank and file of the high scalability open source software such as memcached, spread, CLucene and facebook’s newcomer thrift for wire protocols.

ThruDB is able to use Amazon S3 as a permanent data store which makes it an ideal fit for EC2 installations. For quick document access its able to utilize the local disks and even memcached.

Everything about ThruDB’s design is genius. The innovation here is that it separates the concerns of permanent document storage and querys. For querying documents it uses the lucene fulltext search engine. CLucene is naturally more suited for the requirements of a web application. SQL cannot compete with the quality and features of lucene for query and lookup.

In the end you have something more safe, more scalable and much faster than a traditional RBMS with the added benefit of world class full text search.

Cheers Jake.

1 comment

Startup help

Posted by tobi — 11:11 AM Dec 27

Fantastic article from readwriteweb: 36 Startup tips

There is no way of saying where Shopify would be at by now if we would have had this article to rocket fuel us when we incorporated 3 years ago.

2 comments

Google Chart API

Posted by tobi — 04:47 PM Dec 06

Google made their internal chart library available to the public:

The kicker? It all works using urls. You simply embed an image pointing to their chart engine at http://chart.apis.google.com/chart and pass parameters with your data. This got to be one of the most clever internet API’s yet.

Update: Thanks to reddit for those:

12 comments

ActiveShipping: call for contributions

Posted by tobi — 10:30 AM Oct 16

At Shopify we are shifting our attention to shipping in the near future. ActiveMerchant supports an impressive 30+ payment gateways and its time for the library to become more than ActivePayments: we want ActiveShipping.

To this end i’d like to collect all the shipping related code you may have floating around. Have you integrated with UPS? Canada Post? Any shipping service? Do you have a script to generate shipping labels? Bar codes? Implemented First Fit / Best Fit Descending Packaging algorithm ?

If you have anything to contribute under your copyright or if you can persuate your employer to relicense such snippets of code under the terms of the MIT license for our new project please send an email to tobi jadedpixel.com to work out the details.

The more code we can scrape together from contributions the sooner we can release another unified access layer for such services and the more likely it is that we get it right.

6 comments (closed)

RailsConf Europe

Posted by tobi — 12:46 PM Sep 13

Flying to railsconf europe today. Despite growing up in Germany I only made it to the capital once before so I’m really stroked to get such a good opportunity to enjoy the town for a few days. The travel guides indicate that the fun to be had in Berlin is off the charts.

I’ll speak on the 19th on outsourcing to open source. PDI checking it out.

( If anyone has the capacity to pick up an iphone for me before flying over the pond please send me an email to tobi@leetsoft.com )

7 comments (closed)

Futuretalk: CouchDB

Posted by tobi — 11:42 AM Sep 02

I have to confess: I really don’t like relational databases. I can’t wait for the day we can ditch them.

Think about it for a second: Databases store data to disk. Thats all what 90% of us use them for. They are essentially elaborate hash tables backed by a disk drive. Why are they more lines of code than some operating systems?

Despite that, unless you have a really well thought out setup, a disk failure is still a major disaster. Even if you have backups, even if you have replication, there will be downtime and manual labor while a new master server is established. Databases never put your 10-20 commodity server boxes with all their spare disk space to use. They always sit on these really expensive ivory tower IBM boxes outside of your cheap cluster.

Despite million man years of research databases are actually pretty dumb. You have to tell them about every nuance of your schema, you have to tell them about indexes and so on. If you forget an index they are perfectly happy to run sequentially run through all the data you ever inserted into them many times a second.

Replication is generally a nightmare and every machine involved in the replication needs to have enough disk space to store the entire content of the database.

There are several interesting projects which try to re-invent the database as we know them. Yesterday i found out about a particularly interesting one: CouchDB a contender for “The next generation web storage” as their website proclaims. The project started out using C but eventually changed to Erlang which is a perfect choice for highly parallel server software.

CouchDB has no tables, it just has a flat global namespace for documents. A document is a simple JSON record.


POST /shopify/
{
 "value":
 {
   title:"Arbor Draft",
   type:"Product",
   price:299.00,
   tags:["snowboarding", "freestyle", "wintersport"],
   description:"...." 
 }
}

Instead of defining the schema we simply add arbitrary records. There are no tables.

So how do we receive all the records again? CouchDB uses the concept of views which are essentially javascript methods. It uses map/reduce to find matching records in its global namespace so that at query time the results are available instantaneously. This is a huge performance boost for web applications which generally have many more queries than update/inserts.

Lets install some usefull views under /shopify/all:


PUT /shopify/all
{
  "_view_documents": "function(doc) { return doc; }",
  "_view_products": function(doc) { 
       if(doc.type == 'product') { return doc; } 
   }
}

GET http://couchserver/shopify/all:products
returns:
  {
    "_id":"all:products",
    "rows":
    [
      {
        "_id":"64ACF01B05F53ACFEC48C062A5D01D89",
        "_rev":"62D22746",
         title:"Arbor Draft",
         type:"Product",
         price:299.00,
         tags:["snowboarding", "freestyle", "wintersport"],
         description:"...." 
      },
  }

There are a lot more cool things in CouchDB. Notice that the returned document has a _rev? Older revisions of documents are only deleted if you say so. If you are working on a wiki you just got your historical data for free. Unfortunately CouchDB is still in alpha but i think the fundamentals are sound. Its a lot more aligned with the way a modern web application works and needs its data represented. Its replication system is already much more powerful than that of other database systems and in fact is very similar to the way google works with tis bigtable and map/reduce infrastructure.

For further information head to the projects Wiki.

31 comments (closed) Filed under: Code

Seam Carving

Posted by tobi — 12:34 PM Aug 23

Impressive photo resizing algorithm, via dpreview

7 comments (closed)

Static DNS despite of it all

Posted by tobi — 02:58 PM Aug 15

When external services have to access your code from the outside world development setups often become complicated.

Take for example writing a paypal IPN based application. Paypal’s sandbox wants a callback address which they can post the test IPNs to. Writing a facebook application? Many public urls have to be exposed to make it work.

Your options here are either to forward the required ports in your firewall or deploy your code to a publicly accessible area and perform the trial & deploy & repeat dance.

Worse, if you actually develop from a laptop, maybe from coffee shops around town, all bets are off.

However there is help, a little known aspect of SSH tunnels called reverse tunnels can be used to our advantage here. To enable it you have to edit your /etc/ssh/sshd_config:


# sshd_config
GatewayPorts yes

Unfortunately the feature is disabled by default and requires OpenSSH 4 or newer.

Once enabled you can tell the server to forward any traffic arriving on a local port through the tunnel to one of your local ports.


ssh server -R *:5555:127.0.0.1:3000 -vv

Read: Traffic from *(any)ip arriving on port 5555 goes through the tunnel and is released on the other side to 127.0.0.1:3000, your local rails application.

Now you can simply create a new script in your rails folder called script/tunnel and you can work from everywhere!


#!/bin/sh
echo "Listening on port 5555" 
echo "Forwarding to localhost:3000" 
ssh server -R *:5555:127.0.0.1:3000 -vv

6 comments (closed)

There's no one programmer who does the work of ten other programmers.

Posted by tobi — 09:07 AM Aug 08

A great slashdot comment regarding the difference between programmers and great programmers:

There’s no one programmer who does the work of ten other programmers. One uber-programmer does just as much work as one ordinary programmer. It’s just that the results solve ten times as many problems. Programming is fundamentally a design problem. A great bridge designer doesn’t do the work of ten lousy bridge designers; the great one designs one great bridge in the time it takes the ten lousy ones to design ten lousy bridges.

The best approximation is that each problem has a certain complexity and a certain size. The size determines how long it will take, and it doesn’t matter how good the developers are. The complexity determines how good a developer is needed to make progress at all. If you’ve got only easy problems, an uber-programmer doesn’t help you much (unless the programmer can find a smaller, harder problem that replaces the big easy one). If you’ve got a hard problem, ten average programmers will work on it forever without getting any results.

And there’s one last thing specific to computers: the computer can solve easy problems for you, but making it do so is a hard problem. But solving that one hard problem (plus some processor time) resolves a lot of easy problems. Another type of hard problem is writing a magic library function that makes a range of moderately hard problems easy enough for average programmers to solve.
If you’ve got ten people essentially doing data entry, an uber-programmer may be able to eliminate the need for them to do that at all. If you’ve got ten developers working on some problem, an uber-programmer may be able to double their productivity. In either of these cases, the uber-programmer directly produces something that isn’t part of the actual project, but the benefit to the project is on the order of ten average programmers’ work. And, if the uber-programmer reduces the complexity of the problem to put it in reach of the rest of the team, no amount of ordinary programmers’ work would benefit the project as much as the uber-programmer’s contribution. Of course, if you require an uber-programmer to literally do the work of average programmers, there’s no benefit at all.

2 comments (closed)

Married

Posted by tobi — 10:11 AM Aug 07

On sunny August the 4th I got married to my love Fiona.

I love you Fi and I’m looking forward to spending my life with you!

I would like to thank everyone who was there and made it work. Especially Bruce, Dale and Kim who organized the lion share of the occasion. I’d also like to thank all from my old and new family who traveled from Europe, Central America and Asia just to attend. You guys rock.

In best geek fashion we crowdsourced the wedding photography with following challange (german pronunciation of challenge ):

The results are slowly arriving at flickr tag tobifionawedding

18 comments (closed)

Nginx Gzip SSL

Posted by tobi — 02:13 PM Jul 25

We switched our production server farm to Nginx when we moved it to toronto a few months back and it has been working admirably well.

A few days ago we ran into an issue though. GZip compressed responses which were requested through SSL were cut off after 32kb which was problematic because our biggest js file compresses down to 62kb.

After some upgrading and experimentation the culprit was found to be the following config line:

gzip_buffers

syntax: gzip_buffers number size
default: gzip_buffers 4 4k/8k
context: http, server, location Assigns the number and the size of the buffers into which to store the compressed response. If unset, the size of one buffer is equal to the size of page, depending on platform this either 4K or 8K.

4 * 8kb == 32768bytes, exactly where the transfer was cut off when accounting for HTTP response header size.

Adjusting the config to gzip_buffers 16 8k; saves the day here.

2 comments (closed)

Ostrava Slides & Video

Posted by tobi — 01:10 PM Jul 04

Robert Cigán from rails.cz taped the talks of the recent Ostrava on Rails conference.

Here is my talk as well as the slides:

Via Google video :

5 comments (closed)

Ostrava on Rails

Posted by tobi — 10:14 AM Jun 26

I just returned from the most exciting 4 days in Europe to attend the Ostrava on Rails conference. After about 17 hours of travel I finally made it to Prague where I was supposed to have a stopover of several hours. It was all planned in advance. I was going to wait an hour in the Airport for Jamis Buck to arrive and we were going to take the bus downtown to do some sightseeing. Well in theory, theory is just like practice. In reality Frankfurt airport shut down due to a lightning storm.

With my plane canceled I got right in line at the ticket counter to get a seat assigned for the next flight. Natural Airport madness ensued. After making it to the front they told me that they can only assign seats about half an hour before the plane loads.

No problem, except that every time we managed to get within 45 minutes of take off the plane was delayed another half hour. Of course I could never stop being in line because someone else might get my precious seat. One complete boarding and de-boarding of the plane later, and after about 3 triumphant extra laps around Prague because of more bad weather in this area, we finally touched down with approx. 45 minutes to go until my train left for Ostrava; a 5 hour train ride with no good other trains leaving that day.

Good thing I travel light and don’t have to wait for my luggage at the baggage claim! Or so I thought. The people at the counter told me that the trip downtown to the train station would take about an hour! I couldn’t even make it in time after all this!

One thing immediately became clear to me: Prague’s taxi drivers mean business. After a short exchange that went something like this:

Stranded Rails developer: I need to be at the train station in 30 minutes! (add German accent)

Taxi driver: I’ll do it in 25! (imagine no words at all but him just throwing my luggage in his trunk and flooring the Volkswagen)

This Michael Schumacher of the taxi world manged to make it but had to employ the following strategy:

He drove approx. 90% of the time on the street car tracks (illegal)
He cut from the street car tracks over 2 lanes of traffic into a one way street (insane)
He overtook a police car on the right hand side (illegal, insane)
He drove 80km/h in a 30km/h area with cars parked on both sides of the street. (dear god)

After paying him well I ran into (or almost over) Jamis at the train station and we finally made it to Ostrava.

The conference itself went really well. It took place in a beautiful modern technology park in one of the suburbs of Ostrava. The English talks were well received, the audience asked great questions and was visibly enjoying itself. What really stood out was the perfect organization of the event. At no time were the foreign people allowed to be bored. There was always a program. Great dinner, great pubs, and always a designated driver. The event locations had signs pointing everyone in the right direction and the talks were cut off on time with a subtlety only matched by the Oscars (heh). The team was clearly not properly challenged by the event. I’m sure organizing a conference of 5 times the size would be just as easy for them.

The whole trip was a very enjoyable experience. Ostrava is a ex-coal mining town which is well on its way to re-inventing itself by events such as this conference and the Czech Republic is blessed by a lot of very passionate people. I expect rails and web development in general to hit a tipping point over there at any moment; all the fundamental things are there, all that is needed is a bit of entrepreneurial spirit and incorporation of start-ups.

Here are some thoughts which might help to accelerate this process:

Centralize the rails community. There should be only one mailing list, one forum and one page which explains what rails is.
Create a mailing list and launch local unconferences in the style of barcamp.org around the country.
Follow up next year with a Central Europe on Rails conference and get the whole region involved!
Deploy to railshosting.cz , they rock

Thanks again to everyone, especially Jiří, Lucie and Robert for being such great hosts.

4 comments (closed)

Off to Europe

Posted by tobi — 10:19 PM Jun 19

I’m fortunate enough to have been invited to the Czech Republic for the Ostrava on Rails conference as a speaker.

I’m really looking forward to this trip. I always heard that Prague is a must see town and I was sad that i never got around to visiting it when I was still living in Germany. I’ll have about 5 hours of stop over there before boarding a train to Ostrava, any recommendations?

3 comments (closed)

Cool little http utility

Posted by tobi — 10:40 AM Jun 10

via reddit :

Wbox HTTP testing tool

This little tool which was just released is proving to be really useful for setting up our new server farm. It basically works like a ping for http but you can instruct it to test various aspects of http such as gzip compression and concurrency.

% wbox www.google.it
WBOX www.google.it (64.233.183.99) port 80
0. 200 OK    3774 bytes    407 ms
1. 200 OK    (3767) bytes    273 ms
2. 200 OK    3767 bytes    304 ms
3. 200 OK    3767 bytes    260 ms
user terminated

4 comments (closed)

Too-biased

Tobias Lütke's thoughts