Older blog entries for Bram (starting at number 103)

Manual Spam Filters

I noticed the other day that most of the viruses I get have a Message-Id tacked on by the local mailserver. A little bit of messing with procmail and suddenly my junk mail level is under control. I'm now spending a few minutes a day creating new rules throwing out junk mail my old filters missed. I'm sticking with only blocking dead giveaways instead of using complicated scoring rules, and thus far it's working extraordinarily well.

It's too bad that bounces have become effectively useless, since there are so many of them sent in response to forged mail. The only way to fix this would be for the local mail server to record all the Message-Ids it sends out, and only accept bounces which are in response to one of those. Given the quality of current email infrastructure, I'm not holding my breath.

It's amazing what a downer junk mail can be. I'm in much improved spirits now that I'm actually getting all my real mail instead of accidentally deleting it along with the junk.

blinky.at.or.at is sending me huge amounts of junk.

Ebay correction

The method of gaming Ebay I gave earlier isn't nearly as easy as I stated. The problem is that the first out-bid by minimum amount you do might actually win, because you can't tell if the current price is the current second place bid plus the minumum increment or the current first place bid exactly. The attack still works, but to do it you need to do multiple bids increasing by exactly the minumum increment each time, which is a much more flagrant.

Ebay is probably less worried about this than plain old mail fraud. One thing which helps is that auctions are frequently won by snipers, and these tricks don't work when they're bidding. I've taken to sniping everything I really care about, because it saves money.

Trust Metric Correction

The anti-spam trust metric I gave earlier can be easily defeated. A spammer certs a whole lot of fake identities, and from those fake identities, and from those fake identities cert just a few fake identities which they spam from. This will keep the negative certs from propogating back.

Here is a technique which fixes that. For each node, we figure out what the flow of spam looks like viewing that node as the source of everything (how to do this is explained below). If more than half of all such flows label a node as a spammer, then that node is assumed to be a spammer. It's generally impossible for a single node to not be labelled as a spammer in all cases, because when that node is is the source then it gets a very high spam rating, and when any of the nodes which certify it are viewed as the root it also gets a very high spam rating, although not quite so high.

To calculate the flow of spammerness into a single root, calculate flows between nodes along certification lines and though nodes based on the following criteria:

  • Spammerness flows backwards along certification lines, from the node certified to the one which did the certifying.
  • The inflow out outflow of each node sums to zero, except for nodes which have negative certs, which have excess outflow in the amount of their negative certs. The source node is another exception, it can have unlimited excess inflow.
  • The maximum flow through any node is set to be as small as possible. The tiebreak for multiple possible flows with this value the same is that the second greatest maximum flow is as small as possible, then third, etc.

For example, if node A is the source, and A certs B and C, and B and C both cert D, and D has 1 negative cert, then the spammerness ratings for A and D are both 1 and for B and C it's both 1/2.

A threshold is set above which a node is considered to be a spammer. Typically all nodes close to the source wind up looking like spammers and all actual spammer nodes look like spammers.

This technique should of course be combined with distinguishing 'confused' nodes, which are judged as being spammers but are certified by at least one non-spammer node. Confused nodes only get blocked from sending messages if they have directly spammed above the threshold. Without this, very highly connected nodes will sometimes get blocked simply because they're too highly connected, and hence wind being near the source node for all possible sources.

Calculating this exactly might take a while, but there are decent polynomial time approximation algorithms. I'm not sure how small it's possible to get the exponent, but I'm following the policy of designing behavior first, and worrying about efficiency second.

Stamps Against Spam

Ken Birdwell, a coworker of mine, has an interesting idea about how to use stamps against spam. The basic premise of stamps is that a stamp represents some kind of resource, typically a penny, which a sender must have put out in order to get mail accepted. The question is, if the mail wasn't spam, does the penny go back to the sender, get transferred to the receiver, or simply drop into the ether? Ken's observation is that most people send and receive about the same amount of mail, so you could simply have the stamp transfer ownership, without ever turning it back into a penny. This should keep things about at equilibria, and designing a protocol for it is trivial - a client sends a stamp to a server, and the server responds with either a reject or an accept with a new valid stamp.

A few modifications to this scheme can help maintain equilibrium. First, since a lot of mail is sent to dead accounts or at least accounts which don't use this scheme, a sender should try to collect stamps they issued after a month or so to keep them from disappearing into the ether. Second, peers which have an excess of stamps should start probabilistically not caching in (or caching in and sending back) stamps they receive.


I pushed out a new BitTorrent release, but it's seriously borked because of sourceforge. Thanks guys. This is the third release in a row which has gotten garbled - only the last two were nicely garbled, this one is flat-out giving 404 most of the time.


Codecon is coming up this weekend. It's only $95 for three whole days of presentations, everybody should come!


Speaking of which, Codeville, a project by me and my brother, is much more mature than the last time I mentioned it (check the page for details) and will be presented on friday.

The program for CodeCon 2004 has been announced.


CodeCon is the premier showcase of active hacker projects. It is a workshop for developers of real-world applications with working code and active development projects. All presentations will given by one of the active developers, and accompanied by a functional demo.

Highlights of CodeCon 2004 include:

  • PGP Universal - Automatic, transparent email encryption with zero clicks
  • Osiris - A free Host Integrity Monitor designed for large scale server deployments that require auditable security
  • Tor - Second-generation Onion Routing: a TCP-based anonymizing overlay network
  • Vesta - An advanced software configuration management system that handles both versioning source files and building
  • PETmail - Permission-based anti-spam replacement for SMTP
  • FunFS - Fast User Network File System - An advanced network file system designed as a successor for NFS
  • Codeville - Distributed version control system
  • Audacity - A cross-platform multi-track audio editor

The third annual CodeCon takes place February 20 - 22, noon - 6pm, at Club NV (525 Howard Street) in San Francisco. CodeCon registration is $95; a $20 discount is available for attendees who register online prior to February 1, 2004.


Codecon 2004

Just a few days left to submit to Codecon 2004. It only takes a few minutes, get those submissions in!

Is ZF set theory a hack?

Raph links to some discussion whether ZF set theory is a hack. The up shot is that all the different proposed formulations for the foundations of mathematics are several times as large as you would expect.

I've long thought that the one true criterion for acceptability of a basis of mathematics was its intuitive acceptability to a human. Unfortunately there is a long history of humans, including professional logicians, forming bases which later turned out to be inconsistent. Now we know that our intuitions about what is simple are also completely wrong.

Perhaps instead of judging based on untuitive acceptability, a fundamentally subjective and poorly defined criterion, we should instead judge based on number of symbols in representation. That at least is a well-defined and measurable concept. Maybe its possible to make a basis which is based on much more abstract concepts even than set, which requires some work to build even our most basic intuitive concepts, but is much simpler at core.

Formulating a better foundation of mathematics may also help with computer proving. Perhaps we are crippling our theorem provers by forcing them to view mathematics through the lens of human intuition due to our selection of an extremely cumbersome set of base axioms.


clickmazes.com is cool.

Twisty Puzzles

I came up with some interesting twisty puzzle ideas.

Smith Numbers

Smith numbers raise some interesting philosophical questions.

Jumping Champions

Jumping champions are neat.

Gaming Ebay

Ebay's system of minimum bid increments can be gamed slightly. Let us say that I'm selling an item which is currently at $7000, at which point the minimum bid increment is $100. I could now have a shill bid $7100 and have full confidence that this would definitely make the price go up, since if the highest bidder bid less than that the price would have been at their highest bid.

Shill bidding is against ebay's rules, and there is active policing to stop it. However, detecting gaming of this sort done on a small scale is very difficult. In fact, I might not even have known about it - maybe a friend of mine just bid the $7100 to help me make some more money. A technical solution would be much better than a policing solution.

As it turns out, there is a simple technical solution. In the (rare) case where the highest bid is less than a minimum bid increment greater than the second highest bid, the current winning price should be listed as the second bid plus the minimum bid increment, even though if the current high bidder wins they'll get it for their high bid. If someone else bids higher the winning amount should change to what was previously the second-highest bid plus two times the minimum bid increment. Then shills couldn't outbid without running the risk of getting the winning bid and blowing the sale.

Surprisingly, this system doesn't result in the winning amount being dependant on the order in which bids are placed. The winning price is the greater of the second highest bid plus the minimum bid increment or the third highest bid plus two times the minimum bid increment. Or the maximum the winner bid, if it's less than that value.

So this rule causes some extra complexity in calculation of the winning bid, but not very much. It also adds some potential confusion to the interface, but that can be minimized by making the high bidder see their real winning value while showing the fake (and only slightly different) one for everyone else. Whether these problems offset the benefits of stopping such a simple and likely widespread form of shilling is ebay's decision, but I hope they at least read this and consider it seriously.

There are of course much simpler (and potentially lucrative) forms of shilling. For example, one can have a shill bid which is quite high, then later send mail to the real highest bidder claiming that the high bid was a no-show and offering to sell it for their maximum bid. If you get mail like this, don't immediately assume that it's a cheating seller, since bidders are occasionally no-shows. Instead, you should respond by offering to sell it for what the price would have been if the high bid had never happened. I'm about to do that right now.

I have to say I'm impressed with how much improvement ebay has made over the years. I was going to post about some much more serious errors in their bidding system, but it turns out those have been fixed since the last time I studied it.

Codecon 2004

The Codecon 2004 Call For Papers is now out. If you've got an actively maintained cool project, please consider submitting.

Bram's Law

ncm: Sadly, artificially causing difficulty is an invalid loophole. But adding functionality which is inherently more difficult may work. Bloating up what should be a simple spec is mostly just an example of Bram's law in action.

Databases are a good example. MySQL started without transactions at all, has no indexing for joins, and doesn't scale to a decent number of rows even with that ridiculously limited feature set. Postgres, on the other hand, supports full transactions, lots of fancy joins, and scales up extremely well. Unfortunately, Postgres can't be simply pointed at a file and used as a library, which is why people use MySQL.

Image Grouping

I had an interesting idea for a user interface for image grouping. The user is shown three images and asked which one least belongs. Then another three, and another three, etc. I'm not sure what the point of this is, or what one might do with the data, but it seems like an interesting idea.

What Customers Want

The things which will make people love your software, by rapidly plummeting order of importance, are:

  1. ease of use
  2. stability
  3. performance
  4. features

The order of priority many people use when writing software, and, unfortunately, what users generally say they want when asked, are:

  1. features
  2. performance
  3. stability
  4. ease of use

This is a siginificant discrepancy.

Trust Metrics Against Spam

The last trust metric I posted about can be improved significantly both in terms of run time and behavior by switching from number of certs in to number of nodes certed.

Make each position be a float, rather than an int. At each round, lower each node by one unit for each spam it sent out. Then, for each node which is below one which certed it, raise it by ten units and lower lower each of the ones which certed it and are above it by ten units divided by the number of them.

I think further dramatic run time reductions are possible.

Lightning Thermal Energy

The transfer of energy to the outside of a lightning thermal plant doesn't have to be done with anything as fancy as microwaves. A simple pair of tubes, one with air going out, the other with air going in, leaves the energy in completely mechanical form and transfers at high efficiency without acting as a conductor.

ncm: The largest solar plants are actually solar thermal instead of using photovoltaics. This would seem to indicate that solar thermal is just plain cheaper. As for doing anything with lightning, lightning doesn't follow the normal rules you're taught about in electronics class. Any change which can jump through the atmosphere isn't going to much care about a few piddly feet of vacuum trying to act as an insulator. Attempts to change batteries with lightning have gotten a few people blown up, and attempts to charge capacitors with them have mostly resulted in burnt electronics. With regards to your specific idea, if you acatually got the electronics to work (doubtful) it would probably result in the object you were trying to lift being turned into powder or, worse, shrapnel.

Lightning Thermal Energy

Here's my plans for a lightning thermal energy plant -

Go to a place which gets lots of lightning. Dig a deep hole. Extend outwards from the bottom of the hole a bunch of spokes of some highly conductive substance. At the bottom of the hole put a conductive plate attached to all the spokes. Fill the hole with carbon. (Or maybe with tungsten. Carbon is cheaper but more fragile.) On top of the filled hole put a ventilated faraday cage with low heat conductivity. In the faraday cage put a stirling engine. To the top of the faraday cage attach a very long lightning rod.

The idea here is to get the lightning to go through the carbon despite it being a resistor. There are several technical problems here -

  • The lightning might go down the lightning rod but jump around the carbon block
  • There might not be enough energy in the lightning to be useful, or not enough of it might get changed into heat
  • There needs to be some reasonable substance for building the faraday cage out of
  • The lightning may follow the cable carrying the electricity out, despite absurd amounts of shielding. If desperate, this can be fixed by emitting the energy as a focused microwave beam or something along those lines.

Anyone with more engineering knowledge than me know if these problems can be overcome?

Solar Thermal Energy

A bunch of people informed me that the solar enery harvesting technique I came up with is an example of what's called 'solar thermal energy'. Zaitcev has a good analysis of my exact proposal. (The upshot is, it's a lot easier to get the water hot enough to boil it then use a plain old rankine engine. Parabolic dish ones tend to use stirling engines.)

The biggest current solar thermal energy plant is a solar tower. A cheaper design (when done at sufficient scale) is a solar chimney Solar chimneys are a monument to inefficiency but in the desert we have more solar energy than we know what to do with, so the real issue is cost rather than efficiency. The robustness and simplicity of solar chimneys is certainly appealing. After the cost of construction is amortized out after thirty years or so, it produces energy essentilly for free and with hardly any maintenance.

Strangely, the solar thermal plants don't use fancy new kalina cycle engines.

My next question is, could lightning thermal energy be viable?

In other engineering news, there's a new desalinization trick.

94 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!