www.cdegroot.com

MailSpace design
MailSpace design
Cees de Groot
$Revision: 1.2 $
This article sketches the design for MailSpace, a JavaSpaces based Mail Transfer Agent implementation.
Table of Contents
1. Introduction
2. Outline
3. Environment
4. Data Storage
5. Standard Agents
1. Introduction
The inspiration for MailSpace was born out of frustration after a sendmail rulebase editing session. Yes, this is the software package that touches probably 90% of all mail traffic, but the architecture and configuration language is not exactly modern. For the umptieth time I considered doing a mail transfer agent myself, when suddenly my memory pointed me to the location in my brain that contained Tim O'Reilly saying at a Jini Community Meeting "you guys need a killer app".
Now, it is extremely hard to go out and design a killer app - at best you can try to tackle a lot of interesting problems and hope that you get a killer app; this could be one of them. MailSpace is nothing but a simple, extensible sendmail replacement; it makes a lot of processing explicit and "user-visible", because it tracks messages in a workflow-like manner in a JavaSpaces tuplespace. My hope is that this will enable MTA admins to completely customize their mail system, by introducing extra stages, adding filters, rewriters, and delivery agents.
Note that this paper uses the term agent for all the processing bits. I am fully aware that you need quite a loose definition of agent for how I use the term, but I think the use is warranted here and the term fits well. My apologies to all the people who do the real mobile agent stuff...
2. Outline
A mail transfer agent is essentially a simple workflow system which is mostly automatic in nature. Mails enter the system, normally by means of the SMTP protocol, are analyzed, processed, filtered, and leave the system again - either through delivery to a user's mailbox, or by forwarding the mail to another MTA. This model fits wonderfully well in the concept of JavaSpaces:
A receiving agent receives mails from an SMTP connection and drops them into the MailSpace. It is likely that the MailCatcher has blacklist-processing implementations, but that is all the mail processing that it does;
One or more processing agents process the mail, with the goal to convert the mail's initial state (something like "Received") to an outgoing state (lets say Processed), or to decide that the mail should not be processed at all or presented to a human being for further processing;
Finnally, a transmission agent takes matching processed messages and removes them from the MailSpace either by forwarding them to the next MTA (via SMTP or maybe even via a private MailSpace RMI protocol), or delivering them to the user's mailbox.
In effect, this is a simple state machine using the message's datatype as state and proceeding from the start state (incoming message) via an undefined number of intermediate steps to one of the goal states (processed message). The state-transition devices are autonomous agents that act on behalf of various principals (more on that later).
3. Environment
A MailSpace runs on one or more computers that are equipped with Java2-compliant virtual machines. An advantage of using JavaSpaces is that distribution of agents is obtained almost for free. It is quite easy to configure a system where one or two machines receive mail, a pool of four machines contains mail processing agents, and a 7th machine contains the actual MailSpace.
MailSpace will offer a scripting language, JPython, with a little shell in order to interact with MailSpace in easy scripts. Short fragments of JPython code can form the bulk of the mail processing agents, and this way of writing agents is probably very effective for site-specific work (due to the advanced string processing in JPython).
Security-wise, MailSpace will rely on the standard security layers of the platform: the JVM security, JAAS, RMI Security, Jini Security and JavaSpaces Security. As three of these layers are still under development, for the time being, MailSpace needs to be run in a trusted network, which does not allow communication from the "outside" to take place via RMI calls. Later on, this restriction can be lifted and it may even be possible to let users write and submit their own processing agents (for vacation filters, etcetera).
4. Data Storage
This is very rough stuff. The biggest trick here will be to come up with something that makes the best of the limited querying power of JavaSpaces, which probable means a bit less abstraction than is sketched below.
4.1. MailItem
This is the most basic type, and contains just a field messageId, which holds the message ID of the mail. Mails are split up, upon reception, in a set of MailItems in the MailSpace, all linked together because they have the same message ID.
4.2. MailHeaderItem
Extends MailItem; doesn't add any data to MailItem, just exists to group all the various header items together.
4.3. SimpleMailHeaderItem
Extends MailHeaderItem; contains two fields: name and value, containing the header name ("Subject", "Received", ...) and the head value, respectively. Because JavaSpace cannot do case-insensitive matching, header names are normalized: every part starts with a capital, parts are separated by a dash ("X-Mailing-List", etcetera).
4.4. MimeTypeMailHeaderItem
In order to be able to select better on mime types, they are described in a special structure which extends MailHeaderItem. The class contains mainType, subType and option fields. For example, the header line
Content-type:
	multipart/signed; boundary="==_Exmh_-600054584P"; micalg=pgp-sha1;
	protocol="application/pgp-signature"
would have a mainType of "multipart", a subType of "signed", and all the rest as a string in the option field.
4.5. AddressMailHeaderItem
Mail header lines containing one or more addresses, are split up in instances of this structure, one per address. The attributes are name, containing the name of the header line ("To", "Cc", ...); namePart, the part before the at-sign of the mail address, hostPart, the part after the at-sign; and comments, everything else. For example, the mail address
cg@cdegroot.com (Cees de Groot)
would be parsed into "cg", "degroot.com", and "Cees de Groot".
4.6. MailBodyItem
As MTA's normally aren't concerned with the actual contents of mails, we just lump the body parts here. The class has three attributes: seqNo, the sequence number of the part, mimeType, the mime type string of the part, and data, the actual contents of the part.
4.7. MailStatusItem
This structure contains the mail's state in the MTA's state machine. The state is encoded as a string in the state attribute, and a data attribute may contain any extra information that the state-producing agent whishes to communicate to the state-consuming agent. As mail status values are, conceptually, in the domain of final static instance variables, they are always written in all-caps.
4.8. Bookkeeping
In order to maximize efficiency and parallel processing, certain operations must be designed to happen concurrently. For example, routing agents may be deciding how a message should be delivered, and delivery agents should be able to start delivering immediately - this can mean that a routing agent is still busy with the MX lookup for the Cc: address, while at the same time a delivery agent processes the result of another routing agent which determined that the To: is to be delivered locally.
This process means that some bookkeeping is necessary in order to decide the termination condition. A DeliveryStatus item is inserted into the space containing the following counters: totalDests contains the number of destination addresses (To, Cc, Bcc) in the mail. determinedDests is incremented when a routing agent determined the routing for an address. Finally, deliveriesLeft contains the number of succesful deliveries that still need to be made and is set initially to the total number of addresses. When the latter field reaches zero, the mail is determined to be succesfully delivered and can be removed from the MailSpace.
Furthermore, when a routing agent determines the path for an address, it inserts a special item in the MailSpace for the message containing delivery details; it is this item that triggers a specific delivery agent. When the item is inserted the original address has a "processed" flag set to "true", so that no other agents will look at the address.
5. Standard Agents
This section describes the standard agents of MailSpace. These agents will be able to handle simple mail configurations.
5.1. SMTP Receiving Agent
This agent keeps an SMTP listener open, and processes incoming mails by splitting them up into the various items described in the previous section. All these items are inserted, in a single transaction, into the MailSpace together with a mail status of "RECEIVED".
Like most Java servers, the agent is multi-threaded in order to give the best bandwidth-consumption; multiple instances of the agent may be run on multiple machines that are all part of the system's MX list.
5.2. Local Routing Agent
This agent reads all unprocessed AddressMailItems in the space and checks whether they pertain to a local host name. If so, a LocalDeliveryItem is inserted into the MailSpace with the user's name. The AddressMailItem has its "processed" flag set to true, and the determinedDests count in the bookkeeping record is incremented by one.
This agent could be made to operate in a more sophisticated way if the host part of the mail address would not be stored in a single String variable, but rather spread out over n items in the space ("home.cdegroot.com" would be spread out over 3 items, "home", "cdegroot" and "com", linked together corresponding to the hierarchy). With this structure, the agent could look for specific patterns, like an item "cdegroot" linked to "com" indicating "*.cdegroot.com").
5.3. Remote Routing Agent
This agent reads all unprocessed AddressMailItems in the space and checks whether they pertain to a remote name by doing MX lookups. If the lookup succeeds, an SMTPDeliveryItem is inserted into the MailSpace with the user's address and the MX data. The AddressMailItem has its "processed" flag set to true, and determinedDests for the message is incremented by one.
5.4. SMTP Sending Agent
This agent takes the next SMTPDeliveryItem instance in the space and sends out the corresponding message. When the message has been delivered, the SMTPDeliveryItem is removed and the mail's deliveriesLeft count is decremented by one.
A possible optimization: an SMTP agent could keep its connection lingering after the first delivery and check for a couple of seconds whether there are more deliveries to that host. The MX data in the SMTPDeliveryItem should be setup so that such matches are possible.
5.5. Local Delivery Agent
This agent takes the next LocalDeliveryItem instance in the space and deliveres the corresponding message to /var/spool/mail. When the message has been delivered, the LocalDeliveryItem is removed and the mail's deliveriesLeft count is decremented by one.
Some obvious enhancements: multiple types of maildrop, and having agents that know about user's filters so that can deduce the correct folder/directory to deliver to (or forward, or whatever).
5.6. Cleaning Agent
This agent simply removes all messages whose deliveriesLeft has reached zero.