|
MailSpace designCees de Groot$Revision: 1.2 $
This article sketches the design for MailSpace, a JavaSpaces based
Mail Transfer Agent implementation.
The inspiration for MailSpace was born out of frustration after a
sendmail rulebase editing session. Yes, this is the software package that
touches probably 90% of all mail traffic, but the architecture and
configuration language is not exactly modern. For the umptieth time I
considered doing a mail transfer agent myself, when suddenly my memory
pointed me to the location in my brain that contained Tim O'Reilly saying
at a Jini Community Meeting "you guys need a killer app". Now, it is extremely hard to go out and design a killer app - at
best you can try to tackle a lot of interesting problems and hope that you
get a killer app; this could be one of them. MailSpace is nothing but a
simple, extensible sendmail replacement; it makes a lot of processing
explicit and "user-visible", because it tracks messages in a workflow-like
manner in a JavaSpaces tuplespace. My hope is that this will enable MTA
admins to completely customize their mail system, by introducing extra
stages, adding filters, rewriters, and delivery agents. Note that this paper uses the term agent for all the processing
bits. I am fully aware that you need quite a loose definition of agent for
how I use the term, but I think the use is warranted here and the term
fits well. My apologies to all the people who do the real mobile agent
stuff...
A mail transfer agent is essentially a simple workflow system which
is mostly automatic in nature. Mails enter the system, normally by means
of the SMTP protocol, are analyzed, processed, filtered, and leave the
system again - either through delivery to a user's mailbox, or by
forwarding the mail to another MTA. This model fits wonderfully well in
the concept of JavaSpaces:
A receiving agent receives mails from an
SMTP connection and drops them into the MailSpace. It is likely that
the MailCatcher has blacklist-processing implementations, but that
is all the mail processing that it does; One or more processing agents process the
mail, with the goal to convert the mail's initial state
(something like "Received") to an
outgoing state (lets say
Processed), or to decide that the
mail should not be processed at all or presented to a human being
for further processing; Finnally, a transmission agent takes
matching processed messages and removes them from the MailSpace
either by forwarding them to the next MTA (via SMTP or maybe even
via a private MailSpace RMI protocol), or delivering them to the
user's mailbox.
In effect, this is a simple state machine using the message's datatype
as state and proceeding from the start state (incoming message) via an
undefined number of intermediate steps to one of the goal states
(processed message). The state-transition devices are autonomous agents
that act on behalf of various principals (more on that later).
A MailSpace runs on one or more computers that are equipped with
Java2-compliant virtual machines. An advantage of using JavaSpaces is that
distribution of agents is obtained almost for free. It is quite easy to
configure a system where one or two machines receive mail, a pool of four
machines contains mail processing agents, and a 7th machine contains the
actual MailSpace. MailSpace will offer a scripting language, JPython, with a little
shell in order to interact with MailSpace in easy scripts. Short fragments
of JPython code can form the bulk of the mail processing agents, and this
way of writing agents is probably very effective for site-specific work
(due to the advanced string processing in JPython). Security-wise, MailSpace will rely on the standard security layers
of the platform: the JVM security, JAAS, RMI Security, Jini Security and
JavaSpaces Security. As three of these layers are still under development,
for the time being, MailSpace needs to be run in a trusted network, which
does not allow communication from the "outside" to take place via RMI
calls. Later on, this restriction can be lifted and it may even be
possible to let users write and submit their own processing agents (for
vacation filters, etcetera).
| This is very rough stuff. The biggest trick here will be to
come up with something that makes the best of the limited querying power
of JavaSpaces, which probable means a bit less abstraction than is
sketched below. |
This is the most basic type, and contains just a field
messageId, which holds the message ID of
the mail. Mails are split up, upon reception, in a set of MailItems in
the MailSpace, all linked together because they have the same message
ID.
Extends MailItem; doesn't add any data to MailItem, just exists to
group all the various header items together.
Extends MailHeaderItem; contains two fields:
name and value,
containing the header name ("Subject", "Received", ...) and the head
value, respectively. Because JavaSpace cannot do case-insensitive
matching, header names are normalized: every part starts with a capital,
parts are separated by a dash ("X-Mailing-List", etcetera).
In order to be able to select better on mime types, they are
described in a special structure which extends MailHeaderItem. The class
contains mainType,
subType and option
fields. For example, the header line Content-type:
multipart/signed; boundary="==_Exmh_-600054584P"; micalg=pgp-sha1;
protocol="application/pgp-signature" |
would have a
mainType of "multipart", a subType of "signed", and all the rest as a
string in the option field.
Mail header lines containing one or more addresses, are split up
in instances of this structure, one per address. The attributes are
name, containing the name of the header line
("To", "Cc", ...); namePart, the part before
the at-sign of the mail address, hostPart,
the part after the at-sign; and comments,
everything else. For example, the mail address
cg@cdegroot.com (Cees de Groot) |
would
be parsed into "cg", "degroot.com", and "Cees de Groot".
As MTA's normally aren't concerned with the actual contents of
mails, we just lump the body parts here. The class has three attributes:
seqNo, the sequence number of the part,
mimeType, the mime type string of the part,
and data, the actual contents of the
part.
This structure contains the mail's state in the MTA's state
machine. The state is encoded as a string in the
state attribute, and a
data attribute may contain any extra
information that the state-producing agent whishes to communicate to the
state-consuming agent. As mail status values are, conceptually, in the
domain of final static instance variables, they are always written in
all-caps.
In order to maximize efficiency and parallel processing, certain
operations must be designed to happen concurrently. For example, routing
agents may be deciding how a message should be delivered, and delivery
agents should be able to start delivering immediately - this can mean
that a routing agent is still busy with the MX lookup for the Cc:
address, while at the same time a delivery agent processes the result of
another routing agent which determined that the To: is to be delivered
locally. This process means that some bookkeeping is necessary in order to
decide the termination condition. A
DeliveryStatus item is inserted into the space
containing the following counters:
totalDests contains the number of destination
addresses (To, Cc, Bcc) in the
mail. determinedDests is incremented when a
routing agent determined the routing for an address. Finally,
deliveriesLeft contains the number of
succesful deliveries that still need to be made and is set initially to
the total number of addresses. When the latter field reaches zero, the
mail is determined to be succesfully delivered and can be removed from
the MailSpace. Furthermore, when a routing agent determines the path for an
address, it inserts a special item in the MailSpace for the message
containing delivery details; it is this item that triggers a specific
delivery agent. When the item is inserted the original address has a
"processed" flag set to "true", so that no other agents will look at
the address.
This section describes the standard agents of MailSpace. These
agents will be able to handle simple mail configurations.
This agent keeps an SMTP listener open, and processes incoming
mails by splitting them up into the various items described in the
previous section. All these items are inserted, in a single transaction,
into the MailSpace together with a mail status of "RECEIVED". Like most Java servers, the agent is multi-threaded in order to
give the best bandwidth-consumption; multiple instances of the agent may
be run on multiple machines that are all part of the system's MX
list.
This agent reads all unprocessed AddressMailItems in the space and
checks whether they pertain to a local host name. If so, a
LocalDeliveryItem is inserted into the
MailSpace with the user's name. The AddressMailItem has its "processed"
flag set to true, and the determinedDests count in the bookkeeping
record is incremented by one. | This agent could be made to operate in a more sophisticated way
if the host part of the mail address would not be stored in a single
String variable, but rather spread out over n items in the space
("home.cdegroot.com" would be spread out over 3 items, "home",
"cdegroot" and "com", linked together corresponding to the
hierarchy). With this structure, the agent could look for specific
patterns, like an item "cdegroot" linked to "com" indicating
"*.cdegroot.com"). |
This agent reads all unprocessed AddressMailItems in the space and
checks whether they pertain to a remote name by doing MX lookups. If the
lookup succeeds, an SMTPDeliveryItem is
inserted into the MailSpace with the user's address and the MX data.
The AddressMailItem
has its "processed" flag set to true, and
determinedDests for the message is
incremented by one.
This agent takes the next SMTPDeliveryItem instance in the space and
sends out the corresponding message. When the message has been
delivered, the SMTPDeliveryItem is removed and the mail's deliveriesLeft
count is decremented by one. | A possible optimization: an SMTP agent could keep its
connection lingering after the first delivery and check for a couple
of seconds whether there are more deliveries to that host. The MX data
in the SMTPDeliveryItem should be setup so that such matches are
possible. |
This agent takes the next LocalDeliveryItem instance in the space
and deliveres the corresponding message to
/var/spool/mail. When the message has been
delivered, the LocalDeliveryItem is removed and the mail's
deliveriesLeft count is decremented by one. | Some obvious enhancements: multiple types of maildrop, and
having agents that know about user's filters so that can deduce the
correct folder/directory to deliver to (or forward, or whatever). |
This agent simply removes all messages whose deliveriesLeft has
reached zero.
|
|