Mailing list providers have to do a bit more. The list itself should be administrated via majordomo. It has its problems, but I don't know of anything better. Mailserv can be used to give a WWW interface to subscriptions, and either hypermail or MHonArc will handle archiving the list on the Web. If you want to provide an NNTP service for reading the list as well (no need to export it to the world, just locally), then mail2news and INN are a good combination. Finally, you'll want an efficient mail delivery backend to handle all the thousands of people who subscribe to your list: qmail is looking to be a good sendmail replacement, and bulk mailer might be a lightweight mail distributer.
All this software exists, but knowing how to use it is a bit arcane and the actual setup is a lot of work. A good project would be to document it all and somehow bundle the tools together to make it easy to install. Some of these programs could probably use improvement, too.
It seems like there's a lot of poorly understood art to getting a certain amount of bandwidth through the net subject to bandwidth and latency constraints. I don't really understand the technical issues well enough to write my own realtime net app. It might be interesting to learn, though, with an eye towards creating a library. There's a good question if a general purpose library is even possible: the tradeoffs for video might be quite different than, say, a game. What I want my library to do for me is to guarantee a ceratin amount of reliability, bandwidth, and latency, and allow me to tell it prioritize traffic if I need to exceed those guarantees.
One thing that would be interesting and useful is to have a little tool that displays the usage of all RAM dynamically by process. See how much RAM each process is consuming, how much swap, both on its own and shared across several properties. There are various sources of this information: /proc/meminfo, ps -axm, and /proc/self/maps. These data sources are somewhat confusing and undocumented, so it will take some doing to get the needed data. It's also unclear exactly what we want to display.
I do want to be able to extend the thing, though. There are some simple things I want - quick lists of all the hrefs and <h#>s in a document, for instance. In general, I want the ability to make by browser help me read the Web better. But of course, Netscape is a huge piece of code and writing a new web browser, even a small one, would be a nightmare. Maybe GROW will save us all. And NCSA Mosaic is actually pretty slick these days, but its network loading code is still single threaded. If that were fixed, I'd probably use it.
IRC is designed as a centralized, top-down network of servers. The servers pass traffic around trying to stay synchronized; individual clients connect to servers and listen in. The problem is that the server network is too brittle: servers get out of sync, get lagged or drop out entirely, and the whole net falters. On top of that, EFNet's server operators generally don't seem to understand how to make the network work right. Back in the old days, at least, half the servers were run by vanity ops who wanted to be cool but didn't understand what was going on.
Undernet tries to address the IRC problems by improving the quality of the servers. The protocols have been improved a bit, and the control of the network is even more centralized - a small committee coordinates teh network. I haven't spent a lot of time on Undernet (the people I want to talk to aren't there), but it generally looks pretty good. No telling how well they'd handle the same load EFNet has, though.
I believe that at some point, any centralized network like IRC will collapse under it's own weight. It would be interesting to try to apply some of the principles of bottom-up systems to redo IRC. One thought people have kicked around for a couple of years is to go serverless, have channels served by a floating system of servers that spontaneously form to handle traffic as need be. This could go anywhere from no server software at all, clients passing around messages, to something much like the current design but more uncoupled.
In order to do this redesign effectively, a lot of thought needs to go in. Simulating how proposed designes responds to heavy traffic would be a big help. But as with all projects like this, the hardest part is finding the time to implement it and convince the world to use it.
The main thing I want to see is to stop having Usenet be a store-and-forward network, move more towards servers giving out articles, but with sophisticated caching schemes to reduce load. For heavily trafficked groups I'd expect tens of thousands of copies of the same article to be all over. This isn't much different in effect than the current scheme. But for less read groups less copies would be needed, and so there's less overhead in keeping them. If done right, you could have a fairly sophisticated adaptive distribution framework trading off disk space, server CPU usage, and network bandwidth against readership demand. Doing this would marry the Usenet to the Internet forever, and that has unfortunate political consequences, but the fact is Usenet as it exists now couldn't be without the Internet.
Some other things I'd like to see on Usenet... It'd be terrific if all Usenet articles were permanently archived in some URL-addressable format. Or better yet, use URNs - just specify the message ID and rely on a URN resolution framework to figure out where that article is going to be found. There are some nice ad hoc ways to do this now, if nothing else just count on DejaNews to do the resolution.
alt.* needs to be integrated into the mainstream Usenet in some fashion: alt is currently something like 70% of the net traffic; the anarchical upstart has dwarfed the respectable parent. There are good groups inside of alt (alt.security.pgp, for instance), they need to be taken out of the noise. Along with mainstreaming the useful stuff in alt.*, people need to stop distributing binaries, especially pictures, on Usenet. It's ludicrous that the same 200k picture of someone naked is stored on 50,000 computers, has been transmitted across to them, only to be deleted two days later.
One choice would be to hack gcc so that the runtime did what I want. That'd be ok, but it would be even better to design the compiler so that the interface it requires to the runtime is well designed and documented, so people could write their own runtime libraries. I know several people interested in doing this, but no one has the time to sit down and really do it.
What's needed is a sequence format that's human readable, with no built-in arbitrary limitations. I suspect it would be useful to have minimal control structures built in, to allow loops easily and (maybe) some nondeterministic playback. Don't want it to get too complicated, though. This format might exist already, I don't know: maybe MIDI formats are good enough. I definitely know I don't want to take 3 months to learn how to write a player that will interpret my cool format, especially with the knowledge that it's unlikely anyone else would ever use it.
Considered as a giant directed graph (nodes are keys, edges are signatures), how well connected is the web of trust? My guess is not very - I'd expect to find some 5000 or so disconnected components, and the largest connected component being about 100 people. But I'm not sure, and the data is there, it's just a matter of collecting it.
And as of Feb 4 1996, someone had done it! Neal McBurnett's Analysis of PGP Keyserver Statistics is pretty much what I had in mind. My guess was pretty far off. There's one big connected component of 2775 keys, and then a bunch of little tiny components that don't amount to much. Lots of other interesting data, too.
Fortunately, ssh has done this for me, and much better than I could ever have done it. The one (minor) drawback is I can't use my PGP key for authentication - I have to use some other RSA key that ssh has generated. I can live with that.