A few days ago
JWZ did a "fortune" program
for xscreensaver that churns out the latest in the
Livejournal RSS feeds. Brilliant! I say, I've got to make my
own too. Being lazy, and not wanting to do all the work, I
then take advantage of
rawdog to do the dirty
work of RSS aggregation for me, while I feed off its data
like a vampire sucking off someone's blood. The result:
Thanks to python and the way pickles are done, I managed to
get everything I needed without even the need to pry open
rawdog's source code not that I need to pry open anyway,
rawdog is open source. Here's how it's done:
Rawdog keeps it's data in $HOME/.rawdog/state, if you're running on unix. It's a big file if you let it run for a couple of days. I'm not sure what the data format was, but I figured that it might be a pickle, so here goes:
bash-2.05$ python
Python 2.2.2 (#1, Dec 29 2002, 22:20:22)
[GCC 2.96 20000731 (Linux-Mandrake 8.0 2.96-0.48mdk)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> from pprint import pprint
>>> data = pickle.load(file('state'))
>>> data
<rawdoglib.rawdog.Rawdog instance at 0x819a6c4>
Good, so pickle managed to load the file without errors. This is a good thing. So the pickle is now loaded with a rawdoglib.rawdog.Rawdog object. Using python introspection, we can at least discover what its methods or atrributes are:
>>> dir(data) ['__doc__', '__init__', '__module__', '_modified', 'articles', 'feeds', 'is_modified', 'list', 'modified', 'update', 'write']
Interesting indeed, looking at these, items of interest would be the feeds and articles.
>>> pprint(data.feeds)
{'http://roughingit.wari.org/comments.xml':
<rawdoglib.rawdog.Feed instance at 0x82cf9f4>,
'http://roughingit.wari.org/index.xml':
<rawdoglib.rawdog.Feed instance at 0x82cf0d4>,
'http://www.advogato.org/rss/articles.xml':
<rawdoglib.rawdog.Feed instance at 0x82cd1ec>}
>>> dir(data.feeds['http://roughingit.wari.org/index.xml'])
['__doc__', '__init__', '__module__', 'etag',
'get_html_link', 'get_html_name', 'last_update', 'link',
'modified', 'period', 'title', 'update', 'url']
From here, we know that feeds contains a Feed instance that will contain useful data that we might need later. Let's look at data.articles now:
>>> len(data.articles)
405
>>> pprint(data.articles)
{'febebd8faf7f483e2b86ad50fcf0a5f045920e7e':
<rawdoglib.rawdog.Article instance at 0x8280d94>,
'ff137f7f0d69c1251657c62a463fb04a2f04057e':
<rawdoglib.rawdog.Article instance at 0x825f71c>,
'ff9befad55e3651a33e7a3029d728caa0a4a8f68':
<rawdoglib.rawdog.Article instance at 0x82519a4>}
There's over 400 of them, and I'm just showing the last three. So, data.articles is a dict containing Article instances. The hash of the articles are the keys. Let's look at the last one:
>>> art = data.articles['ff9befad55e3651a33e7a3029d728caa0a4a8f68'] >>> dir(art) ['__doc__', '__init__', '__module__', 'added', 'can_expire', 'description', 'feed', 'hash', 'last_seen', 'link', 'title'] >>> art.title 'Feedback:"Welcome to my blog" by tjk on 1060785639.53' >>> art.description 'Just started using pyblosxom and it fits my needs well. However, one think I cannot figure out is how you are creating your "read more" links in your blog files in such a way that in the overview page you only get the blurb before read-more while the actual permalink has all the text w/o the read-more link. Is there some documentation I am missing?' >>> art.added 1063413726.152036 >>> art.feed 'http://roughingit.wari.org/comments.xml' >>> data.feeds[art.feed].title 'RoughingIT recent comments'
Nice, just about all that we need for the screensaver of ours. Our program is going to be very short indeed. What we are going to need are an HTML stripper, and a text wrapper.
And for rawdog, I'm going to get articles from it randomly. I don't care whether the items are new or old, it's just to fill up the screensaver. random.choice gives me what I need to select the articles:
>>> from random import choice >>> choice(data.articles.items())[1].title 'Goodbye Euro-centric, hello America-centric' >>> choice(data.articles.items())[1].title "S'pore delegation in Brunei for visit" >>> choice(data.articles.items())[1].title '3-D with animated gifs, who knew?' >>> choice(data.articles.items())[1].title 'Their 22nd birthday was surely one to remember' >>> choice(data.articles.items())[1].title 'Fame vs Fortune: Micropayments and Free Content'
The HTML stripper that I'm using is actually found in
comp.lang.python, it's a simple code that seems to do some
magic, so that will take a while to explain I guess. But it
is important to strip out the html because the screensaver
program does not read html. As for text wrapping, it's a
python2.3 thing only though. But you can find a backport for
older pythons
here.
So, in about 10-20 minutes, I'm able to whip out a program that outputs this:
bash-2.05$ ./get_story.py 14 hours ago: From: STI Singapore Title: Should cops have done more to warn of teen rapist? Link: http://straitstimes.asia1.com.sg/storyprintfriendly/0,1887,209707,00.html? A TEENAGE rapist and molester was on the loose in the neighbourhood. He preyed on 11 girls, from eight years old to 12, over six weeks before he was caught.
According to
tripps, it's like Pointcast, only more pointless
The source can be downloaded @ http://roughingit.wari.org/sourcecode/get_story.py or you can view it here:
1 import random
2 import cPickle
3 import sgmllib
4 import string
5 from textwrap import wrap
6 from time import time
7
8 # Change this to your actual rawdog statefile
9 statefile = '/home/wari/.rawdog/state'
10 # Depending on your screensaver, you might want to change this.
11 # 50 is good for a scale of 3 in the phosphor screensaver
12 wrapping_at = 50
13
14 class Stripper(sgmllib.SGMLParser):
15 """
16 Strips HTML
17
18 An SGMLParser subclass to strip away HTMLs
19 """
20 def __init__(self):
21 self.data = []
22 sgmllib.SGMLParser.__init__(self)
23 def unknown_starttag(self, tag, attrs): self.data.append(" ")
24 def unknown_endtag(self, tag): self.data.append(" ")
25 def handle_data(self, data): self.data.append(data)
26 def gettext(self):
27 text = string.join(self.data, "")
28 return string.join(string.split(text)) # normalize whitespace
29
30 def striphtml(text):
31 """
32 Uses the stripper to weed out html
33 """
34 s = Stripper()
35 s.feed(text)
36 return s.gettext()
37
38 if __name__ == '__main__':
39 now = int(time())
40 # Read rawdog's statefile
41 fp = file(statefile)
42 data = cPickle.load(fp)
43 fp.close()
44
45 item = data.articles[random.choice(list(data.articles))]
46
47 delta = now - item.added
48 unit = 'secs'
49
50 # Damn, I'm lazy :) My sucky version of a fuzzy clock
51 if delta > 60:
52 delta = delta / 60
53 unit = 'mins'
54 if delta > 60:
55 delta = delta / 60
56 unit = 'hours'
57 if delta > 24:
58 delta = delta / 24
59 unit = 'days'
60 if int(delta) == 1:
61 unit = unit[:-1] # Removes the (s) from the units
62
63 print '%d %s ago:' % (delta, unit)
64 print 'From:', data.feeds[item.feed].title
65 print 'Title:', item.title
66 print 'Link:', item.link
67 if item.description:
68 print
69 print '\n'.join(wrap(striphtml(item.description), wrapping_at))
Posted by MALIK ABDUL WAHAB at Tue Oct 21 23:15:22 2003
SIR I NEED A JOB
IM A POOR MAN
I HAVE ONE MOTHER AND FATHER
AND TWO SONS AND FOUR DAUGHTERS
Posted by amnesia at Tue Nov 4 20:13:15 2003
haha. what the... you have a blog spam?
Posted by Chris Davies at Thu Mar 11 02:09:39 2004
Since the phosphor screensaver is now a terminal emulator to all intents and purposes, it might be easier to just run links displaying your rawdog in phosphor :)
Add Comment
To insert a URI, just type it -- no need to write an anchor tag.
Allowable html tags are:
<a href>, <em>, <i>, <b>, <blockquote>, <br/>, <p>, <code>, <pre>, <cite>, <sub> and <sup>.You can also use some Wiki style:
URI => [uri title]
<em> => _emphasized text_
<b> => *bold text*