About

Welcome to Panela, Matt Harrison's take on mostly Open Source, Linux, Python, innovation in those areas, other buzzwords and Dick Proenneke. It comes complete with the illustrations as needed. Note the opinions expressed here are merely my opinions and not the opinions of my employer.

about Matt

Calendar

««Dec 2010»»
SMTWTFS
    1234
567891011
12131415
16
1718
19202122232425
262728293031

Mailing List

My RSS Feeds








Latest Entries

Update of Git Supervisual Cheatsheet

2010.12.16 Thu 12:23 P GMT-07

As I mentioned previously, I wanted a git cheatsheet that made sense to me. Hence, The Git Supervisual Cheatsheet project I've started. I've completely overhauled the previous version. In the source, I'm aiming for a Legal sized handout, but I haven't decided yet what to fill all of it with. In the meantime here's a letter sized portion that doesn't have empty areas.

I like the feel/look of the graphics/steps here much better. I think the organization into boxes is better. The previous version looked somewhat like a big ball of spaghetti. I'm not super satisfied with the headers/fonts in the headers/curvy boxes/background but it's a step in the right direction. Since people were translating the old version to other languages, I feel I should push out my latest release, though it is still very much a work in progress. Anyone have suggestions for i18n on svg?

I'm open to feedback of any kind (design feedback from designers would be lovely). Am also thinking it might be nice to have some sort of magit overlay, to show how the magit commands correspond. Also might want to have a backside that explains gory details like the internal structure, examples of specs, ranges, etc. I'm not sure these would easily fit into the current sized boxes.

tags:        

Git Supervisual Cheatsheet

2010.11.25 Thu 12:43 A GMT-07

I've yet to find the git quick reference that I like, so I started making one tonight. Here's a draft release.

Source is at github if anyone with a design bent is interested in collaborating. Let me know if you think something is missing or unclear.

tags:    

Scott McNealy keynote at PgWest

2010.11.04 Thu 11:04 A GMT-07

Scott McNealy (SUN CEO) gave a surprisingly refreshing albeit somewhat rambling keynote at PgWest 2010. He introduced it by informing everyone that he only lasted 2 days at Oracle before he was given the boot. He is currently not receiving any salary, not serving on any boards, but is "helping" 35 different startups. He indicated that because he was no longer responsible to a board, he could basically say whatever he wanted, but warned that you also get what you pay for (and he wasn't paid to talk).

He mentioned some things he might talk about, such as Social Networking. He referred to it as a "Digital Tattoo" for the next generation, and noted he was "glad I don't have a Facebook account if I ever run for office."

Cloud or "the computer is the network" was another popular buzzwordy topic he could have talked about.

He decided to talk about Open Source. Sun had a big R&D budget and a large % of that went to open source. Scott figures that businesses can be a partner to OS or a problem for OS.

For Sun not having a DB proved to be an "Achilles Leg", hence the moves to support Postgres before eventually buying MySQL. Oracle and MS were stealing customers because they had a DB.

As an attempt to play Letterman, McNealy had a "Top 10 Reasons your DB Company has you have the 'whatevers'", which included various jabs about America's CUP. One I remember was "You are still waiting for DB3". Scott mentioned that this was written without a marketing department, so he could say what we wanted.

Scott then went into an interoperability spiel. There are somethings that you just need to do and it doesn't work to try and find the best way to do it. One example is driving. You don't want to let Darwin decide what side of the road you drive on. You just pick a side and stick to it. He likened this to OS. With open technology you tend toward a 0 cost of engineering, but you also more toward a "written" and "spoken language" of tech that people understand.

He also noted that OS is just better, more secure, cleaner, etc (because engineering's pride is on the line since they are exposing their code). According to him, the government is a fast adopter of OS for those reasons, faster even than tech companies. An anecdote was the upper levels asking why certain products weren't released. The managers replied, "it works, but the engineer doesn't want to release the code with his name on it without clean up/touch up/etc."

Now Data is a big deal. It is a CIO's number one problem. They need to manage, store, kill, retrieve and secure it. Note that there is no such thing as 100% security, given that someone has to have access to the data (and another person as well if you don't want a hit by bus problem). OS code is more secure because everything is exposed. Proprietary code is full of secrets, humans are good at discovering secrets (good for OS, bad for proprietary), but humans are BAD at keeping secrets. DOD data suggest that OS is an order of magnitude "safer".

But the #1 reason to use OS is the "Barrier to Exit". Technology has the "shelflife of a banana", by the time you buy it, it is brown and rotting. Scott explained that there are 3 costs to a technology purchase; Purchase price, Support/ongoing price and Barrier to exit (ie "how do I move off?") price. No one wants to talk about the 3rd, because it is the largest price, Scott claiming that it is 10X the first 2 costs. Then companies base the price for the next version at "$5 less than the barrier to exit", in effect locking you into their rotting banana techonology. There is no contract negotiation, only contract explanation.

Scott claimed that contracts should have the vendor agreeing to pay the Barrier to Exit cost. Made a comparison that switching costs for cars are 3 minutes (you just need to know where the switch for lights, windows, radio are). Switching costs from DB2 to Oracle is prohibitive. So IBM can price the next version at $(prohibitive - 5).

For Open Source communities, Scott advised to take advantage of commercial businesses before Oracle buys them. Private enterprise can be very good for OS and help them in ways and with resources that can be very valuable.

Scott likes the BSD style license of Postgres.

Data is the new hotness. Is Hadoop a friend or enemy of Postgres? Anything that is open should be a friend. Mysql (community) should be a friend of postgres.

OS devs should make it easier for "normal" people to use their stuff. Engineers are NOT "normal". Make it accessible to mere mortals. If OS is usable, it "will save world a lot of cash that is going to America's Cup races."

He noted that the subscription prices for MySQL doubled that very morning and the DB software can cost as much as 35% of the an IT departments budget.

One project that he is working on is Curriki, and online community that has 40K learning assets server 150K teachers. He wants to lower the 8-15 Billion $ spent on curriculum for K-12 every year. Help out if you are interested.

There was time for Q&A.

"Where will PG and MySQL be in 5 years?" Both will be doing well. SkySQL is a new fork that should help since "sharing is not Larry's middle name". PG should reach out to Mysql and share and provide for interop. Both should be partners in "getting out of the data roach motel".

Oracle vs Google lawsuit? Interesting that Oracle told Sun to loosen licensing scheme for Java when Oracle didn't own it. Now that has changed. Scott explained that he is very much in favor of IP and patents. If there were no Pharma patents, R&D would drop massively and we would get sick and die earlier. In favor of capitalism which creates "National Economic Heroes" (wealthy people who pay tons of taxes). Price controls are also bad, make less incentive to develop drugs. "Do I have a problem with Larry exercising IP rights? No. I was a good capitalist, he is a great capitalist. I believe in sharing for fundamental software infrastructure. [Larry] may do better with his model". Potentially a settlement could lead to more money put into Java R&D.

WRT Sun's OS projects: "I wouldn't count on Oracle sponsored releases to the community"

What will happen to Java? "I wasn't a visionary. I was good at telling people what has already happened that they should take advantage of. I think others will fork..." Nexenta is doing that for OpenSolaris, LibreOffice, middleware company is europe is doing it for java middleware.

OS is putting your code in escrow. But after Oracle bought BEA, they would only migrate off of BEA to Oracle app server and because the code was proprietary lockin cost is very high.

What is underserved in cloud/data/analytics? Biggest issue for those looking to move to the cloud isn't backup or encryption. The issue is transactions. Data could be stolen at the point of creation. People are worried about txn being intercepted by a virus in the cloud. Until cloud txn appear, people will "server hug".

Although it was a talk down a winding path, I found a nice honesty that you wouldn't have seen had he still been an Oracle (or Sun) employee. Thanks for the insight Scott.

Python and emacs (7): Buffer navigation (point history)

2010.10.28 Thu 9:51 A GMT-07

(Not really Python specific, but useful when navigating/editing Python files.)

Emacs has a notion of current location known as the "point". A while back I wanted a way of keeping track of point history (similar to browser history), so if say I jumped to another buffer and location I could come back later. I figured this would be an existing feature, just hidden in the bowels of emacs somewhere. I ended up finding half a solution in John Connors' emacswiki page. I say half because I wanted the forward buttons (to use my browser metaphor), which he didn't have. Alas, point-stack is now on github.

Here's a simple screencast:

With point stack, I can push my current location to a stack with f5. (I can push multiple locations too). To jump back I hit f6 (multiple times if needed). To go back forward, f7.

tags:      

Dear Lazyweb, best of breed flow based/pipeline programming in Python

2010.10.25 Mon 11:11 A GMT-07

Yes, Python makes it easy to roll your own. I want to use pipeline programming soon (I actually started my own a while back), but upon further investigation there are quite a few existing tools/libraries:

  • Pypes - "Pypes provides a scalable, standards based, extensible platform for building ETL solutions. Most commercial platforms have steep learning curves and try to generalize too much of the process. Pypes provides a simple yet powerful framework for designing custom data processing workflows using components you write. In turn, it takes care of scalability and scheduling semantics."
  • papy - "The papy package provides an implementation of the flow-based programming paradigm in Python that enables the construction and deployment of distributed workflows."
  • Another pype - A simpler module for chaining operations
  • zFlow - "Flow-based Programming Library using Python generators, loosely based on J.P. Morrison's book of the same name."
  • PyF - "PyF is a python open source framework and platform dedicated to large data processing, mining, transforming, reporting and more."

I'm sure there's more. Any good ones I'm missing? I have yet to see any comparison of these libraries/frameworks(?). Pypes and PyF have fancy websites and gui tools. Pypes requires stackless. I've created a wiki page about this on python.org. Feel free to comment if you have used one or update the wiki

tags:      

Tuning Postgres

2010.10.14 Thu 1:09 P GMT-07

A few months back I set out to tune Postgres a little bit. After a little bit of searching I ran into pgtune, a Python script that "takes the wimpy default postgresql.conf and expands the database server to be as powerful as the hardware it's being deployed on.". Instead of actually tuning my db, I ended up getting side tracked while writing some patches for the code. (I'm still not doing much tuning, haven't run the script, basically I've been changing work_mem).

Recently at UTOSC, Joshua Drake gave a talk on Dumb Simple Postgres Performance. (I think the title is somewhat of a misnomer, anything that has to do with Postgres tuning is by definition not simple). Needless to say the default configuration assumes an ancient machine and that you are being ultra paranoid about data (JD said of one setting that defaults to on, "don't turn this on unless you are a stock broker"). After he covered some dozen settings or so, I figured I should actually get around to tuning. I should note that many of the settings he suggested are in line with what pgtune does.

So to further get away from actually tweaking my settings, I started another project, PgTweak. This tool does two things, extracts queries from log files, and runs those queries with different combinations of settings, allowing you to pick the best settings.

I've still got work to do. It'd probably be interesting to have a (google) spreadsheet or table in the wiki page listing the settings and how the different tools/developers say to set them. Obviously it depends on the type of work your database is doing, but I'm not sure that even among postgres developers there is a consensus. Oh yeah and actually change the settings on my db.