About

Welcome to Panela, Matt Harrison's take on mostly Open Source, Linux, Python, innovation in those areas, other buzzwords and Dick Proenneke. It comes complete with the illustrations as needed. Note the opinions expressed here are merely my opinions and not the opinions of my employer. I also tweet.

about Matt

Calendar

««Jul 2011»»
SMTWTFS
      12
3456789
10111213141516
17181920212223
24252627282930
31

Mailing List

My RSS Feeds








Python at Google (Greg Stein - SDForum)

posted 2006.02.01 Wed
As I promised, here is the writeup of Greg Stein's keynote last week at the SDForum Python Meeting. All text in italics are my thoughts.  This is a repeat of the talk he gave last year at PyCON.  He began with his introduction to python and how he began to use it at eShop (after he saw an early demo of what eventually became Zope).  Which led into an interested timeline of various companies he had worked for and their attitude towards python.

Attitudes toward Python

  • 1995 - eShop - What is python?
  • 1996 - eShop - Python is great and making us successful
  • 1997 - MS (acquired eShop) - Shipped embedded python in their product
  • 1998 - MS - We are using python?  That was a "prototype".  Let's re-write it.  (The re-write in .asp was slower)
  • 2001 - Collabnet - We don't use python
  • 2003 - Collabnet - Write that in python
  • 2005 - Google - Of course we use python
Greg said that some forward thinking companies are willing to adopt technologies that make them more productive, while some more pragmatic companies will not adopt any new technology unless they see examples of other large companies using it successfully.  Me:  Interesting that MS is now using python again.  Apparently, they have crossed the chasm!  (or python has)...

Greg claimed, using python was very advantageous to eShop, in fact it was a "secret sauce".  Here are some of the points that allow python to help businesses:
  • Highly adaptable.  One can deal with changing requirements easily.  One can also deal with changes in computing environments easily.  Both of these can come up in the development lifecycle.
  • Rapid development.
  • Easy to maintain.

Google's Engineering Process

At Google, python is one of the 3 "official languages" alongside with C++ and Java.  Official here means that Googlers are allowed to deploy these languages to production services.  (Internally Google people use many technologies including PHP, C#, Ruby and Perl).  Python is well suited to the engineering process at Google.  The typical project at Google has a small team (3 people) and a short duration (3 months).  After a project is completed developers may move to other projects.  Larger projects are sub-divided into 3 month deliverables, and teams get to choose their own language for their project.  Engineers are given their 20% time to work on what they want to at Google.  Many new ideas spring from this 20% work, and "bottom up" seems to be the mantra at Google.  Greg stated that architecture and design were not mandated from the top, but rather the teams working on these projects were given the freedom to suggest and deliver.

Swig is your friend

Google makes extensive use of Swig.  Greg indicated that Swig has improved much in recent years.  All C++ projects in Google have swig generators created during build time, so python programmers can take advantage of this work.  Greg said that neither Boost nor ctypes were as direct or clean as using SWIG.

RPC

RPC is the method Google uses to scale horizontally so well.  They have their own internal binary wire format that speaks over http.  Programmers can easy speak this format using Java, C++ or Python.  Using RPC allows Google to divide computing problems up across large numbers of servers.

Python at Google

Internally Google has been using Python 2.2.  It has been hard for them to move forward to 2.3 or 2.4 becaue of the large number of machines that the have and they have to have compatibility amoung those machines (I'm assumming this is more of an IT issue, since Python is pretty good at backwards compatibility, but I guess if you deploy 2.4 and start using decorators any machine running 2.2 will choke).  Greg said that they will soon try to move to 2.4.

Python programmers at Google must follow a strict style guideline (based on PEP8 with 2 spaced indenting).  When engineers are first granted commit access to their SCM system, they must pass a style test.  All code must pass through two sets of eyes before being checked in.  That combined with liberal doses of unittest, pychecker and code coverage eliminates most non-algorithmic issues that might appear in python code.

Where is Python used?

  • The Google build system is written in python.  All of Google's corporate code is checked into a repository and the dependency and building of this code is managed by python.  Greg mentioned that to create code.google.com took about 100 lines of python code.  But since it has so many dependencies, the build system generated a 3 megabyte makefile for it!
  • Packaging.  Google has an internal packaging format like RPM.  These packages are created using python.
  • Binary Data Pusher.  This is the area where Alex Martelli is working, on optimizing pushing bits between thousands of servers
  • Production servers.  All monitoring, restarting and data collection functionality is done with python
  • Reporting.  Logs are analyzed and reports are generated using Python.
  • A few services including code.google.com and google groups.  Most other front ends are in C++ (google.com) and Java (gmail).  All web services are built on top of a highly optimizing http server wrapped with SWIG.

Open Sourced Code

Greg commented that the code Google has released as Open Source thus far is not too interesting.  But hopefully that will change in the future.  He noted that they will probably release their packaging system.

Performance of Python Code

Greg said the really Python is rarely a bottleneck at Google.  (With bits going over a network and hitting a database, both of these will impact before python even comes into the picture).  As mentioned previosly when MS ported their eShops code to .asp, because "Python is interpreted and has to be slow" (not a direct quote, but something like that).  When the port to asp was done, the code was in fact slower!

When programming in python, one should design for scalability.  If you design properly, the you should be able to just throw more machines at the problem and your app should scale fine.

Greg's final point on performance was that you can always write in C/C++ and wrap if you need to.  This is an attempt to justify the use of python to higher up types who may disapprove of it's use.  Greg stated (and this is a direct quote), "People have been saying [wrap C/C++] for 10 years.  I've never done this once!"

Questions/Observations

Alex Martelli commented that you can throw more machines to solve the bandwidth problem, but you can't do that to solve the latency problem.

Someone asked about the use of MySQL in Google.  Greg said, "We use it specifically because it is open source".  When you buy from a vendor you are subject to the whims of that vendor.  Google had an instance where they were using proprietary software and needed a feature to be added.  The vendor said no.  Google offered to pay for a developer.  The vendor said no!  Google obviously wasn't happy with this, and open source is one solution.  (Obviously not having to pay Oracle a per cpu license for 1000's of machines is probably another reason, but not one that Greg mentioned).

There was talk about testing and code quality of python.  Greg said that by using code coverage for an interpreted language.  Once you have coverage on a line you know it will "compile" much like a C or Java program.  So you can get around typos somewhat with that.

Someone asked why Google even used Java at all?  Greg appeared to bite his tongue and then said that there are a lot of good java programmers out there and Google hires a few of them.

On the issue of catching bugs, Greg mentioned that because they are running code on tens of thousands of machines, that they see bugs that appear less frequently A LOT more than most people.  They've run into obscure kernel bugs that other people rarely run into.  So he said, Google's production code is quite good, because bugs get exposed early and often.

There was a question on the GIL (global interpreter lock) in python.  Greg said he tried to remove the GIL a few years back, but it was a quick hack and slowed some things down.  He said that trying to remove it now would prove very difficult (hopefully PyPy helps here).  But Greg's suggestion was to use RPC instead of threading.

All in all it was a very good talk, providing interesting insight from a man who has a lot of experience using Python.  It was also fun to see where Google is using python inside.

tags:          

links: digg this    del.icio.us    reddit




1. Mike A. Owens left...
2006.02.04 Sat 3:08 am :: http://mike.filespanker.com/

Thanks for taking the time to write this summary.

It's refreshing to see such a large tech company not attempting to enforce "one true way" technology mandates on the engineers.


2. GoogleEmployee left...
2006.02.04 Sat 10:18 am

RPC is the method Google uses to scale horizontally so well. They have their own internal binary wire format that speaks over http. Programmers can easy speak this format using Java, C++ or Python. Using RPC allows Google to divide computing problems up across large numbers of servers.

Not true. Google uses 9P, not http.


3. test left...
2006.03.07 Tue 1:19 am

>Not true. Google uses 9P, not http. I would very much like to know how you know that.


4. Matt left...
2006.03.16 Thu 10:26 am

Re: 9P take it as a grain of salt


5. Roman left...
2006.06.19 Mon 11:34 pm :: http://www.language-binding.net/

Hi. I develop code generator for boost.python library. This code generator introduces few ideas not found in others. It allows to minimize support and development time to minimum.

You can study next use case - Python bindings for Ogre engine:

http://lakin.weckers.net/index_ogre_python.html http://www.ogre3d.org/phpBB2addons/viewtopic.php?t=1478

I hope you will find it interesting and useful.


6. Harish Mallipeddi left...
2007.02.16 Fri 6:38 am

Hi,

How does PyPy solve the GIL problem? I recollect that PyPy is an implementation of Python in Python.