Performance Comparison – C++ / Java / Python / Ruby/ Jython / JRuby / Groovy

Posted by – July 8, 2008

Update (README CAREFULLY) : I am starting to see hyperlinks to his post with only some of the findings being treated as the link title (eg. X is 100 times faster than Y, X faster than Z). I emphasise once again that I have carefully indicated in the original post that this is but one of many possible microbenchmarks and that you should treat the results as one of many data points. Given the comments I’ve received and some of the links I’ve seen to this post, if I was to make this posting anew, I would choose to assign the title of this post as “Implementing an identical object oriented solution to the Josephus Problem in Java / C++ / Ruby / JRuby / Python / Jython / Groovy and measuring the performance results thereof.”

This post compares performance across various languages for a specific micro benchmark (actually it isn’t really a microbenchmark – it is simply a benchmark for a specific piece of logic – but thats the closest word I could think of).

Last week, while preparing for a presentation – Contrasting Java and Dynamic Languages, I came across this interesting Perl/Python/Ruby Comparison which focused on comparing the code style of different languages. I thought it would be interesting to use the same to get some actual benchmarks based on the same. Note that you could also use the code segments below to get a feel for different syntactic flavours. However since I have strived to keep the code as similar as possible to each other, some of the advanced syntactic sugar of the dynamic languages is not on display here.

Problem Statement

Quoting from the post linked to above :

Flavius Josephus was a roman historian of Jewish origin. During the Jewish-Roman wars of the first century AD, he was in a cave with fellow soldiers, 40 men in all, surrounded by enemy Roman troops. They decided to commit suicide by standing in a ring and counting off each third man. Each man so designated was to commit suicide…Josephus, not wanting to die, managed to place himself in the position of the last survivor.

In the general version of the problem, there are n soldiers numbered from 1 to n and each k-th soldier will be eliminated. The count starts from the first soldier. What is the number of the last survivor


Design

I actually changed the design of the solution as compared to the original post. Instead of using the deeply recursive calls as used in the earlier post, I decided to split the logic into two classes, and use loop iteration instead of recursion. It is my belief that we tend to do loop iterations far more frequently than recursions, and the resultant class design having two classes – one to indicate a Chain and one reflecting a Person seemed more appropriate to me.

Logic
The Chain object contains a reference to one person (first) who is but one member in a circular linked list. Each person object has a reference to its previous (prev) and next (next) person in the circle. When the kill loop starts, it sets a threshold (nth). The count starts with 1 from the first person. Each person when asked to shout, checks if the shout count (shout) is less than the threshold (nth). If less, the person just returns an incremented count. If the two are same, the person in effect commits suicide. In doing so the person, updates the next reference of its prev, and prev reference of its next to take himself off the circle and keep the circle consistent, finally returning a shout of 1 (which is what the next person in the list will shout).

The code does not have any comments (sorry!) and all the console outputs have been removed so that the benchmarking activity is not interfered with by the IO overheads.

The results

All the results are as observed on my notebook with the following config
OS : Ubuntu Gutsy Gibbon 7.10
Kernel : 2.6.22-15-generic
CPU : Intel(R) Core(TM) Duo CPU T2600 @ 2.16GHz
RAM : 2GB

Language Version Lines of Code Time per iteration (microseconds)
Java Sun JDK 1.6.0.03 10186 1.6
C++ 4.1.3 20070929 (prerelease)
(Ubuntu 4.1.2-16ubuntu2)
Compiled with optimisation -O3
86 3
gcc version 4.2.3
(Ubuntu 4.2.3-2ubuntu7)
Compiled with optimisation -O3
Alberto Bignotti’s modified code with customised memory reuse and management
124 approx 0
Ruby ruby 1.9.0 (2008-04-14 revision 16006) [i686-linux] 63 114 89
ruby 1.8.6 (2007-06-07 patchlevel 36) [i486-linux] 372 380
jruby : ruby 1.8.6 (2008-05-28 rev 6586) [i386-jruby1.1.2] 84 80
Python 2.5.1 41 225 192
2.5.1 with psyco 33
Jython 2.2.1 on JRE 1.6.0.03 884 632
Groovy Groovy Version: 1.5.6 JVM: 1.6.0_03-b05 uncompiled 81 363
Compiled to bytecode and run using java 360
UpdateGroovy Version: 1.6-beta-1 JVM: 1.6.0_03 104
PHP PHP 5.2.3-1ubuntu6.3 (cli) 85 593



Updates :

  • Ken suggested syntactic improvements (see comments below) which lead to even faster ruby execution times : jruby : 80 microseconds, ruby 1.9 : 89 microseconds, ruby 1.8.6 : 380 microseconds. The table above has been updated
  • Cato requested a run using Groovy 1.6 beta 1 – have updated the same. Big improvement
  • Nicholas Riley suggested introducing slots and using “is not” and “is” in the if conditions for the python code. Updated the results to reflect the figure of 192 and 632 micro seconds for CPython and Jython. The figure was 182 microseconds for CPython and 131 microseconds for Jython if I did not use the new style classes, however I did not reflect the same, since most new code is likely to be using new style classes. However this does indicate one possible performance optimisation if your code does not depend upon new style classes. Moreover makes me really interested in waiting for the Jython performance optimisations for new style classes that Nicholas suggests are on their shortlist.
  • Tim Fountain in a comment below indicates that on his hardware (core 2 Quad) with Ubuntu Hardy Heron, Ruby 1.8.6 (same version as above) performs somewhat faster (15%) whereas upgraded version of Python and PHP run much faster(63% for python and 83% for ruby). Another difference in config- he is running 64 bit.
    • python – version 2.5.2: – 138 microseconds
    • ruby – 1.8.6 (2007-09-24 patchlevel 111) [x86_64-linux] :321 microseconds
    • PHP – 5.2.4-2ubuntu5.1 with Suhosin-Patch 0.9.6.2 (cli): 323 microseconds.
  • Peter Lupo requested a reduction in line count for Java since the conventional way is to have the opening braces not on a separate independent line. Given the fact that it is a fair comment, I have reduced the line count of java to 86 (didn’t physically change the code – 86 = 101 – 15 opening curly braces).
  • Added another finding of using python with psyco
  • C++ – Added results using Alberto Bignotti’s alternative code with customised memory reuse management


Summarisation

The following are the results. Given the long code blocks, I am presenting the summarisation first followed by the code.

Note: This can only be treated as one particular benchmark. The results are a little atypical with respect to my general understanding. Advise caution against drawing broad conclusions based on this benchmark alone but would suggest that you could treat this as one data point amongst many. People better versed than me in the details of language runtimes might be able to suggest why some of the results seem surprising or atypical.

  • Java / C++ Rock : The performance of Java and C++ was head and shoulders beyond other languages (nearly 100 times faster). My thought is that while a difference of 10x was only to be expected – this difference was just way too massive
  • Java is faster than C++ : Though I had read about other microbenchmarks reaching the same conclusions, it is the first time I actually ran one where Java was faster. There are many others I have run where C++ beats Java quite handsomely. More importantly – the performance of C++ worsened by almost 40% once I added code which started freeing memory that was being allocated (there’s still a small memory leak in the code – there is no Chain destructor which will clean up first). I would later definitely want to look at the impact of garbage collection in this context, and whether the Java garbage collector simply was much faster than the hand crafted new – delete calls in C++. Update:Using the customised memory management (which is not used in any of the other examples) but the same algorithm as in the code written by Alberto, C++ is much faster than Java
  • Ruby 1.9 is twice as fast as Python : While it has been known for a while Ruby 1.9 is much faster than Ruby 1.8.6, heres one more supporting data point. I was expecting ruby 1.9 to give python a run for its performance money. But at least in this particular context it seems to be much much faster.
  • JRuby is faster than Ruby : Even ruby 1.9. Very interesting indeed.
  • Jython still has some catching up to do : Though in the ballpark as the other languages, it was the slowest in the pack.
  • Overhead of dynamism is dominant : I have no idea if JRuby ran much faster because of the java bytecode or because of its implementation (though its performance was not even remotely close to that of Java). However even after I compiled groovy code, to java bytecode, it still ran much slower than python and ruby. It seems the overhead of supporting dynamic constructs is much more dominant than any benefits that one gets out of compilation (whether to java byte code or to intermediate compiled files). I think the argument that because something compiles to java bytecode it is likely to be fast should be looked at a little carefully.
  • PHP stays at the rear end : Though I benchmarked PHP for the first time, I wasn’t completely surprised by the fact that PHP could only manage to be faster than Jython.

Update : There are many comments to this post including those from cwilbur who benchmarks perl using a idiomatic method, Paddy3118 who offers an optimised algorithm for python, and peter lawrey who offers an optimised algorithm for Java. I would like to state that each of their solutions offer superior performance than that what has been described here. However I believe any benchmark comparison should compare apples to apples. Should these contributions be taken into account and be reflected in the table above ? I certainly believe there is a case to do so as an exercise using a different algorithm. However to ensure that it is a fair comparison, one has to modify all the code in all other languages also to reflect the same algorithm. Only then can we get an apple to apples comparison. That is probably an exercise for another post. Is the algorithm I have chosen the fastest – No. However I believe it is a very readable algorithm and if one ignores the IO with networks and databases and files, it is probably close to the kind of code many programmers write (and maintain) on a day to day basis. It has been consistently implemented in all the languages. Readers should be aware that there are algorithms which will deliver much superior performance – but they will also make the performance superior in all the languages (perhaps to slightly differing extents and thus possibly somewhat different results).

The code

For all you who are either interested in running it for yourself or would like to perhaps explore this in more detail, .. here’s the code. Note that I am not equally competent across all languages. So if you believe there is something that could be more appropriate way to code the same, do post a comment. One of the things I have tried to do is to ensure that the code remains more or less similar across all languages. Also I have used getter – setters or skipped them based on my understanding of the generally accepted convention for users of the language.

Java :

package com.dnene.josephus;

public class Chain
{
	private Person first = null;

	public Chain(int size)
	{
		Person last = null;
		Person current = null;
		for (int i = 0 ; i < size ; i++)
		{
			current = new Person(i);
			if (first == null) first = current;
			if (last != null)
			{
				last.setNext(current);
				current.setPrev(last);
			}
			last = current;
		}
		first.setPrev(last);
		last.setNext(first);
	}

	public Person kill(int nth)
	{
		Person current = first;
		int shout = 1;
		while(current.getNext() != current)
		{
			shout = current.shout(shout, nth);
			current = current.getNext();
		}
		first = current;
		return current;
	}

	public Person getFirst()
	{
		return first;
	}
	public static void main(String[] args)
	{
		int ITER = 100000;
		long start = System.nanoTime();
		for (int i = 0 ; i < ITER ; i++)
		{
			Chain chain = new Chain(40);
			chain.kill(3);
		}
		long end = System.nanoTime();
		System.out.println("Time per iteration = " + ((end - start) / (ITER )) + " nanoseconds.");
	}
}
package com.dnene.josephus;

public class Person
{
	int count;
	private Person prev = null;
	private Person next = null;

	public Person(int count)
	{
		this.count = count;
	}

	public int shout(int shout, int deadif)
	{
		if (shout < deadif) return (shout + 1);
		this.getPrev().setNext(this.getNext());
		this.getNext().setPrev(this.getPrev());
		return 1;
	}

	public int getCount()
	{
		return this.count;
	}

	public Person getPrev()
	{
		return prev;
	}

	public void setPrev(Person prev)
	{
		this.prev = prev;
	}

	public Person getNext()
	{
		return next;
	}

	public void setNext(Person next)

	{
		this.next = next;
	}
}

C++

#include 
#include 
#include 
#include 

class Person
{

    public:

        Person(int count) : _next(NULL), _prev(NULL) { _count = count; }
        int shout(int shout, int nth)
        {
            if (shout < nth) return (shout + 1);
            _prev->_next = _next;

            _next->_prev = _prev;
            return 1;
        }
        int count() { return _count; }
        Person* next() { return _next; }
        void next(Person* person) { this->_next = person ; }
        Person* prev() { return _prev; }
        void prev(Person* person) { this->_prev = person; }
    private:
        int _count;
        Person* _next;
        Person* _prev;
};

class Chain
{
    public:
        Chain(int size) : _first(NULL)
        {
            Person* current = NULL;
            Person* last = NULL;
            for(int i = 0 ; i < size ; i++)
            {
                current = new Person(i);
                if (_first == NULL) _first = current;
                if (last != NULL)
                {
                    last->next(current);
                    current->prev(last);
                }
                last = current;
            }
            _first->prev(last);
            last->next(_first);
        }
        Person* kill(int nth)
        {
            Person* current = _first;
            int shout = 1;
            while(current->next() != current)

            {
                Person* tmp = current;
                shout = current->shout(shout,nth);
                current = current->next();
                if (shout == 1)
                {
                    delete tmp;
                }
            }
            _first = current;
            return current;
        }
        Person* first() { return _first; }
    private:
        Person* _first;
};

int main(int argc, char** argv)
{
    int ITER = 1000000;
    Chain* chain;
    struct timeval start, end;
    gettimeofday(&start,NULL);
    for(int i = 0 ; i < ITER ; i++)
    {
        chain = new Chain(40);
        chain->kill(3);

        delete chain;
    }
    gettimeofday(&end,NULL);
    fprintf(stdout,"Time per iteration = %d microsecondsnr", (((end.tv_sec - start.tv_sec) * 1000000) + (end.tv_usec - start.tv_usec)) / ITER);
    //fprintf(stdout,"Last man standing is %dnr", (chain->first()->count() + 1));
    return 0;
}

Python

class Person(object):
    def __init__(self,count):
        self.count = count;
        self.prev = None 

        self.next = None
    def shout(self,shout,deadif):
        if (shout < deadif): return (shout + 1)
        self.prev.next = self.next
        self.next.prev = self.prev
        return 1

class Chain(object):
    def __init__(self,size):
        self.first = None
        last = None
        for i in range(size):
            current = Person(i)
            if self.first == None : self.first = current
            if last != None :
                last.next = current
                current.prev = last
            last = current
        self.first.prev = last
        last.next = self.first
    def kill(self,nth):
        current = self.first
        shout = 1
        while current.next != current:
            shout = current.shout(shout,nth)
            current = current.next
        self.first = current
        return current

import time
ITER = 100000
start = time.time()
for i in range(ITER):
    chain = Chain(40)
    chain.kill(3)
end = time.time()
print 'Time per iteration = %s microseconds ' % ((end - start) * 1000000 / ITER)

Ruby

class Person
    attr_reader :count, :prev, :next
    attr_writer :count, :prev, :next

    def initialize(count)
        #puts 'Initializing person : ' + count.to_s()
        @count = count
        @prev = nil
        @next = nil
    end

    def shout(shout, deadif)
        if shout < deadif
            return shout + 1
        end
        @prev.next = @next
        @next.prev = @prev
        return 1
    end
end      

class Chain
    attr_reader :first
    attr_writer :first

    def initialize(size)
        @first = nil
        last = nil
        for i in (1..size)
            current = Person.new(i)
            if @first == nil
                @first = current
            end
            if last != nil
                last.next = current
                current.prev = last
            end
            last = current
        end
        @first.prev = last
        last.next = @first
    end

    def kill(nth)
        current = @first
        shout = 1
        while current.next != current
            shout = current.shout(shout,nth)
            current = current.next
        end
        @first = current
        return current
    end
end

ITER=100000
start = Time.now
ITER.times { |i|
chain = Chain.new(40)
chain.kill(3)
}
ends = Time.now
puts 'Time per iteration = ' + ((ends - start) * 1000000 / ITER).to_s() + " microseconds"

Groovy

class Chain
{
    def size
    def first

    def init(siz)
    {
        def last
        size = siz
        for(def i = 0 ; i < siz ; i++)
        {
            def current = new Person()
            current.count = i
            if (i == 0) first = current
            if (last != null)
            {
                last.next = current
            }
            current.prev = last
            last = current
        }
        first.prev = last
        last.next = first
    }

    def kill(nth)
    {
        def current = first
        def shout = 1
        while(current.next != current)
        {
            shout = current.shout(shout,nth)
            current = current.next
        }
        first = current
    }
}

class Person
{
    def count
    def prev
    def next

    def shout(shout,deadif)
    {
        if (shout < deadif)
        {
            return (shout + 1)
        }
        prev.next = next

        next.prev = prev
        return 1
    }
}

def main(args)
{
    println "Starting"
    def ITER = 100000
    def start = System.nanoTime()
    for(def i = 0 ; i < ITER ; i++)
    {
        def chain = new Chain()
        chain.init(40)
        chain.kill(3)
    }
    def end = System.nanoTime()
    println "Total time = " + ((end - start)/(ITER * 1000)) + " microseconds"
}

def ITER = 100000
def start = System.nanoTime()
for(def i = 0 ; i < ITER ; i++)
{
    def chain = new Chain()
    chain.init(40)
    chain.kill(3)
}
def end = System.nanoTime()
println "Time per iteration = " + ((end - start)/(ITER * 1000)) + " microseconds"

PHP

class Person
{
    function __construct($c)
    {
        $this->count = $c;
    }       

    function getPrev()
    {
        return $this->prev;
    }           

    function setPrev($pr)
    {
        $this->prev = $pr;
    }   

    function getNext()
    {
        return $this->next;
    }

    function setNext($nxt)
    {
        $this->next = $nxt;
    }

    function shout($shout, $nth)
    {
        if ($shout < $nth)
        {
            return $shout + 1;
        }
        $this->getPrev()->setNext($this->getNext());
        $this->getNext()->setPrev($this->getPrev());
        return 1;
    }
}

class Chain
{
    var $first;

    function __construct($size)
    {
        for($i = 0; $i < $size ; $i++)
        {
            $current = new Person($i);
            if ($this->first == null) $this->first = $current;
            if ($last != null)
            {
                $last->setNext($current);
                $current->setPrev($last);
            }
            $last = $current;
        }
        $this->first->setPrev($last);
        $last->setNext($this->first);
    }

    function kill($nth)
    {
        $current = $this->first;
        $shout = 1;
        while($current->getNext() !== $current)
        {
            $shout =  $current->shout($shout,$nth);
            $current = $current->getNext();
        }
        $this->first = $current;
    }
}

$start = microtime(true);
$ITER = 100000;
for($i = 0 ; $i < $ITER ; $i++)
{
    $chain = new Chain(40);
    $chain->kill(3);
}
$end = microtime(true);
printf("Time per iteration = %3.2f microsecondsnr",(($end -  $start) * 1000000 / $ITER));

No related posts.

128 Comments on Performance Comparison – C++ / Java / Python / Ruby/ Jython / JRuby / Groovy

Closed

  1. cwilbur says:

    Not surprisingly, solving the problem in idiomatic but clean Perl winds up being *extremely* fast:

    #!/usr/bin/perl

    use strict;
    use warnings;
    use diagnostics;

    use Benchmark qw(:all);

    use constant TOTAL = 40;
    use constant INTERVAL = 3;
    use constant DEBUG = 0;

    sub josephus
    {
    my %alive = map { $_, 1 } (0..TOTAL-1);
    my $current = -1;

    while (scalar keys %alive 1)
    {
    $current += INTERVAL;
    $current %= TOTAL;

    while (!exists $alive{$current})
    {
    $current++;
    $current %= TOTAL;
    }

    delete $alive{$current};
    print “$current just died.

    if DEBUG;
    }

    my ($survivor) = (keys %alive);

    print “last to die: $survivor

    if DEBUG;

    return $survivor;
    }

    my $bm = timeit (10000, josephus);
    print timestr ($bm);

    On my 7-year-old Linux box, I get 6172 iterations per second with this code.

  2. cwilbur says:

    It helps to get the algorithm right.

    sub josephus
    {
    my %alive = map { $_, 1 } (0..TOTAL-1);
    my $current = -1;

    while (scalar keys %alive 1)
    {
    for (1..INTERVAL)
    {
    do
    {
    $current = ($current + 1) % TOTAL;
    } until exists $alive{$current};
    }

    delete $alive{$current};
    print “$current just died.

    if DEBUG;
    }

    my ($survivor) = (keys %alive);

    print “last to die: $survivor

    if DEBUG;

    return $survivor;
    }

  3. ahmetaa says:

    Intresting. however your line count is not so accurate.not that it matters but should not have counted opening brackets and spaces..

  4. cwilbur says:

    Not surprisingly, solving the problem in idiomatic but clean Perl winds up being *extremely* fast:

    #!/usr/bin/perl

    use strict;
    use warnings;
    use diagnostics;

    use Benchmark qw(:all);

    use constant TOTAL => 40;
    use constant INTERVAL => 3;
    use constant DEBUG => 0;

    sub josephus
    {
    my %alive = map { $_, 1 } (0..TOTAL-1);
    my $current = -1;

    while (scalar keys %alive > 1)
    {
    $current += INTERVAL;
    $current %= TOTAL;

    while (!exists $alive{$current})
    {
    $current++;
    $current %= TOTAL;
    }

    delete $alive{$current};
    print “$current just died.\n”
    if DEBUG;
    }

    my ($survivor) = (keys %alive);

    print “last to die: $survivor\n”
    if DEBUG;

    return $survivor;
    }

    my $bm = timeit (10000, \&josephus);
    print timestr ($bm);

    On my 7-year-old Linux box, I get 6172 iterations per second with this code.

  5. @Stefano

    Interesting. Thanks for introducing me to Quercus – I had no idea it existed.

    There was one more interesting observation from your comment (something we all know about but did not seem so much obvious until I read your comment) – your java code benchmark was almost the same as what I got, but PHP seemed half as fast. The difference – you were running it on Windows.

  6. cwilbur says:

    It helps to get the algorithm right.

    sub josephus
    {
    my %alive = map { $_, 1 } (0..TOTAL-1);
    my $current = -1;

    while (scalar keys %alive > 1)
    {
    for (1..INTERVAL)
    {
    do
    {
    $current = ($current + 1) % TOTAL;
    } until exists $alive{$current};
    }

    delete $alive{$current};
    print “$current just died.\n”
    if DEBUG;
    }

    my ($survivor) = (keys %alive);

    print “last to die: $survivor\n”
    if DEBUG;

    return $survivor;
    }

  7. Keith Sader says:

    Was there a reason you didn’t use java.util.LinkedList in your example?

  8. Stefano Bagnara says:

    I’ve ran a test with my laptop (Core 2 Duo T5500 1.67GHz) running Windows Vista because I wanted to compare it to Quercus, too.

    Java:1.5 microseconds (1500 nanoseconds)
    PHP 5.2.6 win32: 1238.56 microseconds
    Quercus (Free/Interpreter) = 759,98 microseconds
    Quercus (Pro/Compiling) = 229,25 microseconds

  9. @Ken,

    Thanks for the suggestions. The updated timings are jruby : 80 microseconds, ruby 1.9 : 89 microseconds, ruby 1.8.6 : 380 microseconds (the last one increased)

  10. ahmetaa says:

    Intresting. however your line count is not so accurate.not that it matters but should not have counted opening brackets and spaces..

  11. @Keith

    I wanted a circular list where one wouldn’t need to actually access the list structure too frequently (directly pick off the references and go). So the list had to be circular and extend in both directions.

    The code could’ve certainly be written using a linked list, but I suspect it would be a lot slower.

  12. @Stefano

    Interesting. Thanks for introducing me to Quercus – I had no idea it existed.

    There was one more interesting observation from your comment (something we all know about but did not seem so much obvious until I read your comment) – your java code benchmark was almost the same as what I got, but PHP seemed half as fast. The difference – you were running it on Windows.

  13. @ahmetaa

    There are different ways to count LOC based on the intent. While what you are suggesting is certainly one way – thats the one often used to measure coding productivity. That was not the intention in mind – I actually did this exercise in the context of comparison of languages and was really looking at brevity and readability (something I did not cover in this post) from that perspective it made sense to count all the lines – blank lines, braces etc.

    Having said that there are still some minor errors in the line count (eg having some extra blank lines in the python code) .. but I don’t think that is significant

  14. One more comment (that my captcha system did not let thru) from Paddy. Definitely something I shall try out and revert

    Hi Dhananjay, Nice problem – but the results show how NOT to write it in Python. I took the problem and messed around a bit in the Python shell looking at the patterns needed to go once through the loop, then successive times through the loop. I used a Python list as the datastructure – their is very seldom need to create linked lists of classes in Python. I boiled it down to a simple function of SIX lines: def findlast(chainlength = 40, kill = 3): ..firstinc, c = 1, range(1,chainlength + 1) ..while len(c)>1: ….#print firstinc, c ….c, firstinc = [x for n,x in enumerate(c) if (firstinc+n) % 3], \ ………………….(n+1 +firstinc) %3 ..#print firstinc, c ..return c (I don’t knoe if indentation is preserved in your comments so replace leading dots with spaces) The relative timings are a speed up of around four times over your solution: Time per iteration for Chain = 386.570000648 microseconds Time per iteration for findlast = 94.0599989891 microseconds The big change is that I only need 6 lines to do it in. If you un-comment the print statements you might better follow the algorithm. I note that Chain(40).kill(3).count == 27, whereas findlast() == 28 On looking at your code in lines 29 and thirty you should beware a possible off-by-one error as the original question has positions counting from 1 where you initialize Persons count from zero. I like Python as even if my solution were wrong, I think it gives me the best chance of getting the algorithm right. Correct then fast, but only if you know what is fast enough. Thanks again for the interesting problem. – Paddy.

  15. ken says:

    Arguably idiomatic Ruby that makes it go faster:
    – instead of “for i in (1..size)”, “(1..size).each do |i|”
    – instead of “if last != nil”, just “if last”
    – slots are nil by default, so remove @prev/@next = nil in the initializer (also, @first=nil)
    – set something if it hasn’t been set with the “@first = current unless @first” form (or @first ||= current), which is also a bit faster (but ||= isn’t, oddly)

    Idiomatic Ruby that makes it shorter, but doesn’t help performance:
    – instead of attr_reader+attr_writer, just use attr_accessor
    – single-statement ifs are often put on one line, like “return shout+1 if shout52), almost 25% on time (107 -> 78), and is more idiomatic Ruby to boot.

  16. ken says:

    Your website ate half of my comment, so here’s the second half again:

    – single-statement ifs are often put on one line, like “return shout+1 if shout LESSTHAN deadif”
    – you don’t need to say “return” on the last value in a function, so strike “return current” (you can change “return 1″ to just “1″, but it doesn’t save a line)

    That saves over 10 lines (63 to 52), almost 25% on time (107 to 78), and is more idiomatic Ruby to boot.

  17. @Ken,

    Thanks for the suggestions. The updated timings are jruby : 80 microseconds, ruby 1.9 : 89 microseconds, ruby 1.8.6 : 380 microseconds (the last one increased)

  18. One more comment (that my captcha system did not let thru) from Paddy. Definitely something I shall try out and revert

    Hi Dhananjay, Nice problem – but the results show how NOT to write it in Python. I took the problem and messed around a bit in the Python shell looking at the patterns needed to go once through the loop, then successive times through the loop. I used a Python list as the datastructure – their is very seldom need to create linked lists of classes in Python. I boiled it down to a simple function of SIX lines: def findlast(chainlength = 40, kill = 3): ..firstinc, c = 1, range(1,chainlength + 1) ..while len(c)>1: ….#print firstinc, c ….c, firstinc = [x for n,x in enumerate(c) if (firstinc+n) % 3], \\ ………………….(n+1 +firstinc) %3 ..#print firstinc, c ..return c (I don\’t knoe if indentation is preserved in your comments so replace leading dots with spaces) The relative timings are a speed up of around four times over your solution: Time per iteration for Chain = 386.570000648 microseconds Time per iteration for findlast = 94.0599989891 microseconds The big change is that I only need 6 lines to do it in. If you un-comment the print statements you might better follow the algorithm. I note that Chain(40).kill(3).count == 27, whereas findlast() == 28 On looking at your code in lines 29 and thirty you should beware a possible off-by-one error as the original question has positions counting from 1 where you initialize Persons count from zero. I like Python as even if my solution were wrong, I think it gives me the best chance of getting the algorithm right. Correct then fast, but only if you know what is fast enough. Thanks again for the interesting problem. – Paddy.

  19. joe says:

    I never understood why microbenchmarks are run with bad algorithms. If performance was critical, I would not write the code the way it has been written above. Also, for Java, in performance critical situations, you should run the test multiple times to see how the JIT compilation affects the performance.

    Here is the code I would write for this. It works equally well in Java, C or C++.

    public class Chain
    {
    protected static final String versionID = “@(#) $ Id: $ “;

    private int size;

    public Chain(int size)
    {
    this.size = size;
    }

    public int kill(int nth)
    {
    int[] people = new int[size];
    for(int i=0;i 1)
    {
    curCount++;
    if (curCount % nth != 0)
    {
    people[fillPos] = people[curPos];
    fillPos++;
    }
    curPos++;
    if (curPos == realSize)
    {
    realSize -= (curPos – fillPos);
    curPos = 0;
    fillPos = 0;
    }
    }
    return people[0];
    }

    public static void main(String[] args)
    {
    int ITER = 100000;
    runTest(ITER);
    runTest(ITER);
    runTest(ITER);
    runTest(ITER);
    }

    private static void runTest(int ITER)
    {
    long start = System.nanoTime();
    for (int i = 0 ; i ITER ; i++)
    {
    Chain chain = new Chain(40);
    chain.kill(3);
    }
    long end = System.nanoTime();
    System.out.println(“Time per iteration = ” + ((end – start) / (ITER )) + ” nanoseconds.”);
    }

    }

  20. Paddy3118 says:

    Hi again,
    With a simple addition of two lines (OK and the prior installation of the non-standard package called psyco)
    The Psyco JIT compiler gave the following faster times on the same box:

    Time per iteration for findlast = 32.1899986267 microseconds
    Time per iteration for Chain = 109.370000362 microseconds

    So, that is an order of magnitude speed up.

    - Paddy..

  21. Paddy3118 says:

    Hi Dhananjay,

    I decided to do a more idiomatic solution in Python than your one. As you yourself state, the Python solution closely follows the static language solutions and so it is handicapped.

    I came up with this:

    ..def findlast(chainlength = 40, kill = 3):
    ….firstinc, c = 1, range(1,chainlength + 1)
    ….while len(c)>1:
    ……..#print firstinc, c
    ……..c, firstinc = [x for n,x in enumerate(c) if (firstinc+n) % 3], \
    ………………….(n+1 +firstinc) %3
    ….#print firstinc, c
    ….return c

    (Replace initial dots with spaces)

    The run time was around four times faster, and its is only SIX lines of code versus the ~33 lines of your static-like classes.

    Here is my timings:

    Time per iteration for Chain = 386.570000648 microseconds
    Time per iteration for findlast = 94.0599989891 microseconds

    If you remove the # from the print statements you get printouts to help show what is happeneing.

    To create this I used Pythons interactive shell to play around with removing positions in a list, then on how to wrap around and remove more positions from the start of the reduced list. After the experimentation it was straight forward to create the function, (not everything needs to be a class).

    I note that your result of Chain(40).kill(3).count == 27 wheras I get findlast() == 28
    On looking closer at your lines 18 and 19 I see that you are counting Persons from zero when the problem statement says count from 1. You need to make sure that you don’t get an off-by-one error when you report the position as someones life could be at stake :-)

    All-in-all I am quite happy with finding a solution with Python, I can concentrate on getting the algorithm right. If I need it faster then I would probably translate the Python algorithm or use the psycho JIT compiler.

    - Paddy.

  22. @Paddy,

    Thanks for the interesting solution. I am taking a look at that.

    Your remark on the zero based vs. one based index is interesting – the problem statement says you count from one (which is what shout does), but the primary key for each person in my case is indeed zero based (they could be ‘a’,'b’,'c’ instead of zero or one :) ). Even if I have been coding in python for the last couple of months, its kind of difficult to take the ‘C’ programmer out of me :) .

  23. joe says:

    I never understood why microbenchmarks are run with bad algorithms. If performance was critical, I would not write the code the way it has been written above. Also, for Java, in performance critical situations, you should run the test multiple times to see how the JIT compilation affects the performance.

    Here is the code I would write for this. It works equally well in Java, C or C++.

    public class Chain
    {
    protected static final String versionID = “@(#) $ Id: $ “;

    private int size;

    public Chain(int size)
    {
    this.size = size;
    }

    public int kill(int nth)
    {
    int[] people = new int[size];
    for(int i=0;i 1)
    {
    curCount++;
    if (curCount % nth != 0)
    {
    people[fillPos] = people[curPos];
    fillPos++;
    }
    curPos++;
    if (curPos == realSize)
    {
    realSize -= (curPos – fillPos);
    curPos = 0;
    fillPos = 0;
    }
    }
    return people[0];
    }

    public static void main(String[] args)
    {
    int ITER = 100000;
    runTest(ITER);
    runTest(ITER);
    runTest(ITER);
    runTest(ITER);
    }

    private static void runTest(int ITER)
    {
    long start = System.nanoTime();
    for (int i = 0 ; i < ITER ; i++)
    {
    Chain chain = new Chain(40);
    chain.kill(3);
    }
    long end = System.nanoTime();
    System.out.println(“Time per iteration = ” + ((end – start) / (ITER )) + ” nanoseconds.”);
    }

    }

  24. Paddy3118 says:

    Hi again,
    With a simple addition of two lines (OK and the prior installation of the non-standard package called psyco)
    The Psyco JIT compiler gave the following faster times on the same box:

    Time per iteration for findlast = 32.1899986267 microseconds
    Time per iteration for Chain = 109.370000362 microseconds

    So, that is an order of magnitude speed up.

    - Paddy..

  25. joe says:

    sorry, I’d forgotten how slow mod operator is. Even faster:

    public class Chain
    {
    private int size;

    public Chain(int size)
    {
    this.size = size;
    }

    public int kill(int nth)
    {
    int[] people = new int[size];
    for(int i=0;i 1)
    {
    curCount++;
    if (curCount != nth)
    {
    people[fillPos] = people[curPos];
    fillPos++;
    }
    else curCount = 0;
    curPos++;
    if (curPos == realSize)
    {
    realSize -= (curPos – fillPos);
    curPos = 0;
    fillPos = 0;
    }
    }
    return people[0];
    }

    public static void main(String[] args)
    {
    int ITER = 100000;
    runTest(ITER);
    runTest(ITER);
    runTest(ITER);
    runTest(ITER);
    }

    private static void runTest(int ITER)
    {
    long start = System.nanoTime();
    for (int i = 0 ; i < ITER ; i++)
    {
    Chain chain = new Chain(40);
    chain.kill(3);
    }
    long end = System.nanoTime();
    System.out.println(“Time per iteration = ” + ((end – start) / (ITER )) + ” nanoseconds.”);
    }

    }