Count Lines/Words/Chars

Back to the Language Shootout
Back to Doug's Homepage

[NEWS]   [FAQ]   [Methodology]   [Acknowledgements]   [Scorecard]   [Conclusion]  

[cpu minus startup time]
    [sort] [sort]  
Source Code CPU (sec) Mem (KB) Lines Code Log
gcc0.1529631log
ocaml0.2647222log
bash0.3913481log
mlton0.4138456log
cmucl0.56416832log
mawk0.595286log
gawk0.637966log
bigforth0.7483234log
g++0.7767631log
smlnj1.20348060log
perl1.2612249log
java1.94753233log
python1.98130013log
bigloo2.0067622log
se2.1938840log
ruby2.3665289log
stalin2.475506413log
pike3.63239627log
icon4.13199614log
lua4.3758413log
ocamlb4.7462822log
gforth5.4268834log
ghc8.44164026log
php9.14196413log
tcl9.21110818log
erlang10.0413525219log
njs15.59280027log
guile50.66144818log
Languages that compile to native code are in Bold Italics.

[Note: Values have been normalized to fall in the range of 0-10 for aesthetic reasons. Original value ranges are included on the X-axis.

Click here for more detailed data and graphs.

[Results last updated: Tue Oct 9 18:40:32 2001 CDT]


About this test

For this test, each program should be implemented to do the same thing, following the guidelines below:

Each program reads the input from standard input, and counts the lines, words (whitespace delimited tokens), and characters, and outputs each count. The programs should not read the input by more than 4K at a time. To give a baseline of expected performance I allow bash to use an external process (wc). All other solutions should be implemented natively.

This test is essentially the same as the wordcount test from Timing Trials, or, the Trials of Timing: Experiments with Scripting and User-Interface Languages by Brian W. Kernighan and Christopher J. Van Wyk.

Note that as in the original version of this test, whitespace is defined as space, newline and tab characters. This is a little different from the Unix wc command, which defines a few more characters to be whitespace.

The programs can assume that the file ends in a newline, and they should be able to handle arbitrarily long lines.

Input file (it is repeated N times).

The correct output (for N = 500, i.e. a 500 copies of the input) looks like this:

  12500 68500 3048000

Observations

The original C program is significantly slower than the version here which bypasses stdio.

Alternates

This section is for displaying alternate solutions that are either slower than ones above or perhaps don't quite meet my criteria for the competition, but are otherwise worthy of comment.

[NEWS]   [FAQ]   [Methodology]   [Acknowledgements]   [Scorecard]   [Conclusion]  


Back to the Language Shootout
Back to Doug's Homepage
Send me comments or suggestions.