Count Lines/Words/Chars

Back to the Language Shootout
Back to Doug's Homepage

[NEWS] [FAQ] [Methodology] [Acknowledgements] [Scorecard] [Conclusion]

[cpu minus startup time]
		[sort]	[sort]
Source Code	CPU (sec)	Mem (KB)	Lines Code	Log
*gcc*	0.15	296	31	log
*ocaml*	0.26	472	22	log
bash	0.39	1348	1	log
*mlton*	0.41	384	56	log
*cmucl*	0.56	4168	32	log
mawk	0.59	528	6	log
gawk	0.63	796	6	log
*bigforth*	0.74	832	34	log
*g++*	0.77	676	31	log
*smlnj*	1.20	3480	60	log
perl	1.26	1224	9	log
*java*	1.94	7532	33	log
python	1.98	1300	13	log
*bigloo*	2.00	676	22	log
se	2.19	388	40	log
ruby	2.36	6528	9	log
*stalin*	2.47	55064	13	log
pike	3.63	2396	27	log
icon	4.13	1996	14	log
lua	4.37	584	13	log
ocamlb	4.74	628	22	log
gforth	5.42	688	34	log
*ghc*	8.44	1640	26	log
php	9.14	1964	13	log
tcl	9.21	1108	18	log
erlang	10.04	135252	19	log
njs	15.59	2800	27	log
guile	50.66	1448	18	log
Languages that compile to native code are in Bold Italics.

[Note: Values have been normalized to fall in the range of 0-10 for aesthetic reasons. Original value ranges are included on the X-axis.

Click here for more detailed data and graphs.

[Results last updated: Tue Oct 9 18:40:32 2001 CDT]

About this test

For this test, each program should be implemented to do the same thing, following the guidelines below:

Each program reads the input from standard input, and counts the lines, words (whitespace delimited tokens), and characters, and outputs each count. The programs should not read the input by more than 4K at a time. To give a baseline of expected performance I allow bash to use an external process (wc). All other solutions should be implemented natively.

This test is essentially the same as the wordcount test from Timing Trials, or, the Trials of Timing: Experiments with Scripting and User-Interface Languages by Brian W. Kernighan and Christopher J. Van Wyk.

Note that as in the original version of this test, whitespace is defined as space, newline and tab characters. This is a little different from the Unix wc command, which defines a few more characters to be whitespace.

The programs can assume that the file ends in a newline, and they should be able to handle arbitrarily long lines.

Input file (it is repeated N times).

The correct output (for N = 500, i.e. a 500 copies of the input) looks like this:

  12500 68500 3048000

Observations

The original C program is significantly slower than the version here which bypasses stdio.

Alternates

This section is for displaying alternate solutions that are either slower than ones above or perhaps don't quite meet my criteria for the competition, but are otherwise worthy of comment.

[NEWS] [FAQ] [Methodology] [Acknowledgements] [Scorecard] [Conclusion]

Back to the Language Shootout
Back to Doug's Homepage

Send me comments or suggestions.