IBM
Shop Support Downloads
Home Products Consulting Industries News About IBM Search
IBM : developerWorks : Linux library
 
Download it now!
PDF (90 KB)
Free Acrobat™ Reader

Ruby: a new language
Introducing the latest open source gem from Japan

Maya Stodte
Author and researcher
February 2000

Contents:
 Language profile
 Language properties
 Ruby vs. other OOLs
 Ruby vs. Perl and Python
 A brief history of Ruby
 The Ruby community
 Features Ruby is missing
 Interview with Ruby's creator
 Resources
 About the author

Author Maya Stodte looks at Ruby, a pure object-oriented scripting language, which has successfully seduced Python and Perl users in Japan. Ruby is now beginning to make its international debut, touting elegant syntax, single inheritance, straightforward OO features, true closures, and an iterator more extensive than most call-back routines. Below Maya looks at the language's profile in depth, providing code comparisons to Perl, and trying to illuminate some of the features that have so entranced Japanese programmers for the past few years.

Yukihiro Matsumoto (see our interview and Resources later in this article) developed Ruby in answer to the Perl community's proposal that "there's more than one way to do it." Ruby is an absolutely pure object-oriented scripting language written in C and designed with Perl and Python capabilities in mind.

Ruby has been gaining popularity over the past few years, especially in Japan, where it was born and conceived. Its features, like Perl's, are designed to process text files and complete systems management tasks. Ruby is highly portable and easily customized, but primarily draws users because of its purity and readability. In particular, CGI code scripters are increasingly frustrated with Perl's occasionally enigmatic code and Python's inelegant and difficult syntax that requires "too much typing." Neither Python nor Perl were designed as object-oriented languages. Consequently, the OO features often feel "added on" and are not fully integrated into the language core, making for cryptic code.

Ruby, on the other hand, is by definition an object-oriented language. It literally treats every data structure as an object, and offers only single inheritance with its methods in order to reduce confusion and encourage simple, straightforward code. Ruby's interface to objects is therefore completely defined, and allows for extensive code alteration in implementing methods.

Based on the syntax of Eiffel and Ada, the power of C, the functions of Python, and the diversity of Perl, Ruby really is an attempt to combine the best of everything. And rumor has it that Yukihiro Matsumoto, affectionately known as Matz, has interwoven Ruby with a mysterious spell to keep us coming back for more. Ruby has, at the very least, been showing up Python in Japan for quite some time.

Language profile
Where Ousterhout's dichotomy draws the line, Ruby is a scripting language. However, like Perl, it offers much more than scripting. Ruby is weakly typed and interpreted, but surpasses even Perl in its handling of complex data structures. Ruby is not, therefore, strictly a glue language. According to Ousterhout, only languages that are strongly typed, able to handle arbitrarily complex data structures, and able to work independently of other programs on their own compilers are high-level system languages. But Ruby, like Perl, has proven to be an exception to this artificial distinction, because it too falls between being interpreted and compiled, and its functions allow for the capability of handling more complex data structures than strict scripting languages like Tcl. Ruby also offers various GUI toolkits. Other well known OOLs of this kind with albeit fewer functions than Ruby include Java, Python, and C++.

Ruby is primarily an interpreted language. This means, of course, that Ruby avoids all of the various problems and annoyances usually associated with compilers and compiled programs and languages. The advantage is that Ruby programs are immediately executable. The disadvantage is that Ruby's execution may be much slower than that of a compiled C program. But since its source code is freely available, and many extensions and modules are already offered on the Ruby home page (http://www.ruby-lang.org/en/), the typical disadvantages of an interpreted language in terms of tedious work are much less severe in Ruby's case.

The fact that Ruby is for the most part an interpreted language is particularly advantageous in the edit-interpret-debug cycle, since it allows you to write and test programs simultaneously. But because it is not strictly an interpreted language, Ruby's interpretive overhead is not as high as that of, say, Tcl. It is always possible to call the interpreted code with a compiler later to increase execution speed.

Because it is object-oriented, Ruby is highly reliable and can write small programs that are reusable and easily modified. Ruby includes all the basic OO features (such as classes, methods, etc.) that other object-oriented languages offer; inheritance, polymorphism, singleton method, and mix-in are all implemented in Ruby. But, unlike other languages offering these OO features, Ruby is a pure object-oriented language. This means that absolutely all things in Ruby are treated as objects. Classes, integers, strings, arrays, and code blocks are all treated as objects. This is what lends Ruby both its power and its simplicity.

Ruby is also fundamentally an extensible language. You can either compile source code for extensions and integrate them into your local system, or download precompiled extensions from the home page and various other sites; the latter option allows you to avoid building your own extensions. You can also easily extend Ruby using C. The online extension repository is available through the Ruby home page, and the extension library directions are in the file README.EXT in the Ruby package.

Language properties
Let's look at Ruby's running speed, portability, learnability, and readability.

Running speed
As long as your OS is capable, Ruby can load extension libraries dynamically. This means that it can link its library to application programs as they are run or loaded immediately, rather than waiting for compilation to complete. It can thus share blocks of library code between several tasks that are run simultaneously, which increases the run speed significantly. It also minimizes the size of each program, as programs don't need to contain the entire library, or copies of the routines they use, individually. The fact that Ruby can load extension libraries dynamically partially makes up for loss of speed normally associated with an interpreted language.

Portability
Although it was created on NEWS-OS (Sony's BSD variation), then written for SunOS and then written for Linux, Ruby is highly portable across all platforms. Because Ruby's threading is independent of OS, its portability is enhanced. This means that regardless of OS support, multithreading is available in Ruby for every platform (including MS-DOS!). With multithreading support, a single CPU can work on several tasks at the same time, and therefore minimize the time required to switch between threads. Each time a thread is changed, as much as possible of the program execution environment is saved and transferred, minimizing changes to the environment, regardless of your OS! Ruby can be run on most versions of UNIX, DOS, Windows 95/98/NT, Mac, BeOS, OS/2, etc.

Learnability
One of Ruby's chief attractions is its learnability. Because of the operator overloading feature (polymorphism), Ruby's operators are easily defined and are syntax sugar for the methods. In accordance with this feature, Ruby uses single symbols to represent operators with different argument types. A typical example of this feature in many OO languages is the use of "-" for both monadic operators, such as negation, and dyadic operators, such as subtraction; another is the use of "+" for addition of both floating-point numbers and integers.

Ruby's mark and sweep garbage collector, which literally works with all Ruby objects, eliminates the need to maintain reference counts in extension libraries.

Many garbage collection techniques require that each memory cell contain a count of the number of other cells that point to it; if the count reaches zero, the cell is freed and its pointers to other cells are followed to decrement their counts recursively. Such methods of collection cannot therefore handle circular data structures because cells in such structures will never have a zero reference count and would never be reclaimed.

However, Ruby's mark and sweep garbage collection salvages dynamically allocated storage during execution time through periodic storage reclamation. Each cell initially reserves a bit for marking whether or not it is clear, and unmarked cells are freed once they have been traced from the root during garbage collection. The mark and sweep garbage collector also significantly reduces the memory requirements of any program written in Ruby.

In addition to the garbage collector, Ruby's Application Programming Interface (API) makes writing C extensions easier. The API is written in C, ensuring portability. It provides an interface between Ruby and C and an interpretation of call-by-value and call-by-reference arguments in both directions.

An Application Archive is available through the Ruby home page (http://www.ruby-lang.org/en/raa.html). The archive includes a "What's New" section, a list of applications from the interpreter, text, and mail applications, and a parser generator among other things. Since not all applications are currently stable, the archive lists the condition and update schedule for each application. The archive also offers a library of code with calendars, databases, and GUIs. An English speaking mailing list is available. To get it, send an e-mail to ruby-talk-ctl@netlab.co.jp. with subscribe First-Name Last-Name in the mail body.

Readability
One of Ruby's main advantages over Perl and Python is its readability. Due to its single inheritance features and pure object orientation, Ruby's code is far less cryptic and confusing than usual. Readability is especially important in the maintenance stages, when tools are expanded or changed. For example, Perl's use of "@", "$", and "%" often causes considerable grief and confusion. The following code in Perl:

 
 @array = (1, 2, 3);
  puts $array[1]; 
  %hash = ('foo1' => 'bar1', 
           'foo2' => 'bar2',
           'foo3' => 'bar3');
  puts $hash{'foo1'};

translates into Ruby as:


array = [1,2,3]
  puts array[1]
  hash = {'foo1' => 'bar1',
          'foo2' => 'bar2',
          'foo3' => 'bar3'}
  puts hash['foo1']
 

(Code example from Masaki Suketa, CQN02273@nifty.ne.jp).

Ruby's syntax is also widely known for its simplicity, which makes for code that is user-friendly and readable on the one hand, and which features a highly powerful grammar and semantics on the other. For example, Perl requires the use of the semicolon at the end of virtually all statements; by contrast, Ruby anticipates the end of all statements without such a prompt.

In Perl:


print 'test';
print '1', '2', '3';

translates in Ruby to:


print 'test'
print '1',  '2', '3'

and in C to:


printf("test");   /* (^^; */  

An example of a Ruby script that exemplifies its power and elegance is:


  [ 'string', 2, ['array in array'] ].each do |i|
    ret = case i
          when /regexp/ then 1
          when 'string' then 2
          when Class    then 3
          else               4
          end
  end
  "Content-Transfer-Encoding".split('-').filter{|i| i.upcase }.join('-')


(Code examples by Minero Aoki, aamine@dp.u-netsurf.ne.jp)

A final example of readability and power in Ruby's syntax is its CLU-inspired block passing features. These are commands surrounding Ruby's code -- such as { ... } or do ... end -- that can be passed to methods or converted to closures. With this feature, an object of a certain class knows how to perform functions itself, rather than the other way around, where a function knows how to handle different types of objects. For example, a sort criterion can be specified as follows under block passing:


    array.sort{|i,j| [i.date, j.name] <=> [i.date, j.name]}

A grammatical element "{|..| ....}" is called a block. "{|..| ....}" can be written as "do |..| .... end".

In this example we are assuming that each member of "array" has the methods "date" and "name". "x <=> y" returns -1, 0 or 1 for "(x < y)", "(x == y)" or "(x > y)" respectively, and "sort" does a quick sort by the block value.

Furthermore, in Ruby it is possible to define a Schwartzian transformation on Array:


class Array
      def sort_by_map
	collect{|i| [yield(i),i]}.sort.filter{|i| i[1]}
      end
    end

where "yield" returns the result of block evaluation, and Ruby can sort more efficiently with the criterion above:


array.sort_by_map{|i| [i.date, i.name]}


(Code examples by Goto Kentaro)

How does Ruby differ from other OO languages?
Ruby differs in pure object orientation, inheritance, and closures.

Pure object orientation
The major advantage of Ruby over languages like Python and Perl is that Ruby is a complete and pure -- but open -- OO scripting language. All data, without exception, are treated as objects. For example, Ruby treats an integer, which is automatically converted, as an instance of class Fixnum or Bignum, depending on its size. (Smalltalk also has this feature, but is far less comprehensive and powerful than Ruby.) In Ruby, integers are used without taking into account their internal representation.

Any program written in Ruby can add methods to both classes and instances of classes at runtime. Consequently, two instances of the same class can behave differently at the same time.

Inheritance
To avoid complexity and messy syntax, Ruby features only single inheritance. This means that in Ruby, sub-classes can only be derived from one parent. This is in opposition to multiple inheritance features that allow sub-classes to be derived from multiple parent classes, which are not derived from one another. Ruby does, however, understand modules -- collections of methods -- and any class can import any module, getting it for free.

Closures and the iterator
The scope of each variable in Ruby is specified by simple naming conventions rather than by variable declarations. The scope of an identifier is the region within which it represents a function or procedure: "var" specifies a local variable, "@var" specifies an instance variable, "$var" specifies a global variable. It is unnecessary to use the "self" variable for every instance member. Ruby uses closures with present variable bindings instead of unnamed functions.

The following tree structures, written in Ruby and Perl, are instructive. In Ruby, the tree class can be written very naturally as:


          class Tree
	  attr :parent
	  attr :children
	  def initialize(parent,children)
	    @parent = parent
	    @children = children
          end
          ...
	end

Perl does not offer this kind of class definition; rather it performs the tree in many different ways. For example,



	$node = [$parent, \@children];

introduces a node using a reference to an array. However, this notation makes it necessary to remember that $$node[0] is the parent and $$node[1] is the reference to the child nodes. This method can be improved upon in Perl, using anonymous hash:


	$node = {"parent" => $parent, "children" => \@children};

However, this is less efficient than Ruby's version. Perl's equivalent OO notation using the package, on the other hand, looks strange in comparison to the simple Ruby example above:


	package Tree;
	sub new {
	  my $type = shift;
	  my $self = {};
	  $self->{'parent'} = $_[0];
	  $self->{'children'} = $_[1];
	  bless $self;
	} 

Ruby's iterator resembles what many languages call a "callback routine," although it does much more than a typical callback routine. Consider, for example:


[ 1, 2, 3 ].each do |item|
    print item
  end

This code is equivalent to:


  array = [ 1, 2, 3 ]
  i = 0
  while i < array.size do
    item = array[i]
    print item
    i += 1
  end

The "each" iterator allows you to shorten code in this manner. This method allows access to Array without requiring knowledge of its access method. Ruby also allows you to define your own iterator. In comparison, Perl has the "each" iterator, but you cannot define it. Below is an example of a user-defined "each" method in Ruby:


  class Array                 # redefine Array class (!)
    def my_each
      i = 0
      while i < self.size do
        yield self[i]         # this is key-point
        i += 1
      end
    end
  end

The iterator was initially set to abstract the loop, but was changed to the more useful 0/1 times iteration. A good example of this usage is File.open in the following code:


  File.open( filename ) do |f|
    line = f.readline
       :
  end

The File.open method opens the file and if the iterator blocks (do...end) finish the method, or if an error occurs, the file is automatically closed.

Using iterator, it is also possible to access data inside an object without exposing instance variables. For example,


	class Tree
	  ...
	  def each_child
	    for c in @children
	      yield c
	    end
	  end
	  ...
	end

allows you to access each child node without accessing the instance variable directly:



	t = Tree.new(p,[c1,c2])
	..
	t.each_child do |c|
	  # do something with c
	end


(Code examples by Akinori Ito,
aito@ei5sun.yz.yamagata-u.ac.jp)

How does Ruby compare to Perl and Python?
Ruby has been rumored to be "better Perl than Perl." Although it owes much of its design and many of its features to Perl, ranging from $_ short cuts to extended regular expressions, Ruby is a complete OO language. Perl, on the other hand, is not, and the language's OO features occasionally attach clumsily and ineffectively to straight Perl. Ruby does not have this problem. As mentioned above, Ruby uses prefixes to specify variable scope (as opposed to Perl's specification by data type) and uses far less punctuation than Perl. For these reasons, Ruby script is less cryptic than Perl, and its syntax is more learnable, and so easier to modify.

Although Python is similar to Ruby in design and purpose, there are significant differences. Ruby's statement structure is more conservative than Python's. It is not necessary to write "self" to access the attributes of an object in Ruby. Ruby does not access object attributes by default as Python sometimes does. Ruby's functions and methods, unlike Python's, are not first class objects. Ruby converts small integers and long integers automatically; Ruby does not have tuples, and all data in Ruby are class instances. Though it is rarely claimed that Ruby is more powerful than Python, Ruby is faster, more natural, more elegant, and increasingly more popular.

A brief history of Ruby
Ruby was born on Feb. 24, 1993. Matz was talking with a colleague about the possibility of object-oriented scripting languages (as opposed to programming languages that offer OO support). Matz didn't like Perl. "It had the smell of a toy language. It still does." Python, too fell short of Matz' high standards, being a kind of hybrid OO language whose object features feel added on. Matz, "as a language maniac and OO fan for 15 years, really wanted a genuine, easy-to-use, object-oriented scripting language." There was no such language, so he created Ruby.

Since February of 1993, mailing lists have been established, Web pages have formed, and a community has begun to grow around Ruby. The mailing lists, in particular, have been instrumental in creating and stabilizing the language. The oldest Ruby-list has 14,789 messages to date. Most of Ruby's scripters and developers have come from Python and Perl, though a few are fresh young upstarts.

Ruby was written in C and based on Perl; in fact Ruby is like a streamlined version of Perl -- not "too cryptic and weird," in Matz' words -- with an emphasis on correct object orientation. The examples available through the Ruby home page illustrate Ruby's strong ties to Perl. Here, for instance, is an implementation of the finger command:


#! /usr/local/bin/ruby
require "socket"

ARGV.push "" if ARGV.size == 0

ARGV.each{|adr|
  name, dom = adr.split("@")

  if dom                 
    remote = TCPsocket.open(dom, 'finger')
    print "[#{dom}]\n"                                                  
    begin                                                             
      remote << if name then "#{name}\n" else "\n" end    
      while remote.gets                
        # you should use read instead of gets
        # to defend the buffer from overflow.
        print                                 
      end
    ensure
      remote.close
    end               
  else          
    print `finger #{name}` # I always choose easier way!
  end                     
}  

In Matz' own words, "I [simply] decided to make it. It took several months to make the interpreter run. I put in the features I love to have in my language, such as iterators, exception handling, garbage collection. Then, I reorganized the features in Perl into a class library, and implemented them. I posted Ruby 0.95 to the Japanese domestic newsgroups in Dec. 1995." Ruby 1.0 was released in Dec. 1996. 1.1 was released in Aug. 1997. 1.2 (stable version) and 1.3 (development version) were released in Dec. 1998. The new stable version of Ruby 1.4.3 was released Dec. 1999. It is available from the site along with a reference manual and compiled binary for Win32/DOS as well as four ftp mirrors.

The Ruby community
Ruby users come from all walks of life, including students and professors, researchers in speech recognition and natural language processing, software developers both in the commercial and open source community, specialists in parallel computing, ASIC designers, Debian developers, and network administrators.

They come to Ruby from newsgroups (fj.sources being one of the biggest), online communities (freshmeat.net, NetNews, the Java-House MailingList, Nifty-Serve), magazines (Linux Japan), and of course, by word of mouth.

As Nishikawa (nyasu@osk.3web.ne.jp), a Ruby user, relates, "I was searching for a good scripting language which can deal with databases when I started enjoying the horse races. One of my friends encouraged me to bet, and suggested an algorithm competition of the winning collection amount. I got the result database of the past horse race and put them into MySQL database in my Linux box. C/C++ programming is not suitable for trial-and-error. I started it with Excel95 VBA and ODBC driver. Soon I found this was also a crazy way. I was going to look for a good scripting language by the web search engines, and I found Ruby. Please don't ask me about the result of the competition!"

Ruby attracts users for many reasons. Some users are won over by the ease with which it is possible to code Gtk+ applications. Others praise Ruby for its simplicity, clean syntax, powerful string manipulation and purity of design as an OO language. Some users are drawn to Ruby from Perl, complaining that Perl doesn't handle complicated data structures well, that it doesn't handle multi-dimensional arrays effectively, and that it's difficult to read. Some are drawn to Ruby's Japanese supporting libraries. Still others are drawn because Ruby is smarter than other languages, can write larger programs, has Tk and Gtk interfaces, and most importantly, is well balanced and natural. And, of course, Ruby's iterator is a big attraction.

Most of Ruby's users formerly used C, Perl, C++, Java, Python, sh, csh, and Awk.

Common Ruby programs and applications range from mail readers and schedulers to small and medium-sized applications, text format conversions, data processing and statistical analysis, prototyping of numerical systems, networking, manipulating databases, analyzing and predicting horse races, synchronizing PalmPilot with MySQL database, implementing of CGI, SMTP client, POP client FTP client tools, and converting files to other formatted files, or extracting the information from large files.

Features Ruby is still missing
Ruby could benefit from features such as a Qt extension like PyQT, a destructor like C++, a CORBA extension, a better method to arrange its libraries, and more class libraries, especially for GUI applications. In general, Ruby could benefit from more advanced mathematical features. Ruby could also use a compiler to reduce execution speed, although there is a Ruby-to-C compiler in the experimental stage as well as a JIT compiler for x86 processors. Both can be found at
http://easter.kuee.kyoto-u.ac.jp/~hiwada/ruby/.

Resources

About the author
Maya Stodte is a freelance writer and researcher. Most recently, she has worked for Renmen Publishing, a New York-based research firm. She can be reached at
mstodte@pop.rcn.com.


 
What do you think of this article?

Killer! Good stuff So-so; not bad Needs work Lame!

Comments?


Privacy Legal Contact