September 14, 2006

Programming in the small - removing duplication.

Many people condemn "copy-and-paste programming", know the DRY principal (don't repeat yourself) and advocate removal of duplication, but I still see lots of code that contains plenty of duplication - sometimes by the very same people.

Based on the sorts of duplication I see most often, I think this might be that if the amount of code involved isn't very large, then people don't consider it worth refactoring to remove the duplication. However, removing duplication of even small amounts of code can make the code shorter, easier to read and clarify its meaning. The benefits can be even greater if the amount of duplication is greater - so fix them - but don't ignore the easy to fix, smaller bits of duplication either.

A typical duplication

Consider the following code:

	public void foo(Map configuration){
		String name = (String) (configuration.get("name") == null ? 
					"defaultName" : 
					configuration.get("name"));
		int value = (Integer)(configuration.get("value") == null ? 
					0 : 
					configuration.get("value"));
		//...

There is obvious duplication - but I see code with this level of duplication quite often - even by people whose code is otherwise good. There are several ways to eliminate this duplication and make the code clearer.

Extract method

The obvious candidate for removing the duplication is to extract (at least one) method. I'd probably go for something like:

	public void foo(Map configuration){
		String name = (String) getValueOrDefault(configuration, "name", "defaultName");
		int value = (Integer) getValueOrDefault(configuration, "value", 0);
		//...
	}

	private Object getValueOrDefault(Map configuration, Object key, Object defaultValue) {
		Object value = configuration.get(key);
		return value == null ? defaultValue : value;
	}

that is, both extracting a method for the duplicated code, and extracting a variable for the duplicated expression. The purpose of the code is now clearer - it was to get the configuration value or a default value if it wasn't defined. The name of the extracted method tells you what the purpose of the code is.

Same duplication in other classes

What if I've got similar code to that which has been refactored into the getValueOrDefault method, in some other class, that currently isn't related by inheritance? You could introduce some common superclass and put it there - but that's often a bad idea (that I don't really want to go into here, but have written about, to some extent, in another article).

One way to share this code is to put it in a "Utils" class as a static method, for example:

public class Utils {
	public static Object getValueOrDefault(Map configuration, Object key, Object defaultValue) {
		Object value = configuration.get(key);
		return value == null ? defaultValue : value;
	}
}

which, if you use Java 5.0 static imports, doesn't look any different to the caller than having the method defined in the same class as the caller.

Using a decorator

The problem with the method as defined so far is that I have to pass the Map into it, but what I really want is to for the method to be on Map; however in Java I can't (yet) do that. A way to make the code read more like I want would be to define a decorator that adds the method that I want. This can be defined as:

public class MapWithDefaultableValues implements Map {
	private Map undefaultedValues;

	public MapWithDefaultableValues(Map undefaultedValues) {
		this.undefaultedValues = undefaultedValues;
	}
	
	public Object getValueOrDefault(String key, Object defaultValue) {
		Object value = undefaultedValues.get(key);
		return value == null ? defaultValue : value;
	}

	public void clear() {
		undefaultedValues.clear();
	}

	//... etc for all other methods of Map

The eclipse IDE has a handy feature that generates the code for delegating all the methods of an interface to a field - see "Source > Generate Delegate Methods...".

Then, the calling code can become something like:

	public void foo(Map configurationAsMap){
		MapWithDefaultableValues configuration = new MapWithDefaultableValues(configurationAsMap);
		String name = (String) configuration.getValueOrDefault("name", "defaultName");
		int value = (Integer) configuration.getValueOrDefault("value", 0);
		//...

You might even find that there are other things that you'd like to be able to do with your configuration objects that also removes duplicated code, and have a class called something like Configuration that doesn't necessarily implement Map but instead provides the interface that is simpler for client code.

Look out for duplication - it might be telling you something

Consider the following code:

	public void something(Thing thing){
		//...
		thing.foo(thing.bar());
		thing.baz();
		//...

The duplication here is of thing - you could call it repetition rather than duplication if you like - it's not much like the earlier example - what is the repetition of thing telling you? One possibility is that the code might be better with this code on the class Thing instead of in the method where it is currently. If there are other instances of this code, or maybe just of part of it (like thing.foo(thing.bar())) then that might tell you even more strongly that the code is in the wrong place.

You might want to add a method on Thing:

	public void something() {
		foo(bar());
		baz();
	}

and change the call to:

	public void something(Thing thing){
		//...
		thing.something();
		//...

There are lots of other examples. Keep a look out for any type of duplication at all in both the code base and in what you are doing (e.g. if you are frequently typing something that seems very similar to something you've already done). In some cases, it's a sign that you need to do some refactoring. In other cases it's a sign that you need a better tool (for example, I wrote MockMaker [now obsolete, unless you're on pre 1.3 Java] as the result of noticing that I was constantly writing similar code for hand written mock objects - back in the days before anything like JMock).

I'll tell you what I want, what I really, really want.

I'd like developers to take more notice of duplicated code and it's removal. Not only can it make your code smaller and easier to read, sometimes better design and abstractions materialise as a result of simply removing duplication (essentially the hypothesis behind my PhD thesis).

Posted by ivan at 8:55 PM Copyright (c) 2004-2008 Ivan Moore | Comments (3)

September 4, 2006

Programming in the small - formatting.

Following on from the previous article in the series, another "programming in the small" thing that I look out for is formatting. Good formatting helps make the code easier to read - bad formatting does the opposite.

Formatting should match structure.

Consider the following code:

	public void foo(){
		if(someCondition())
			doSomething();
			doSomethingElse();
	}

At first glance, it looks like doSomethingElse() is only executed if someCondition() is true. However,
it is executed no matter what the value of the condition is. Possibly the code was originally:

	public void foo(){
		if(someCondition())
			doSomething();
	}

and someone had to add doSomethingElse() to be executed after doSomething(). This is why I prefer to always use the {}s even when syntactically redundant.

In Python, the indentation of code is significant - personally, I like this - some people hate it. The reason I like it is because I think code should be formatted with indentation that matches the logical structure of the code, and by making indentation significant, Python enforces this, while at the same time removing the necessity for syntax to delimit code blocks (e.g. {}s or end or equivalent keywords.

Formatting shouldn't waste space

I prefer formatting that doesn't use too much space - in particular, vertical space. Even large screens don't allow for loads of lines vertically. Generally, methods/functions shouldn't be that long anyway, but by using vertical space wisely, you can see more code (e.g. several methods) in one screen without compromising readability.

In particular, multiple blank lines (instead of just one), or blank lines that do nothing for the readability of the code, drive me a bit nuts. I also prefer not to have symmetrical {}s - i.e. not having { on a new line, because I don't think it adds anything to readability (if indentation is correct) but just takes more vertical space. (I just found an article which names different bracing styles - apparently, the style I favour is called "K&R").

When talking about formatting with Romilly Cocking he mentioned about the fact that our vision requires things that we read to be within quite a small region of our field of view - (see macula) - and for that reason it's best to try to keep code reasonably compact. (I hope I've represented what Romilly said - hopefully he'll blog about it.).

Formatting doesn't have to be perfect, but shouldn't be sloppy either

Obviously sloppily formatted code shows a lack of care of code quality - which can be distracting. On the other hand, minor formatting inconsistencies - that is, ones that you have to look for (e.g. using an automated tool) - are probably not worth looking for. In general, if in doubt about formatting, I recommend that the team finds an automated formatter (typically whatever is built in to your IDE) that you can run on your code if you want. However, I wouldn't have code always auto-formatted because that doesn't always do what you want, and I think sometimes you should "get over" minor inconsistencies, but take more important formatting issues seriously.

I'll tell you what I want, what I really, really want.

I'd like developers to take care over the appearance of code, as well as the contents. It's not the most important thing, but it is important.

Posted by ivan at 9:34 PM Copyright (c) 2004-2008 Ivan Moore | Comments (3)

August 6, 2006

Programming in the small - conditionals.

Following on from the previous article in the series, another "programming in the small" style thing that I often see, even by people who understand non-interacting fermions, are conditionals that are more complicated than they need to be.

Can you handle the truth?

Consider the following (Java) code:

	public boolean isWhatever() {
		if(someCondition() == true){
			return true;
		}else{
			return false;
		}
	}

Must control fist of death.

someCondition() == true can be replaced by someCondition(), which is easier to read.

There's something about the existence of the "true" which I think sends my brain (and I guess others) thinking about programming rather than the meaning of the code.

For the opposite value of the condition, some people do someCondition() == false rather than !someCondition() which I have heard justified on the basis that ! is easy to miss - which is a reasonable point but again I find obscures the meaning (if you read the code rather than interpret it as a program). It's not: "if the water to make my tea is boiling hot is false then throw a hissy fit" - no - it's "if the water to make my tea is not boiling hot then throw a hissy fit".

In python the equivalent of ! is not, which I think is much more preferable - but unfortunately my programming in the small articles don't result in changes to languages quite yet. I think ! is typical of the wrong-headed "C" based nonsense that also includes each case of a switch statement needing a break - but maybe that should be another article.

Anyway, back to the plot. This leaves you with:

	public boolean isWhatever() {
		if(someCondition()){
			return true;
		}else{
			return false;
		}
	}

Must still control fist of death.

This can be replaced by:

	public boolean isWhatever() {
		return someCondition();
	}

less is most definitely more. I've also seen someCondition() ? true : false, which is also pointless.

Do two wrongs make a right?

David Peterson suggested another category of over complicated conditional (thanks David) - where the conditional is negatively named, for example:

if (!isTeaNotHot() && isCupNotEmpty()) { bewareOfSpillage(); }

is much easier to read with positively named methods:

if (isTeaHot() && !isCupEmpty()) { bewareOfSpillage(); }

The truth will set you free.

Other conditional related stuff that annoys me (but less obviously - I probably won't spill my tea when faced with such code) is where there are more if statements than necessary, that could be replaced by judicious use of &&, || and ==.

Consider the following code:

	public boolean isWhatever() {
		if (someCondition()) {
			if (someOtherCondition()) {
				return true;
			} else if (yetAnotherCondition()) {
				return true;
			} 
		}
		return false;
	}

this can be replaced by:

	public boolean isWhatever() {
		return someCondition() && (someOtherCondition() || yetAnotherCondition());
	}

which is so much easier to understand (given someCondition(), someOtherCondition() and yetAnotherCondition() mean something). In general, I think fewer if statements lead to a more easily understood meaning.

How true.

Something based on my current reality:

	public boolean successfullJobInterview(boolean candidateIsSuitable, boolean candidateIsOfferedJob) {
		if (!candidateIsSuitable && candidateIsOfferedJob) {
			return false;
		}
		if (candidateIsSuitable && !candidateIsOfferedJob) {
			return false;
		}
		return true;
	}

it looks reasonably logical and simple - how could this be improved?

	public boolean successfullJobInterview(boolean candidateIsSuitable, boolean candidateIsOfferedJob) {
		return candidateIsSuitable == candidateIsOfferedJob;
	}

which is simpler if you think about it.

Posted by ivan at 9:02 AM Copyright (c) 2004-2008 Ivan Moore | Comments (11)

July 24, 2006

Closures for refactoring

Comments about my previous article reminded me (in an indirect sort of way) of another "programming in the small" thing - using closures for refactoring.

Closures - good for collections

Fans of languages with closures often show collection processing sorts of things - like most of the examples in one of Martin Fowler's articles that I showed python translations for, and python's alternative to this sort of thing - called "list comprehensions". However, closures also allow for some neat refactorings independent of anything to do with collections.

Duplication that looks difficult to get rid of

Let's say you've been working with David and Ade and you've produced the definitive way of reading a byte from a file. Later you find you need to read a UTF encoded string from a file. You use the same "pattern" (in a loose sense of the word) in both cases.
	int readByteFromFile(String fileName) throws IOException {
		InputStream inputStream = new FileInputStream(fileName);
		try {
			return inputStream.read();
		} finally {
			try {
				inputStream.close();
			} catch (IOException ex) {
				log("Hi Ade.");
			}
		}
	}
	
	String readStringFromFile(String fileName) throws IOException {
		InputStream inputStream = new FileInputStream(fileName);
		try {
			DataInputStream dataInput = new DataInputStream(inputStream);
			return dataInput.readUTF();
		} finally {
			try {
				inputStream.close();
			} catch (IOException ex) {
				log("Hi Ade.");
			}
		}
	}
You don't want to have to keep repeating this every time you want to do something similar, but slightly different. How do you refactor the similarities between these methods? The fact is that in Java I see lots of this sort of duplication - because in Java, getting rid of this sort of duplication is often verbose and clunky. However, it is possible, and sometimes very worthwhile.

Using closures to get rid of the duplication

Java provides a way to define some sort of closures - using anonymous inner classes. Here's how you could remove the duplication in this example. It's quite painful. My fingers hurt as I typed the code. Then I got a headache.
	interface DoWithFile {
		Object execute(FileInputStream inputStream) throws IOException;
	}

	public Object withFile(String fileName, DoWithFile runThis) throws IOException {
		FileInputStream inputStream = new FileInputStream(fileName);
		try {
			return runThis.execute(inputStream);
		} finally {
			try {
				inputStream.close();
			} catch (IOException ex) {
				DuplicatedCode.log("Hi Ade.");
			}
		}
	}

	int readByteFromFile(String fileName) throws IOException {
		return ((Integer) withFile(fileName,
				new DoWithFile() {
					public Object execute(FileInputStream inputStream)
							throws IOException {
						return inputStream.read();
					}
				})).intValue();
	}

	String readStringFromFile(String fileName) throws IOException {
		return (String) withFile(fileName, new DoWithFile() {
			public Object execute(FileInputStream inputStream)
					throws IOException {
				DataInputStream dataInput = new DataInputStream(inputStream);
				return dataInput.readUTF();
			}
		});
	}
Despite how horrible this looks - it is sometimes worth it.

... and in Python

In languages with better support for closures (or at least support for passing functions around), it can look a whole lot better. A direct translation of the original code in Python (taking the liberty of defining a class DataInputStream) would be:
def readByteFromFile(fileName):
	inputStream = file(fileName)
	try:
		return inputStream.read(1)
	finally:
		try:
			inputStream.close()
		except:
			log("Hi Ade.")

def readStringFromFile(fileName):
	inputStream = file(fileName)
	try:
		dataInput = DataInputStream(inputStream)
		return dataInput.readUTF()
	finally:
		try:
			inputStream.close()
		except:
			log("Hi Ade.")
Passing a function as a parameter (as a closure) gives:
def withFile(fileName, function):
	inputStream = file(fileName)
	try:
		return function(inputStream)
	finally:
		try:
			inputStream.close()
		except:
			log("Hi Ade.")

def readByteFromFile(fileName):
	def execute(inputStream):
		return inputStream.read(1)
	return withFile(fileName, execute)

def readStringFromFile(fileName):
	def execute(inputStream):
		dataInput = DataInputStream(inputStream)
		return dataInput.readUTF()
	return withFile(fileName, execute)
which hurts much less than the Java version (but slightly more than the Smalltalk version would - but that's another story). [Martin Fowler shows in his article how a Ruby caller of similar code would look "File.open(filename) {|f| doSomethingWithFile(f)}".] Python doesn't have great syntax for defining closures - it would be nice to be able to define the two "execute" functions anonymously. You can do this for one of the methods using "lambda" but not the other because of the limitations of "lambda". In Smalltalk or Ruby the syntax for this is better. Here's the "lambda" version of the "readByteFromFile" function.
def readByteFromFile(fileName):
	return withFile(fileName, lambda inputStream: inputStream.read(1))

I'll tell you what I want, what I really, really want, because David wants to know.

I'd like Java to have better support for closures. I'm impressed with C#'s anonymous delegates, although, it's still not as good as Smalltalk. I'd also like developers to put more effort into reducing duplication, even when the language gets in the way.
Posted by ivan at 9:09 PM Copyright (c) 2004-2008 Ivan Moore | Comments (3)

July 18, 2006

Programming in the small - finally.

Following on from the previous article in the series, there are some "programming in the small" style things that I don't see as often as I'd expect.

"Non-structured programming" - using finally

Consider the following (Java) code:

	int readValue() throws IOException {
		InputStream file = new FileInputStream("temp.txt");
		int result;
		try {
			result = file.read();
		} finally {
			file.close();
		}
		return result;
	}

This can be written slightly shorter, and I think slightly better:

	int readValue() throws IOException {
		InputStream file = new FileInputStream("temp.txt");
		try {
			return file.read();
		} finally {
			file.close();
		}
	}

However, I hardly ever see other people using this style. This way of using finally came up when writing a previous article with Bjorn Freeman-Benson - he likes this style too. I don't expect to see this style all the time - just a lot more than I do, which is hardly ever.

Comments:

Comments aren't working at the moment, but here's one by email from David Peterson (and he's right, you know. I've altered the code as David's suggestion - I originally had the code working but a bit wrong due to "brain fart"). (David's "I got rejected (blacklisted)" applies to everyone at the moment).

Hi Ivan,

It's better not to put the creation of the stream within the try.. finally. If the constructor throws an exception then you will end up calling close() on a null reference.

So, an even shorter (and safer) version is...

    int readValue() throws IOException {
        InputStream stream = new FileInputStream("temp.txt");
        try {
            return stream.read();
        } finally {
            stream.close();
        }
    }

David

P.S. I tried posting a comment, but I got rejected (blacklisted). :-(

Another comment, from Adewale Oshineye:

There's a reason people don't write methods like the one in your
example. The IOException that's thrown can be misleading. The call to
read() can throw an exception which then gets buried if the call to
close() also throws an exception.

The smallest safe version of that method looks something like:

int readValue() throws IOException {
       InputStream inputStream = new FileInputStream("temp.txt");
       try {
               return inputStream.read();
       } finally {
               try {
                       inputStream.close();
               } catch (IOException e) {
                       //we can't do anything about this except log it
                       LOG.warn("Problem closing inputStream", e);
               }
       }
}

My response: Thanks for the comment - I should have picked a better example - catching the possible IOException from the close is unrelated to the point of having a return inside a try finally and some code you want to execute after the return and my example has detracted from that point.

Posted by ivan at 8:39 PM Copyright (c) 2004-2008 Ivan Moore | Comments (2)

June 26, 2006

Programming in the small - comments.

Following on from the previous article in the series, another "programming in the small" thing that I look out for is comments.

I've been indecisive about publishing this article because it seems so obvious; but I know from seeing code at dozens of different clients over the years that somehow it isn't. I think it's another example of a cargo cult.

Pointless comments

A comment on a method "getName" that says "returns the name" doesn't tell you much. I've seen many such comments; usually the result of a project standard whereby all methods have to have a comment. You may think that such a comment is harmless - but it isn't because it takes time to write, time to read and makes the source longer without adding anything.

Misleading comments

Even worse than pointless comments are misleading ones. I think it was Weinberg in The Psychology of computer programming who made an observation that when debugging, misleading comments can make it take longer to work out what's happening than if you delete all the comments. Misleading comments can lead you into a truth trap.

Useful comments that shouldn't exist

Sometimes comments can be useful; for example telling you that some lines of code together mean something, for example "estimated journey time". However, it would be better to extract those lines of code into a method called, for example, "estimatedJourneyTime". Often comments can be an indication that the names of a variable should be improved, or a method extracted.

Beck on comments

Kent Beck wrote about comments in Smalltalk best practice patterns - worth reading whether you use Smalltalk or not. I was so pleased to read what he wrote about comments - he expressed clearly what I felt at the time with the code I was working with then.

Not all comments are bad

Sometimes comments are good.

Education, education, education

I think the over-use of comments (and "structured programming") comes from University computer science courses, so maybe I shouldn't be suprised by the prevalence of a belief that comments are good (and many developers, or their bosses, feel they "should" be writing more comments).

Blog Comments

From Malte Finsterwalder:

Hi Ivan,

I always remember a quote from the book "Code Complete" : "Don't comment
tricky code - rewrite it!"

Also an interesting observation: Heavily documented code contains more
errors!

Greetings,
Malte

Posted by ivan at 9:37 PM Copyright (c) 2004-2008 Ivan Moore

April 25, 2006

Programming in the small - warnings.

Following on from the previous article in the series, another "programming in the small" thing that annoys me is when warnings from the IDE are ignored.

This is more a matter of taste than previous articles - but it is based on reasoning rather than following a cargo cult.

The use of warnings

If you look at the warnings from your IDE, they sometimes indicate the chance to do the best refactoring - deleting code. Eclipse, for example, will warn of unused imports, unassigned fields, and uncalled private methods. These are warnings I want developers to act on - and delete unused code. Unused code is code you might accidentally read, keep compiling, or slow you down from seeing the code that's actually used.

Crying wolf

Sometimes IDEs provide warnings for things that aren't really a problem. The result of that is that in a lot of teams, developers just ignore all warnings, as the IDE has "cried wolf" too many times for warnings to be taken seriously any more.

I like to set the warnings of the IDE such that the ones I'm interested in are switched on, and the ones that cause "false positives" are switched off. I prefer to have no warnings at all, with some warnings that might sometimes be useful switched off, than to have more warnings but have to ignore some of them.

It's much easier to tell if a new warning has appeared when there are usually none. If you don't have the "false positive" warnings switched off (because you want some warning that is sometimes useful) the problem is that then you've got to remember which of the warnings you've got are the "false positive" ones, which can waste your time looking through them.

Don't check-in warnings

If the warnings are set such that there are no "false positives", then you know that if there are any warnings, you need to take notice of them, see what they are saying, and take action. Therefore, with warnings set appropriately, the team can follow a simple rule of not checking-in warnings - just like they wouldn't check-in compilation errors or code that failed the automated tests.

I want more warnings

If you have no warnings in your code, it makes it much more amenable to running the splendid "find bugs" Eclipse plug-in. If you start off with no warnings, then you can run "find bugs" and know that all of the warnings are "find bugs" ones. Although "find bugs" is really good - it can produce too many "false positives" for me to want to run it all the time or to have a "don't check in warnings" policy apply to "find bugs" warnings, so I only run and look at "find bugs" warnings occassionally.

I'll tell you what I want, what I really, really want.

It would be great if Eclipse (or other IDEs) (and "find bugs") allowed you to ignore specific warnings - not just types of warnings. Then you could have the warnings set higher, and ignore "false positives" (that you have checked manually once) so you don't have to switch "often useful but sometimes bogus" warnings off.

Posted by ivan at 7:20 AM Copyright (c) 2004-2008 Ivan Moore

March 31, 2006

Programming in the small - structured programming

Following on from the previous article in the series, another programming habit I see frequently is following the structured programming rule of having only one exit point from a function/method.

Cargo Cults

I've been prompted to write this article (which I was previously considering but thought might be seen too much a matter of taste rather than anything more definitive than that) by Dave Thomas's talk at SPA 2006. He explained the phenomina of Cargo Cults - which I won't fully describe here - but in a nutshell, it's doing something because it used to work but not considering whether it's still appropriate.

Dave mentioned structured programming - in particular the GOTO statement as a Cargo Cult. (I couldn't find much of a write-up of his talk - I might write more myself some time ...).

I think that the "single exit point" rule of structured programming is a Cargo Cult (but I don't think that the absence of GOTO statements in languages is).

Structured programming considered harmful

Consider the following function (and pretend the ternary operator isn't available):

	void foo(boolean condition) {
		int returnValue;
		if(condition){
			returnValue = 5;
		}else{
			returnValue = 6;
		}
		return returnValue;
	}

I've seen lots of code like this over the years. I'm sure it's because people have been taught at some point (and I think it's at University) that a function should only have one exit point, and so that's what they do.

The improved version - multiple exit points

Compare the code above with this version:

	void foo(boolean condition) {
		if(condition){
			return 5;
		}else{
			return 6;
		}
	}

It's slightly shorter, and I think, easier to understand.

In the "single exit point" version, the declaration of "returnValue" is unnecessary fluff that I have to read, that is less direct and clear than just doing the returns when you need them. Unless I read the whole function (in this case mercifully short) I don't know whether "returnValue" gets used for other things during the function.

Longer functions that use the "single exit point" rule can get really convoluted and difficult to understand. Having a function return as soon as it can improves the clarity because it improves the "locality" of the code. It's much the same philosophy as behind a previous article in the series - declaring variables as locally as possible.

At a return statement you know what the code is going to do next, you don't need to read through the rest of the function to work out that none of it's relevant for that path (well, there is an exception to that - a finally clause, but that's another article).

I've seen code that has become really convoluted as a result of trying to follow the "single exit point" rule and I just can't see any benefit.

I'll tell you what I want, what I really, really want.

What I'd like is for people to think about whether having a "single exit point" is a good thing rather than just believing the accepted wisdom and taking it as true. It's become such a strong "Cargo Cult" that it'll take many years until it disappears. To me it seems as unchallengable as the idea that water goes down a plug hole clockwise in one hemisphere and anti-clockwise in the other.

Posted by ivan at 7:51 AM Copyright (c) 2004-2008 Ivan Moore | Comments (10)

March 4, 2006

Programming in the small - access levels.

Following on from the previous article, another programming habit I see frequently is using access levels (e.g. in Java, private, package, protected and public) that are less restrictive than possible. For example, declaring a method public that could be private.

In this article, I'll call the access levels that are less restrictive as being "more public" or "less private" etc.

Declaring class members "too public".

Probably the most frequent example of this habit is that of declaring class members as protected when they could be private. I believe that people do this because of past experiences where they wanted to be able to override some method and found they couldn't - I'll address that later.

Something that I've noticed in my travels around many client sites is a suprisingly large number of Java developers who think protected means something different than it actually does (in fact, even that private means something different than it actually does - it's class based not instance based). I won't describe what these access levels actually mean - it's well documented.

The consequences

Class members with "less public" access levels can be referenced in fewer classes than members with "more public" access levels - that's the point of access levels. That means that when reading some code, if you see that it is declared "more public" you know instantly that there are more classes that might reference (hence more dependencies) than if it's "more private".

You might guess that, for example, if you change a public method, it'll probably have an effect on some other code (in a different class) because the chances are that there is some other code (in a different class) calling the method, because it's declared public. Therefore, before changing a public method, you might want to look at references to the method and check that callers of the method are OK with the changes you are making.

If a member is declared private, then you know that only methods of that class can be using the member, so you are free to refactor private methods by only looking at code in that class.

Therefore, if a class member is declared to be "more public" than it could be, you might be put off refactoring it because of possible effects on code in other classes, or you might waste time looking for references, when if it was private you might see immediately the only references.

You could, of course, look for all references to the member, but because it's not immediate you won't do that all the time.

What about my experience where I couldn't override something because it was declared private?

This experience is often a symptom of something else:

Lack of collective code ownership

If the code that you are dealing with is all in-house, then you should be able to make a member more public when you need to. That is, make everything as private as it can be for your current needs - making it more public only if you find you need to. This fails if there is rigid partitioning of a code base. Collective code ownership is needed for this to work well in practice.

Closed source libraries

If you are writing or using a closed source library, there may be times when members have to be declared more public in order to allow for future unforseen uses of the code.

Use of inheritance instead of composition for customisation

One of the reasons libraries may need members to be declared more public than possible is if the way the library is designed is for users to extend classes using inheritance. There are many cases where using composition would be a better design.

I'll tell you what I want, what I really, really want.

It would be great if Eclipse (or other IDEs) spotted that members could be more private than they have been declared and offer to change them for you. I'm sure something like this must exist - please post here telling me. Note that I want a tool to tell me immediately, not one that has to be run separately, because any lack of immediacy will mean it won't be as useful.

A few years ago I thought of writing a tool to do this that I was going to call "Thatcher" - it would privatize as much as possible. I haven't got around to it...

Posted by ivan at 2:49 PM Copyright (c) 2004-2008 Ivan Moore | Comments (2)

February 19, 2006

Programming in the small - variable declarations.

There are some programming habits that I see very frequently that lead to code that's more difficult to understand and maintain than necessary. I'll just cover one in this article and then maybe others in future articles.

Variable declarations in outer scope.

Consider the run method of Graph2Csv from the previous article. I often see code where variables are declared outside of the scope where they are used. For example:

	private void run() throws IOException {
		BufferedReader input = new BufferedReader(new InputStreamReader(System.in));
		String line = input.readLine();
		String[] parts;
		String source;
		String destination;
		while(line != null){
			if(line.contains(" -> ")){
				parts = line.split(" -> ");
				source = parts[0];
				destination = parts[1].substring(0,parts[1].length()-1);
				...

The variables parts, source, destination could have been declared inside the loop (as shown previously).

I think this habit comes from experience with some other languages, or earlier education, in which it was either necessary, or deemed good practice, to declare variables at the beginning of functions/procedures/methods.

The consequences

Delaring these variables outside the loop takes slightly more lines without being any clearer. You don't know, without reading through the whole method, whether the variables are accessed later in the method (if declared within the loop then, of course, they can't possibly be being used outside of the loop).

Extracting methods

It's not as obvious whether the code within the loop can be extracted as a method. Looking at this code, I'm wondering whether:

			if(line.contains(" -> ")){
				parts = line.split(" -> ");
				source = parts[0];
				destination = parts[1].substring(0,parts[1].length()-1);
				node(destination).incoming++;
				node(source).outgoing++;
			}

can be extracted as a method - maybe called "parseLine", but having the declarations outside of this scope makes me unsure. Fortunately, Eclipse is good enough to do the refactoring correctly, even if it doesn't do a fantastic job of it. You end up with the declarations of parts, source, destination being left behind but unused.

Where it gets really ugly

The declaration of variables in an outer scope isn't such a big deal in this example because the code is small and simple enough, but when combined with long methods, particularly if the variable is used later in the method for some other purpose (e.g. if there was another loop later in the method), it can be really difficult to see how to split a long method up into understandable parts.

I'll tell you what I want, what I really, really want.

It would be great if Eclipse (or other IDEs) spotted that these variable declarations could be put into an inner scope and offer to refactor the code to move the declarations for you (if you know of a tool or IDE that does this, please post a comment).

Posted by ivan at 11:07 AM Copyright (c) 2004-2008 Ivan Moore | Comments (9)