Martin Odersky, the creator of Scala, is doing an online course on Coursera starting September 15th of this year (2014). It's an advanced course ment for those who already know a programming language like Java or C# although knowing languages like C/C++, Python, Javascript or Ruby will also work. Since I am actually in the process of reading Martin Ordersky excellent Scala book, this works out nicely for me. The first edition of that book is available for online reading too.
Scala is an attempt to create a language for the JVM that is completely compatible with existing Java code but pushes further than Java. Scala adds things like lambdas and function programming concepts while trying to address criticisms of Java like its verbosity. Scala is a language that has been picking up steam recently and it seems to be where the cool JVM people are hanging out (assuming it's possible to be cool with Java).
Ok, one more ridiculous video: is that a Mac Plus? Pff, Java doesn't run on that.
Wednesday, August 27, 2014
Tuesday, August 26, 2014
final Keyword in Java
I like the "final" keyword in Java. However, I'd like it more if every reference were final by default and "muttable" was a the keyword to create a mutable a reference. This way around is better because if you see "mutable" in a reference declaration then you know that the author took the time to do that because they are mutating the reference. Mutation is the case you want to watch out for and discourage. Today, if a reference isn't marked as final you don't know if that means it's mutated or if the programmer isn't using final. In my experience mutability is the rarity and final is the common case.
Unlike C++, Java doesn't have a way of making both the reference and object constant. In java the declaration.
public class Foo {
private final Date someDate = new Date();
// ....
Doesn't stop you from mutating the object like this:
someDate.setTime(1234);
It just stops you from changing the reference like this:
someDate = new Date(); // not allowed, referenced is final
Some programmers use the final keyword to mean that not only shouldn't the reference change but the object shouldn't either. Restricting final in this way is not a good idea. Basically, that's not what final means and if you do that you're not helping anything.
I am generally against using language keywords like final to mean something other than what the compiler can assert from them. Marking a reference final doesn't guarantee that the object won't change. You might as well add a comment to the declaration like this:
public class Foo {
// someDate is not mutated
private final Date someDate = new Date();
// ....
Because it has about the same chance of either being honoured or going out of date.
Additionally, knowing that the reference is final (not the object) is still very useful on its own. This is why C++ has the ability to make both the object and reference constant independently. By using final to mean the object and reference shouldn't change you confuse maintainers with an idiosyncratic style and destroy the utility of final-for-references-only.
If you want to create an object that doesn't change then create an immutable object. (see the link)
I have to say that I personally don't make function parameters final because it's too much bother. I've found it looks like noise in the code and other developers resent it. Instead I have the compiler warn me if there's any assignment to a method parameter, which accomplishes the same thing.
If you are using eclipse you can turn that on by going to Window -> Preferences -> Java -> Compiler -> Errors/Warnings and turning on the "Code Style -> Parameter Assignment" warning. Go ahead, make it an "Error" if you dare.
Marking an object's members as final is much more useful. I've always though of object members as mini-global variables. Global variables are bad but global constants are less so. When followed religiously, it allows the maintainer to see at a glance which references are being mutated or ideally that non are. When combined with extensive use of immutable objects it also allows you to quickly see what is being mutated if anything.
While I don't bother marking method parameters final, I try and mark every class and object member I can as final. I've gone so far as to re-write code to mark more members final. I find it helps me understand my code and code of others faster than if I didn't and avoid errors too.
Unlike C++, Java doesn't have a way of making both the reference and object constant. In java the declaration.
public class Foo {
private final Date someDate = new Date();
// ....
Doesn't stop you from mutating the object like this:
someDate.setTime(1234);
It just stops you from changing the reference like this:
someDate = new Date(); // not allowed, referenced is final
Some programmers use the final keyword to mean that not only shouldn't the reference change but the object shouldn't either. Restricting final in this way is not a good idea. Basically, that's not what final means and if you do that you're not helping anything.
I am generally against using language keywords like final to mean something other than what the compiler can assert from them. Marking a reference final doesn't guarantee that the object won't change. You might as well add a comment to the declaration like this:
public class Foo {
// someDate is not mutated
private final Date someDate = new Date();
// ....
Because it has about the same chance of either being honoured or going out of date.
Additionally, knowing that the reference is final (not the object) is still very useful on its own. This is why C++ has the ability to make both the object and reference constant independently. By using final to mean the object and reference shouldn't change you confuse maintainers with an idiosyncratic style and destroy the utility of final-for-references-only.
If you want to create an object that doesn't change then create an immutable object. (see the link)
I have to say that I personally don't make function parameters final because it's too much bother. I've found it looks like noise in the code and other developers resent it. Instead I have the compiler warn me if there's any assignment to a method parameter, which accomplishes the same thing.
If you are using eclipse you can turn that on by going to Window -> Preferences -> Java -> Compiler -> Errors/Warnings and turning on the "Code Style -> Parameter Assignment" warning. Go ahead, make it an "Error" if you dare.
Marking an object's members as final is much more useful. I've always though of object members as mini-global variables. Global variables are bad but global constants are less so. When followed religiously, it allows the maintainer to see at a glance which references are being mutated or ideally that non are. When combined with extensive use of immutable objects it also allows you to quickly see what is being mutated if anything.
While I don't bother marking method parameters final, I try and mark every class and object member I can as final. I've gone so far as to re-write code to mark more members final. I find it helps me understand my code and code of others faster than if I didn't and avoid errors too.
Friday, August 22, 2014
Evangelism in Software
"Don't worry about people stealing your ideas. If your ideas are any good, you'll have to ram them down people's throats." - Howard H. Aiken
Imagine that one day you're sitting in your office coding away and suddenly you hit upon this idea for a glorious new framework that allows you to not only solve the problem but also problems you know that others are working on. You write up your framework, make sure it's unit tested and put it in the libraries project for all to use and you're done.
Well, no you're not. While the framework might be available no one knows about it yet. So, no problem, you send an email to the developer's mailing list and for whatever reason, people still aren't using it. No problem, you've done enough code reviews to know that people need a bit of background knowledge when using the library so you create a presentation that explains why it's the library is awesome and present it to everyone. It seems to go well, but even then, no one uses it. What is going on? Everyone keeps doing it the long way. What is going on?
Insight is special because it must be earned with experience; long hours getting burnt doing silly things over and over again until you see the light of a better way. The more revolutionary the insight the more likely that people won't understand or appreciate it.
For example, back in the late 1950s, John Backus and a small team at IBM developed Fortan. The first high level language capable for being used on the limited computers at the time. It was designed to make the act of programming quicker and easier. It was an incredible technical accomplishment. It was also treated dismissively by many in the industry who though it didn't make programming easier at all or that they didn't need it because they already found programming easy. They preferred to stick to assembly.
This pattern repeats with structured programming, object oriented programming, revision control, garbage collectors, lambdas and even today with functional programming.
The worst thing about any revolutionary technology is it makes your knowledge obsolete. There's a cost to throwing out what you know and the more you know the more you need to throw out. If you take a look at the list of things above, they were all very hyped as silver bullets when they came out. Not all of them lived up to the hype... and these were the ideas that succeed. Writing frameworks is hard and very few are done well. It's hardly surprising people are sceptical.
In order to get any change accepted you need to:
- Buy in from those who will be affected. Change isn't always good and you need people to believe that your change is a change for the better.
- It must be understandable. The stepper the learning curve the more you have to work to convince people it will be worth the payoff.
- Must have a significant payoff. The advantages of your new framework should be easy to articulate; work on your elevator statement.
Once you've made your first converts things get easier because they will evangelize on your behalf. At least they should. If you can't even get the people who work with you in the same domain to take an interest you might not have the revolutionary new framework you think you do. In this case listen to the feedback they are giving. They might have some insight that will make your framework better.
I can tell you, creating a new API that other people actually use is very difficult. It's as much a political challenge as it is a technical one. That said, sometimes the results of all that effort can be revolutionary.
Wednesday, August 20, 2014
Coding Standards - not blank!
In answer to the question: "Where did the blog post on coding standards go?". Much like the one ring it's still secret and it's still safe; I haven't finished writing it yet. I'm not sure why that blank post showed up but I think that I accidentally clicked the publish button before I started writing. It does work on symbolic level, though. The worst coding standard is the blank coding standard.
Every large team should have a coding standard that defines how that code should be formatted. These coding standards make code quicker and easier to read because the style is consistent and once you learn it your brain can work via pattern recognition - that is reading code by processing its shape not its content. It's the same sort of thing as the brain does when it comes to reading normal text. That's what makes "italics" and ALL CAPS more difficult to read. Your brain has to work harder because it can't just match shapes, it has to do some decoding first.
I don't agree with the theory that one coding standard is any better than the other. The most readable coding standard is the most familiar. K&R, BSD, or Gnome coding standard is easy to read because it's familiar. The only point I'd make is it would be nice if everyone used the same coding standard. Come to think of it, why aren't C-like languages defined in such a way that limits the different conventions possible.. Like the position of the {}? I suppose the original reason was that C was intended to provide a flexible toolkit for creating as many arguments as possible. Using C and its derivatives, it's possible to create lengthy discussions that go on for hours without conclusion. It's also possible to create bitter team divisions as the brain's legacy limbic system switches on and forms tribes around different ideals. Hallelujah! and Amen.
:-)
Every large team should have a coding standard that defines how that code should be formatted. These coding standards make code quicker and easier to read because the style is consistent and once you learn it your brain can work via pattern recognition - that is reading code by processing its shape not its content. It's the same sort of thing as the brain does when it comes to reading normal text. That's what makes "italics" and ALL CAPS more difficult to read. Your brain has to work harder because it can't just match shapes, it has to do some decoding first.
I don't agree with the theory that one coding standard is any better than the other. The most readable coding standard is the most familiar. K&R, BSD, or Gnome coding standard is easy to read because it's familiar. The only point I'd make is it would be nice if everyone used the same coding standard. Come to think of it, why aren't C-like languages defined in such a way that limits the different conventions possible.. Like the position of the {}? I suppose the original reason was that C was intended to provide a flexible toolkit for creating as many arguments as possible. Using C and its derivatives, it's possible to create lengthy discussions that go on for hours without conclusion. It's also possible to create bitter team divisions as the brain's legacy limbic system switches on and forms tribes around different ideals. Hallelujah! and Amen.
:-)
Thursday, August 14, 2014
Using Multiple Monitors
I've been an advocate of multiple monitors even since I realized that productivity wasn't decadence. I have three monitors attached to my computer all of them are Dell U2211-H panels. The U2211-H panel has a 16:9 aspect ration which makes them a little too wide. Text is best read in long, thin columns so the extra width is overkill for that. Also, if you have three 16:9 screens you need a really wide desk. 4:3 is the perfect ratio for multiple computer screens. If you're working on text all day, a portrait oriented 4:3 is even better since you can fit more text on the screen that way. It's also close to the ratio of a standard sheet of 8.5 X 11.
The main problem with 4:3 screens is that they are very expensive new. The rumour is that most computer monitors these days are simply a repackaging of HDTVs. Even the 16:10 aspect ratio is getting hard to find and 16:10 is better than 16:9.
My three 16:9 monitors are laid out with the left most one in portrait mode and the other two in landscape. I put my code editor and email client on the portrait screen and use the two landscape monitors for everything else. Having a 16:9 monitor in portrait mode is comically tall but works really well for code and emails. Using a 16:9 monitor in portrait mode allows me to see 95 lines of code on the screen at once (with a nice big font!) without having to scroll. To get the same effect from a larger 16:9 monitor you'd have to almost double the display size.
I use my computer for both gaming and coding. Having the leftmost monitor in portrait mode is a comprise. It means that whenever I code I move my mouse and keyboard over to the left so that this screen becomes my primary screen. Then, once the day is over, I slide back over to use my centre, landscape monitor for games and web browsing. My monitors are actually on a pivot so I should try just pivoting the centre monitor while I'm working then pivoting it back when I'm done. Maybe I should put all three monitors in portrait mode, since the bulk of what I do involves reading code or text. I'll have to try that over the next few days and get back to you.
Recent Window management features added in Windows 7 make super large displays a second option to the multi-monitor setup. Windows 7 has the ability to snap a window to the left or right half of the screen. This makes managing two windows on the screen at once far easier. The keyboard shortcut for this is windows key-left or right arrow.
That's really the only difference between multiple monitors or one massive monitor - software support. With multiple monitors the OS knows you want to use each monitor as a separate region so that works already. If you're using one giant monitor the OS has no idea what to do. Do you want one big window or a bunch of regions or what? Back when I bought my three monitors I didn't know about the Windows 7 snap features so, I chose multiple monitors rather than rely on software that might not be there. Oh well.. I still love my three monitors.
The main problem with 4:3 screens is that they are very expensive new. The rumour is that most computer monitors these days are simply a repackaging of HDTVs. Even the 16:10 aspect ratio is getting hard to find and 16:10 is better than 16:9.
My three 16:9 monitors are laid out with the left most one in portrait mode and the other two in landscape. I put my code editor and email client on the portrait screen and use the two landscape monitors for everything else. Having a 16:9 monitor in portrait mode is comically tall but works really well for code and emails. Using a 16:9 monitor in portrait mode allows me to see 95 lines of code on the screen at once (with a nice big font!) without having to scroll. To get the same effect from a larger 16:9 monitor you'd have to almost double the display size.
I use my computer for both gaming and coding. Having the leftmost monitor in portrait mode is a comprise. It means that whenever I code I move my mouse and keyboard over to the left so that this screen becomes my primary screen. Then, once the day is over, I slide back over to use my centre, landscape monitor for games and web browsing. My monitors are actually on a pivot so I should try just pivoting the centre monitor while I'm working then pivoting it back when I'm done. Maybe I should put all three monitors in portrait mode, since the bulk of what I do involves reading code or text. I'll have to try that over the next few days and get back to you.
Recent Window management features added in Windows 7 make super large displays a second option to the multi-monitor setup. Windows 7 has the ability to snap a window to the left or right half of the screen. This makes managing two windows on the screen at once far easier. The keyboard shortcut for this is windows key-left or right arrow.
That's really the only difference between multiple monitors or one massive monitor - software support. With multiple monitors the OS knows you want to use each monitor as a separate region so that works already. If you're using one giant monitor the OS has no idea what to do. Do you want one big window or a bunch of regions or what? Back when I bought my three monitors I didn't know about the Windows 7 snap features so, I chose multiple monitors rather than rely on software that might not be there. Oh well.. I still love my three monitors.
Wednesday, August 13, 2014
Software Development Estimates
Programmer estimates are notoriously bad. Having been in the position to give estimates and to be blocked by another's faulty estimate, the whole process is deeply frustrating for everyone.
I've been trying to figure out what is going on with task estimation for a while. There are plenty of theories on the internet. Many say programmers are just too optimistic. That's undoubtedly part of it but in terms of a theory with predictive and explanatory power it's like saying that programmers underestimate tasks because their estimates are always too short.
A group of psychologists first proposed a theoretical basis for.. I feel I should say "optimism".. in a 1979 paper called "Intuitive prediction: biases and corrective procedures". This pattern of optimist task estimation has shown up in tax form completion, school work, origami and a bunch of other things. From this we can conclude that programmers don't suck at task estimation, human beings suck at task estimation. A condemnation of humanity but an encouraging result for those who still cling to the ridiculous notion that programmers are people. The phenomenon is called the "Planning Fallacy" and Wikipedia has a great summary if you're interested. I estimate it will take you 5 seconds to read so go ahead.
Optimistic estimates are bad enough, but organizations will often add estimate distortions of their own.
I've been on more projects than I can count and it didn't seem to matter who was on the project or how many times they've been through the process because at least one of these things happened. The last project I was on had the last three estimation pathologies show up at one point or another.
Once a project is behind schedule and you have a deadline to meet, all the options start to suck.
There have been many attempts to fix this estimate mess some more successful than others. The current best practice is to use some variant of agile/scrum. I say "some variant" because agile programming has many forms and not all of them are understood properly.
Agile software development turns the estimate problem on its head by admitting that estimates are likely to be wrong and that features will be added or removed so why not deal with it. The first thing agile does is to try and compute a fudge factor for the task estimates. The assumption is that the estimates are relatively accurate they are just off by some constant factor X. If you can figure out this factor you can multiply all estimates by X and get a more accurate number. In practice this helps but ins't a panacea.
The second thing Agile does is it seeks to minimize the problems bad estimates cause. This may seem like giving up because it is. As far as I can tell, no one in the software development industry has managed to solve this problem outside of some very special cases. The best strategy is to plan for bad estimates.
With agile, programming task estimates are re-calculated on a weekly basis so that estimates can include the latest information as its discovered. The focus is on keeping estimates up to date and communicating them instead of realizing a feature is too large when it's impossible to do anything about it. Additionally, features are done in order of importance; there's an ordered product backlog with the most critical things at the top of the pile. This way, developers know what is important and what isn't and can work with that in mind. When the deadline comes around you're guaranteed to have the important things done instead of whatever was fun to program.
It's way too hard to give a good summary of Agile here so I'm going to point to some resources:
Developers need a feedback loop for their estimates. At some point after the feature has been implemented (this includes debugging!), developers should get feedback as to the original estimate and the actual time taken. Most agile tools will try to capture that but it might miss important aspects like design time or the time it took to gather requirements. In any case, this information should be explicitly presented to everyone on the team (including managers). Developers rarely consider how long - in real time - things are taking or how it relates to their estimates. Compiling and presenting the numbers to the team at the end of a project, when they are likely to be receptive, is enough to start this thinking process. It also communicates that the organization cares about the accuracy of estimates. If you're curious about why something took so long, this is a good place to have that discussion.
Out of the many projects I have been on, none of them have outright failed despite the usual optimistic estimates and estimation pathologies. This is because I have always worked within organizations that that were flexible enough to work with bad estimates. I am completely convinced that protecting your project against bad estimates is a realistic approach to managing estimate risk. That said, better estimates are always welcome so watch out for organizational estimation pathologies and make sure the developers realize how long their tasks are actually taking vs their estimates.
I've been trying to figure out what is going on with task estimation for a while. There are plenty of theories on the internet. Many say programmers are just too optimistic. That's undoubtedly part of it but in terms of a theory with predictive and explanatory power it's like saying that programmers underestimate tasks because their estimates are always too short.
A group of psychologists first proposed a theoretical basis for.. I feel I should say "optimism".. in a 1979 paper called "Intuitive prediction: biases and corrective procedures". This pattern of optimist task estimation has shown up in tax form completion, school work, origami and a bunch of other things. From this we can conclude that programmers don't suck at task estimation, human beings suck at task estimation. A condemnation of humanity but an encouraging result for those who still cling to the ridiculous notion that programmers are people. The phenomenon is called the "Planning Fallacy" and Wikipedia has a great summary if you're interested. I estimate it will take you 5 seconds to read so go ahead.
Optimistic estimates are bad enough, but organizations will often add estimate distortions of their own.
- Fabricate a schedule without consulting anyone who will be taking part of the project.
- Ask developers for estimates and then cut them down because they are too high or those wacky developers are "padding" their estimates.
- Ask for estimates from developers without there being a detailed plan to estimate.
- Use estimates for a plan that's not being used anymore. Sometimes this happens because of additional features were added. Sometimes it's because the team you are integrating with hit a technical wall and had to change everything. Sometimes people are still using the pre-project estimates when the task estimates are available.
- Ask a lead developer for an estimate then give the job to an intern.
I've been on more projects than I can count and it didn't seem to matter who was on the project or how many times they've been through the process because at least one of these things happened. The last project I was on had the last three estimation pathologies show up at one point or another.
Once a project is behind schedule and you have a deadline to meet, all the options start to suck.
- You can't add more people to a project to make it go faster because "Adding more people to a late project makes it later." - Brooks's law.
- If you cut scope (which is normally the best option) then you might not deliver what you need to.
- If you put pressure on the team to deliver functionality you'll get a buggy mess that demos OK but you can't give that to anyone. Once you're in this situation it takes forever to get stability back.
- If you insist on longer hours nothing much happens because productivity per hour drops off so quickly it compensates for hours worked. Also quality of life goes to hell.
- You can't do nothing because you'll get a Death March.
There have been many attempts to fix this estimate mess some more successful than others. The current best practice is to use some variant of agile/scrum. I say "some variant" because agile programming has many forms and not all of them are understood properly.
Agile software development turns the estimate problem on its head by admitting that estimates are likely to be wrong and that features will be added or removed so why not deal with it. The first thing agile does is to try and compute a fudge factor for the task estimates. The assumption is that the estimates are relatively accurate they are just off by some constant factor X. If you can figure out this factor you can multiply all estimates by X and get a more accurate number. In practice this helps but ins't a panacea.
The second thing Agile does is it seeks to minimize the problems bad estimates cause. This may seem like giving up because it is. As far as I can tell, no one in the software development industry has managed to solve this problem outside of some very special cases. The best strategy is to plan for bad estimates.
With agile, programming task estimates are re-calculated on a weekly basis so that estimates can include the latest information as its discovered. The focus is on keeping estimates up to date and communicating them instead of realizing a feature is too large when it's impossible to do anything about it. Additionally, features are done in order of importance; there's an ordered product backlog with the most critical things at the top of the pile. This way, developers know what is important and what isn't and can work with that in mind. When the deadline comes around you're guaranteed to have the important things done instead of whatever was fun to program.
It's way too hard to give a good summary of Agile here so I'm going to point to some resources:
- Wikipedia's Agile Software Development page is a good starting point.
- You can also look at Scrum. Most industry standard agile best practices are some variant of Scrum.
- Joel Spolsky advocates his version which he calls Evidence Based Scheduling. I wouldn't normally include this but the page has a good explanation of where typical task estimation goes wrong.
- There are quite a few consulting groups that can help too.
Developers need a feedback loop for their estimates. At some point after the feature has been implemented (this includes debugging!), developers should get feedback as to the original estimate and the actual time taken. Most agile tools will try to capture that but it might miss important aspects like design time or the time it took to gather requirements. In any case, this information should be explicitly presented to everyone on the team (including managers). Developers rarely consider how long - in real time - things are taking or how it relates to their estimates. Compiling and presenting the numbers to the team at the end of a project, when they are likely to be receptive, is enough to start this thinking process. It also communicates that the organization cares about the accuracy of estimates. If you're curious about why something took so long, this is a good place to have that discussion.
Out of the many projects I have been on, none of them have outright failed despite the usual optimistic estimates and estimation pathologies. This is because I have always worked within organizations that that were flexible enough to work with bad estimates. I am completely convinced that protecting your project against bad estimates is a realistic approach to managing estimate risk. That said, better estimates are always welcome so watch out for organizational estimation pathologies and make sure the developers realize how long their tasks are actually taking vs their estimates.
Thursday, August 7, 2014
Garbage Collectors
Hurray for garbage collectors!
There's a been quite a lot of work put into garbage collectors these last 10 years and they have gotten a great deal better. They now pause less and are more efficient then their forbearer and a good thing to. Common wisdom has it that there's a 2X productivity different between managing memory yourself and using a garbage collector system so it's a good thing they're more usable now.
I've been watching a video about modern garbage collectors online. It's called "Understanding Java Garbage Collection and what you can do about it". Take a look.
There's a been quite a lot of work put into garbage collectors these last 10 years and they have gotten a great deal better. They now pause less and are more efficient then their forbearer and a good thing to. Common wisdom has it that there's a 2X productivity different between managing memory yourself and using a garbage collector system so it's a good thing they're more usable now.
I've been watching a video about modern garbage collectors online. It's called "Understanding Java Garbage Collection and what you can do about it". Take a look.
Tuesday, August 5, 2014
Why do You Always Seem to be Refactoring?
Wikipedia defines refactoring as:
When you first start a brand new project there's tons of new code to write. There's the game engine and the code that talks to the website and then there's the website code itself. However, as the project continues you will find yourself transitioning from writing brand new things to reusing existing things in a new way.
What has happened is that over the years your project has built up a toolkit for dealing with problems in your domain. You don't need a "users" database because you already have one. Similarly you don't need a messaging system because one already exists. If you project has gone on long enough it even has an email client in there somewhere. The project becomes more about refactoring existing code to do new things and less about adding new code.
Why build when you can reuse? As Joel pointed out in his now classic article, re-writing something is surprisingly hard. I'll let him explain it:
Re-writing something is very time consuming with the added problems of violating "don't repeat yourself". These reasons are the reasons why you refactor a lot on a long-lived project.
Before I go I want to point out how this fits into the previous article on Technical Debt.
On a long lived project, a new feature will typically be implemented by changing the existing code base. All aspects of the existing codebase that aren't compatible with the new direction become technical debt. You refactor the code to get rid of the technical debt and magically you have the new feature with barely any new code. This last part can confuse people because refactoring isn't supposed to change external behaviour and it doesn't here either. What's changing is the software's organization. You're taking a design that was never intended to have the new feature and turning it into a design that expects the new feature. Once you do that, the feature can sometimes be implemented in just a handful of lines of code.
If you're curious about whether what you're doing is the sort of refactoring I'm talking about then read this article by Steve Rowe about when to refactor.
Until next time here is a picture of a bunny:
Code refactoring is the process of restructuring existing computer code – changing the factoring – without changing its external behavior. Refactoring improves nonfunctional attributes of the software. Advantages include improved code readability and reduced complexity to improve source code maintainability, and create a more expressive internal architecture or object model to improve extensibility... and this definition is good but it doesn't capture how refactoring fits into the overall picture of software development on a long-lived project. On a software project that has been going for a while most of what developers do is refactoring.
When you first start a brand new project there's tons of new code to write. There's the game engine and the code that talks to the website and then there's the website code itself. However, as the project continues you will find yourself transitioning from writing brand new things to reusing existing things in a new way.
What has happened is that over the years your project has built up a toolkit for dealing with problems in your domain. You don't need a "users" database because you already have one. Similarly you don't need a messaging system because one already exists. If you project has gone on long enough it even has an email client in there somewhere. The project becomes more about refactoring existing code to do new things and less about adding new code.
Why build when you can reuse? As Joel pointed out in his now classic article, re-writing something is surprisingly hard. I'll let him explain it:
Yes, I know, it's just a simple function to display a window, but it has grown little hairs and stuff on it and nobody knows why. Well, I'll tell you why: those are bug fixes. One of them fixes that bug that Nancy had when she tried to install the thing on a computer that didn't have Internet Explorer. Another one fixes that bug that occurs in low memory conditions. Another one fixes that bug that occurred when the file is on a floppy disk and the user yanks out the disk in the middle. That LoadLibrary call is ugly but it makes the code work on old versions of Windows 95.Each of these bugs took weeks of real-world usage before they were found. The programmer might have spent a couple of days reproducing the bug in the lab and fixing it. If it's like a lot of bugs, the fix might be one line of code, or it might even be a couple of characters, but a lot of work and time went into those two characters.When you throw away code and start from scratch, you are throwing away all that knowledge. All those collected bug fixes. Years of programming work.
Re-writing something is very time consuming with the added problems of violating "don't repeat yourself". These reasons are the reasons why you refactor a lot on a long-lived project.
Before I go I want to point out how this fits into the previous article on Technical Debt.
On a long lived project, a new feature will typically be implemented by changing the existing code base. All aspects of the existing codebase that aren't compatible with the new direction become technical debt. You refactor the code to get rid of the technical debt and magically you have the new feature with barely any new code. This last part can confuse people because refactoring isn't supposed to change external behaviour and it doesn't here either. What's changing is the software's organization. You're taking a design that was never intended to have the new feature and turning it into a design that expects the new feature. Once you do that, the feature can sometimes be implemented in just a handful of lines of code.
If you're curious about whether what you're doing is the sort of refactoring I'm talking about then read this article by Steve Rowe about when to refactor.
Until next time here is a picture of a bunny:
Monday, August 4, 2014
Technical Debt
I'd like to take today's soapbox to explain the concept of technical debt.
During the development of an application a developer faces many decisions where he can choose to do what's right for the long term or what is expedient for the short term. For example, a programmer might fix a bug in a way that is quick to do but difficult to understand or he might re-write the code to fix the bug and also keep the code easy to understand. A typical developer makes these decisions multiple times per day. (I hope you trust your developers!) The path your developers choose determines things like how easy it is to add a feature, whether anything breaks when you do and how long the changes take to stabilize.
Every time a developers chooses to do things the quick way he slows himself (and everyone else) down in the future. The thing is, the harder he pushes the quicker things get done in the short term but the longer things take in the future. Over time he will be working very hard, taking all the shortcuts he can and not advancing at all.
Aspects of legacy code that slow you down are called technical debt. It works like ordinary, monetary debt. You can take shortcuts and spend more than you have for a while but if you keep doing it the interest will kill you.
Every project needs to hurry at some point. That's perfectly normal and it should be possibly to take on technical debt for a short while. However, if you keep doing that your software project will eventually stall; you'll reach project bankruptcy.
Technical debt is usually accrued by taking shortcuts, but this is not the only way. You can also get technical debt by changing the requirements. Every time you add a feature or otherwise change the behaviour of a piece of existing software it requires changing how that software works. If many changes are required then those required changes become technical debt.
On large projects, relatively simple changes can cause a cascade of modifications across the whole software project. For example, a new feature might require a change on the backend, which might require a change to the database, which might require a change to the database update script. A seemingly small change might cause a crisis level change by the time it hits the database. For example, supporting a characters with accents not expressible in CP-1253 would require upgrading the database to UTF-8.. which might not be easy if your database engine is very old. Suddenly a system that has worked fine for years becomes a big blob of technical debt.
On these large, long running projects, the hardest part of any enhancement is integrating with all the stuff that's already there. Not only that, but changing one thing triggers a cascade of changes everywhere. It's not an exaggeration to say that many large software projects tend to get stuck in the mud due to bad technical debt management. The way to avoid this is to have a plan to address at the technical debt pain points. (I've stolen this from Wikipedia):
Have you had this experience recently?
During the development of an application a developer faces many decisions where he can choose to do what's right for the long term or what is expedient for the short term. For example, a programmer might fix a bug in a way that is quick to do but difficult to understand or he might re-write the code to fix the bug and also keep the code easy to understand. A typical developer makes these decisions multiple times per day. (I hope you trust your developers!) The path your developers choose determines things like how easy it is to add a feature, whether anything breaks when you do and how long the changes take to stabilize.
Every time a developers chooses to do things the quick way he slows himself (and everyone else) down in the future. The thing is, the harder he pushes the quicker things get done in the short term but the longer things take in the future. Over time he will be working very hard, taking all the shortcuts he can and not advancing at all.
Aspects of legacy code that slow you down are called technical debt. It works like ordinary, monetary debt. You can take shortcuts and spend more than you have for a while but if you keep doing it the interest will kill you.
Every project needs to hurry at some point. That's perfectly normal and it should be possibly to take on technical debt for a short while. However, if you keep doing that your software project will eventually stall; you'll reach project bankruptcy.
Technical debt is usually accrued by taking shortcuts, but this is not the only way. You can also get technical debt by changing the requirements. Every time you add a feature or otherwise change the behaviour of a piece of existing software it requires changing how that software works. If many changes are required then those required changes become technical debt.
On large projects, relatively simple changes can cause a cascade of modifications across the whole software project. For example, a new feature might require a change on the backend, which might require a change to the database, which might require a change to the database update script. A seemingly small change might cause a crisis level change by the time it hits the database. For example, supporting a characters with accents not expressible in CP-1253 would require upgrading the database to UTF-8.. which might not be easy if your database engine is very old. Suddenly a system that has worked fine for years becomes a big blob of technical debt.
On these large, long running projects, the hardest part of any enhancement is integrating with all the stuff that's already there. Not only that, but changing one thing triggers a cascade of changes everywhere. It's not an exaggeration to say that many large software projects tend to get stuck in the mud due to bad technical debt management. The way to avoid this is to have a plan to address at the technical debt pain points. (I've stolen this from Wikipedia):
Causes for technical debt
Business pressures, where the business considers getting something released sooner before all of the necessary changes are complete, builds up technical debt comprising those uncompleted changes. Lack of process or understanding, where businesses are blind to the concept of technical debt, and make decisions without considering the implications. Lack of building loosely coupled components, where functions are not modular, the software is not flexible enough to adapt to changes in business needs. Lack of test suite, which encourages quick and risky band-aids to fix bugs. Lack of documentation, where code is created without necessary supporting documentation. That work to create the supporting documentation represents a debt that must be paid. Lack of collaboration, where knowledge isn't shared around the organization and business efficiency suffers, or junior developers are not properly mentored Parallel development at the same time on two or more branches can cause the buildup of technical debt because of the work that will eventually be required to merge the changes into a single source base. The more changes that are done in isolation, the more debt that is piled up. Delayed refactoring – As the requirements for a project evolve, it may become clear that parts of the code have become unwieldy and must be refactored in order to support future requirements. The longer that refactoring is delayed, and the more code is written to use the current form, the more debt that piles up that must be paid at the time the refactoring is finally done. Lack of knowledge, when the developer simply doesn't know how to write elegant code
Have you had this experience recently?
Friday, August 1, 2014
Complex Logic in Unit Tests
There is a rule somewhere in the land of best practices that says you should avoid putting logic in Unit tests. For example, if you were making a suite of unit tests that had some common logic then you wouldn't create a method to reuse code because that would lead to the test becoming harder to read and understand.
The idea is that unit tests should read like a behavioural specification. The test should be so simple and clear that it tells the reader exactly how the function should behave without having to poke around inside different functions or crank a for loop mentally to guess what happens.
Basically,
The idea is that unit tests should read like a behavioural specification. The test should be so simple and clear that it tells the reader exactly how the function should behave without having to poke around inside different functions or crank a for loop mentally to guess what happens.
Basically,
- Don't worry so much about Don't Repeat Yourself because when you put things into variables, constants or functions you make people hunt down these pieces of code and it makes the test harder to read.
- Don't use fancy flow control like loops if you can possible avoid it because it means the user has to work out what the flow control is doing and that hurts readability.
I've tried this approach for several years and I can tell you that's it's all crap. The ultimate result is an unmaintainable mess.
I feel I've heard these unit test arguments before. Somewhere in our codebase there's a pile of older C/C++ code. It contains functions that are very long; pages and pages of source code per function. I remember this coding style being considered best practice by some, not just for performance but because it made the code easier to read precisely for the same reasons that are now being used for unit tests. In retrospect, it doesn't make anything easier to read and it makes code really hard to maintain.
Code is code. The reasoning behind Don't Repeat Yourself doesn't magically get suspended because you're writing a unit test. Whenever you're writing code you're constantly building abstractions that help you to solve the problem you're working on. Things like Don't Repeat Yourself allow you to find and leverage these cases of code reuse into tiny frameworks. Unit tests are no exception to this. If your code is maintainable and well factored it will be intrinsically readable because you'll have 5 well chosen lines of code instead of a page of boilerplate.
Here's an article about not putting logic in tests from Google's testing blog. In this article the example the author chooses is the following:
@Test public void shouldNavigateToPhotosPage() { String baseUrl = "http://plus.google.com/"; Navigator nav = new Navigator(baseUrl); nav.goToPhotosPage(); assertEquals(baseUrl + "/u/0/photos", nav.getCurrentUrl()); }
In this test there is an error. Can you spot it? Ok, let's write the test without the code reuse:
@Test public void shouldNavigateToPhotosPage() { Navigator nav = new Navigator("http://plus.google.com/"); nav.goToPhotosPage(); assertEquals("http://plus.google.com//u/0/photos", nav.getCurrentUrl()); // Oops! }
The error is now obvious. Hurrah for no logic in tests!
However, what if you didn't spot it? What if the error was "photso" instead of a double slash? I don't know about you, but there are a bunch of words I always type wrong. Things like spelling height as "hieght" or "heigth". What if the URL was http://www.heigth.com/? In that case I would be testing a typo... but only sometimes.... If you're not reusing then you're duplicating so this error is going to be in a bunch of places and you have to find and fix them all.
For that matter, aren't there libraries to append a path on a base url already? Modifying example #1 is easier than example #2 because in the first case we know that the base URL is common text because it's explicitly marked and check by the compiler right there in the code. In the second example, how do we know what the text in assertEquals() is supposed to represent without doing a tedious, manual character by character comparison?
If logic in unit tests has you concerned about making errors then don't worry. You'll make plenty of errors no matter how you do things. In one case it's a single error in one part of the code has to be code reviewed once and fixed once. If you repeat yourself it's like the 1970s and code is duplicated all over the place with slight differences it has to be fixed many times and code reviewed until your eyes bleed. If logic in unit tests really bothers you, make your complex logic into re-usable functions and unit tests that. Yes, it's a little "meta" but at least you're being honest about the complexity instead of copying around madness without worry.
To reiterate, best practices like "Don't Repeat Yourself" came into existing to make reading and maintaining all code easier. I can tell you from personal experience that ignoring it results in difficult to maintain tests.
What you really need to do is make what you're doing in your test as clear as possible - don't hide your function's inputs and outputs. I can't go into detail here but here's what the test would look like if I wrote it:
@Test public void shouldNavigateToPhotosPage() { mNav.goToPhotosPage(); // mNav is a member defined in the fixture assertEquals(BASE_URL.withPath("/u/0/photos").toUrlString(), mNav.getCurrentUrl()); }
In the example above I've created a test fixture (not displayed). An mNav member variable is built in the test fixture's setUp(). The mNav is always initialized with a base URL. I've made a BASE_URL object using an existing library. In this library BASE_URL is immutable but has several methods that return a mutated (also immutable) object.
The net result is you can see that calling goToPhotosPage() should go to the "/u/0/photos" path.
If mNav had a getBaseUrl() and it was tested elsewhere then you might use that instead of BASE_URL constant since it's more explicit about what the expectation is.
Don't worry about logic in unit tests - that is a red herring -worry about making what you're testing clear to the reader.
Subscribe to:
Posts (Atom)