Effective Refactoring, Part 4
Rewriting the Code
March 3, 2018This is the final part of a four part series on effective refactoring.
- Part 1: Asking the Right Questions
- Part 2: Formulating a Plan
- Part 3: The Imperative Role of Tests
- Part 4: Rewriting the Code
You put together a plan, selected a task to work on, wrote the tests, and now you're ready to clean up some code. It's about time! You'll be tempted to open up a file and start changing variable names, or clean up the contents of a folder that have been driving you nuts. Before you start digging into the codebase, however, there are a few concepts you should familiarize yourself with to ensure the refactoring goes smoothly.
In the final part of this series, I'm going to cover some common pitfalls that you may encounter over the course of a rewrite. I'll also present techniques for expediting some of the more tedious tasks. I won't be covering the specific mechanics, definitions, or processes associated with refactoring; if you're interested in learning more about that topic, check out these great resources:
- Refactoring: Improving the Design of Existing Code by Martin Fowler
- Refactoring to Patterns by Joshua Kerievsky
- Clean Code: A Handbook of Agile Software Craftsmanship by Robert C. Martin
Identify Opportunities to Automate
There's a very good chance that you'll have some project tasks that will be tedious. One of the biggest culprits I encountered was moving files. For example, the application I refactored had the Redux actions, reducers, and selectors in their own separate folders.
One of my project tasks was to group the Redux files by module (e.g. appActions.js
,
appReducer.js
, and appSelectors.js
). I could have run a git mv
command to move /actions/app.js
to /redux/app/appActions.js
, then do the same for /reducers/app.js
and /selectors/app.js
, but
that only takes care of the app module.
There were 11 modules in this project, so I'd have to type out 33 git mv
commands just for the
Redux elements. You may have to run git mv
150 more times to get the React containers and
components in the right folder location!
You'll find that this situation can quickly become untenable. Rather than type out all those commands, I wrote a script to do it for me using JavaScript and Node.js:
const fs = require('fs');
const path = require('path');
const chalk = require('chalk');
const sh = require('shelljs');
const _ = require('lodash');
const sourcePath = path.resolve(process.cwd(), 'src');
// This is the new /src/redux folder that gets created:
const reduxPath = path.resolve(sourcePath, 'redux');
// I used "entities" instead of "modules", but they represent
// the same thing:
const entities = [
'app',
'projects',
'schedules',
'users',
];
const createReduxFolders = () => {
if (!fs.existsSync(reduxPath)) fs.mkdirSync(reduxPath);
// Code to create entities folders in /src/redux...
};
// Executes a `git mv` command (I omitted some additional code
// that validates if the file already exists for brevity).
const gitMoveFile = (sourcePath, targetPath) => {
console.log(chalk.cyan(`Moving ${sourcePath} to ${targetPath}`));
const command = `git mv ${sourcePath} ${targetPath}`;
sh.exec(command);
console.log(chalk.green('Move successful.'));
};
const moveReduxFiles = () => {
entities.forEach(entity => {
['actions', 'reducers', 'selectors'].forEach(reduxType => {
// Get the file associated with the specified entity for the
// specified reduxType, so the first file might be
// /src/actions/app.js:
const sourceFile = path.resolve(
sourcePath,
reduxType,
`${entity}.js`
);
if (fs.existsSync(sourceFile)) {
// Capitalize the reduxType to append to the file name
// (e.g. appActions.js):
const fileSuffix = _.capitalize(reduxType);
// Build the path to the target file, so this would be
// /src/redux/app/appActions.js:
const targetPath = `${reduxPath}/${entity}`;
const targetFile = `${targetPath}/${entity}${fileSuffix}.js`;
// Execute a `git mv` command for the file:
gitMoveFile(sourceFile, targetFile);
}
});
});
};
moveReduxFiles();
You can either go into the files and update the relative paths after running the script, or you can write a script to update the paths using string manipulation. Automation can get pretty fancy, but know when to draw the line: it wouldn't make sense to spend 20 hours writing a script to save one hour of manual work. Most code bases are unique and have specific structures, which means you probably won't be able to reuse these scripts.
Commit Often
Making significant changes to a file or set of files can break your app and cause tests to fail. This is to be expected. If you've made a bunch of changes and everything works, commit the changes. If you've only made a few minor updates and everything still works, commit the changes. Making commits is cheap. Here are some metrics for the app I refactored:
- 1,747 commits
- 378 closed pull requests
- 42,017 total lines of text across 169 Jest snapshot files
- 13,555 total lines of text across 181 test files
- 24,261 total lines of text across 2 fixture files (
responses.js
andstate.js
) - 16,685 total lines of text across 198 JavaScript files
- 3,823 total lines of text across 111 Sass files
That's 76,080 lines of code across 659 files, and the repository size is only 10MB. Make all the commits you want — git can handle it. You can always squash them later if you want to clean up your history.
I urge you to commit as frequently as possible; when refactoring a large app you'll very quickly realize how difficult it can be to track down small, breaking issues.
Imaging making changes to 10 files, verifying your app still works, and feeling generally happy with the code. You decide not to commit your changes, and after another half-hour of refactoring you break something. You can either spend 3 hours tracking down the bug or just discard all of your changes and start from scratch. Since you didn't commit the changes to those 10 files, you lost perfectly good code!
I'm not saying it's impossible to determine if you need to reset those files, but why waste time and energy when you know you can jump back to the last commit and go from there. Try to limit the changes on commits to only a few files at a time, and commit often, so you have reliable "save points" to fall back on.
Avoid Temptation
Avoid the urge to clean up code that's out of the scope of the task at hand. This is one of the
most difficult aspects of refactoring. Let's say you were working on task called Standardize
Redux Selector Names that involved updating the names of the selectors to ensure all of them
were prepended with the word select
. As a relatively simple task with low risk and well-defined
scope, it should go smoothly; you just need to update the selector names (along with any files
that reference these selectors) to accommodate the change (and don't forget about the tests!)
Let's say you come across a selector that's using Object.assign
,
and one of your future tasks is to update the code to utilize some newer ESNext features like
spread syntax.
You may be tempted to update that code, but don't!
It is imperative that you don't deviate from the current task. Even the most trivial change can
set you on the wrong path. Even though the concept of a slippery slope is
a logical fallacy, a less hyperbolic situation will often occur. You start changing Object.assign()
statements, and before you know it, 4 hours has elapsed, and you've only updated 2 selector names!
As I progressed through my refactoring tasks, I became much less susceptible to this type of
behavior. I added a // REFACTOR: Fix this later
comment in the code and checked my project to
ensure I had a task to cover the change, then I moved on.
Be Methodical with Pull Requests
Pull requests are an excellent tool in refactoring projects for a number of reasons, they:
- provide a record of changes that correspond with a project task,
- define a rollback point if new bugs are introduced,
- enable peers to review your code and look for potential bugs, and
- offer the opportunity to explain why you made certain changes
If you spend a large chunk of time cleaning up a section of the codebase, it can be difficult to step back and evaluate the quality of your changes. Pull requests give your colleagues the opportunity to review your changes and determine whether they add value and improve the existing code.
When you're submitting pull requests, be as descriptive as possible in the summary. Put together a template and include it in your repository. If you're using GitHub, this template will be pulled in automatically for new PRs. Try to include a title, high-level description, a section for important notes, and a bulleted list of changes. The list of changes doesn't have to be extensive and wordy, but it should cover important changes that reviewers should focus on.
One metric that you need to be cognizant of is the quantity of changes associated with a pull request. Try to limit the amount of changes to +/-500 lines of code. Adding Jest snapshot files can easily add thousands of lines to a pull request, so be sure to include a note about this in the summary. If you can't minimize the quantity of changes, strive to reduce the complexity. If you're just reordering import statements, exceeding 2,000 lines of code is acceptable as long as the changes are not overly complex.
Try to isolate that change in a single PR and ensure that you describe the nature of the change in the summary. It helps to include a bold note at the beginning of the summary with something like "No logic changes were made" so the reviewer has proper context.
By writing descriptive summaries in your pull requests and ensuring that the scope of changes corresponds to the project tasks, you'll make life much easier for reviewers. I'll admit that I wasn't always able to follow my own advice (I still owe some of my coworkers a lunch or 10), but I made sure I responded to inquiries and addressed concerns quickly.
Wrap Up
That's it for the refactoring series! I hope the collective nuggets of wisdom I presented will contribute to the success of your refactoring project. Refactoring isn't as simple as rearranging files or renaming functions in your codebase, but if you put together a solid plan, adhere to that plan, and update regularly then you'll be off to a good start. Having a robust testing infrastructure will give you the confidence to make changes without breaking any functionality.
In terms of execution, automate as much as you can, commit often, and avoid temptation. Pull requests are an excellent documentation and review tool, so ensure they're detailed and limited in scope. In case you were wondering, here are the results of my refactoring project:
- 40 tasks completed
- 12 tasks pending
- 1 Jira bug logged (directly related to the refactor)
- 1,473 tests written across 181 test suites (737 of which were snapshot tests)
- 108,450 lines added and 44,241 lines removed
- 38 pull requests
- 99.84% Statement coverage
- 95.07% Branch coverage
The lines added and removed don't really mean much, as most of them represented file moves,
snapshots, and the test fixture files. The high test coverage should be taken with a grain of
salt as well: I'm ignoring some chart components because it's difficult to test
the HTML <canvas>
.
I added several features and fixed miscellaneous bugs over the course of the refactoring, and having high test coverage caught several potential issues before they could be included in the release. Overall, I'd say the rewrite was definitely worth the effort.