Healthy Code

I work on a team of software developers that maintains several large codebases — too much code for any one person to easily know what’s going on in every part of it at any particular time. I found myself thinking a lot about how to keep the code healthy and a while ago I set my thoughts down as a list of good practices. Thanks to my coworkers at Endless for input, editing, and debate.

The good practices in this post differ slightly from the ones we adopted at work, which reflect the opinions of the whole team; these are worded to reflect my personal opinions.

Assumptions

I don’t like rules without a rationale. I believe these six assumptions underlie the rules that I set out below. That is, if you don’t agree with these assumptions then you probably won’t agree with the rules… ☺︎

  1. We can never know that our own code is correct.
  2. Left unchecked, we will believe our own code to be correct.
  3. Even small mistakes can lead to catastrophic data loss.
  4. Non-trivial programs have interconnections too complex to keep entirely in one person’s mind.
  5. Modifying non-trivial programs will break code unrelated to the modifications.
  6. The business value of maintainable code is only visible to developers.

Good practices for code health

Use your judgement

As always, rules apply only in the absence of any overriding reason to ignore them. Breaking them should be in mutual agreement between the writer of the code and the reviewer. (This system only works if everyone agrees about what the rules are in the first place, though.)

Example reason to break this rule: If no agreement can be reached, then the default is to follow the coding standards.

Review code

Code gets reviewed by a developer who didn’t write any part of it, because of assumptions #1, #2, and #3 — and to spread familiarity with different parts of the codebase throughout the team. Develop with ease of reading in mind, as if you are writing a letter to an unfamiliar code reviewer. Review code skeptically and with full attention, as if it came from a malicious agent out to erase your hard drive.

Example reason to break this rule: You are committing a trivial fix for a broken build and your continuous integration system acts as the code reviewer.

Observe the style

Code follows the coding style. Coding style is important because when code looks the same it’s quicker to read and errors jump out more easily. Apply automated tools when possible to save the code reviewer from becoming a parenthesis counter.

Example reason to break this rule: The code reviewer agrees with you that deviating from the style is more readable.

Test your code

Code needs automated tests. The rationale for this is assumptions #2 and #5, but could be the subject of an entire blog post itself. Lack of tests can be by itself a reason to fail code review, or at least start a dialogue between developer and reviewer about why tests are not necessary in this particular case.

Example reasons to break this rule: A one-off script. A component that proxies an external resource which can’t easily be mocked out.

Refactor on write

You will always have to deal with legacy code (code on which development has ceased but still must be maintained) and rushed code (code which you were forced by circumstances to check in that didn’t quite work well, works but is difficult to maintain, or is not tested.) By assumption #6, you will probably never set aside time to refactor code for its own sake. Therefore, refactor bit by bit to leave the code in a slightly better state each time it’s touched. In this way, code receives refactoring attention roughly proportional to the benefit you receive from refactoring it. If at all possible, add new code with a unit test even if the rest of the code is not written in a testable way.

Example reasons to break this rule: The code is already in good shape. The feature is critical and cannot be delayed. You are contributing your code to an open source project, in which case it is better to work with the upstream community to refactor.

Refactor only on write

Make your diffs per commit no larger than they have to be, in order to make code review easier. Since diffs go line-by-line, do not fix style errors in lines that that are not already being touched in the same commit. Use separate commits if there is an opportunity to make other style fixes.

Example reason to break this rule: If it makes more sense to fix lines other than the ones being edited in one shot (e.g. large sections with wrong indentation), do so throughout the whole file in a separate commit.

Pay down technical debt

Sometimes it’s not possible to build a feature without doing a large refactor first. Determine this as early as possible and include it in the time estimate for the feature. Do not shy away from paying down this debt; it will only compound if you borrow more on top of it. However, keep the changes incremental, and the functionality unimpeded while making these changes.

Example reason to break this rule: Extreme time constraints force you to take out a second mortgage on the code (even then, do this only with a healthy dose of disgust.)

Moving text messages between Android phones

I recently got a new Android phone secondhand, and after resetting it I wanted to move the text message archive over from my old phone. It turns out that you can do this easily if you have root access. Well, technically you can do anything easily if you have root access, but the trick is knowing how. I hope that by putting this out on the internet, other people will be able to know how too.

I had root access on both phones, as they were flashed with CyanogenMod. The new phone is a Nexus 4, and the old phone is an HTC G1 (Android 2.2 is the highest that could run on it.)

On both (and as far as I know, all) versions of Android, all the text messages are stored in this file, which you need root access to read:

/data/data/com.android.providers.telephony/databases/mmssms.db

Getting the file off the G1 was easy; I entered the Terminal Emulator app (I think it’s installed automatically when you flash CyanogenMod) and copied the file to the SD card:

su
cp /data/data/com.android.providers.telephony/databases/mmssms.db /sdcard/

(su requests superuser permissions, which you have to grant.) Then I connected the G1 to my computer with its USB cable and transferred the file off of it.

Getting the file onto the Nexus 4 was harder. What I did not know is that the Nexus 4 can’t mount its SD card as USB Mass Storage (see the explanation), so I ended up using my Apple laptop to do the transfer, and had to download a program called Android File Transfer. Still, I got it onto the phone’s SD card.

Since the newer version of Cyanogenmod comes with a file manager app, I decided to use that to put the file into the correct place, instead of Terminal Emulator (typing shell commands on a phone is no joke.) The file manager is set to “Safe mode” by default which means it won’t request root access. I changed it to “Prompt User mode” in the settings, then navigated to the above databases/ folder and made a backup copy of the old (empty? 100 KB? It’s a sqlite DB so maybe there are still deleted records in there, but I don’t care to check) database. Then I copied the G1’s mmssms.db file over top of it. Unlike on the G1, there was also a mmssms.db-journal file there, which I hoped wouldn’t mess with things…

I couldn’t see my text messages after going into the messaging app, but after rebooting the phone, they were there.

Geek tip: Malloc debugging on OSX

I’ve been trying to chase down an annoying bug that I suspected to be a case of using uninitialized memory. The problem is, it only shows up about 1 in 30 times (I was lucky to notice it in the first place), and never in a debugger.

Fortunately I found that there’s a library on OSX that tweaks malloc() to help you debug:

DYLD_INSERT_LIBRARIES=/usr/lib/libgmalloc.dylib MallocPreScribble=1 ./myprogram

Or, to do this in LLDB, since due to System Integrity Protection, your linker-affecting environment variables get wiped when you execute a system program:

lldb -- ./myprogram
env DYLD_INSERT_LIBRARIES=/usr/lib/libgmalloc.dylib
env MallocPreScribble=1
run

This triggered the bug every time, both inside and outside the debugger.

For more information about what you can do with libgmalloc, see this documentation. It only tells how to use that facility in Xcode, though, so the above instructions should help if you’re on the command line.

Wave at the camera

You have probably seen the fake advertisement for Wave, the new way of charging your iOS 8 phone in any standard household microwave. (Although I would venture that some of the responses with fried microwaves and phones are hoaxes as well.)

I admit I did giggle when I first read it — some chump microwaved their expensive phone and blew it up, funny, right? Only I realized that it’s not funny at all.

Why shouldn’t people believe that a new technology would allow them to charge their phone by microwaving it? It’s no more or less magical than any other new technology being invented every day. It just happens not to have been invented yet.

Yes, people need to think critically, check sources, use common sense, and become less science-illiterate. Is microwaving your phone a smart thing to do? No. Could the average person probably have known better? Yes. But if you are lucky enough to be in the minority for whom this is obvious, you don’t have any right to laugh at those for whom it is not.

Nature, why?!

Scientific journals charge subscription fees in order to access their content. If you’re an employed scientist, the university or company where you work usually buys an institution-wide subscription to a journal. In that case you don’t have to log in to the journal website because it recognizes your IP address as belonging to a subscribing institution. In fact, you don’t even get an account on the journal website, because it’s impractical to issue an account to every single user at a university, for example.

So what do you do when you have to look up something when you’re away from your office? You use SSH with port forwarding to connect to work, then visit the website using a proxy server on that port. Since you are now browsing through a work computer, you can read the journal. There’s nothing wrong with this, because your employer has already paid for your access that content, but the barrier was simply the impracticality of issuing you an individual account.

So it’s really strange that Nature Publishing Group, which publishes the overrated Nature family of journals, seems to want to discourage this practice. If you visit the site of a Nature journal from a non-subscriber IP address, they set a cookie in your browser that says you are not a subscriber. So even when you turn on your proxy server and revisit the site, it still tells you you’re not a subscriber and can’t access the journal article. Luckily, it is easily remedied by erasing your browser’s cookies. (Easily done, that is, but not easily thought of. Hope this helps someone.)

Why, Nature, why? Why would you do this? Do you have scientists’ best interests at heart and you want to prevent them from working at home? Or do you hope that people are gullible enough to pay twice for the same content?

Discretization, Part II

In this post I described how I encountered the Sell Your Science contest and was entirely fed up with how they perpetuate the myth that scientists are a bunch of timewasters and that marketable research is the only research worth doing. I wrote the organizers, Science Alliance, a letter and urged other people to do the same. Well, it took fewer letters than I expected for something to happen.

My coworker Jelmer Renema wrote them a more strongly worded e-mail than I did. Today he got a telephone call from someone from Science Alliance who wanted to talk about the e-mail. The outcome of the telephone call was that the Science Alliance employee said they didn’t mean that economic gain was the only valid reason for science; social relevance and curiosity from the public are important too. He admitted that the blurb could have been worded differently, although he claimed that there was a large group of scientists opposed to bringing research to market. No, Jelmer told him, nobody’s opposed to that — they’re opposed to the idea that marketable research is the only worthwhile research. In the end, Science Alliance promised to do better next year and Jelmer offered them his assistance in matters of science communication.

By coincidence, an interview appeared in the Delft University newspaper this week. Professor Piet Borst, former scientific director of the Dutch Cancer Institute, says that the whole ‘valorization’ business has gone too far and gets quite angry about it (translation mine):

“We are going about this in such an absurd way. There’s really no other way to put it. [The ministry of] Economic Affairs is living in the 1970s, they think like this: ‘Those wretched university researchers and other academics, busy only with their own hamfisted hobbies, we have to force them to do useful work, and we can only do that by making them dependent on industry financing. They need guidance from our watchful industrialists over what they do.’ They’re delusional. It’s a recipe for how to do it wrong.”

Note that this man isn’t one of those mythical ‘hermit scientists’ either: he says in the interview that those who do research with public money have a duty to allow their findings to be turned into products, which create jobs.

One other important point that Borst makes is that if you, as a researcher, have a significant stake in a spinoff company, then can you really be trusted to publish findings that will cause your shares to plummet? As the interviewer says in the article, “The answer is obvious once you’ve asked the question.”

Discretization is the better part of valorization

V is for Valorization. What’s that? A buzzword coined by the Dutch government that signifies how all scientific research should make money, and lots of it, sooner rather than later. It’s certainly not an English word, as evidenced by the quizzical looks on the faces of physicists who haven’t been working in the Netherlands lately, when some official government delegate gets to make a speech at a Dutch physics conference and says, beaming into the audience, “We are ferry heppy to see so much fellorizable research going on here!”

(UPDATE: Merlijn van Deen reports that valorisation is, in fact, a borrowing from French, where it is used in the same context of scientific research as in Dutch. In English, according to Wikipedia, it is used only as a translation of the German Verwertung, a technical term coined by Marx in Das Kapital meaning to add surplus value to capital by human action.)

I don’t fit the popular caricature of a scientist who thinks all research should be pure and untouched by worldly concerns. On the contrary, I have a Master’s degree in applied physics. One of my current projects is to build a new kind of wavefront sensor that works on a different principle than the commercially available ones. I’m firmly of the opinion that the original reason for this ‘valorization’ policy is quite sound: to get academia and industry interested enough in each other so that academia’s more marketable efforts get passed on to industry instead of dying the death of obscurity in a professor’s filing cabinet, and industry knocks on academia’s door when they have an interesting problem to solve with a longer time-to-market.

But it’s been blown all out of proportion now. The government has declared some research more valuable than other research: fields like high tech systems and energie (energy) are now designated topsectoren (top sectors,) research to which funds should be diverted at the expense of all other research. They are headed by topteams (top teams) each including a captain of science and captain of industry, which draw up innovatiecontracten (innovation contracts) that are required to hit each vertex of the gouden driehoek (golden triangle) of kennis, kunde, kassa (knowledge, expertise, and cash.) It will be successful in making the Netherlands #1 worldwide in the use of buzzwords, which I’ve italicized and translated (only where necessary, since half of them are in English anyway to make them sound more important.) If you read the actual documents, you get the feeling that the government is telling the big companies, “Hey! Want some cheap contract research? We’ve given those scientists free rein for too long and it’s time they worked for you to redeem themselves!”

The thing that spurred me out of lethargy was this, the Sell Your Science contest. You have to make a 90-second video about your research and the winner gets the title “Best Science Communicator of the Netherlands.” Sounds great. But it turns out that you literally have to sell your research: in the description, they treat ‘the audience’ and ‘investors’ as one and the same! I’m sorry, but science communication and sales pitches are two different things. Nothing wrong with a sales pitch contest, but at least call it by its rightful name!

Science crosses borders that politics doesn’t, so it may not have even occurred to their bureaucrat brains that they’re shutting out a large share of the scientists in the Netherlands, who are not Dutch and might not speak it well enough to read the rules of the contest which aren’t in English.

And this part really makes my blood boil (translation mine):

Nowadays, it’s not enough just to write scientific articles and to talk to people in your own field. A broader, open attitude towards society is expected, and valorization sections are required in NWO grant applications. The modern scientist will have to communicate differently and more widely in order to propagate their research.

I explain exactly why this makes my blood boil in the letter that I sent them on May 10. My own English translation is reproduced below. It’s been two weeks and I’ve received no reply. So I’m sharing it:

Dear Sir or Madam, (cc: editorial office of the Leiden University employee newsletter)

I read about the ‘Sell Your Science’ contest in Leiden University’s employee newsletter, and from there I clicked over to the website www.valorisatie.nu. My astonishment was boundless when I read there that this contest is failing to distinguish between the two entirely disparate concepts of ‘science communication’ and ‘science valorization.’ I would like to take a moment of your time to explain why I think this is wrong.

Science communication is, as you say, presenting research to a broad audience in a clear and understandable way. But is that the same as ‘valorization’? Only if one assumes that the broad audience is exclusively interested in marketable research. That is a dangerous fallacy.

The passion that drives a researcher to be good at science communication usually doesn’t spring from the commercialization of research. It’s likely that someone who’s motivated by commercialization won’t choose a career in research. These days, there are those who would rather deny that, but it’s a fact. The description of Sell Your Science, in which scientists are portrayed as hermits, only speaking to their fellow scientists and avoiding contact with society, and in which you say that the ‘modern’ scientist has to start doing things differently, feels like a slap in the face of my profession. There are countless scientists, both in the past and in modern times, who may not necessarily be oriented towards industry, but do stand 100% squarely in society. These people are marginalized by the tendentious introduction on the website. ‘Hermits’ may exist, it’s true, but they are a small minority.

Anyone that I’ve ever encountered who’s been good at communicating science, was able to captivate their audience using their dedication and passion, no matter what the economic value of the research was. Good science communication makes sure the audience has learned something by the time they leave. Good science communication fans the sparks of curiosity in the audience, so that someone, the day after or the day after that, might just hit upon the idea to ask “How does that work, anyway?” A scientist who can captivate an audience (apparently, a hostile one at that) with ‘unmarketable’ science and at the same time, manages to convey its importance despite its unmarketability, is a much better candidate for the title of “Best Science Communicator of the Netherlands” than someone who can sell ‘marketable’ science to investors. That’s the difference between ‘science communication’ and ‘science valorization.’

Sincerely,
Philip Chimento
PhD student, physics
Leiden University

Writing letters seems to have had an actual effect — read Part II.