For part of my sophomore year of college, I was a computer science major. When I realized that I loved my CS theory courses while my classmates hated them, I decide to major in math instead. I enjoyed the programming classes enough, but programming is not what I wanted to spend my time doing.
The summer after my junior year, I was accepted to a math REU at Rochester Institute of Technology. The first thing my adviser Stanislaw Radziszowski asked me was whether or not I could program! I spent the whole summer programming combinatorial graph theory-related algorithms in C.1
Now I, like many of my operations research classmates, spend much of my time programming. Despite the importance of writing code for solving operations research problems, I am surprised how little programming is discussed. The admissions page for my program says nothing about programming ability, but it is implicitly assumed that programming is a skill that students have.
Moreover, I suspect the operations research-specific parts of the research behind many journal articles is only a fraction of the actual work done by the authors. Much of the required work is implementation and debugging of their algorithms. Yet, articles contain little-to-no discussion of the actual code. Even worse, the code is often not published or reviewed. I can only imagine how many coding errors underly the results of peer-reviewed papers.
Marc Kuo recently blogged about how operations researchers need to get with the program (pun intended). His post kicked of tons of discussion in its comments, on Google+, on Hacker News, and on OR-Exchange.
This discussion came at a good time for me. I’m in the middle of my first big coding project of my PhD research. Despite completing a computer science minor and spending two summers doing nothing but coding, I never learned good software engineering practices. I decided at the beginning of the summer to force myself not to just write this code to get the job done but to write good code.
To start, I finally started using git and github for version control. I have tried several times before, but I have always found it rather confusing.2 This git tutorial finally got me over the hump. Now I can easily branch my code into different versions, and I have the ability to go back to old versions when I screw something up.
Second, I started teaching myself about unit testing. Code testing was never mentioned in any of my classes in college, and I never hear operations researchers talk about it. Again, I have no doubt that the code behind much published work is full of mistakes. Operations researchers need good testing practices?3
Third, I’m trying to write clean, object-oriented, well-commented code. My intention is to publish this code on github when the corresponding paper is published. I want my results to be easily reproducible by others and open to scrutiny. I would also like my code to be reusable for future research. My design patterns might not be quite there yet, but I’m trying to move in that direction.
I realized that I’ve used the word I as much as Stephen Wolfram blog post. I have no desire to toot my own horn here; I’m just thankful this conversation is happening, and I want to continue it. Good software is crucial to good operations research (both in the academy and out), and yet academic operations researchers, in my experience, talk very little about good software engineering practices. We can do better.
- I’m eternally indebted to my brilliant research partner Evan who taught me how to use bash, vim, and subversion, among other things. [↩]
- I feel vindicated by a recent thread on Hacker News. [↩]
- Incidentally, here’s an interesting Quora thread about testing stochastic algorithms. [↩]
Congrats on staring to use GitHub and unit testing. Very nice
GitHub is great, it’s by far the best way to share and contribute to code. And unit testing is crucial.
“My intention is to publish this code on github when the corresponding paper is published. I want my results to be easily reproducible by others and open to scrutiny.” Doesn’t it take months to publish a paper? Will you keep your codebase untouched after the paper is submitted, or will you evolve the codebase for your next paper?
What stimulates more scientific collaboration? 1) Sharing the code months after the experiments are done (and published), commonly known as a code dump. Or 2) Sharing all code as it is written from day 1 and blogging about the experimentation results as you conduct them.
Open Science Manifest.
Geoffrey – I appreciate the feedback. In theory, I love #2. In practice, at least for now, I need this research to get a PhD. I can’t share it too quickly, even for the good of science.
Even if I do continue to work on the code, my plan is to leave a branch in place that was the code corresponding to a paper.
In understand, the PhD system (and more generally the academic system), don’t allow for #2.
Having one foot in each camp, I think we can adopt both strategies. “Major” results get turned into journal articles, and the code is posted after the article is published (or concurrent with publication). “Minor” results (too small to be worth a paper) get blogged, and the code is published on the fly. The dividing line between “major” and “minor” is partly the author’s judgment, and may move over time. (When you’re trying for tenure, everything is “major” to the author; not necessarily so to the journal.)
Here’s the dilemma:
A) It is hard to tell if a minor result will lead to a major result. By sharing the minor result, you could possibly enable others to find the major result (before you do). Even worse, sharing a major result, might enable others to publish it before you do. So the default is to keep everything hidden.
B) On the other hand, sharing minor and major results around in the community immediately, exposes you to discussions, critique and ideas that help you find other new minor and major results. The more open the community is (sharing results fast), the better this works for all in the community and the faster science evolves.
I follow track B, but I don’t work for an academic institution. I do hope more academics start following track B too.
It’s just a tad ironic that the automation medium (computation) isn’t itself automated. That a person should have to jump through the hoops you describe here simply to make software production less of an arbitrary and opaque process means that computing is attractive to those who like it this way and can’t imagine ways to automate the automator. We use text editors to write code! Insane. Meanwhile the computer just sits there dumb to anything we intend or anything anyone else has done. Trillions of lost computational cycles every minute. The greatest crime of lost potential ever committed. And it is a crime repeated across the world on billions of computers all day every day. Shovels could be forgiven for not knowing anything about the ditch they are used to dig. Computers? It’s nuts.
Nice post. my experience is right along those lines: it feels like I spend 90% of my time implementing and testing and 5% of my time actually coming up with what I’m implementing! (the last 5% is “other”). Then once I have a workable, finished product, I hate looking at it and would have liked to somewhat about the approach! It’s aggravating because the mathematician in me feels that computer code should always look elegant and fit neatly together.
I feel your pain Brian, I had the exact same frustrations! Hehe, I decided to just invest time to learn and do it right, just like Tim. Here’s <a href=”http://kuomarc.wordpress.com/2012/01/27/why-i-love-common-lisp-and-hate-java/>a personal note on my experiences
sorry about the mess there
Ah, but if we’re talking about sharing code so that others can understand and replicate our research, then something (a) mainstream and (b) object-oriented (did I hear Java mentioned?) is pretty much essential.
Hehe, very good point Paul, it’s the network effect. What point in writing elegant code if nobody can read your language? Sure thing that Java is mainstream and Lisp isn’t.. Of course it’s a long haul — Common Lisp will never get there (Clojure is on its way though, and works on JVM) — but perhaps some people might get as curious as I was about “there must be some other way?”
Here’s my humble/wishful thinking:
As Brian pointed out, even himself hated looking at his own code (and so did I about my own Java code)
Others will not want to see let alone understand your code
In my opinion/experience, it’s quite difficult to write elegant code in Java — requires lots of discipline and patience
ORites want to focus on their ideas
So my highly personal take on Common Lisp?
MOST importantly, it is fun (very subjective)
Someone on the fence about programming might jump the fence
Easier to get started
The nature of the syntax/language forces you to write modular/understandable code
More difficult to abuse the language
Perhaps more people will then be willing to share their code
Cons?
I have yet to meet another Lisper in OR beside
Requires ‘radical change’
Not as many libraries available
Hope?
New OR students are taught Lisp
Frustrated ORites try Lisp as alternative
They will enjoy programming
They will write beautiful code
They will share their code with pride
They will collaborate on open-source projects
They will create a community and libraries for OR
They will take over the world
Forgive my day dreaming. It’s Friday and the sun is shining. Have a good weekend!
how embarrassing -_-’ I don’t understand the XHTML
It should read:
“I have yet to meet another Lisper in OR beside Jorge Tavares”
Some of your pluses for LISP are also pluses for Python, which is closer to mainstream and does have some common libraries (some of which, I think, are C or C++ with a Python wrapper).
I happen to find my Java code more readable than a lot of what I wrote before Java. I find embedded Javadoc to be a boon (don’t know if LISP has the equivalent).
I’m learning Python, but I actually prefer a more rigid typing system and more rigid class structure (so that if I meant to access the ‘cart’ field of the ‘Shop’ class and typed ‘Shop.cat’ rather than ‘Shop.cart’, I hear about immediately, rather than after an arduous debugging session).
Nice to hear from you Brian. That’s the most I’ve heard from you in months… even though your office is right around the corner.
The topic of programming skills for OR types got a heavy workout on OR-Exchange recently. Lack of discussion (forewarning?) of the importance of programming, let alone any coursework to help you with it, is apparently pretty common in OR and probably in a lot of applied math programs. I suspect that many of my brethren on math faculties view programming as (to borrow a phrase used by one of my instructors in a different context) “tedious but brutally straightforward”.
Which it’s not. I won’t attempt to rehash the OR-X discussion, but I can personally attest to the fact that dubious programming (not just indexing a loop with a matrix but subtle things like using a linked list v. a set v. a priority queue) can seriously affect execution time. Other little nasties can affect whether the algorithm even works. (You’d be surprised how often OR people write “if x = y then …” when x and y are floats. Or maybe you wouldn’t.)
There’s also the question of being able to tap a database (otherwise your code is limited to fairly small instances in many cases) and writing a user-friendly (not to mention user-proof) front end (when you won’t be the sole user of the code, and it’s not a library).
Good post on an important topic.
Hi Tim, good article, especially the part about wanting to write “clean, object-oriented, well-commented code” in order to be “be easily reproducible by others”. I’m in the software industry and only recently delved into OR.
When I go to a company I try to write the simplest, most standard code possible so everyone can read it quickly. If you need hyper-efficiency then tighten up your code, otherwise I don’t sweat about eeking out that minute performance boost if it makes your code illegible. Object-oriented is mainstream, but procedural coding can be fine sometimes too. Heresy! Anything to allow your target audience to understand and fix it quickly when its broken.
I find the posts on the other sites interesting because there seems to be a reluctance to publish code. In the software world you leave your ego at the door when you’re subjected to structured walkthroughs and code reviews. Publish away, each mistake will just raise the bar.
Let me add something I haven’t read in those discussions, unless mistaken.
As long as programming is seen as a necessary evil, it will be a time consuming pain.
If you’re good at programming, then you spend less time doing it. It leaves you more time to think about ideas and algorithms. The saved time is a significant reward, even if good coding is not valued enough in academic publications.
This should be enough to motivate OR people to learn more about how to efficiently produce efficient code.
Great comment. I think I probably left the impression that I view coding as a necessary evil. I think I used to, but I don’t anymore. But I do feel that the lack of discussion of it in much OR academic literature implies that. Or maybe others don’t view the code underlying the results as significant as I do. (Or maybe, as it has been suggested, many are embarrassed about their code.)
Tim, it is quite clear that programming is not a necessary evil to you. Moreover, your post will probably convince others to share this view!
hi, Can u solve operations Research paper for me if u can give me ur email address then i can email u the paper. I have a Exam on 1st april please solve it for me.Thanks,Alina