Preparing Ruby Packages for Public Consumption
(or, things to do before exposing your code in public)
Daniel J. Berger
I first discovered Ruby near of 2000 and I’ve been enamored with the language ever since. However, I have found many of the Ruby packages, whether on the Ruby Application Archive (RAA) or elsewhere, to be lacking in one respect or another.
I came to Ruby with a strong Perl background and with that background comes CPAN, Perl’s module archive. CPAN has proved to be more than just a convenient way to publish your code. It has also come to define a set of standards that have evolved over time. Those standards have yet to be adopted in a consistent and widespread manner by the Ruby community.
The goal of this article is to identify what I consider to be the most common shortcomings of many of the current Ruby packages. Then I’ll explain why I consider them shortcomings and what you can do to address them in your own packages. I’ll also recommend some standards that I believe we, the Ruby Community, should always try to follow.
One quick note. I often use the words “package” and “module” interchangeably. In the latter case, I am referring to a lump of code & files, not the Ruby keyword.
Before you begin your noble quest to write a Ruby package, make sure that something similar, or even identical, doesn’t already exist. First, search the RAA. Second, search RubyForge.net. Lastly, search Google.
It’s much better to contribute to an existing project in such a case. There are many folks out there that would likely appreciate help with one aspect or another of their project.
At OSCON 2003, Matz discussed the importance of choosing a good name for your application. This is also true for your modules. What do I mean by “meaningful” exactly? First and foremost, it should be descriptive of what it actually does. A module that monitors your network should probably have “net” in it somewhere for example.
This can have a domino effect. A good name means people will more easily find your module on the RAA. As more people find it, more people will use it. The more people that use it, the more likely you are to get feedback. The more feedback you get….well, you get the picture.
Publishing code, in my opinion, puts a burden of responsibility on you. If you’re going to publish code, it means that you expect that some folks out there might find it useful. That means there’s a good chance you’ll get feedback and, horror of horrors, they’ll expect an answer.
Seriously, if you don’t plan on responding to feedback, or don’t like the notion of answering questions about your code, DO NOT PUBLISH. If you aren’t willing to maintain your module for at least a year, DO NOT PUBLISH.
My reasoning is simple – I may use your module in actual production code. If I come across an issue that I cannot resolve myself, and you are unwilling to respond to questions, it could make my life very difficult. I will either have to spend hours working out the issue, or abandoned your module altogether in favor of something else.
With that having been said, be receptive to feedback, but don’t overindulge requests, either. Don’t be afraid to say no to any suggestions which you don’t agree with or simply feel are not appropriate for your module. I also realize that sometimes “real life” and/or “burnout” happens and you absolutely cannot or will not maintain your module any longer. In such a case, please do not simply abandoned your module, especially if it happens to be popular. Please announce on the mailing list (or IRC or wherever) that you are looking for someone to take over maintenance of your code.
I realize this may seem incredibly basic to many of you, but you would be surprised at how many fubar’ed zip files I come across. Here are a couple of rules that you should follow:
a)Make sure your files unzip into their own directory and NOT the current directory. This seems to be more common with .zip files than .tar.gz, but I’ve seen egregious violations with both types.
b)Include a version number in the top level directory. For example, if I untar foo-0.1.tar.gz, I would expect a top level directory called “foo-0.1” that contained the contents, not just “foo”.
The reason for the latter suggestion is that there are times when I wish to keep multiple versions of the same module around. Usually this is in the event I want to roll back to an earlier version. It’s also so that I can compare versions, via “diff” or whatever.
You should always include a VERSION constant somewhere in your code, either in a module or a top level class. This helps both us, the end user, and you the coder.
It helps end users by making it easy for us to determine which version of a module we may (or may not) have installed on a given system. A simple ‘ruby –e “require ‘foo’; puts Foo::VERSION.to_s”’ can save us the headache of manually inspecting code.
Another reason to include a VERSION is for sanity checking with regards to unit testing. Let’s say you’ve installed foo 0.1. You’ve written foo 0.2 and now you want to run your unit tests against it. How can you guarantee that you’re running your unit tests against 0.2 and not 0.1? You guessed it – by checking the VERSION. In fact, the first unit test I have in all of my own test suites is a VERSION check.
One final reason to include a VERSION constant is for automatic packaging and/or dependency checking for future modules that perform such functions.
If a tree falls in the middle of the forest and no ones sees it, did it ever fall? If you release a package to the public but don’t include any documentation, did you ever release?
If you are going to publish code, you MUST INCLUDE DOCUMENTATION. Failure to include adequate documentation is a virtual guarantee that no one will use your code. Even worse, they’ll probably bug you about it until you finally do include documentation, so you may as well include it.
It is not our job to analyze your code to determine how it works. It’s your job to tell us how things work. Personally, I live by examples. Even a brief synopsis can get me up and running quickly.
I realize that language barriers can sometimes make this difficult. Most native English speakers don’t read Japanese and most native Japanese speakers have a rough time with English. In such cases, the best way to help is to translate the documentation into a more fluent version and politely suggest the changes to the author. You’ll be helping both the author and those of use who use the module.
Provide us with an easy way to install your module. This can be as simple as a homegrown install.rb script or as advanced as Minero Aoki’s “setup” module. Just give us something. Generally, of course, this isn’t an issue if you’re talking about an extension, and there’s an extconf.rb file present.
Don’t expect us to manually copy files into the Ruby lib directory. In addition, if you post a module on the RAA, give us a tarball. Don’t expect us to copy and paste from a web page. If you need hosting, you have SourceForge, Savannah and (recently) RubyForge for all your hosting needs.
One final note – please configure your installation scripts to install in the sitelibdir somewhere (which ‘setup’ can be configured for). This is something that Perl does by default. The reasoning is simple – it allows you to easily distinguish modules that were included as part of the core distributions versus those that were installed manually.
README – Include a synopsis of what your package is. In addition, include installation instructions and documentation, or point me to where I can find instructions and documentation.
MANIFEST – Good for sanity checks, statically built extensions, and scripts that check directory contents based on this file (i.e. auto-packagers).
CHANGES – A change log is good, quick way for folks to see if it’s truly worth installing the latest and greatest version of your code, or for the merely curious. It’s also a good reminder for yourself, especially if you’re not using any formal version control
You should include copyright and license information.
You should always write unit tests and include them with your package. Tests help you write and maintain bug free code. If you’re not familiar with unit testing, learn NOW.
Another advantage to writing unit tests is that it can help with cross platform support. Most folks tend to develop on only one platform. By providing unit tests, folks on other platforms can more quickly and easily spot any platform specific issues that arise.
Finally, the mere inclusion of unit tests provides me with a bit of psychological comfort. It tells me that you put a little extra time, effort and care into writing your code.
Oh, wait did I mention this one already?
I mention it twice because it’s CRUCIAL!
Create a sample script or code snippet and make sure it runs as you expect (in addition to any unit tests, that is).
Then, run that same code with the –d ($DEBUG) option and watch for warnings.
Then, run that same code again with the –w ($VERBOSE) option and watch for warnings.
Build all extensions with warnings enabled. For you gcc users, that means set your CC environment variable to “gcc –Wall” (or modify the Makefile before building). If you’re feeling exceptionally thorough, add a “-W” as well.
Do your best to eliminate all warnings, though keep in mind that some cannot be eliminated or simply aren’t worth worrying about (unused parameters, for example).
Be prepared for platform specific issues.
Premature optimization is the root of all evil. That’s why this topic is last and gets the number 13 to boot.
Put together what you would consider “typical code” and run your code through the profiler via “-r profile” on the command line. This will help you spot any egregious bottlenecks that may exist.
The profiler can also help you spot logic or control flow problems, based on the number of calls to specific methods that you see in the results. This can also help you short circuit loops that are running too often, for example.
All of these practices will not only help you write more professional code, they will also , over time, establish a standard for the Ruby community with regards to publishing code.
Daniel J. Berger