Perlbuzz news roundup for 2010-03-09

These links are collected from the Perlbuzz Twitter feed. If you have suggestions for news bits, please mail me at andy@perlbuzz.com.

CPAN Testers 2.0 end-February update and next steps

It’s never great to post an “end-February” report a week into March, but that’s how things are going lately. I’ve been busy with family and work obligations that have meant less CT2.0 hacking. I’m sorry this is coming late, but I hope I will give anyone interested a sense of where things stand.

I should note that the original deadline for finishing CT2.0 was March 1 and clearly we’re not there yet. I’ve discussed the situation with Robert and Ask at the Perl NOC, and they’ve been willing to extend the deadline for cutting off CT1.0 for a while longer. Thank you, Robert and Ask, for your understanding!

Progress in the last couple weeks:

  • I did some alpha testing of the CT2.0 Metabase hosted on Amazon. Based on that, I revised yet again (sigh) the user registration/credentials approach to minimize the hassle for old and new registrants.
  • Florian Ragwitz wrote a Catalyst app to help distribute new credentials files to legacy CT1.0 users. I haven’t deployed it yet (since I now need to regenerate all the credentials), but greatly appreciate his quick turnaround.
  • I implemented the Metabase search capabilities that Barbie will need to update the CPAN Testers statistics database. This will be based on the excellent SQL::Abstract approach to WHERE clause construction.
  • I wrote several helper modules to simplify configuration of a CPAN Testers metabase in preparation for deployment. The first of these has already been uploaded to CPAN: Net::Amazon::Config
  • I finalized the “version 0″ API for the Metabase web service and revised the interface between the Catalyst app and the Metabase to reflect the latest changes to the library.

In the last several weeks, members the #catalyst and #moose IRC channels were very, very helpful and patiently answered my many stupid questions. Thank you in particular to confound, perigrin, rafl, rjbs, stevan and t0m.

Coming up:

  • Deploy all my new code onto the server for “beta” testing
  • Release all the code to CPAN that people will need to configure their clients for beta testing
  • Regenerate user profiles and deploy rafl’s app to distribute them to legacy users
  • Whip the server into production shape (e.g. proper boot scripts to auto-start the CT2.0 apps on restart)
  • Get back to work on legacy report migration

It’s at the point now where I suspect the “hard thinking” part is pretty much done and it’s a lot of grotty but straightforward tasks to go. Hopefully, beta testing won’t reveal any major issues and the end of March update will be focused on planning on orderly transition from CT1.0 to CT2.0.

Configuration-Free CPAN Installations

Module::Install exists because installing CPAN distributions is not always perfectly easy.

Unfortunately, it didn't help—at least, not entirely. According to the completely unscientific process by which I install CPAN distributions, Module::Install accounts for a greater amount of pain than it should, at least according to its frequency of use. (Again, this is completely unscientific. I could guess that half of the CPAN client sessions which encounter Module::Install require me to fix things manually, but it's probably closer to 20%. It's more memorable because of my severe dislike for M::I prompting to install dependencies during configuration time.)

M::I addresses a real bootstrapping problem. I want to be able to use libraries during configuration, building, testing, and installation. I don't know which versions of those libraries you have available. Bundling known-good versions of those libraries with the distribution itself solves part of that problem...

... except when it doesn't. If I were to use M::I, I would have to re-release all of my distributions for every new release of every bundled library, at least if they contain important bug fixes for the various platforms about which I care. The cheap perfume of static linking leaves its musk heavy in the air.

It's easy to fall into the trap of a false dilemma. "You fool!" you prepare to comment below. "It's either that or the chaos of trying to make do with whatever version of those dependencies users may or may not have installed on their systems!" You're right; those are two possibilities. They're not the only two possibilities.

Part of the real problem is that bootstrapping during configuration is much too late. By the time you're running the configuration system, you're already running the configuration system. If your version of the configuration system is too old or too new, you have a problem. Bail out? Revert? Upgrade? There's no good heuristic for determining this. (The CPAN itself has an opinion. That's part of the problem.)

M::I hackers do deserve credit for helping to develop the META.yml standard. (I think M::I is the wrong approach, but I intend no slight toward its users, advocates, and developers. Invention requires the courage to get things wrong sometimes, even as it requires the courage to abandon false leads.) The META.yml specification is a big step in the right direction. If most CPAN modules have static requirements and follow a standard set of conventions, there's little or no configuration necessary. A sufficiently smart CPAN client can perform the appropriate configuration without running code from the distribution itself.

You can't avoid that in all cases; distributions with XS components, for example, need to probe system information. Good luck writing a sufficiently smart CPAN client and getting the community to agree on specific standards that let you find OpenGL headers in a cross-platform fashion, for example. Yet if 80% of CPAN distributions can get by with static, upload-time configuration, a lot of complexity of installation can go away.

Yes, that would make Module::Build and ExtUtils::MakeMaker unnecessary for (probably) most CPAN distributions, at least at the point of configuration, building, and installation. (I'm a recent fan of Dist::Zilla for automating away tedium on behalf of distribution maintainers; there's less need for Module::Install in such a world. If I never write another Build.PL again, so much the better.)

That helps, but the real problem with CPAN installations is that the CPAN itself is merely an uploading, indexing, and mirroring system. Projects such as META.yml attempt to add (and extract) meaning from the system, but they cannot work around one fundamental design feature of the CPAN. That limitation is the source of most woes for end users.

Clever readers (or experienced CPAN users) have already identified this limitation. I'll reveal it in the next installment.

The 99% Rule

David Golden praised Tatsuhiko Miyagawa's excellent new cpanminus CPAN client in The power of not being all things to all people. You should consider using cpanminus.

Don't overlook something else insightful that David wrote:

It's a lot of work to be all things to all people and I keep wondering whether making things simpler and better for 99% of people would be a better choice.

The only reliable way I've ever seen to "make the easy things easy and the hard things possible" is to make the easy things the default without preventing customization of the hard things. That's a design principle for languages, APIs, and tools.

Figure out what's most common (though not necessarily what people think they want, but what they need). Optimize for that. Consider what they might need and don't prevent it.

That's not easy, and what people need will change over time, but if you want to solve problems well, you have to solve the right problems.

The power of not being all things to all people

The Perl community seems abuzz about the cool new cpanminus client by Tatsuhiko Miyagawa. After about two weeks of development, this is a reasonably functional CPAN client with just a fraction of the overhead, complexity and verbosity of the CPAN and CPANPLUS clients that come with the Perl core.

It’s a remarkable achievement, not only technically, but in the reaction it has sparked. As one of the (sometimes reluctant) maintainers of CPAN and CPANPLUS, I’ve realized that I both love and hate cpanminus.

  • I love that Miyagawa has done so much with so little and in such a short span of time
  • I hate that fanboy-types flocked to it and trashed the older clients without noting cpanminus’ limitations
  • I love that Perl toolchain maintainers have rallied around Miyagawa and contributed their wisdom to make cpanminus better instead of rejecting it
  • I hate that one of Perl’s great strengths (CPAN) has legacy clients that are so unwieldy, hated and difficult to maintain

Miyagawa graciously acknowledges standing on the shoulders of giants. Still, I can’t shake the nagging thought that cpanminus should never have been necessary in the first place.

What I’ve come to realize is that cpanminus is another example of the power of not being all things to all people. Miyagawa doesn’t promise that it works for all of CPAN or that it works everywhere that Perl does. He doesn’t have to. Making it work for 99% of CPAN for 99% of people is more than good enough.

I’ve been co-maintaining various parts of the Perl toolchain for a while now. It’s a frustrating challenge needing to make thing work everywhere, for everything, and trying as hard as possible not to break backwards compatibility. Plus, I don’t even get to use CPAN to make life easier. I don’t get to use handy tools like Moose or DateTime or Regexp::Common or SQLite or anything in the Config::* namespace or even basic tools like Archive::Zip. Nearly everything is done by hand.

Things have to work with just core Perl on a diverse set of platforms and with an incredibly limited set of assumptions. For example, the Perl core still doesn’t come with an HTTP client, so CPAN has to rely on FTP or command line programs to bootstrap LWP. (This is something I personally plan to tackle during the Perl 5.13 development series later this year.)

I think this is an ongoing challenge for core Perl development in general. It’s a lot of work to be all things to all people and I keep wondering whether making things simpler and better for 99% of people would be a better choice. (Anyone else for use strict by default? I hope that finally comes to pass in Perl 5.14.) chromatic writes about this topic often in his Modern Perl blog and I usually tend to agree with the points he makes. (October February 2009 had a particularly good series of posts.)

In the meantime, I look at cpanminus with greed and envy. Miyagawa++

What’s Wrong with Module::Install

I've never liked ExtUtils::MakeMaker. I've liked Module::Build from the beginning. I've never, ever liked Module::Install, even though Ingy sat in my living room and hacked on what would eventually become M::I way back several years ago.

I don't believe people who use Module::Install should be shot on sight, but I do believe that Module::Install has set the usability of the CPAN back by several years.

Ingy did identify a real problem: there's too much code strewn about the configuration and build systems of the CPAN and not enough code shared. When he found himself writing something complicated to configure, compile, and install a new distribution, he'd crib code from someone like Tim Bunce or Graham Barr or Nick Ing-Simmons. In other words, to make the CPAN—perhaps the world's largest repository of redistributable and sharable library code—work, he had to copy and paste code.

M::I did get that right; turning the configuration and build system into reusable, redistributable libraries also available from the CPAN helped reduce the amount of boilerplate code and the amount of copy and paste code in configuration systems. The people behind M::I have also helped push for better configuration of CPAN clients and better tracking of dependencies and versions and types of dependencies (optional, compilation, bundling, testing, et cetera). The CPAN ecosystem is better off for that work, even though M::I itself isn't the answer.

One of M::I's biggest failings, of course, is that it prolongs the lifespan of EU::MM. Unfortunately, I think Ingy missed the big problem when he saw copy and paste code. To do anything reasonably complex with EU::MM, you have to be able to write Perl 5 code which generates (and modifies with regular expressions) cross-platform Makefiles which themselves call into Perl 5 code because Perl 5 has a sane baseline of well-understood and reliable behavior across platforms in a way that Makefiles and shells do not.

If that's not sufficient horror, consider that the way to customize EU::MM behavior the last time I looked at it (Two notes here. First, its current maintainer refuses to add new features or change existing features because it's so awful to maintain. Second, I wrote tests for some of those behaviors, so I've read and understood the code.) you write a custom superclass called MY from which EU::MM inherits to change the behavior of the various steps of generating cross-platform Makefiles which may or may not invoke Perl 5 to perform shell functions.

I am not making this up and I did not make any typos. EU:MM inherits from your custom class.

Keeping EU::MM viable long past the point where Module::Build did everything n a saner way is but a little crime. (There's a reason almost no one recommends the use of h2xs to make skeletons for new modules anymore; most new modules aren't mere wrappers around C libraries. The need for a compilation step with a pure-Perl distribution seems more than a little bit superfluous.)

Module::Install reinvoking your current CPAN client recursively to install dependencies when your current CPAN client already has a perfectly good way to install dependencies is a slightly larger crime. (I understand the justification; what if someone is trying to install a distribution from a tarball manually without a CPAN client, but is there a Principle of Least Possible Differentiation at work with that design choice?)

I don't particularly care that using M::I as a distribution maintainer means that you have to keep track of every new release of M::I which could fix bugs or make upgrading impossible and release a new version of every one of your distributions with the new M::I because of its autobundling problems, because if you get your kicks that way, more power to you. (Don't expect me to jump up and down for joy at upgrading all of your distributions on my machines, though.)

I never particularly cared for the FUD about Module::Build from some M::I discussions, especially the nonsense about "Module::Build doesn't support --install so it spews files all over the place!" (as if EU::MM ever worked properly there) or "Module::Build doesn't support uninstalling!" (as if anyone ever used that from the CPAN client to know that it worked with any system.)

I appreciate that Module::Install provided a much nicer interface than EU::MM did, and that that interface worked transparently to hide the nasty details of EU::MM. Those are true benefits, and I don't blame anyone for choosing M::I for that reason.

Even so, Module::Install's greatest crime is that it's been a distraction from identifying and fixing the real problems of the CPAN... but that's another post.

Perl 5, Support, and Bugfixes

I wrote what I understand to be the strategy behind releasing new minor releases of Perl 5. Though the development branch of Perl 5 follows a monthly release cycle, the maintenance branch currently does not. If it's difficult to predict what changes volunteer developers will make in the future, it's doubly difficult to predict which bugs they will fix (or will need to fix).

Thus any support document must explain the responsibilities of users who encounter bugs and what they should expect from developers.

A bug you discover in a new major release of Perl 5 is a candidate for a new minor release if it is:

  • A security or dataloss bug
  • A regression introduced in the new major release
  • A failure to build on a supported platform combination
  • A failing core test on a supported platform

I've organized that list in rough order of decreasing severity. The most likely candidate for a fix is the second; it indicates an undertested aspect of the system. Behavior should not change between Perl 5 releases accidentally. If a patch modified behavior on which your code depends and if that change did not occur as part of a deliberate, communicated plan, a fix is likely.

Of course, any fix in a minor release needs to maintain binary compatibility within the release family.

The easiest way (at least for developers) to find and fix such bugs is before the release of a new major version of Perl 5. That's one goal of the monthly releases: to encourage you to test all of the code you care about with versions of Perl 5 in development. That's also one reason for the Perl 5 release candidates (though it's likely too late for big fixes by then).

If you can't do that, the next step to reporting a bug is to reproduce it in the smallest example possible. 10-15 lines of Perl 5 is good. 2-3 is ideal. More than 20 is usually too big. If you can provide a test case suitable for adding to the core test suite, so much the better. From there, test your code with multiple releases of Perl 5. It helps to browse the perldelta documentation, but the amount of detail between even minor releases can be daunting. A post on PerlMonks is a good step.

If you've gone through all of that, see perldoc perlbug.

This is all no guarantee that your bug will get fixed in a minor version—you should prepare for the possibility that, given enough time since the release of the corresponding major version, the best approach for p5p is not to backport a fix to a new minor version. Even so, you will likely get one of several options:

  • An explanation of why it's not a bug
  • A suggested workaround
  • A fix in the current version

In the latter case, you will have the option of applying the relevant patch yourself or asking someone to backport it to your own custom Perl 5 if you wish. That may not seem like the ideal situation (it isn't!), but at least with free software such as Perl, you always have that option.

The (Unwritten) Rules of Perl 5 Minor Releases

When a new major version of Perl 5 comes out, history suggests a new minor release will follow.

Some of this reasoning is pragmatic. For all of the requests of the Perl 5 Porters for people to test development snapshots (5.11.0 through 5.11.5) and the inevitable release candidates (in this case, 5.12.0 RC1 through... hopefully not RC2 and RC3), nothing gets more testing or bug reports than new major releases. Bugs get reported. Changes get requested. Changes occur.

The traditional view of new major releases is that they're somewhat unstable. "Wait for the patch release," people say. This was true of 5.6.0. (The ancient Camel third edition describes an unreleased version of Perl 5 somewhere between 5.6.0 and 5.6.1). This was true of 5.8.0. This was even more true of 5.10.0, where the CPAN itself suggested that 5.10.0 was a "testing" release for bleeding edge users.

Given the size of the Perl 5 test suite and the daily and weekly test reports produced from the bleeding edge of Perl 5 itself as well as the monthly releases, most of the obvious bugs appear and get corrected quickly. Even so, bugs happen. It's software. Changes occur and people notice only in odd or complex situations. Sometimes a new compiler warning appears, or underlying libraries change. Sometimes a few updates help get Perl 5 building on a platform which itself has changed.

A patch release is inevitable.

However, the Perl 5 Porters make no promises about when a point release will occur. Nor do they promise how many point releases will occur in a family.

The Perl 5.8.0 family had nine point releases between July 2002 and December 2008. If Perl 5.10 hadn't taken five and a half years, Perl 5.8.0 might have had fewer point releases.

22 months passed between the release of Perl 5.10.0 and Perl 5.10.1. There may never be a 5.10.2.

A point release needs two things: a steady stream of bug fixed in the core without breaking source compatibility and someone to identify appropriate those patches and to make the release. In other words, one or more people need to be able to cherry-pick patches from the development track of the next release of Perl 5 to the branch which will become the new point release. The weight of history and expectations is sufficient to assume that p5p will be able to find or herd enough volunteer effort to make a Perl 5.12.1 and a Perl 5.14.1 and so on, but if monthly Perl 5 releases continue and produce a new major release every 12-18 months, the likelihood of a Perl 5.12.2 and a Perl 5.14.2 decreases.

The single unpredictable factor is the presence of a major bug discovered in that release family; a major security bug or a data loss bug is one possibility. In that case, a single-patch minor release is likely. Beyond that, minor releases have diminishing returns.

CPAN Testers 2.0: Interim milestone

This is just a quick note to say that I’ve successfully configured a test CT2.0 server hosted entirely on Amazon Web Services. I’ve also successfully sent test reports to it from my regular CPAN::Reporter client using Test::Reporter::Transport::Metabase. It’s the first end-to-end test of the target architecture for CT2.0.

There are some tweaks I need to make before I’m ready to open it up to beta testing, and all the updated module are still in git and not yet CPAN, but there is a light at the end of the tunnel and (so far) I don’t think it’s a train.

Perl 5, Version Numbers, and Binary Compatibility

As mentioned in What Perl 5's Version Numbers Mean, the written Perl 5 support policy must explain several guidelines and their implications.

If you've ever upgraded between major versions of Perl 5 on the same machine, you've likely noticed that you have to install new versions of modules. Various resources spread across the Internet suggest the use of CPAN autobundles, but even that's likely enough to make you curse a little bit as you babysit a CPAN shell for an hour or two to get back to where you started.

This is due to the binary compatibility guidelines to which the Perl 5 Porters adhere. While there's a strong desire to ensure that programs and modules written in the past several years will continue to work unmodified on new major versions of Perl 5, it's almost impossible to ensure that compiled XS code remains compatible across major versions of Perl 5.

Certainly the porters attempt to maintain source code compatibility of XS wherever possible, but ensuring that an XS module compiled in 2000 for Perl 5.6.0 will continue to work unchanged on Perl 5.12 in 2010 requires a great deal of foresight, plenty of tolerance for workarounds in the core, and no small amount of luck. There's a limit to what's practical to provide for how long, and the price of reinstalling extensions (and recompiling a few) is worthwhile.

While this choice may seem odd, considering the degree to which Perl 5 retains backwards compatibility, it reflects a philosophy from the earliest days of Unix: source compatibility is important, but everyone should have access to the source code to recompile as necessary.

Within a release family—Perl 5.8.0 through 5.8.9, for example—the porters attempt to maintain binary compatibility. If you installed DBI on a fresh Perl 5.8.0, it should continue to work even if you install 5.8.9 and remove 5.8.0. That's the intent.

Note that the Perl 5 configuration and installation process enforces this by default; minor releases within a major release family reuse the same module installation paths. Major releases have new installation paths. You can reconfigure Perl 5 to use paths of your choosing, but you do so at your own risk.

←Äldre