Using WWW::Mechanize to get my scratchy 45s

I'm a big fan of WFMU's Beware of the Blog. So much music geekery and arcana in one handy source!

Sometimes there will be a post with lots of MP3s for download, like this one today with long-forgotten 45s of songs paying tribute to Merle Haggard. I don't want to listen to the mp3s in my browser, and I don't want to manually do the Save As dance in the browser.

Perl and WWW::Mechanize to the rescue! If you have WWW::Mechanize installed, you also have the mech-dump utility. mech-dump started as a tool to make it easier to create WWW::Mechanize programs by showing what form fields exist, but it does more than that. By default, mech-dump will fetch a page and display the forms and fields on the web page. If you call it with --links, you'll only get back the links, like so:

alester:~ $ mech-dump --links http://blog.wfmu.org/freeform/2012/04/a-tribute-to-the-hag-mp3s.html 
http://blog.wfmu.org/freeform/styles.css?v=6

http://static.typepad.com/.shared:v20120403.02-0-g1ba1fe9:typepad:en_us/themes/common/print.css

http://blog.wfmu.org/freeform/2012/04/you-cant-put-your-arm-around-a-memory.html?no_prefetch=1
... etc ...

Filter that output through grep and pass it to xargs and wget, and you've got a handy MP3-only downloader.

alester:~ $ mech-dump --links [big URL] | grep mp3$ | xargs wget

--2012-04-04 09:27:32--  http://blogfiles.wfmu.org/GG/Skeeter_Harmon_-_A_Tribute_To_The_Hag.mp3

100%[=======================================================>] 4,935,323   4.08M/s   in 1.2s

... etc ...

I suspect very few Mech users are aware of mech-dump, and how handy it can be from the command line. I wish I'd done a better job of publicizing it.

-Ofun for Whom?

A fundamental, rarely-questioned piece of wisdom about free software is that it works best when it scratches the itches of its developers.

(You can tell when someone's passionate about something. You can also tell when that someone has no love for the work; the results often differ. Then again, passionate people made the movies Plan Nine from Outer Space, Avatar, and Manos: The Hands of Fate.)

What could motivate someone to work on something outside of work hours? What could motivate someone to spend time solving a hard problem for the challenge of it? For status? For low pay? For hope of a future benefit?

A few years ago, Audrey Tang described Optimizing for Fun, a project organization strategy for cultivating new contributors by lowering the barriers to contribution and relentlessly encouraging even the smallest progress as valuable, desirable, and sustainable.

More projects could benefit from that, in and out of software, in and out of business, in and out of the world of volunteers, professionals, dilettantes, and amateurs.

The 20th century author and theologian C. S. Lewis suggested that every vice is a virtue misapplied. (To belabor the point, gluttony—one of the seven deadly sins as borrowed by Gregory the first pope from a fourth century monk—is the misapplication of the legitimate enjoyment of food. Lewis was no stoic.)

What's -Ofun misapplied? Does this sound familiar?

We must remember that, from what we can see, the $foo project's primary mission in life is providing entertainment for $foo developers, not convenience or stability for $foo users. Which is understandable given that volunteers, who make up the vast majority of $foo developers, tend to do whatever it is they do for fun, not for drudgery.

That could apply to many projects. (Relevant context: a comment by user anselm on LWN.net's story "Free is too expensive".

Fixing bugs isn't always fun. Keeping an old and crufty API around until users have time to migrate off of it isn't always fun. Making and meeting promises about release dates isn't always fun. Writing documentation isn't always fun. Holding back new features in favor of improving existing ones isn't always fun.

Sometimes supporting real actual users—not just hobbyist developers who already think downloading and compiling the new version out of your repository is fun—sometimes takes work and effort.

As a developer, each of us gets to decide the degree to which we pursue things we find enjoyable. If it stops being enjoyable, you have every right (even the responsibility) to change your situation to make it more fun or to leave it for someone else to do. Your obligation is to do the best work you decide you are obligated to do. Nothing more, nothing less.

... but if your desire to do the fun bits exceeds your willingness to put in the hard work to understand what your users want and need and expect, at least do them the courtesy of not acting surprised when they tell you of their disappointment.

Perl QA hackathon wrapup

From mid-air somewhere near Greenland... I'm on my way back from the fifth annual Perl QA Hackathon and I can't believe it's already over. I missed the last two and I'd forgotten what an awesome experience it is.

tl;dr: Stuff I worked on:

Why I love the QA hackathon

If you've been under a rock and still don't know what the QA Hackathon is: it's a sponsored conference in which a small band of dedicted Perl hackers spend three days madly coding to improve the quality of the Perl experience for everyone.

I really enjoyed the chance to meet people in person that I only know from on-line venues or rarely get to see face-to-face. Having so many people working on so many projects in one space made it really easy to benefit serendipitously from the work of others, or to have a chance conversation spark a new way to get things done.

Another thing that makes the hackathon awesome is how quickly blocking issues get fixed. Several times, someone would hit a bug in some related library, walk across the room to the person who could fix it, and it would get fixed and shipped before one could get a coffee.

This year, my work focused mostly on the evolution of CPAN.pm and the CPAN toolchain.

A new way of thinking about CPAN indexes

On the morning of the first day, I convened an informal group of half-a-dozen CPAN client maintainers, installer maintainers and other interested parties [1] to talk about how to re-think CPAN indexing. In particular, I wanted to separate the notion of the "index" from the "repository". The canonical CPAN index is a file on CPAN that maps Perl package names ("Foo::Bar") to a path to a distribution archive file on CPAN ("DAGOLDEN/Foo-Bar-1.23.tar.gz").

Historically, your CPAN client mirrored the index from the same CPAN mirror used to download tarballs. I think that's limiting in a few ways. First, that file keeps growing as CPAN grows and it takes a while to mirror the whole file when you only need the mapping for a few modules.

Some CPAN clients, like cpanminus, don't even use the package index directly, but query a web API that serves up answers from it, which is one way to separate the index from the repository. That would be a nice feature to have in all CPAN clients.

That still doesn't change the model of having only one official index that your client knows about. If you or your company want to manage the mapping, you've got to use various tools to modify the official index in some sort of minicpan or private CPAN repository (aka "DarkPAN"). It's possible, but not user-friendly.

After some debate, the group agreed on a new model. A CPAN client should support an ordered list of index resolvers and should query them in turn. This means you could specify that you want an online web resolver tried first, and only then the traditional index.

More powerfully, you could list a local overlay index as the first resolver. That would let you freeze the mapping to a particular version, or to swap in a development release that fixes a critical bug. The overlay index would only need to list the modules you want to change, because your CPAN client will fall back to the canonical index. You could even have an overlay index per-application or per-application-version for total control.

We also agreed that mapping shouldn't just be a distribution path on a CPAN mirror, but should evolve into a URL. This would allow overlay indexes pointing to locally patched distributions, or to the BackPAN, or potentially even to source repositories (if appropriate scheme handlers were written to check out the necessary files).

In summary -- CPAN indexes should become an open, flexible mechanism to give users more control over how module names are mapped to the files that can provide them.

After reaching that agreement, Nick Perez (nperez) volunteered to start working on a common library for index resolvers (to be called CPAN::Common::Index) and to build a resolver for it that uses MetaCPAN to provide the mapping data.

Meanwhile, I started work on a proof of concept for how CPAN.pm could be modified to use the new, common library instead of its traditional index lookup routines.

Evolution of CPAN.pm

It had been a while since I was deep in the guts of CPAN.pm, but after coming back up to speed, I tackled two big projects and found one crazy bug along the way.

New CPAN indexing and reduced memory footprint

A stock CPAN.pm client uses a ton of memory when the indexes are loaded — about 300 MB last time I looked. It's bloated because it keeps a read-only copy of the indexes loaded in data structures in memory and also keeps a mutable object for every index entry as well.

Some time ago, CPAN::SQLite was released to help solve that problem. It kept the indexes in a SQLite database and loaded data into memory on demand. I decided to use that same approach for the interface to the forthcoming CPAN::Common::Index library, with the goal of being able to load all data on demand, even directly from the package index file, using only core Perl modules.

Here's the trick: the package index file is line-oriented and is sorted by package name. Using the Search::Dict core module, I was able to do a binary search as a super-fast way to look up data for a package name.

The wrinkle in that plan is that Search::Dict wants a filehandle and uses it to seek around in the file, but the package index has an email-style header that confuses it. I could have copied it without the header, but that takes time and memory, too. PAUSE could publish an identical copy without the header, but that's extra work for PAUSE and potentially confusing if they ever get out of sync.

Instead, I wrote Tie::Handle::Offset and Tie::Handle::SkipHeader to hide the email header on a handle, so I could give that directly to Search::Dict. Unfortunately, Search::Dict died unless stat() on the handle gave a valid response, so I patched it to fall back to an alternate method if stat() failed. That revealed a bug in Perl, in which stat() warns when called on tied handles, even if there is a valid filehandle to check (filed as rt#112164).

Since that bug can't get fixed until Perl 5.17 and since we need a working Search::Dict for older Perls anyway, I patched Search::Dict to avoid using stat() on handles, and asked Ricardo Signes (rjbs) to give me a green light make a dual-life release to CPAN.

After chasing my tail on that for a while, I finally was able to get a proof of concept of on-demand index lookup on the package index file working, saving hundreds of megabytes of memory. It didn't use the CPAN::Common::Index library, since Nick was still writing it, but it expects the same API, so it will be easy to adapt once CPAN::Common::Index is ready (meaning that fast MetaCPAN lookups for CPAN.pm should be easy too).

My POC only covered the package index, but Andreas Koenig (klapperl) and Ricardo created a similarly sorted index of author data and we agreed to consider a similar approach for modlist data once we see how the package indexing works in practice.

I would have been happy if that was all I achieved at the hackathon but I still had some time left to get more done.

CPAN.pm support for 'recommends' and 'suggests' prereqs

The v2 CPAN::Meta::Spec formalized dependency specifications for different phases (configure/build/test/runtime) and for different levels of dependency (requires/recommends/suggests/conflicts). The 'recommends' level is for things that should be usually installed to make a module better except in really resource-constrained environments. The 'suggests' level is for really optional modules that might make a module better but really aren't necessary for regular use.

Even though those have been specified for a while, none of the CPAN clients supported them -- meaning it was a manual job to look at the META file, see the recommends/suggest and install them yourself. Ssually, no one bothers.

Since I was on a roll from the indexing work, I set up another CPAN.pm feature branch and implemented support for a 'recommends_policy' and a 'suggests_policy' to control whether those prereqs should be queued up along with the required ones. Even better, if those optional dependencies fail for any reason, CPAN.pm won't warn about missing dependencies and simply notes them as being optional when it reports the failures after processing a command.

Along the way, I found and fixed a CPAN.pm edge-case bug where a module listed in both "build requires" and "runtime requires" and that has a lower prereq in "build requires" would overwrite the higher requirement in "runtime requires". (yikes!) That might explain some bizarre CPAN.pm bug reports I've seen that we could never track down, so it was an extra win.

Unfortunately, ExtUtils::MakeMaker and Module::Build don't yet preserve 'suggests' dependencies during configuration, so this will only help with 'recommends', but fixes to the installers are in the works (Ricardo was working on EU::MM at the hackathon) and CPAN.pm will be ready whenever they are.

Adding features and fixing bugs

CPAN::Meta got a tiny bit of love. I released a version of Parse::CPAN::Meta with dependencies on the latest (less-buggy) versions of CPAN::Meta::YAML and JSON::PP. (I've already got CPAN Testers fail reports, so the tests apparently need some more work.).

Leon Timmermans (leont) added a new method to CPAN::Meta::Requirements for something he was working on, which was awesome because I wound up needing it for the CPAN.pm work only a couple hours after he sent me the pull request. Then I split out CPAN::Meta::Requirements from CPAN::Meta and released it, so CPAN.pm could depend on it without needing all of CPAN::Meta. CPAN::Meta also got some releases for these various changes.

In a startling display of synchronicity, both Curtis Poe (ovid) and Lars Dɪᴇᴄᴋᴏᴡ (daxim) reported a weird Module::Build bug within about an hour of each other. Apparently, errors in META file creation can result in existing META files being deleted, no new files being created and no error message shown about what happened. Leon and I figured out the problem and offered some workarounds — though we ran out of time at the hackathon to fix it in Module::Build itself.

Various other things I did

Several people — Leon, Michael Schwern, Olivier Mengué (dolmen), Lars, me, and a few others I now forget (sorry) — got together to discuss a draft of a "Build.PL API" draft. It defines what CPAN clients should expect interacting with a Build.PL/Build-based installer, which opens the door to future replacements for Module::Build, like Module::Build::Tiny.

Breno de Oliveira (garu) wanted to add CPAN Testers reporting to cpanminus, and along the way volunteered to write a unified, second-generation CPAN Testers client to replace the disparate behaviors of CPAN::Reporter and the reporting modules of CPANPLUS. I gave a small tutorial on CPAN Testers and the Metabase backing it to Breno and others interested in the topic.

As a minor note, I got annoyed at some Test::Spelling carping during all the releases I was doing, so I released a new Pod::Wordlist::hanekomu. If you use Dist::Zilla and the Test::PodSpelling plugin, check it out!

Cool things other people did

Some things I didn't work on that I thought were notable:

  • To support CPAN::Common::Index, Nick wrote MetaCPAN::API::Tiny — a client for querying MetaCPAN that relies only on core Perl modules, which is exactly what we need for a new CPAN.pm index resolver
  • The CPAN "package index" now updates every five minutes instead of every hour... which means other projects that rely on it, like MetaCPAN, are even closer to real time.
  • I asked around if there was a command-line client for MetaCPAN and there wasn't. Then Chris Nehren (apeiron) asked me what I had in mind, whipped one up, and submitted it as an addition to the MetaCPAN::API distribution
  • Ricardo worked on getting full support for CPAN::Meta::Spec v2 into ExtUtils::MakeMaker, including TEST_REQUIRES and ensuring all prerequisites types are preserved in MYMETA.json files
  • Ricardo also got PAUSE to save package index files into git after each update, so we no longer lose historical information
  • Peter Rabbitson (ribasushi) demonstrated a way to use git to store CPAN Testers reports to achieve massive delta compression (and make it easy for people to get copies of the raw data quickly and cheaply). I didn't have time at the hackathon to do much with it but hope to look into it more soon.
  • Late in the afternoon on Sunday, Nick used his MetaCPAN::API::Tiny client for what was dubbed "CloudPAN", a crazy April-Fools proof-of-concept to hook module loading to load missing modules directly from source on metacpan. You'll never need to install pure-Perl modules again. ;-)

There was a lot more going on and a lot I missed, so if I omitted anyone's project, I mean no offense. (I'll read all the hackathon blogs to catch up.)

Conclusion and Acknowledgments

This was my third hackathon and was just as inspiring (and productive) as the last two. I'm excited about the evolution of CPAN.pm and hope to get my work tested further and then merged into the CPAN.pm master branch before long.

I have nothing but wonderful things to say about Laurent Boivin (elbeho), Philippe Bruhat (BooK) and the French Perl Mongers who organized a great event and provided wonderful hospitality, including an endless supply of food, drink and coffee machines to fuel our hacking.

I would also like thank the hackathon sponsors whose generosity made the hackathon possible and enabled me to attend. (If you'd like to donate, it's not too late and will help support next year's QA hackathon.)

These companies and organizations support Perl. Please support them: The City of Science and Industry, Diabolo.com, Dijkmat, DuckDuckGo, Dyn, Freeside Internet Services, Hedera Technology, Jaguar Network, Mongueurs de Perl, Shadowcat Systems Limited, SPLIO, TECLIB’, Weborama, and $foo Magazine

These people made individual donations (you rock!): Martin Evans, Mark Keating, Prakash Kailasa, Neil Bowers, 加藤 敦 (Ktat), Karen Pauley, Chad Davis, Franck Cuny, 近藤嘉雪, Tomohiro Hosaka, Syohei Yoshida, 牧 大輔 (lestrrat), and Laurent Boivin

Special thanks also to Torsten Raudssus (getty) and Duck Duck Go for the tee-shirt and Booking for the silly putty. :-)

Finally, thank you to all my fellow hackers! I had a great time and I hope to see you all again next year!

[1] CPAN index discussion group (with some people coming and going): me, Andreas Koenig, Florian Ragwitz, Michael Peters, Michael Schwern, Nick Perez, Olaf Alders, Olivier Mengué, Ricardo Signes, Tatsuhiko Miyagawa and probably even more I don't remember. (Please remind me if you were there and want to share the credit/blame.)

Perlbuzz news roundup for 2012-04-02

These links are collected from the Perlbuzz Twitter feed. If you have suggestions for news bits, please mail me at andy@perlbuzz.com.

  • "The measure of a civilization is how it treats its weakest members." So too it is with how an open source project treats its newbies.
  • A recap of the Israeli Perl Workshop 2012 (blogs.perl.org)
  • Who is a contributor to Perl? (perlmonks.org)
  • RT @OvidPerl Perl version 5.15.9 has 521,047 tests all of which just passed. #software #testing #tdd #perl
  • Perl-based search engine DuckDuckGo is taking off: (duckduckgo.com)
  • Discounts on bulk orders of Modern Perl for user groups (modernperlbooks.com)
  • New beta version of HTML::Lint validates HTML entities (perlbuzz.com)
  • The two most premature optimizations: Optimization of non-working code, and of unmeasured code. (perlmonks.org)
  • Devel::Cover reports in vim (blogs.perl.org)
  • Syntax coloring in the debugger (blogs.perl.org)
  • How to install BerkeleyDB (blogs.perl.org)
  • What I learned teaching Perl for advocacy (blogs.perl.org)

What Testing DSLs Get Wrong

A conversation between the owners of ClubCompy about language design, syntax errors, and testing led to an interesting exchange (lightly edited for coherence):

How do you go about testing order of operations and languages?

You need a minimal test driver that takes an expression or series of expressions and verifies that it either parses or produces a syntax error. The test matches part or all of that error.

Any given input either parses correctly or produces an error.

Our current test framework cannot "see" when there is a syntax error. We set a flag right before the end of our test programs and test that that flag has the right value.

The most robust strategy I've seen is to add a parse-only stage to the compiler such that you feed it code and catch an exception or series of errors or get back a note that everything's okay.

You can inspect a tree structure of some kind to verify that it has all of the leaves and branches you expect, but that's fragile and couples the internals of the parser and compiler and optimizer to the internals of your tests.

Is having a huge battery of little code snippets that run or fail with errors the goal?

Ideally there's as little distance between "Here's some language code" and "Here's my expected results" as possible. The less test scaffolding the better.

I've never been a fan of Behavior Driven Development. I think Ruby's Cucumber is a tower of silly faddishness in software development. (Any time your example walks you through by writing regular expressions to parse a subset of English to test the addition of two numbers, close the browser window and ask yourself if slinging coffee for a living is really such a bad idea after all.)

I neither want to maintain nor debug a big wad of cutesy code that exists to force my test suite into "reading like English"—as if the important feature of my test assertions were that they looked like index cards transcribed into code.

Nor do I want to spend my time tweaking a lot of hairy procedural scaffolding to juggle launching a compiler and poking around in its guts for a magic flag so that, a couple of dozen lines of code later, I can finally say yes or no that the 30 characters of line noise I sent to the compiler produced the error message I expected.

I want to write simple test code with minimal scaffolding to highlight the two important attributes of every test assertion:

  • Here's what I did
  • Here's what I expected to happen

That means I want to write something like:

parses_ok 'TOCODE i + 65',
    'precedence of + should be lower than that of TOCODE';

Instead of:

Feature: Precedence of keywords and arithmetic operators
  In order to avoid parse errors between keywords and arithmetic operators
  As an expert on parse errors
  I want to demonstrate that keywords bind more tightly to their operands than do operators

  Scenario: TOCODE versus +
    Given code of "TOCODE i + 65"
    When I parse it
    Then the result should parse correctly without error

Which would you rather read, run, and debug?

All of these "DSLs for $foo" jump too far over the line and try to produce the end goal their users need to make for themselves. I don't want a project that attempts to allow me to write my tests in a pidgin form of English (and I get to parse that mess myself, oh joy, because I'm already testing a parser and the best way to test a parser is to write a custom fragile parser for natural language, because debugging that is clearly contributing to real business value).

Ideally, I want to use a library someone else has written that can launch my little compiler and check its results. I want to use this library in my own test suite and have it integrate with everything else in the test suite flawlessly. It should express no opinion about how I manage and arrange and design the entire test suite. It should neither own the world, nor interfere with other tests.

In short, if it has an opinion, it limits that opinion to just a couple of test assertions I can choose to use or not.

In other words, I still want Test::Builder because T::B lets me decide the abstractions I want or don't want and reuse them as I see fit. After all, good software development means building up the vocabulary and metaphors and abstractions appropriate to the problem you're solving, not adopting a hastily-generalized and overextended pidgin and trying to force your code into the shapes demanded.

If I'm going to have to write code to manage my tests anyway, I'll make the input and expected output prominent—not a boilerplate pattern of repetition I have to parse away anyhow.

New version of HTML::Lint validates HTML entities

I've released a beta of the new version of HTML::Lint, version 2.11_01. (At the time of this writing, this 2.11_01 release has not reached its search.cpan.org page yet) This version adds HTML entity checking to the tag checking that HTML::Lint has done since the dawn of time. If you're already using HTML::Lint, please help test this beta version!

Entity checking can be a messy business, but can be invaluable for finding little mistakes, especially in static HTML pages sent to you from other sources. For example, if I have this HTML file, filled with HTML entities and ampersands and all sorts of potential problems, HTML::Lint sniffs out the problems and reports them:

<html>
    <head>
        <title>Ace of &spades;: A tribute to Mot&oumlrhead. &#174; &metalhorns;</title>
        <script>
            function foo() {
                if ( 6 == 9 && 25 == 6 ) {
                    x = 14;
                }
            }
        </script>
    </head>
    <body bgcolor="white">
        <p>
        Thanks for visiting Ace of &#9824;
        <!-- Numeric version of &spades; -->
        <p>
        Ace of &#x2660; is your single source for
        everything related to Mot&ouml;rhead.
        <p>
        Here's an icon of my girlfriend Jenny: &#8675309;
        <!-- invalid because we cap at 65536 -->
        <p>
        And here's an icon of a deceased cow: &#xdeadbeef;
        <!-- Invalid because we cap at xFFFF -->
        <p>
        Another <i>deceased cow: &xdeadbeef;
        <!-- Not a valid hex entity, but unknown to our lookup tables -->
        <p>
        Here's an awesome link to
        <!-- here comes the ampersand in the YouTube URL! -->
        <a href="http://www.youtube.com/watch?v=8yLhA0ROGi4&feature=related">"You Better Swim"</a>
        from the SpongeBob movie.
        <!--
        Here in the safety of comments, we can put whatever &invalid; and &malformed entities we want, &
        nobody can stop us.  Except maybe Cheech & Chong.
        -->
    </body>
</html>


$ weblint motorhead.html
motorhead.html (3:9) Entity &ouml; is missing its closing semicolon
motorhead.html (3:9) Entity &oumlrhead. &#174; is unknown
motorhead.html (3:9) Entity &metalhorns; is unknown
motorhead.html (17:9) Entity &#8675309; is invalid
motorhead.html (19:9) Entity &#xdeadbeef; is invalid
motorhead.html (22:17) Entity &xdeadbeef; is unknown
motorhead.html (31:5) <i> at (22:17) is never closed

That last error about the unclosed <i> tag has always been part of HTML::Lint, but all the others are new with this version of HTML::Lint.

The HTML-Lint distribution includes the HTML::Lint module, which is object based for easy handling, and also includes Test::HTML::Lint so that you can add HTML validation to your test suites.

my $html = $app->generate_home_page();
html_ok( $html, 'Home page is valid HTML' );

If you're not doing any validation of your HTML in your apps, I suggest you give HTML::Lint a try.

Bulk Orders for User Groups

Our goal at Onyx Neon has always been to publish great books that real people ought to (and want to) read.

We're fortunate to have identified a couple of trends whose times have come, in Modern Perl: the book and recently in Liftoff: Launching Agile Teams & Projects. Yet one of the risks of identifying and publishing about a trend early is that sometimes you have to wait for the broader market to catch up to the early adopters.

Sometimes that means the best marketing strategy (in the case of Modern Perl: the book) is to give away electronic versions of the book (though if you want to purchase it from a brick and mortar store or from an electronic bookstore available on your device, be our guest). Other times, we rely on you, the people who've read it and know one or five or ten other people who'd benefit from reading it.

Maybe you're like me, in that you learn better with something tactile you can hold and touch and remember exactly where on the page you read that one little fact you need right now.

Either way, we know that you—our most devoted readers—are our best source of ideas (what do you want to read about?), our best source of feedback (what did we do right? what should we do better?), and our best evangelists. You've been great about telling others about us.

We'd like to expand that, in two ways.

First, while we can't lower our prices on electronic versions any further (free is free!), we can offer a discount for bulk orders. Any group that can put together an order for at least five books (any of our books!) will get an automatic 35% discount from the cover price. (Modern Perl for $23, Liftoff for $19, and so on).

Second, we'll give a better discount to any group that wants to take a box of books to a conference, seminar, or wherever. We've done this in the US (at Scale 9x in 2011) and in Europe, and it's worked well. The idea at Scale was to show how Perl 5 is vibrant and active and exciting—and to generate a little revenue for the Los Angeles Perl monger groups. (They made a little profit on each book; totally fine with us.)

If you're interested in either program, mail us at orders@onyxneon.com and we'll figure out what works best for both you and us.

Consistency, CPAN, and Captiousness

Once in a while, an innocent looking change to bleadperl (the version of Perl 5 under current development) causes changes which ripple through the CPAN. As the CPAN is a graph of dependencies, any such change which causes tests to fail could have dramatic effects on user applications.

(Once I almost released a change which would have made half of CPAN uninstallable. Then Schwern slapped my hand metaphorically.)

Sometimes the fault isn't in bleadperl.

Consider RT #106538, which laments the inconsistency between the output of the builtin die and Carp's croak(). From the bug report:

$ perl -e 'die'
Died at -e line 1.
$ perl -MCarp -e 'croak Died'
Died at -e line 1

If your eyes don't immediately catch the missing period, you're in good company.

Consistency suggests that the output of both error messages should be identical. After all, Carp exists to enhance Perl 5's core exception mechanisms.

Yet as you might expect, changing error messages breaks buggy code that attempts to parse unstructured text too strictly. Adding a single dot to an error message makes several important CPAN modules fail their tests.

I can't blame CPAN developers for performing exact matches against string error messages—it's quick and easy and unlikely to change, and it's reasonably easy to fix... until you get a fix that looks like:

$pattern .= $Carp::VERSION gt "1.24" ? "." :"";

... which knows that the period is present but persist in hard-coding specific formatting details of the output.

The right solution, of course, is to stop emitting only unstructured text (from the core side) and to stop testing the exact details of unstructured text (on the CPAN side). The interim solution is to stop testing the exact details of unstructured text on the CPAN side.

Despite all of the effort around what could have been a simple change, the entire process of developing Perl 5 is a huge improvement over its past. Making this change and identifying its effects was reasonably easy, when you consider the size of the task and its consequences. Sure, the entire Perl community has to pay off some of the technical debt for well-established choices and design decisions that turned out to have been less than perfect, but this is a good opportunity to see how much better things are than they were even five years ago and to reflect on how to improve processes and tools to make them better as early as next year.

Do keep in mind, however, that if you're performing exact string matches against the results of things the core has never promised not to change, you are writing risky code.

Perlbuzz news roundup for 2012-03-26

These links are collected from the Perlbuzz Twitter feed. If you have suggestions for news bits, please mail me at andy@perlbuzz.com.

Inadvertent Inconsistencies: each versus Autoderef

Perl 5.12 allows you to use each, keys, and values on arrays. Perl 5.14 will automatically dereference references used as operands to the aggregate operators. The combination produces a worrisome inconsistency.

Perl 5.12's each had no obvious inconsistency problem; you had to write each @$kittens or each @{ $kittens } when using an array reference as its operand. Sure, you could write each %{ $kittens } when $kittens holds an array reference, but you'll get an error when the program runs like you would for dereferencing the wrong type of reference anyway.

With Perl 5.14, you have the curious situation where it's possible to give one of these polymorphic aggregate operators an operand which can behave both as a hash and as an array. By overloading an object, you can make it respond to array operations, or hash operations, or both.

If you use one of these objects as the operand to each, keys, or values, what is Perl to do? It's easy to test:

use Modern::Perl;

package DestroyerOfHope;

use overload
    '%{}' => \&gethash,
    '@{}' => \&getarray;

sub new
{
    my $self = shift;
    bless [qw( I am an array )], $self;
}

sub gethash  { { I => 'hash' } }
sub getarray { $_[0] }

package main;

my $d = DestroyerOfHope->new;
say each $d;

As of Perl 5.14, you get a runtime error "Type of argument to each on reference must be unblessed hashref or arrayref...". (The rationale was partly "Uh oh, this could go wrong!" and partly "Why would you want to iterate over something blessed?" The latter seems to me to ignore the fact that blessing is the only way to produce this kind of desirable overloading, but that's an argument for another time.)

While that decision certainly closes the door on this type of error, it's hardly the only way to solve this inconsistency. I see five other options:

  • Forbid autodereferencing on operands with any overloading
  • Always choose one overloading over the other (array always wins! hash always wins!), preferably producing a run-time warning
  • Forbid autodereferencing on operands with both types of overloading, giving a run-time error
  • Forbid autodereferencing with each, keys, and values
  • Revert the polymorphism of each, keys, and values

Keeping the existing behavior is probably the easiest, but it has two problems. First, it's inconsistent with Perl's nature. Sure, Perl deserves opaque objects, but what we have now are blessed references. Why are some references autodereferenceable and others not (especially in the presence of overload? Second, the existing behavior papers over a real problem. The interaction of these two features is inconsistent because one of the features ignores a longstanding design principle of Perl.

The real problem was making each, keys, and values work on arrays as well as hashes.

I understand the desire to make this feature work. It's easy to say "I want something like each that works on arrays!" The obvious next step is to expand that feature to include other hash aggregate operators. (The pursuit and implementation of a small consistency is easy. The pursuit and implementation of a language-wide consistency is very difficult.)

It's also much easier to hang new behavior off of existing keywords than it is both to find the right new keyword and to add a new keyword (adding new keywords is a perilous process). Would you want to type while (my ($index, $value) = arrayeach $kittens) { ... } every time you wanted to iterate over an array and get its index and value? Probably me neither.

Yet the problem remains. By making each, keys, and values polymorphic with respect to the types of their operands, Perl 5 has removed its ability to provide greater consistency across the language. (It's not just for the compiler; it's for people reading the code.)

The purest response, from the point of view of language design, is to deprecate the use of hash aggregates on anything but hashes and to find new keywords to perform the same functions on arrays. Enabling the feature set of Perl 5.12 or Perl 5.14 (or, by now, Perl 5.16) could re-enable this polymorphic behavior, but p5p could contain the damage to those releases alone and provide better options in the future.

The practical response is to acknowledge yet another wart on the language and keep the existing warning.

In user code, the best option is probably to avoid autodereferencing altogether, even as tempting as it seems. (This is a controversial statement, but I believe it's probably better to avoid the temptation to use a feature when the human brain's desire for pattern recognition and consistency may lead you down a path to using the inconsistent operators, and then where will you be?)

What's the solution in the future to avoid further inconsistencies like this? Always hew to Perl 5's fundamental principles. (Note that the biggest problem with the controversial and soon-to-be-bowdlerized smartmatch operator is that it also is a polymorphic operator and no one can memories exactly what it does in every common situation, let alone every edge case.)