How About a Shetland Ponie?

Rakudo Star is out, and so begins the next great wave of interest and use of Perl 6. The next several releases will improve performance, fix bugs, add features, port or create more libraries, and—in all likelihood—improve and otherwise clarify the Perl 6 specification.

The Perl ecosystem has room for other projects, however.

For example, one of the clearest benefits Perl 6 has over Perl 5 is its portability to other virtual machines and runtimes. By design Perl 6 encourages multiple implementations. Perl 5 is its own specification; in many places, what Perl 5 is is solely what Perl 5 happens to do. Sometimes that behavior gets enshrined in the specification tests, but other times it's folklore and institutional community knowledge.

Just as Parrot's Lorito project intends to make Parrot at least an order of magnitude faster, so too a reorganization of Perl 5 internals could make amazing things more possible.

What if there were a project to implement a minimal set of Perl 5 on the Parrot virtual machine as a prototype and exploration of how much of Perl 5 you can support, the effort it takes to do so, and what kind of utility you can expect? Parrot's compiler tools let the Rakudo developers write most of Perl 6 in Perl 6; surely it's possible to write Perl 5 in a similar fashion. (Credit to other projects such as Rubinius and PyPy for demonstrating that such things are possible.)

I know other projects have attempted this in the past. Perhaps the best place to steal information is Bradley Kuhn's masters thesis, Considerations on Porting Perl to the Java Virtual Machine.

As Jesse wrote in his comments, bug-for-bug compatibility isn't necessary. Nor is full compliance with the existing Perl 5 test suite. A simple proof of concept to produce the 80% of Perl 5 most people use in most programs should suffice. (Parrot gives you a lot of that anyway.)

As a bonus, you get cheap and easy interoperability with Perl 5, access to Parrot features such as multidispatch, grammars, continuations, and bytecode serialization, and you could even replace some of the uses of Perl 5 within Parrot's and perhaps even Rakudo's configuration and build processes.

It doesn't even have to be a pony of full size.

Perlbuzz news roundup for 2010-07-27


These links are collected from the Perlbuzz Twitter feed. If you have suggestions for news bits, please mail me at andy@perlbuzz.com.

Why roles in Perl are awesome

by Chris Prather

A question came up recently on a mailing list. I was talking about how Roles are a awesome win for Perl5 considering how few languages implement the concept1. Someone asked what the win was with Roles. I happen to have been thinking about this recently and dashed off a reply.

When you use Inheritance, especially multiple inheritance, for behavioral re-use you run into several problems quickly.

First Inheritance is an explicit modeling of a relationship that carries semantic meaning. Let's say you're developing a game for Biology students to explain to them taxonomy. In this game a Dog class is a subclass of Animal. That is, the Dog class inherits specific behaviors and attributes from the Animal class. This probably isn't even a direct relationship your Dog class may inherit from a Mammal class which inherits from a Vertebrate class which inherits from Animalia which itself inherits from Life. These kinds of hierarchies are common in Taxonomy as well as in Object Oriented programming. However when you need to express something that may cross cut concerns in, you run into issues.

Say for example your marketing department has had trouble selling this product to schools and is attempting to market to parents directly. They have done studies and kids really like Pets2. So your boss comes to you because the company wants you to add the concept of Pet to your Taxonomy model.

Pets don't fit into a Taxonomy, it's obvious that not all Animalias are Pets3 and some Pets may not be animals at all4. In many languages can use Multiple Inheritance to describe this new "I'm an Animalia and a Pet" relationship but often you run into issues there as well. Is a Pet a Life? That would mean our object model would look like:

Life
    Animalia
        Vertebrae
            Canine    Pet
                Dog

Pet stands out like a sore thumb. Obviously we've got issues with this new modeling. We talk to our boss and figure out that the rules for Pet are simple. Pet's are always domesticated versions of the Animalia, but not every class in Animalia is a pet. So for example Dogs are always Pets, Wolves are not. We can solve this with multiple inheritance now, but it's really not a clean way to express the relationship and it requires us to document the special relationship the Pet class would have with the rest off the Inheritance tree. Once you get beyond a few "special cases" like this it becomes hard to see the model for the exceptions.

This is why some languages like to disallow multiple inheritance entirely. In Java for example, Pet could become an Interface.

public interface Pet {
    Date getYearDomesticated;
}

This however means that every class that we want to be a pet needs to have the exact same piece of boiler plate code added to it.

class Dog implements Pet {
    ...
    private Date yearDomesticated;
    public Date getYearDomesticated () { this.yearDomesticated }
    ...
}

If we instead have the concept of Roles then we can apply the concept of a Pet once at any level of the hierarchy we need. A example using a modern Perl5

package Pet {
    use Moose::Role
    has year_domesticated => (
        is => 'ro',
        isa => 'DateTime',
        required => 1
    );
}

package Dog {
    use Moose;
    extends qw(Canine);
    with qw(Pet);
}

The Pet Role here implements everything we need for a default implementation, and doesn't require more boiler plate to our Dog class, that the bare minimum needed. It also avoids the ugly inheritance issues we saw with multiple inheritance by moving the behavior composition onto different tool. In my opinion, Roles aren't a win for every use of inheritance, nor for every time you want to re-use behavior, but they are an excellent tool to have in the box and one that the Moose crowd knows to reach for quite often.


  1. Off the top of my head I only know about Perl5, Perl6, Scala, Javascript, and Smalltalk. There may be other implementations out there. ↩

  2. The Marketing guy's daughter plays on WebKinz nightly. ↩

  3. Pet Shark's would be dangerous to say the least, and where would you keep a pet Blue Whale? ↩

  4. Who doesn't love their Pet Rock? ↩

  5. We're using the the inline package syntax that will be released in 5.14 ↩

Chris Prather is an Owner at Tamarou LLC, a member of the Moose cabal, and responsible for Task::Kensho.

perlprogramming.org looking for a nice home

I picked up this domain name a while ago since it was available and people were saying how Perl needs better visibility. (I was probably inspired by Tim's post about TIOBE). Currently, I just redirect to perl.org. The domain is coming up for renewal and I'll gladly sign it over to someone who can make a good case for what they plan to do with it to benefit the perl community.

A Checklist for Writing Maintainable Perl

Suppose you want to write a program in Perl. (Suppose you have written a program in Perl.) If the thesis behind what I call Modern Perl is correct, you can write that program well or you can write that program poorly. (For supporting arguments for that thesis, see Piers Cawley's A tale of two languages.)

Likely you've seen examples of Poorly Written Perl on the Internet. They serve as the YouTube comments to Nabokov of English language. In other words, the proper response to a reluctant admission that:

Yes, I know that Perl can be written in an object-oriented and readable way.

— Tim Bray, D.P.H.

... or that:

There's also been a push in some applications to rewrite Perl utilities in Bash to enhance portability between platforms. While Perl exists on just about every platform out there, there are vagaries that can cause issues with differing Perl versions, which then leads to portability problems.

— Paul Venezia, Is it still libelous if you end your titles with question marks?

... the proper response is "Why didn't you write your code with maintainability in mind?"

I know, I know. That's not helpful. Here's a quick checklist to help those of you writing Perl (or those of you trying to hire people to write Perl (or those of you trying to hire people to learn to write Perl)) to determine if you're capable of writing Perl well:

You don't have to answer all of those questions in the correct way to write good and maintainable Perl, but if you answer most of those questions in the wrong way, of course you'll write bad code.

Perl allows people to accomplish their tasks without having to learn much, without having to participate in strange and unfamiliar ceremonies, and without even being much good at programming at all. That's by design, and that's a good thing for very specific circumstances. Yet if you approach programming as if it were merely typing and retyping until something barely working fell out of your typewriter, you're going to make lots of messes, and no language can save you from an unprofessional lack of discipline.

Writing good code requires discipline in any language.

My OSCON talks are online

I've posted my OSCON talks (one regular talk and one lightning talk) in my Talks page. But for those wanting direct links, here they are:

  • Free QA! -- a non-technical talk about the history and social architecture choices of the CPAN Testers project
  • Perl 5, Version 13 -- a lightning talk summarizing notable changes in the Perl 5.13 development series

The Best Art Continues to Surprise

I attended an exhibit about the work of Leonardo da Vinci several months ago. Part of that exhibit was a thorough analysis of his Mona Lisa painting. "It's perhaps the most famous painting in the world," I thought. "I've seen it (or at least replicas) thousands of times before."

Then at the suggestion of the exhibit, I looked behind the model and saw more details, such as a low wall, the lack of eyebrows and eyelashes, and other small details that have always been there but somehow failed to catch my attention.

Several years ago, I read an analysis of Roger Zelazny's The Chronicles of Amber series. The analyst admitted that he re-read the series every few years and learned new things each time. (Zelazny's Chandleresque tone in the first five books contributes to the depth of the books, but so does the fact that his characters gladly lie to, backstab, betray, confuse, manipulate, and distrust each other and their own selves.) A reinterpretation of a single line which seemed so innocent during the last reading could cause you to see a character in an entirely different light.

Good art is like that.

Today I understood an underused feature of Perl 5 better.

Paulo Custodio filed a bug on the Modern Perl draft that the explanation of module unimporting was incomplete. I had written that:

no Module::Name qw( arguments );

... is equivalent to:

BEGIN { Module::Name->unimport( qw( arguments ) ) }

In all accuracy (and, upon reflection, obviousness), no Module::Name qw( arguments ) is equivalent to:

BEGIN
{
    require 'Module::Name';
    Module::Name->unimport( qw( arguments ) );
}

Even though I rarely use module unimporting and have never, to my best recollection, unimported a module I haven't previously used, its obvious that unimporting through no should imply require. (I have trouble imagining an interface where you'd initially load a pragma with no, unless you use strictperl, but clever people can do clever things.)

You may all now chuckle at how long it took me to realize this (and, yes, I did read the Perl 5 source code to prove to myself that this occurs).

Eliminating Errors with Little Languages

Jamie McCarthy made an interesting point about type safety in embedded SQL on String-Plus:

SQL is a great example for this. Relational databases are more useful with strong typing, so EMPLOYEE_ID is incompatible with PRODUCT_ID even if they are both implemented as INT. It'd be a great idea to see those constraints implemented at the perl level, presumably by giving perl more knowledge of the database schema than even the database engine has.

Imagine that you have, or can write, a little language parser for a SQL-like language. My simple example was:

SQL {{
    UPDATE users SET address = { Address $address } WHERE user = { User $user }
}}

This can decompose into several operations:

  • Get the value of the $address variable.
  • Get the primary key of the $user variable.
  • Prepare a database query with a rewritten query string which uses placeholders for the $address and $user variables to avoid SQL injection and other interpolation errors.
  • Execute the query.

That's a nice interface, but you can do better. As I suggested, you can add error checking if you know the structure of the database:

  • Get the metadata which describes the users table.
  • Verify that the required fields (address and user exist).
  • Get the value of the $address variable.
  • Get the primary key of the $user variable.
  • Prepare a database query with a rewritten query string which uses placeholders for the $address and $user variables to avoid SQL injection and other interpolation errors.
  • Execute the query.

You can take advantage of type checking too:

  • Get the metadata which describes the users table.
  • Verify that the required fields (address and user exist).
  • Verify that the type of $address is compatible with the type of the address field. Repeat for $user and user.
  • Get the value of the $address variable.
  • Get the primary key of the $user variable.
  • Prepare a database query with a rewritten query string which uses placeholders for the $address and $user variables to avoid SQL injection and other interpolation errors.
  • Execute the query.

If you know the structure of the database when the program starts, you can start to push some of this type checking to the point of compilation. (You may not be able to perform all of the type checking at compilation time, but you can do as much as possible as early as possible to prevent as many errors as possible.)

That's simple and easy. Now imagine something more interesting:

SQL {{
    SELECT name, address FROM users, addresses GIVEN { User $user }
}}

It's obvious from the syntax of the query language that the database needs to perform a join operation, and it's obvious that the primary key of the $user object is the important key of the operation. If the program knows the relationship of the users and addresses tables, it can join them effectively as well.

Don't get caught up in the syntax or the semantics of the remainder of examples here; they exist to demonstrate possibilities, not the final form of battle-tested code. Even so, imagine a dynamic query:

SQL {{
    SELECT @fields FROM { Table $table_one }, {Table $table_two } }
}}

Again the structure and intent of the code is obvious. The operations are now:

  • Find the primary keys for $table_one and $table_two.
  • Verify that they're joinable.
  • Verify that all members of @fields are present in either $table_one or $table_two.
  • Construct the query.

If I were to implement this, I'd make a join_tables multimethod. It takes two arguments (generalizable to more, but follow along with two for now). Imagine that it looks something like this:

multi join_tables( Table $t1, Table $t2 ) { ... }

multi join_tables( Any, Any ) { fail() }

Given two Table arguments, the first multi candidate matches and gets called. Given any other combination of arguments, the second candidate matches and produces an error.

Knowing that you have two Table objects isn't enough, however. The tables might have no relationship to each other. Imagine if you somehow could verify that the tables have an appropriate relationship. If I were to implement this, I might check that the keys of the tables matched types, perhaps with a syntax something like:

multi join_tables ( Table $t1, Table $t2 where { $t1.primary_key eqv $t2.foreign_key( $t1 ) } ) { ... }

That is, the keys must be of equivalent types. If one key is a user_id and the other is an Integer, the where clause won't match for this candidate, so a different multi will get called.

Now imagine that for those embedded SQL minilanguage statements where table name is available at compilation time and sufficient type information exists to verify the statements themselves at compilation time:

SQL {{
    SELECT name, address FROM { User users }, { Address addresses }
}}

... then everyone who uses this minilanguage (and has set up the table information appropriately) gets safety and correctness by default. Some of that can even occur before the program runs. The rest of it can occur as the program runs.

(A really, really good type checker and optimization system could infer that some errors are impossible even if it can't prove the use of a single type in every case.)

Now imagine that you have a language which allows you to build minilanguages like this, to build APIs which specify correct operations and fall back to good error reporting on incorrect operations, and which do so without interfering with other code and other extensions.

Welcome to Perl 6.

Perl 5.13.3 is released

[Reposted from my announcement to the the perl5-porters mailing list]

Look at Crowley, doing 110 mph on the M40 heading towards
Oxfordshire. Even the most resolutely casual observer would
notice a number of strange things about him. The clenched teeth,
for example, or the dull red glow coming from behind his
sunglasses. And the car. The car was a definite hint.

Crowley had started the journey in his Bentley, and he was
dammned if he wasn't going to finish it in the Bentley as well.
Not that even the kind of car buff who owns his own pair of
motoring goggles would have been able to tell it was a vintage
Bentley. Not any more. They wouldn't have been able to tell
that it was a Bentley. They would only offer fifty-fifty that it
had ever even been a car.

There was no paint left on it, for a start. It might still have
been black, where it wasn't a rusty, smudged reddish-brown, but
this was a dull charcoal black. It traveled in its own ball of
flame, like a space capsule making a particularly difficult
re-entry.

There was a thin skin of crusted, melted rubber left around the
metal wheel rims, but seeing that the wheel rims were still
somhow riding an inch above the road surface this didn't seem to
make an awful lot of difference to the suspension.

It should have fallen apart miles back.

-- Neil Gaiman and Terry Pratchett, "Good Omens"

It gives me great pleasure to announce the release of Perl 5.13.3.

This is the fourth DEVELOPMENT release in the 5.13.x series leading to a stable release of Perl 5.14.0. You can find a list of high-profile changes in this release in the file "perl5133delta.pod" inside the distribution.

You can (or will shortly be able to) download the 5.13.3 release from:

http://search.cpan.org/~dagolden/perl-5.13.3/

The release's SHA1 signatures are:

This release corresponds to commit 414abf8 in Perl's git repository. It is tagged as 'v5.13.3'.

We welcome your feedback on this release.

If Perl 5.13.3 works well for you, please use the 'perlthanks' tool included with this distribution to tell the all-volunteer development team how much you appreciate their work.

If you discover issues with Perl 5.13.3, please use the 'perlbug' tool included in this distribution to report them.

If you write software in Perl, it is particularly important that you test your software against development releases. While we strive to maintain source compatibility with prior stable versions of Perl wherever possible, it is always possible that a well-intentioned change can have unexpected consequences. If you spot a change in a development version which breaks your code, it's much more likely that we will be able to fix it before the next stable release. If you only test your code against stable releases of Perl, it may not be possible to undo a backwards-incompatible change which breaks your code.

Perl 5.13.3 represents approximately one month of development since Perl 5.13.2, and contains 12,184 lines of changes across 575 files from 104 authors and committers.

Notable changes in this release:

  • \o{...} has been added as a string escape for octals.
  • \N{} and charnames::vianame now know about the abbreviated character names listed by Unicode, such as NBSP, SHY, etc.
  • Most dual-life module have been synchronized with the latest production release on CPAN.
  • There is a new internal function PL_blockhook_register for XS code to hook into Perl's lexical scope mechanism

There is one major known issue:

  • Bug fixes involving CvGV reference counting break Sub::Name (currently version 0.04). A patch has been sent upstream to the maintainer.

Thank you to the following for contributing to this release:

Abhijit Menon-Sen, Abigail, Alex Davies, Alex Vandiver, Alexandr Ciornii, Andreas J. Koenig, Andrew Rodland, Andy Dougherty, Aristotle Pagaltzis, Arkturuz, Ben Morrow, Bo Borgerson, Bo Lindbergh, Brad Gilbert, Bram, Brian Phillips, Chas. Owens, Chip Salzenberg, Chris Williams, Craig A. Berry, Curtis Jewell, Dan Dascalescu, Daniel Frederick Crisman, Dave Rolsky, David Caldwell, David E. Wheeler, David Golden, David Leadbeater, David Mitchell, Dennis Kaarsemaker, Eric Brine, Father Chrysostomos, Florian Ragwitz, Frank Wiegand, Gene Sullivan, George Greer, Gerard Goossen, Gisle Aas, Goro Fuji, Graham Barr, H.Merijn Brand, Harmen, Hugo van der Sanden, James E Keenan, James Mastros, Jan Dubois, Jerry D. Hedden, Jesse Vincent, Jim Cromie, John Peacock, Jos Boumans, Josh ben Jore, Karl Williamson, Kevin Ryde, Leon Brocard, Lubomir Rintel, Maik Hentsche, Marcus Holland-Moritz, Matt Johnson, Matt S Trout, Max Maischein, Michael Breen, Michael G Schwern, Moritz Lenz, Nga Tang Chan, Nicholas Clark, Nick Cleaton, Nick Johnston, Niko Tyni, Offer Kaye, Paul Marquess, Philip Hazel, Philippe Bruhat, Rafael Garcia-Suarez, Rainer Tammer, Reini Urban, Ricardo Signes, Richard Soderberg, Robin Barker, Ruslan Zakirov, Salvador Fandino, Salvador Ortiz Garcia, Shlomi Fish, Sinan Unur, Sisyphus, Slaven Rezic, Steffen Mueller, Stepan Kasal, Steve Hay, Steve Peters, Sullivan Beck, Tim Bunce, Todd Rinaldo, Tom Christiansen, Tom Hukins, Tony Cook, Vincent Pit, Yuval Kogman, Yves Orton, Zefram, brian d foy, chromatic, kmx, Ævar Arnfjörð Bjarmason

Many of the changes included in this version originated in the CPAN modules included in Perl's core. We're grateful to the entire CPAN community for helping Perl to flourish.

Development versions of Perl are released monthly on or about the 20th of the month by a monthly "release manager". You can expect following upcoming releases:

  • August 20 - Florian Ragwitz
  • September 20 - Steve Hay
  • October 20 - Tatsuhiko Miyagawa
  • November 20 - Chris Williams

On joining Google

It’s a long time since I posted here, but this seemed worth an announcement.

Yesterday Metaweb, my employer and the creators of Freebase, announced that we’ve been acquired by Google.

The announcement was pretty exciting, not least because I got to be the person to post the official blog post on the Freebase blog.

I’m also excited that we launched this video along with the announcement, explaining what Metaweb/Freebase is all about:

Transcript:
You know what drives me crazy about words? They have a million different meanings.

Like, check this out: someone says, “I love Boston.” Now, they probably mean, “I love Boston, the big city in Massachusetts”, but they could be referring to one of the twenty-six other Bostons that are scattered around the globe. But, if it’s during the playoffs, they’re probably referring to the Celtics [basketball team]. Of course, you and I both hope that they’re talking about the Boston. You know. [Image of rock band, sounds of electric guitar.]

But, I guess there’s really no way of knowing. The problem is that the same word can mean so many different things. Because of that, when it comes to finding, linking, reconciling, or organising multiple layers of information, words are not the best solution. The guys at grocery stores figured this out back in the sixties when they started putting barcodes on everything, so that products with the same name wouldn’t get confused.

So how come on the web, so many sites still try to organise stuff with words? Say you’re a product guy at a big music site and you want to pull in feeds of lyrics and videos and photos from all of your data suppliers. But everyone uses different names for things, and a lot of the feeds don’t even match up, so you’ve got to reconcile them, and pull in updates, and deal with merges and deletes and splits. It’s a nightmare.

But what if there was a better way?

Welcome to Metaweb. Metaweb is a service that helps you build your website around entities, and not just words. Whoa, what’s an entity? Well the simple answer is, it’s a singular person, place, or thing.

OK, well, let’s compare that to text. Did you know that on the web there are more than 50 different ways people write “U. C. Berkeley”? [Examples listed: Cal Berkeley, Berkeley University, UCB, California, U of Cal, etc.] And they’re really just talking about one single place, one entity. By mapping all those words to a single entity, as if it had its own barcode, you can combine all that information about U. C. Berkeley into one place.

But that’s just the beginning. Because entities represent unique, real-life things, we can build a map that shows how they’re related. So, you can look for things that share certain attributes, like “actresses under 20 from New York”. Can you imagine trying to find that with a keyword search? [Shows typical keyword search results, with keywords highlighted: "NY blogger under fire for criticizing actress", "March 3 2004: New! 20 steps to be an actress", "Kid actress eats 20 York peppermints".] Entities are just smarter than words.

So, Metaweb’s been in the process of identifying millions of these entities and mapping out how they’re related, and what words other sites use to refer to them. And it’s really cool because they have a totally collaborative process that involves the online community. This thing will always be expanding and improving.

So, how is this going to help you? Well let’s say you’re that guy writing the movie review. If you tag the review with an entity in Metaweb, it’s like you’re looking at a menu saying, “Hey, Metaweb, give me the movie poster and a trailer and some links and maybe some other information like the release date and who was in it.” And BAM, it’d be right there. And now, your page looks awesome!

Or, say you’re that product guy at the music site. Instead of spending months doing messy integrations and maintaining all those feeds, you can just plug in to Metaweb, and suddenly everything just works. It’s like a switchboard for content on the web. [Various logos related to web content: eg. Twitter, Facebook, Audio Scrobbler, WordPress.] And not only that! When your site’s built on entities, new things get magically connected. Like, if one of your users adds a band to her profile page, or tags them in a comment, that can show up on the band page, because they’re all linked under the hood to the same entity.

Are you kidding me? This stuff sounds impossible! Well, that’s what they said about the barcode.

And it’s not just movies and bands. Metaweb has millions of entities in thousands of categories: twelve million and counting!

Metaweb makes your site smarter. It’s time to connect to the web. Metaweb.com.

I think a lot of friends and family are finally going to be saying, “Oh, so that’s what you do” :)

The other good news with the announcement is that Freebase is going to be staying free and open, and we’ll be working with Google to make it bigger and better (and you know with Google, bigger means bigger). So that’s pretty exciting. I’ll be continuing on over there doing community/developer relations stuff.

I’ll also be at OSCON next week, where I’ll be giving a presentation on Open Source, Open Data where I talk about how we apply open source ideas and processes to open data. Come see my talk!

←Äldre