My work colleague Mike O'Regan created a policy for the latest version of Perl::Critic.
Now if you have a line of code like this:
my $n += somefunc();
# Should be my $n = somefunc();
Perl::Critic will tell you
Augmented assignment operator '+=' used in declaration at line X,
column Y. Use simple assignment when initializing variables.
If you haven't let Perl::Critic loose on your code yet, now's a great time to try.
To the loyal Perl::Critic users, what's the nastiest bug Perl::Critic found for you? Let me know in the comments.
4 January, 2012 i
Code craft,
Perl 5 |
Comments Off
The YAPC::NA 2012 call for presentations has opened! As with every YAPC I've attended, this is a great opportunity to meet other programmers, learn things you know better and don't know yet, and to practice your presentation skills.
A few months ago I exchanged emails with JT Smith about my idea for a talk this year. I've mentioned in passing a few times a small side project my business is investing in. It's a side project, deliberately minimal, and—from the development side—definitely the kind of skunkworks, just get it working, maintain it as little as possible and let it run uninterrupted software that you're likely to find.
That doesn't mean it's quick or dirty. That doesn't mean it's not tested well, or that it has a slapdash design. All it means is that the most important criterion for any design or implementation decision is "is this the simplest thing that could possibly work" instead of "is this elegant" or "what's the standard modern Perl orthodoxy for this problem".
So far the results have been enlightening.
I don't want to give away too many of the details of my talk (if it's accepted), but here are two small hints which may or may not help you.
First, just because a good ORM such as DBIx::Class makes searching and manipulating existing data easy doesn't make it the best way to insert big batches of new data.
Second, while LWP and especially WWW::Mechanize are great tools for automating the behavior of a web client, sometimes wget or curl in a shell script is quicker, easier to parallelize, and more robust.
(As a bonus, consider also that if you're parsing semi-structured data out of HTML that removing all of the HTML is sometimes even easier than using a real HTML parser or even CSS selectors. Sure, semantic markup helps when you can rely on it, and sure, using a regex to remove HTML tags is a bad idea, but there are ways to turn HTML into plain text quickly and easily without doing anything on your own.)
4 January, 2012 i
cpan,
modernperl,
perl,
skunkworks |
Comments Off
These links are collected from the
Perlbuzz Twitter feed.
If you have suggestions for news bits, please mail me at
andy@perlbuzz.com.
2 January, 2012 i
conferences,
cpan,
Perl 5 |
Comments Off
A lot can happen in a year. Think back to 2005 and what we had and didn't
have in Perl compared to now.
In previous jobs, I collected "The Year In Perl" a couple of times for
Perl.com. This required a significant investment of time over a couple of days
for the research and writing.
Perl.com these days is easier to update and to manage (though me carving out
editing time is more difficult). What interest exists in putting together a
document about the interesting developments in the Perl world in 2011?
In particular, we can concentrate on:
- Community Events (especially significant developments such as the first or second occurrence of an event)
- Important releases (5.14 counts, as well as big new improvements of existing projects)
- Plans and announcements (Jesse Vincent's "Perl 5.16 and Beyond" stake in the ground, for example)
- Products (development products, books, et cetera)
I have a small list on my own and will refine it if there's further
interest. Feel free to reply here as a comment or contact me (chromatic at cpan
dot org) as you prefer.
29 December, 2011 i
Uncategorized |
Comments Off
The core Perl community—if you care to draw lines around a group of
people who use Perl seriously and call that a community—is like many
other core F/OSS communities. Real work happens on mailing lists and IRC. I
unsubscribed from several mailing lists and deliberately spent as little time
on IRC as possible this year, for various uninteresting reasons. (I haven't
even made it to the Portland Perl Mongers
meetings for several months.)
While that's been good for my productivity, it's also produced an
interesting sense of disconnect, and that makes me wonder. Consider a thought
experiment. Suppose you have six months to build a new green-field project.
Your primary language is Perl. You're the only developer on the project, but
you do have coworkers to do some of the non-coding work. You don't have access
to IRC or mailing lists, but you do have access to the whole of the CPAN. In
other words, your social connections are limited but your technical decisions
are not.
In this situation, how do you find the best libraries and techniques to use
for your requirements and how do you solve problems and get your questions
answered?
Assume you have access to web forae such as PerlMonks and Stack Overflow and
of course Duck Duck Go.
I can answer this partially for me: thank goodness for the degree of
maturity the CPAN and its ecosystem encourages among its best projects. I have
a lot of confidence in the stack I've chosen of Moose, Plack, DBIx::Class, and Catalyst, sprinkled liberally by great
new tools such as perlbrew, cpanm, and Try::Tiny—but even so, the
documentation and community support available without real-time discussion with
contributors and developers isn't always sufficient to solve problems
quickly.
(How interesting to note that all of these tools hew from a post-Perl 6
world, and how any Perl 6 implementation as it stands now only barely obviates
the need for part of two of the named projects and deigns even to consider the
others.)
For example, what's the best way to manage passwords and authentication in a
Perl-based web application? Do you handle it at the Plack level or the Catalyst
level? What if your user table doesn't match the example in the Catalyst
authentication plugin example? How much better is bcrypt than SHA-1 or SHA-256?
What if your business requirements mandate that users verify their accounts
before they can login? How do you modify/subclass/extend/advise the plugin you
use to meet this requirement?
Anyone who's done a few projects with this stack should be able to give a
good answer to these questions, as should anyone who's spent a few weeks in the
relevant IRC channels or a couple of months reading the right mailing lists.
They're not difficult questions, but they are detailed questions. You could ask
the same questions about the right way to manage DBIC schemas you expect to
deploy frequently while allowing for schema updates and changes.
The interesting question isn't how to accomplish these things, it's
how someone finds this information without mandating access to IRC or
the mailing list.
I make the assumption that it's valuable to have multiple sources of
information. We write copious documentation including ::Manual and
::Tutorial PODs in our top-level distribution namespaces, after
all. We do an admirable job of producing Perl
Advent Calendars (thanks, Andrew Grangaard!), but I'm very glad to see Catalyst retiring its
calendar in favor of monthly articles. Publishing on a schedule is
difficult, but the need for current information is present the other eleven
months of the year.
I wish I could say that Perl and project wikis were more useful, but they
seem neither popular nor currently useful to me. Maybe I looked in the wrong
places. (I know I promised to give Catalyst a list of questions about things
that weren't screechingly obvious; I have a list, but I haven't shown it yet. I
have patched a few parts of the Plack documentation.) Yet it seems to me that
for all of the energy and output of the core Perl community, the practical
non-code results tend to be directed in ephemeral directions. In the past
couple of months, people such as Gabor Szabo and Christian Walde spent a lot of time to improve the results for searching for
"Perl tutorial"
by creating a central place to list and evaluate Perl tutorials.
Again, maybe I looked in the wrong places—but I'd like to see a 2012
focus on making the knowledge and experience of core project members available
further, in many other media. Perl.com always welcomes your submissions of
course, but that's not the only persistent and updated medium for project
knowledge.
If we want people to use our code and projects for real work, to solve real
problems, and to accomplish real tasks, we need to continue to provide
practical code and useful documentation at or above the high quality level we
currently enjoy. Yet we also have to work to approach this audience from their
point of view: in particular, in terms of the tasks they want to
accomplish.
That is the resolution I suggest for the Perl community in
2012.
26 December, 2011 i
Community,
cpan,
documentation,
perl |
Comments Off
A Vanity Fair article asks Does
Airport Security Really Make Us Safer?. Fortunately, the writer of the
article used Bruce Schneier as a source.
(If you've been to an airport in the US, you know that the answer is "No; why
would you even ask?")
The article's penultimate paragraph makes what should be an obvious point. (At least, it's obvious if you want to prevent terrorism as much as possible. If your goal is to spend lots of taxpayer money in a very flashy, showy way without worrying about efficacy, please continue.) In particular:
What the government should be doing is focusing on the terrorists
when they are planning their plots. "That's how the British caught the liquid
bombers," Schneier says. "They never got anywhere near the plane. That's what
you want--not catching them at the last minute as they try to board the
flight."
I read this article moments after sending an email commiserating about the
silly (lack of) Unicode handling in a programming language which isn't Perl.
Then something clicked.
One of my persistent desires for Parrot
was to simplify the internals by reducing the amount of complexity and
genericity in the core. In terms of Unicode, this means knowing the encoding of
incoming data and the desired encoding of outgoing data, then
transcoding to and from a single internal encoding. This way the core
could operate on a single encoding and push the complexity of transcoding to
the edges.
If Parrot hasn't changed this since I looked at it most recently, its string
system requires each string to carry information about its encoding (which
makes each string structure that much larger, increasing memory pressure) and
each string operation to check for the need to transcode strings to mutually
compatible encodings (which takes time for the comparison in every case, as
well as time and memory for the transcoding in other cases).
Worse yet, string literals encoded in the source code of Parrot itself tend
to have a specific encoding (ASCII or at least Latin-1 in the case of literals
in the C code) and they ought to be constant, so transcoding in place isn't an
option and, if you're working primarily with another encoding, that means
always performing transcoding from that incompatible encoding.
It's not free to perform encoding at the edges, and you sometimes
notice this when working with large chunks of data (though if you're processing
multi-terabyte satellite images, treat them as binary and skip this encoding
altogether), but it's the right thing to do.
The same principle applies for trusting incoming data. Secure it at
the borders of the application. Don't spread those checks throughout the
system. Harden the edges and don't let nonsense through. Fail early for
suspicious things.
Otherwise you'll go mad trying to track down all of the possible
interactions and possibilities of maliciousnesses that people could perpetuate
if you lack a sane sanity policy. In other words, stop doing a lot of busy work
to make it look like you know what you're doing. Do it right.
23 December, 2011 i
Parrot,
perl,
security,
softwaredesign,
unicode |
Comments Off
One of the persistent questions which keeps entrepreneurs on the edge is
"Are we building the right thing?"
In the first web bubble, the Silly side of Silicon Valley chased vanity
metrics such as "the number of eyeballs on the site" and "brand awareness" and
"unique visitors". Those numbers are only interesting when you can correlate
them to producing value for customers and bringing in real cash in the form of
revenue.
I've enjoyed the book The
Lean Startup by Eric Ries because he offers a much better mechanism to
track the success or failure of any attempt to produce real value to customers.
While split testing (or A/B testing) is useful to see how small changes lead to
different customer behaviors, Ries recommends cohort
analysis, where you can see the behavior of real customers through the sales funnel
and correlate the X-axis with individual changes to your business or
product.
That means tracking customer behavior. If you're building some sort of
software as a service product, and if the mechanism of delivery of that product
is primarily a web site, you probably already know the punchline.
Assume I already know how to identify and log events for each salient
customer action type. (I've built that kind of system before.) Assume I don't
want to collect personally identifiable information (I don't). Assume I'm using
Plack and its middleware heavily, and
assume I'm happy using Catalyst as
a web framework.
How can I identify unique users (with and without accounts) on a daily
basis, anonymize them, but group their actions across the site such that my
automated daily cohort graphs correspond with reality?
So far I've identified few points of possible contention. I can rely on
browser cookies for unique identification of users if I know that user
sessions have unique identifiers within a 24 hour period. (I could generate
GUIDs for this, but that may be overdoing things.) I think< I also
have to track the transition from anonymous visitor to authenticated user, but
I might be able to convince myself that either replacing the current session or
smple subtraction of successful login events from total number of unique
anonymous visitors would give the right numbers.
(I also haven't dived much into how Catalyst 5.9 and Plack interact in terms
of session and cookie handling. Everything's just worked, so I've ignored the
details until now.)
I don't mind building such a system if necessary, but if all of the pieces
are out there and available—or if someone's already built this and can
give guidance—so much the better.
Have you solved this problem? If so, how did you do it? If not, how would
you do it? Would you handle logging at the Plack level or the application
level? Would you worry about tracking session changes? Does Catalyst need to
know about this?
These links are collected from the
Perlbuzz Twitter feed.
If you have suggestions for news bits, please mail me at
andy@perlbuzz.com.
I have a medium sized project which is effectively a state machine. While I
keep promising to write a reusable modular system which lets you specify the
states and transitions between the states and let behavior manage itself, I
haven't done that yet.
This means that occasionally I have to debug the transition logic.
Suppose I have a series of articles in a publication queue, and suppose each
article has a state() method accessor/mutator. Moving an article
between states (from SOLICIT to EDIT to
PREVIEW to PUBLISHED) means calling
state() and passing a token which represents the appropriate
state.
Because I haven't yet consolidated all of the transitions into a single
place, an article's state may change in any of half a dozen places in the
entire codebase. That's not awful, but if state transitions are not occurring
as I expect, that's multiple places to watch as I debug.
I rarely use the Perl debugger. (I'm a fan of debuggers for compiled
languages such as C, and I've used debuggers in IDEs for languages which
require IDEs to great success, but I've never found Perl's debugger
productive.) I usually annotate my code with log messages and bisect problems
that way.
This seemed easy today; use Moose
advice to surround the state() method and display some logging
information. (Shouldn't this be a pattern already? Certainly there must be
something on the CPAN to accomplish this.)
around state => sub
{
my ($orig, $self, @values) = @_;
return $self->$orig() unless @values;
my $original = $self->$orig();
my $title = $self->title;
my @caller = caller(2);
print STDERR "Setting '$title' from $original to $values[0] " .
"from $caller[1]:$caller[2]\n";
};
If you already see the bug, you're doing better than I am today. After five
minutes of head scratching, and looking elsewhere, I figured out why my logs
showed the first transition happening successfully but nothing else
happened.
The moral of the story is to be very careful what you measure, lest you
change that which you observe... or in my case, fail to allow that change to occur.
The Catalyst web framework uses
Perl 5 function
attributes effectively—I've seen few more effective uses of
attributes.
Any modern web framework has to deal with the idea of routes and request
routing somehow. Given a request path (such as /stocks/AA/view_analysis),
how does your application know what to do?
Catalyst solves this elegantly with a feature known as chained actions.
Controller methods can consume zero or more parts of the path but, when
explicitly chained, can combine. Consider the example request path. The
controller is Stocks.pm. The second component of the path
(/AA) is the identifier for a stock (Alcoa, to be specific. I'm
neither long nor short on Alcoa itself, though I probably own some shares as
part of a fund somewhere.) The final component of the path,
/view_analysis, is an action—a verb representing an action the
controller should take on the object representing Alcoa in the system.
You can probably start to see the idea of the chain right away.
The Stock controller has a controller method called get_stock which
grabs the stock symbol from the request path, looks it up in the database, and
stores the object representing that stock for further processing. If no such
symbol exists, it throws an exception.
The view_analysis method chains off of the get_stock
method such that Catalyst will only dispatch to view_analysis when
it's already successfully dispatched to get_stock. Unless you write a
custom dispatch system which bypasses the dispatch rules, users will never be
able to call view_analysis without a valid stock object
available.
(Further, these methods are part of a chain which requires that users have
successfully logged into the system; they chain off of a user authentication
system.)
In code terms, the relevant attributes look something like:
sub authorized :Chained('/login/required') :PathPart('stocks') :CaptureArgs(0);
sub get_stock :Chained('authorized') :PathPart('') :CaptureArgs(1);
sub view_analysis :Chained('get_stock') :PathPart('view_analysis') :Args(0);
The :Chained attribute is most relevant here.
:PathPart governs how Catalyst's dispatcher makes each method
visible to user requests (get_stock doesn't consume a part of the
path on its own, while authorized consumes the name of the
controller and view_analysis consumes its own name).
:CaptureArgs and :Args control how many other pieces
of the path the methods consume; in the case of get_stock, it's
the single path element between /stocks and any subsequent chained
actions—in this case, /AA. As view_analysis is
the end point of a chain, you use :Args instead of
:CaptureArgs.
With that all explained, request method chaining is fantastic. I can reuse
get_stock() for other request methods and get all of its benefits,
including the fact that only authorized users can even reach this point.
Yet I want to prove these characteristics of my application.
I want to prove these features so definitively that I don't want to write
tests for them. I want my program to fail to compile if these
characteristics are untrue.
I see chaining from get_stock() as supplying an invariant
precondition to view_analysis() such that it proves, to my
satisfaction, that I can always rely on a valid stock object being available
within the analysis method. Always. Similarly, I can always rely on a valid
user being available within both methods. Always always.
The problem comes in that it's easy to make a typo in the name of a chain or
a method, or to use :CaptureArgs instead of :Args or
vice versa.
Here's the thing: all of this metadata is metadata. All of this information
is available at compile time, before Perl has to execute anything.
If I had a really good and extensible type system in Perl 5, I could write a
couple of pieces of predicate logic to say that every chained method should be
a starting point or have a valid predecessor. These are trivial properties of
my program (no matter how large it gets) and they're resolvable with the
information available at the point of compilation. Even with complex controller
construction through the use of roles and parametric roles, this information is
available.
I know how to emulate this behavior by injecting some sort of
CHECK block into the code and schlepping through the symbol table
and inspecting attributes myself, but that's emulating a useful feature we
could exploit in a lot of ways.
Forget the talk about making Perl into Java or C++ by adding a silly
manifest static type system. We could find and fix real errors in
logic—trivial errors, trivially discoverable—if we had an
extensible type system which let us define our own simple predicates.
(Implementing such is left as an exercise for a small army of readers cloned
from a very small army of brilliant p5p hackers with copious spare time and a
habit of reading ACM papers before breakfast.)