<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Perlblogs &#187; perl</title>
	<atom:link href="http://perlblogs.com/category/perl/feed/" rel="self" type="application/rss+xml" />
	<link>http://perlblogs.com</link>
	<description>Posts from selected Perl bloggers</description>
	<lastBuildDate>Fri, 18 May 2012 19:03:44 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
<atom:link rel="hub" href="http://pubsubhubbub.appspot.com"/><atom:link rel="hub" href="http://superfeedr.com/hubbub"/>		<item>
		<title>Programming Breaks Things</title>
		<link>http://www.modernperlbooks.com/mt/2012/05/programming-breaks-things.html</link>
		<comments>http://www.modernperlbooks.com/mt/2012/05/programming-breaks-things.html#comments</comments>
		<pubDate>Fri, 18 May 2012 16:38:38 +0000</pubDate>
		<dc:creator>chromatic</dc:creator>
				<category><![CDATA[novices]]></category>
		<category><![CDATA[perl]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[softwaredevelopment]]></category>

		<guid isPermaLink="false">http://perlblogs.com/?guid=b6345d9b3c23b4c3ea0c9e62c41cc9ce</guid>
		<description><![CDATA[Computer scientist Edsger Dijkstra famously said &#34;It is practically impossible to teach good programming to students that have had a prior exposure to BASIC: as potential programmers they are mentally mutilated beyond hope of regeneration.&#34; I disagree, in principle and...]]></description>
			<content:encoded><![CDATA[
        <p>Computer scientist Edsger Dijkstra famously said "It is practically
impossible to teach good programming to students that have had a prior exposure
to BASIC: as potential programmers they are mentally mutilated beyond hope of
regeneration."</p>

<p>I disagree, in principle and in practice. (I disagree so strongly that I
work on <a href="http://clubcompy.com/">a project to teach programming to
children</a>.)</p>

<p>I believe it's almost impossible to teach programming to someone who hasn't
experienced what we USians call "Geometry". That's mathematics: not the
specific behavior of triangles and angles and their relationships, but the hard
work and creativity and even beauty of following a set of logical rules to a
desirable conclusion. People who can do that can program effectively. People
who can't do that will struggle.</p>

<p>Before you can solve a big problem, you have to break it.</p>

<p>One of my work projects is a document categorization system. I've written
before that it uses a pipeline processing model, where a document moves through
the pipeline in various named stages. One stage might be "NEW", while another
might be "EXTRACT METADATA". As the system runs, documents make their way
through the pipeline in various stages and eventually enter a search index and
an archive intended for users.</p>

<p>Documents come from various places, and it's possible for identical (or
near-dentical) documents to enter the system at various times. I've long had an
exact title match filter as a first approach to remove duplicates, but it's
never filtered out enough duplicates. (Some documents are essentially press
releases barely edited and republished by multiple news organizations. These
documents are almost never interesting and are frustrating in their sameness
within the archive, but in the system they go regardless.)</p>

<p>We've talked about several approaches to finding duplicate and
near-duplicate articles, with everything from heuristics to identify title
similarity to maintaining multiple latent semantic indexes for each unique
category of documents. I dragged my feet on the latter because documents expire
after 90 days, and managing an n-dimensional corpus search space where one of
those dimensions is also <em>time</em> was more work than I wanted.</p>

<p>Wednesday I realized that a na&iuml;ve approach could give really good
results while being easy to code and, more importantly, very quick to run. I
coded and deployed it yesterday, and tuned it and deployed an improved version
as I was writing this very paragraph. I added a new processing stage which
makes a word histogram for every new document entering the system and compares
those histograms to existing articles. If they're similar enough, the new
article gets invalidated before it enters the search index or undergoes any
further processing.</p>

<p>It's silly, but it works. It's 108 lines of code, per sloccount.</p>

<p>I realized something while writing it: <em>programming is breaking
things</em>.</p>

<p>Long years of programming experience have taught me that most problems are
too big. Most functions are too long. Most methods are too long. Most entities
in the system do too much.</p>

<p>If you read much novice code, you see long functions (if you see functions
at all) with deeply nested conditionals and mutable state mutating all over the
place, because a variable at the top of the program gets used all throughout
the entire program. You see a mess, and you see a maintenance burden, and you
see someone flailing to control something that's grown way out of hand.</p>

<p>(You see this in part because people trying to learn <em>how</em> to program
are also learning the syntax and semantics of a programming language, and until
you know the vocabulary rules, you're going to have trouble understanding
nuance of meaning and metaphor and idioms.)</p>

<p>I had no trouble writing this code in in the small because I know the tools
Perl provides for me: hashes and arrays and methods.</p>

<p>I had little trouble writing this code, because I understand the pattern of
fetching a document at a time from an iterator and processing it to get a
histogram and putting that histogram in an array for later processing.</p>

<p>I had an easy time testing this code because I know how to write testable
code: each of my methods has a well-defined input and a well-defined output and
I can test only at those boundaries to see what happens.</p>

<p>Even though you don't know the details of this system, if you're a decent
programmer, you can probably write an outline of how the code works just from
how I've described it already:</p>

<ul>

<li>Get a collection of all active extant documents</li>

<li>Iterate over them</li>

<li>Fetch a histogram of each</li>

<li>Get a collection of all new documents</li>

<li>Iterate over them</li>

<li>Fetch a histogram of each</li>

<li>Compare each to every document in the histogram array</li>

<li>Invalidate the document if it matches any histogram too closely</li>

<li>Add the document's histogram to the array</li>

</ul>

<p>You can probably guess the names of my methods. If you're not exactly right, you're close.</p>

<p><em>This</em> is the discipline and experience that sets a good programmer
apart from a novice. Sure, a novice (or an undisciplined programmer) could
write twice as much code to do the same thing and get it working. Maybe he or
she could write four times as much code. (I don't pretend that my factoring of
this code is the rightest way to do it, but I do know that it passes multiple
tests.)</p>

<p>That's my writing in the small. My writing in the large is even more
interesting.</p>

<p>Each stage in the pipeline is its own self-contained class. I call them app
classes. Every app class conforms to an interface and gets run by a runner.
Every app connects to a defined logger and performs its own registration and
reporting.</p>

<p>Every app has a method to fetch its basic resultset (every app is part of a
processing pipeline; obviously it's going to iterate over documents in a
certain state). Every app has method hooks to fire before this iteration and
after it. Every app has a <code>process()</code> method which performs the
iteration.</p>

<p>I've extracted and formalized the thirteen app classes over the past several
months. They started as a series of individual scripts. Then they had a common
base class. Now they share code with roles, take configuration out of a common
configuration file, and register themselves when loaded as plugins. They can
run separately (great for testing) or all together (as is normal).</p>

<p>I knew from the start that I was <a
href="http://www.modernperlbooks.com/mt/2012/05/write-the-wrong-code-first.html">writing
suboptimal code I'd eventually have to change</a>, but that's because I didn't
know enough about the problem yet. I'd discover that as the project went on.
I'd gain more insight as I saw what kinds of documents we'd have to handle (and
how very strange some of them are compared to what we expected).</p>

<p>The original concept of refactoring always reminds me of math. We rearrange
things to make them clearer, to prepare us to do other work, or harder work, or
at least further work. It's not change for change's sake, and it's not change
to add or remove or modify behavior. It's nothing more or less than changing
the design of things without changing their behavior.</p>

<p>It's the same skill, from writing functions of the right name and size to
putting modules in the right places with the right contents. It's about
breaking big things into smaller things. It's about breaking things into the
right things.</p>

<p>(Dijkstra <em>is right</em> that BASIC affords few abstraction possibilities
to break programs into effective and distinct components, but for novice
programmers the experience of turning what seems like a simple task into the
steps required to accomplish it is an important experience. That's also one reason why <a href="http://modernperlbooks.com/books/modern_perl/">Modern Perl: The Book</a> uses small test programs to demonstrate language features: working in small steps is too important to ignore.)</p>
        
    ]]></content:encoded>
			<wfw:commentRss>http://perlblogs.com/2012/05/18/programming-breaks-things/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Separating Presentation from Content in Templates</title>
		<link>http://www.modernperlbooks.com/mt/2012/05/separating-presentation-from-content-in-templates.html</link>
		<comments>http://www.modernperlbooks.com/mt/2012/05/separating-presentation-from-content-in-templates.html#comments</comments>
		<pubDate>Mon, 14 May 2012 18:47:11 +0000</pubDate>
		<dc:creator>chromatic</dc:creator>
				<category><![CDATA[modernperl]]></category>
		<category><![CDATA[perl]]></category>
		<category><![CDATA[templating]]></category>
		<category><![CDATA[webprogramming]]></category>

		<guid isPermaLink="false">http://perlblogs.com/?guid=4f5d73f0d2b0265d5fdbbecd96a82b97</guid>
		<description><![CDATA[A couple of comments on Simple Attribute-Based Template Exporting have asked for an example. I'll show off more of this code in my YAPC::NA 2012 and Open Source Bridge 2012 talk about how to write the wrong code (along with...]]></description>
			<content:encoded><![CDATA[
        <p>A couple of comments on <a
href="http://www.modernperlbooks.com/mt/2012/05/simple-attribute-based-template-exporting.html">Simple
Attribute-Based Template Exporting</a> have asked for an example. I'll show off
more of this code in my <a href="http://act.yapcna.org/2012/talk/50">YAPC::NA
2012</a> and <a href="http://opensourcebridge.org/proposals/796">Open Source
Bridge 2012</a> talk about how to write the wrong code (along with a handful of
other techniques).</p>

<p>(I assume some knowledge of <a
href="http://search.cpan.org/perldoc?Template">Template Toolkit</a> (besides
far too many books about finance, accounting, and investing, the Template
Toolkit book is always within reach these days); I've set up a wrapper template
which provides the standard look and feel of my application and I
include/process other templates liberally. If you understand that much, you'll
be able to follow along.)</p>

<p>One of the interesting templates in the system displays a list of chapters
of a book in progress. A cron job rebuilds a static page from this template
once a day. The template looks something much like:</p>

<pre><code>[% USE Bootstrap -%]
[%- canonical_url = 'http://sitename.example.com/book/' _ link -%]

[%- add_og_properties({
    'fb:admins'      =&gt; '436500086365356',
    'og:title'       =&gt; title _ ' | sitename.example.com',
    'og:type'        =&gt; 'article',
    'og:image'       =&gt; 'http://static.sitename.example.com/images/logo.png',
    'og:url'         =&gt; canonical_url,
    'og:description' =&gt; text.chunk(300).0,
    'og:site_name'   =&gt; 'Sitename: site tag line',
   })
-%]
[%- add_meta(
    'pagetitle'     =&gt; title _ ' | sitename.example.com',
    'feed_url'      =&gt; 'http://static.sitename.example.com/book/atom.xml'
    'canonical_url' =&gt; canonical_url
) -%]

[% article_text = BLOCK -%]
&lt;article&gt;
&lt;h2&gt;[% title | html %]&lt;/h2&gt;
&lt;p&gt;Published: &lt;time datetime="[% date %]"&gt;[% nice_date %]&lt;/time&gt;&lt;/p&gt;
[% text %]
&lt;/article&gt;

&lt;ul class="pager"&gt;
[%- IF prev -%]
    &lt;li&gt;&lt;a href="[% prev.link %].html"&gt;&larr; [% prev.title | html %]&lt;/a&gt;&lt;/li&gt;
[%- END -%]
    &lt;li&gt;&lt;a href="/onehourinvestor"&gt;index&lt;/a&gt;&lt;/li&gt;
[%- IF next -%]
    &lt;li&gt;&lt;a href="[% next.link %].html"&gt;[% next.title | html %] &rarr;&lt;/a&gt;&lt;/li&gt;
[%- END -%]
&lt;/ul&gt;

[% INCLUDE 'components/social_links.tt', title =&gt; title %]
[%- END -%]

<strong>[%- row(
    maincontent( article_text ),
    sidebar(
        sideblock( process( 'components/cached/book_latest_chapters.tt' ) ),
        sideblock( process( 'components/cached/book_drafts.tt'          ) )
    )
) -%]</strong></code></pre>

<p>The emboldened lines are most important; they put all of the
<em>content</em> produced or assembled by this template in the HTML structure
the site needs. That is to say, everything on the site needs to fit into
something I call a <code>row</code>. A <code>row</code> can contain multiple
elements, such as <code>maincontent</code> and a <code>sidebar</code>, or
<code>fullcontent</code> by itself with no <code>sidebar</code>. A
<code>sidebar</code> can contain multiple <code>sideblock</code>s.</p>

<p>(You can ignore the other functions; they put metadata in the right places
to pass to wrapper templates.)</p>

<p>Within my template plugin (called <code>Bootstrap</code>), each of these
elements is a simple Perl function which takes one or more arguments and
interpolates it into some HTML:</p>

<pre><code>sub row :Export
{
    return &lt;&lt;END_HTML;
&lt;div class="row"&gt;
    @_
&lt;/div&gt;
END_HTML
}

sub sidebar :Export
{
    return &lt;&lt;END_HTML;
&lt;div class="span4"&gt;
    @_
&lt;/div&gt;
END_HTML
}</code></pre>

<p>(I initially tried to write these functions as templates within Template
Toolkit itself, but there comes a point at which you want a real language. That
point came very early for me.)</p>

<p>I lose no love over the <code>varname = BLOCK</code> pattern necessary to
populate variables to pass to these plugin functions, but it works for now. In
some of my templates&mdash;usually those with lots of text I might end up
changing later&mdash;I extract that text into a separate template under
<em>components/content/</em> to make it easy to edit. (This idea came up during
a client project where the client wanted to edit the legal clickthrough
arrangement after users create accounts. I didn't want lawyers or anyone to
have the ability to mess up the templating language, so I said "Edit this
single file as plain HTML and you'll be fine." It worked great.)</p>

<p>While my programmer brain says "This is ugly, and you're a horrible person
for committing this hack upon the world&mdash;you're calling Perl from your
template system to generate HTML you're stuffing into a template and that puts
your presentation elements in Perl code, you awful human being!", it keeps the
presentation code in a single place where I can update it infrequently (being
that I don't change the layout of the site dramatically) without having to
change the divs and classes of multiple templates.</p>

<p>I'm not arguing that this technique as expressed here is <em>right</em>.
It's probably not optimal; there may be easier approaches to achieve the same
effects.</p>

<p>I am saying that this currently works very well for me. I'm not typing the
same HTML over and over and over again, and I can tweak it much more easily
than I did before when I was refining the look and feel. In fact, I've even
<em>forgotten</em> the exact details of the layout, from the HTML/CSS point of
view, and now think only in terms of rows, maincontent, and sidebars.</p>

<p>Working abstractions are very nice.</p>
        
    ]]></content:encoded>
			<wfw:commentRss>http://perlblogs.com/2012/05/14/separating-presentation-from-content-in-templates/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Simple Attribute-Based Template Exporting</title>
		<link>http://www.modernperlbooks.com/mt/2012/05/simple-attribute-based-template-exporting.html</link>
		<comments>http://www.modernperlbooks.com/mt/2012/05/simple-attribute-based-template-exporting.html#comments</comments>
		<pubDate>Fri, 11 May 2012 20:29:01 +0000</pubDate>
		<dc:creator>chromatic</dc:creator>
				<category><![CDATA[cpan]]></category>
		<category><![CDATA[modernperl]]></category>
		<category><![CDATA[perl]]></category>
		<category><![CDATA[webprogramming]]></category>

		<guid isPermaLink="false">http://perlblogs.com/?guid=8072073cb45a2548172a881d305c3f74</guid>
		<description><![CDATA[If you're like me and your design skills are sufficient to modify something decent to look nice but insufficient to create something from first principles, you can do a lot worse than to play with Twitter Bootstrap for your next...]]></description>
			<content:encoded><![CDATA[
        <p>If you're like me and your design skills are sufficient to modify something
decent to look nice but insufficient to create something from first principles,
you can do a lot worse than to play with <a
href="http://twitter.github.com/bootstrap/">Twitter Bootstrap</a> for your next
web site.</p>

<p>I've used it successfully for a few projects and it's been great.</p>

<p>It's a lot better now that I've written my own silly little <a
href="http://template-toolkit.org/">Template Toolkit</a> plugin to reduce the
need for writing lots of repetitive HTML in my templates. (It's like <a
href="http://haml-lang.com/">Haml</a> but less ugly and more Perlish and easier
to extend.)</p>

<p>Writing a TT2 plugin is relatively easy. Of course I do it the wrong way;
when you initialize your plugin, you have the ability to manipulate TT2's
stash. This is the data structure representing the variables in scope in your
templates. Where a well-behaved template should use object methods to perform
its operations, my code stuffs function references in the stash. Here's the
relevant code:</p>

<pre><code>sub new
{
    my ($class, $context, @params) = @_;

    $class-&gt;add_functions( $context );

    return $class-&gt;SUPER::new( $context, @params );
}

sub add_functions
{
    my ($class, $context) = @_;
    my $stash             = $context-&gt;stash;

    while (my ($name, $ref) = each %exports)
    {
        $stash-&gt;set( $name, $ref );
    }

    $stash-&gt;set( process =&gt; sub { $context-&gt;process( @_ ) } );
}</code></pre>

<p>I'll fix this eventually, but the process of making this work was
interesting.</p>

<p>In my first attempt (see <a
href="http://www.modernperlbooks.com/mt/2012/05/write-the-wrong-code-first.html">Write
the Wrong Code First</a> for the justification), I'd write the function I
needed, like <code>row()</code>, which creates a new Bootstrap row or
<code>maincontent()</code> which creates the main content area of the page.
Then I'd add that function to the <code>%exports</code> hash and everything
would work.</p>

<p>After the sixth function, keeping that list up to date was tedious. Then I
kept forgetting it. After all, any time you have to update the same data in two
places, you're doing something wrong.</p>

<p>Now the code looks more like:</p>

<pre><code>sub row <strong>:Export</strong>
{
    return &lt;&lt;END_HTML;
&lt;div class="row"&gt;
    @_
&lt;/div&gt;
END_HTML
}</code></pre>

<p>... with a single code attribute marking those functions which I want to
stuff into the template stash. I've used <a
href="http://search.cpan.org/perldoc?Attribute::Handlers">Attribute::Handlers</a>
before, but I always end up reading the manual and playing with things to get
them to work correctly. (Something about the way you have to write another
package and inherit from it to get your attributes to work correctly always
confuses me.)</p>

<p>My second attempt lasted no longer than ten minutes. I switched to <a href="http://search.cpan.org/perldoc?Attribute::Lexical">Attribute::Lexical</a>. This is almost as trivial to use as to explain:</p>

<pre><code>use Attribute::Lexical 'CODE:Export' => \&amp;export_code;</code></pre>

<p>Whenever any function has the <code>:Export</code> attribute, Perl wil lcall
my <code>export_code()</code> function:</p>

<pre><code>my %exports;

sub export_code
{
    my $referent = shift;
    my $name     = Sub::Identify::sub_name( $referent );

    return unless $name;
    $exports{$name} = $referent;
}</code></pre>

<p>The first argument to this function is a reference to the exported function.
I use <a href="http://search.cpan.org/perldoc?Sub::Identify">Sub::Identify</a>
to get the name of the function reference. (That wouldn't work for anonymous
functions, but I can control that here.) Then I store the name of the function
and the function reference in a hash.</p>

<p>It took as long to write as it does to explain.</p>

<p>A lot of people dislike the use of attributes. Used poorly, they create
weird couplings and plenty of action at a distance.
<code>Attribute::Handlers</code> can be confusing.</p>

<p>I like to think that I'm using attributes well here (even if I'm abusing TT2
more than a little), and that they've simplified my code so that I can avoid
repeating myself and performing manual busywork that I'm likely to forget. Even
better, the code to use them isn't magical at all: it's all hidden behind the
pleasant interfaces of <code>Attribute::Lexical</code> and
<code>Sub::Identify</code>.</p>
        
    ]]></content:encoded>
			<wfw:commentRss>http://perlblogs.com/2012/05/11/simple-attribute-based-template-exporting/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>NYTProf, File IO, and an Optimization Gone Awry</title>
		<link>http://www.modernperlbooks.com/mt/2012/05/nytprof-file-io-and-an-optimization-gone-awry.html</link>
		<comments>http://www.modernperlbooks.com/mt/2012/05/nytprof-file-io-and-an-optimization-gone-awry.html#comments</comments>
		<pubDate>Mon, 07 May 2012 21:56:41 +0000</pubDate>
		<dc:creator>chromatic</dc:creator>
				<category><![CDATA[cpan]]></category>
		<category><![CDATA[modernperl]]></category>
		<category><![CDATA[perl]]></category>
		<category><![CDATA[profiling]]></category>
		<category><![CDATA[softwaredevelopment]]></category>

		<guid isPermaLink="false">http://perlblogs.com/?guid=ce8793e9ef2fc7b17ac9464159a79cc7</guid>
		<description><![CDATA[One of my projects performs a lot of web scraping. Once every n units of time (where n can be days or weeks), a batch process fetches several web pages and extracts information from them. It's a problem solved very...]]></description>
			<content:encoded><![CDATA[
        <p>One of my projects performs a lot of web scraping. Once every <em>n</em>
units of time (where <em>n</em> can be days or weeks), a batch process fetches
several web pages and extracts information from them. It's a problem solved
very well.</p>

<p>I designed this system around the idea of a pipeline of related processes,
where each component is as independent and idempotent as possible. This has
positives and negatives; it's an abstraction like any other.</p>

<p>I initially wrote the "fetch remote web page" and "analyze data from that
page" as a single step, because I thought "analyze" was the main goal and
"fetch" was a dependent task. I separated them a couple of weeks ago to
simplify the system: analysis now expects data to be there, while fetching can
be parallel on a single or across multiple machines. (Testing the analysis step
is also much easier because feeding in dummy data is now trivial.)</p>

<p>I use the filesystem as a cache for these fetched files. That's easy to
manage. I modified the role I use to grab data for the analysis stage to look
in the cache first, then fall back to a network request. That was easy too. The
<code>get_formatted_data_for_analysis()</code> method looked something like:<p>

<pre><code>sub get_formatted_data_for_analysis
{
    my ($self, $type, $key) = @_;

    my $cached_path         = $self-&gt;get_cached_path( $type, $key );
    if (-e $cached_path)
    {
        my $text = read_file( $cached_path );
        return $self-&gt;formatter-&gt;format_string( $text ) if $text;
    }

    return $self-&gt;formatter-&gt;format_string( $self-&gt;fetch_by_url( $type, $key ) );
}</code></pre>

<p>I thought I was done. This trivial caching layer took five minutes to write and gave my project a lot of flexibility.</p>

<p>I thought this would speed up the processing stage, because I was able to
make the fetching stage embarrassingly parallel so that more than one fetch
could block on network IO simultaneously. My rough benchmark didn't show any
speed improvement, but it was fast enough, so I moved on.</p>

<p>On Friday I decided to profile the slowest stage of the application with <a
href="http://search.cpan.org/perldoc?Devel::NYTProf">Devel::NYTProf</a>. The
slowest stage was the processing stage. I isolated it so that it performed no
network fetching. It was still slow.</p>

<p>One of the formatter modules used to extract data from web pages is <a
href="http://search.cpan.org/perldoc?HTML::FormatText::Lynx">HTML::FormatText::Lynx</a>.
It allows me to run <code>lynx --dump</code> to strip out all of the HTML and
other formatting of a document. The formatter allows you to pass in the name of
a file or the contents of a file as a string.</p>

<p>For some reason, most of the time in the processing stage in the profile was
spent in file IO. That wasn't too surprising; these aren't all small files and
there may be thousands of them. I dug deeper.</p>

<p>Most of the time in the processing stage in the profile was spent in reading
the files in my method and reading files in the formatter&mdash;reading files,
even though I was passing the contents of those files to the formatter as
strings.</p>

<p>I poked around at a few other things, but came back to the source code of
the formatter. A comment in <a
href="http://search.cpan.org/perldoc?HTML::FormatExternal">HTML::FormatExternal</a>
says:

<blockquote><code>format_string()</code> takes the easy approach of putting the
string in a temp file and letting <code>format_file()</code> do the real work.
The formatter programs can generally read stdin and write stdout, so could do
that with <code>select()</code> to simultaneously write and read
back.</blockquote>

<p>In other words, all of the work I was doing to read in files was busy work,
duplicating what the formatter was about to do anyway. (Okay, I stared at the
code for a couple of minutes, thinking about various approaches of rewriting it
and submitting a patch or monkey patching it. Then I turned lazier and wiser.)
I rewrote my code:</p>

<pre><code>sub get_formatted_data_for_analysis
{
    my ($self, $type, $key) = @_;

    my $cached_path         = $self-&gt;get_cached_path( $type, $key );
    return $self-&gt;formatter-&gt;format_file( $cached_path ) if -e $cached_path;

    return $self-&gt;formatter-&gt;format_text( $self-&gt;fetch_by_url( $type, $key ) );
}</code></pre>

<p>The result was a 25% performance improvement.</p>

<p>Three things jumped out at me in this process. First, how nice is it to have
a working tool like NYTProf and a community that distributes source code, so
that I could examine the whole stack of my application to isolate performance
problems? Second, how interesting that an assumption and an admitted shortcut
in a dependency could have such an effect on my own code. Third, how much more
I like my new code with all of the file handling gone; pushing that
responsibility elsewhere is a nice simplification without the performance
improvement.</p>

<p>Perhaps the two tools I miss most from my C programming days are
Valgrind/Callgrind and KCachegrind, but NYTProf goes a long way toward filling
that gap. Besides, I'm at least 20 times more productive with a language like
Perl.</p>

        
    ]]></content:encoded>
			<wfw:commentRss>http://perlblogs.com/2012/05/07/nytprof-file-io-and-an-optimization-gone-awry/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Smoothing the Condescending Onramp</title>
		<link>http://www.modernperlbooks.com/mt/2012/05/smoothing-the-condescending-onramp.html</link>
		<comments>http://www.modernperlbooks.com/mt/2012/05/smoothing-the-condescending-onramp.html#comments</comments>
		<pubDate>Wed, 02 May 2012 21:42:28 +0000</pubDate>
		<dc:creator>chromatic</dc:creator>
				<category><![CDATA[codingstandards]]></category>
		<category><![CDATA[modernperl]]></category>
		<category><![CDATA[novices]]></category>
		<category><![CDATA[perl]]></category>
		<category><![CDATA[training]]></category>
		<category><![CDATA[tutorials]]></category>

		<guid isPermaLink="false">http://perlblogs.com/?guid=d0bec1efeae94194d5b907c840f6b5c5</guid>
		<description><![CDATA[If you ever need a dose of humility, solve a non-trivial problem and then watch a Real Actual User try to figure out how to use it. In my second professional job, when I was a system administrator at HP,...]]></description>
			<content:encoded><![CDATA[
        <p>If you ever need a dose of humility, solve a non-trivial problem and then
watch a Real Actual User try to figure out how to use it.</p>

<p>In my second professional job, when I was a system administrator at HP, I
worked in the laser printer group. One afternoon, someone walked by my desk and
asked me to do a user interaction study. I followed her to a little lab area,
where she handed me a list of tasks, and asked me to complete them.</p>

<p>I did, except that I misread the icon on the copier and put in the source
pages upside down, and made ten warm and blank pieces of paper. As soon as that
happened, I <em>understood</em> the icon and why I'd misinterpreted it.</p>

<p>I never heard the results of the study, but I hope my stubborn confusion
ended up improving the product.</p>

<p>User experience (and <em>real</em> user experience, not the fake user
experience stuff that says users are clueless and incapable of all of the
complexity of navigating the cereal aisle of an American grocery store and thus
interfaces must degenerate to a single beveled button which says "DO IT", do
you like my black turtleneck?) is fascinating. What's clear to you, you who
understand the internal model of the software, is perfectly opaque to users. <a
href="http://www.modernperlbooks.com/mt/2011/11/promoting-perls-features-versus-benefits.html">Users
know the results they want</a>, but not necessarily how to achieve them.</p>

<p>Making things easy for novices&mdash;for people who don't have a correct
internal model of the software&mdash;can be compatible with making powerful
software. Consider the <a
href="http://www.perl.com/pub/2012/04/perlunicook-standard-preamble.html">Perl
5 standard Unicode preamble</a> necessary to convince Perl to use the defaults
you probably want to handle anything-but-Latin-1 correctly.</p>

<p>(When user complaints of "My code doesn't work!" get met on PerlMonks and
the Perl Beginners List and elsewhere with "What's the error message?", you
know the languages, libraries, tools, and ecosystem could do more to help
people debug their own code.)</p>

<p>You see the problem when books and other tutorial materials say "Error
checking is left as an exercise for the reader", as if the burden of writing
correct code or the increased page count is far more important than the desire
to help new programmers learn how to code well.</p>

<p>I'm not only talking about better defaults (like <code>strict</code> enabled with <code>use 5.014;</code>). I'm not only talking about writing and collecting <a href="http://perl-tutorial.org/">good Perl tutorials</a>. (Part of the reason <a href="http://modernperlbooks.com/books/modern_perl/">Modern Perl: the Book is available for free online</a> is to continue to cultivate the culture of making great tutorial material available to anyone and everyone.)</p>

<p>With that said, I do despise the attitude of "You have to be clueful enough
to use the proper incantation at the start of your programs before you'll get
help on PerlMonks". Sure, those of us who know Perl <em>now</em> had to learn
the hard way that symbolic references and global variables make our code harder
to manage, that a unified testing system can only improve the CPAN, and that
agreeing on an interoperable OO syntax (if not implementation) lets us
concentrate on solving problems, not rebuilding Greenspun frameworks, but
that's no reason to force the same learning curve on novices.</p>

<p>We'll never remove the essential complexity from programming (to do so would
require us to remove the essential complexity from the problems we're trying to
solve). We <em>can</em> smooth out the onramp for new programmers. That
requires us to think like new programmers and to understand what they're trying
to do and why.</p>

<p>Sometimes that recommends that those of us who see a question and think
"Wow, everyone knows how to use a hash! What's <em>wrong</em> with you for not
understanding this?" to shut up. (Sometimes the best person to help a new
programmer is someone who was recently new.) Often times that requires us to
listen and look for the deeper question.</p>

<p>That <em>probably</em> recommends us to be a little gentler on the audiences
we reach when we publish text and code. As Tom Dale wrote in <a
href="http://tomdale.net/2012/04/best-practices-exist-for-a-reason/">Best
Practices Exist for a Reason</a>:</p>

<blockquote>writing code before you have an expert-level understanding is okay.</blockquote>

<p>(The whole post and its comments are... enlightening.)</p>

<p>Ultimately I expect the real point is to know who you're writing for. If
you're only ever writing for your own amusement and you're willing to cut off
everyone who doesn't share your level of knowledge, that's one thing. If you're
writing to help other people&mdash;even if they have just started using Perl
today&mdash;perhaps there are ways you can smooth the onramp for them a little
bit more. After all, the things we think are easy now are because we understand
the intricacies of lexical binding and scope and default topicalization and
eager versus iteration file reading and so on.</p>
        
    ]]></content:encoded>
			<wfw:commentRss>http://perlblogs.com/2012/05/02/smoothing-the-condescending-onramp/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Make a DBIC Schema from DDL</title>
		<link>http://www.modernperlbooks.com/mt/2012/04/make-a-dbic-schema-from-ddl.html</link>
		<comments>http://www.modernperlbooks.com/mt/2012/04/make-a-dbic-schema-from-ddl.html#comments</comments>
		<pubDate>Fri, 27 Apr 2012 20:05:45 +0000</pubDate>
		<dc:creator>chromatic</dc:creator>
				<category><![CDATA[cpan]]></category>
		<category><![CDATA[dbixclass]]></category>
		<category><![CDATA[modernperl]]></category>
		<category><![CDATA[perl]]></category>

		<guid isPermaLink="false">http://perlblogs.com/?guid=89dba20bb9817607a15c78b55adb195f</guid>
		<description><![CDATA[For some reason, creating DBIx::Class schemas by hand has never made sense to me. I like to write my CREATE TABLE statements instead. DBIx::Class::Schema::Loader works really well for this. I keep this schema DDL in version control. I also keep...]]></description>
			<content:encoded><![CDATA[
        <p>For some reason, creating <a
href="http://search.cpan.org/perldoc?DBIx::Class">DBIx::Class</a> schemas by
hand has never made sense to me. I like to write my <code>CREATE TABLE</code>
statements instead. <a
href="http://search.cpan.org/perldoc?DBIx::Class::Schema::Loader">DBIx::Class::Schema::Loader</a>
works really well for this.</p>

<p>I keep this schema DDL in version control. I also keep a SQLite database
around with some test data (but the database isn't in version control).</p>

<p>I usually find myself writing a little shell script or other program to to
regenerate the DBIC schema from that test database. That usually requires me to
make manual changes to the test database representing the changes I've just
made to the DDL.</p>

<p>After doing this one too many times, I decided to combine <a
href="http://search.cpan.org/perldoc?DBIx::RunSQL">DBIx::RunSQL</a> with the
schema loader. By creating a SQLite database from my DDL <em>in memory</em>, I
can create a schema without me modifying any databases manually.</p>

<p>This was easier than I thought:</p>

<pre><code>#!/usr/bin/env perl

use Modern::Perl;

use DBIx::RunSQL;
use DBIx::Class::Schema::Loader 'make_schema_at';

my $test_dbh = DBIx::RunSQL-&gt;create(
    dsn     =&gt; 'dbi:SQLite:dbname=:memory:',
    sql     =&gt; 'db/schema.sql',
    force   =&gt; 1,
    verbose =&gt; 1,
);

make_schema_at( 'MyApp::Schema',
    {
        components =&gt; [ 'InflateColumn::DateTime', 'TimeStamp' ],
        debug =&gt; 1,
        dump_directory =&gt; './lib' ,
    },
    [ sub { $test_dbh }, {} ]
);</code></pre>

<p>The next step is to connect everything to <a href="http://search.cpan.org/perldoc?DBIx::Class::Migration">DBIx::Class::Migration</a>&mdash;but first things first.</p>
        
    ]]></content:encoded>
			<wfw:commentRss>http://perlblogs.com/2012/04/27/make-a-dbic-schema-from-ddl/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Embrace the Little Conveniences</title>
		<link>http://www.modernperlbooks.com/mt/2012/04/embrace-the-little-conveniences.html</link>
		<comments>http://www.modernperlbooks.com/mt/2012/04/embrace-the-little-conveniences.html#comments</comments>
		<pubDate>Wed, 25 Apr 2012 18:55:09 +0000</pubDate>
		<dc:creator>chromatic</dc:creator>
				<category><![CDATA[codereuse]]></category>
		<category><![CDATA[cpan]]></category>
		<category><![CDATA[modernperl]]></category>
		<category><![CDATA[perl]]></category>

		<guid isPermaLink="false">http://perlblogs.com/?guid=ff38d1c7a63fad035dc6c3c1098e8a60</guid>
		<description><![CDATA[When Perl 6 introduced say (like print, but appends a newline) I had some skepticism. Yes, the Modern::Perl module was as much a polemic as it was a convenience. I know File::Slurp exists, but my fingers by now know how...]]></description>
			<content:encoded><![CDATA[
        <p>When Perl 6 introduced <code>say</code> (like <code>print</code>, but
appends a newline) I had some skepticism.</p>

<p>Yes, the <a href="http://search.cpan.org/perldoc?Modern::Perl">Modern::Perl</a> module was as much a polemic as it was a convenience.</p>

<p>I know <a href="http://search.cpan.org/perldoc?File::Slurp">File::Slurp</a>
exists, but my fingers by now <em>know</em> how to read from a file in a single
line of (impenetrable to the uninitiated) code:</p>

<pre><code>my $text = do { local (@ARGV, $/) = $file; <> };</code></pre>

<p>... and in each case, my initial feeling of "Why bother? What does that
offer? How silly!" were wrong. In every one of these cases, the ability to
write (and the requirement to <em>read</em>) less code has made my code
better.</p>

<p>With <code>say</code> I don't have to worry about single- versus
double-quotes, or even quoting at all sometimes. With <code>use
Modern::Perl;</code>, I don't have to worry about enabling various features and
pragmas. With <code>File::Slurp</code>, all I have to care about when reading
from a file is typing <code>read_file( $path )</code>.</p>

<p>None of these are big deals on their own, but they're little details I don't
have to worry about anymore. The same principle which says that <a
href="http://search.cpan.org/perldoc?Proc::Fork">Proc::Fork</a> is easier to
manage than writing your own forking code (I've written far too much of my own
forking code) applies.</p>

<p>Sometimes getting the little nuisances out of the way makes me more
productive and ready to tackle the big nuisances. Maybe saving my brainpower
for complicated problems (what's the standard deviation from a least square
fit?) is a better approach to typing my own <code>read_file()</code> function
on every project.</p>

<p>As silly as it once seemed to use a CPAN module for a one liner, I've
realized that <em>not</em> reusing good code is even sillier.</p>

        
    ]]></content:encoded>
			<wfw:commentRss>http://perlblogs.com/2012/04/25/embrace-the-little-conveniences/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Fund Elbow Grease, not Birthday Cake</title>
		<link>http://www.modernperlbooks.com/mt/2012/04/fund-elbow-grease-not-birthday-cake.html</link>
		<comments>http://www.modernperlbooks.com/mt/2012/04/fund-elbow-grease-not-birthday-cake.html#comments</comments>
		<pubDate>Mon, 23 Apr 2012 13:00:01 +0000</pubDate>
		<dc:creator>chromatic</dc:creator>
				<category><![CDATA[Community]]></category>
		<category><![CDATA[funding]]></category>
		<category><![CDATA[perl]]></category>

		<guid isPermaLink="false">http://perlblogs.com/?guid=6ed55c35259b05ebeff1142ab21a5636</guid>
		<description><![CDATA[My first rule of community-driven software development: volunteers will work on what volunteers want to work on. My second rule of community-driven software development: volunteers are not fungible (see rule #1). My third rule of community-driven software development: things that...]]></description>
			<content:encoded><![CDATA[
        <p>My first rule of community-driven software development: volunteers will work on what volunteers want to work on.</p>

<p>My second rule of community-driven software development: volunteers are not fungible (see rule #1).</p>

<p>My third rule of community-driven software development: things that aren't
fun tend not to get done.</p>

<p>My first rule of birthdays: cake is fun.</p>

<p>I've been reading an ongoing thread on a mailing list about funding the
development and infrastructure of free and open source software projects. If
you've read more than one thread like this, you know the discussion already.
While it's possible to get a job doing what you love, most of us don't get paid
to write software. We get paid to solve problems. Many of us are fortunate
enough to be able to use and contribute back to free software, but few of us
solely write free software.</p>

<p>That's probably okay. Most of the software I write for my businesses isn't
that interesting to any one else anyhow.</p>

<p>Then you get to the idea that some pieces of software that serve as
community underpinnings&mdash;the infrastructure plumbing that keeps the world
humming&mdash;are so important that they deserve funded developers to ensure
that things just work. From there you set up foundations and boards and run
pledge drives and give out grants and, if you're lucky, sponsor a couple of
developers to work on the software all the time.</p>

<p>The Apache Software Foundation has done this. So has the Linux Foundation.</p>

<p>(Even though many of the rest of us work on projects no less essential to
the global software ecosystem, we're not that fortunate.)</p>

<p>I've been retraining myself to think like a businessman at least half of the
time. Business, done well, addresses the problems of managing limited resources
for the purpose of producing revenue. In programmer speak, I try to solve the
most pressing problem in the most effective way to deliver working software as
soon as feasible.</p>

<p>One of the hardest parts of running a small business is knowing when to pay
someone else to do something you could do yourself. On the publishing side, I'm
glad we did; we've paid people to do editing and design covers and validate
electronic format conversions. I could do all of that, but that's a terrible
waste of my time.</p>

<p>It's also not fun, and I'd keep putting it off&mdash;and that there is the
hook.</p>

<p>Consider TPF's grants. The successful grants, the ones with real
deliverables and real benefit, are those which wouldn't get done without the
lubrication of money. While people like Nick Clark and Dave Mitchell (to name
two names but not to diminish the hard work of many other people) have the
expertise and desire to fix hard bugs in Perl 5, only the generous grants of
tens of thousands of dollars free them to spend the time they need to look into
these bugs and fix them.</p>

<p>After all, if it takes 40 hours to fix a bug in the regular expression
engine or the interaction between string <code>eval</code> and closures, how
much time can you realistically expect Dave or Nick to spend between working a
day job and having some semblance of a social life apart from a computer?</p>

<p>If this were sufficiently fun (for whatever definition matters most) or easy
(even if only in the sense that "I can debug this in five minutes and spend the
next 55 polishing the solution for immediate integration!" is easier than
"After 20 hours of diagnosis, I'm starting to get a handle on how things work.
Now comes the hard part!"), it would have already happened.</p>

<p>Volunteers tend to do the fun things. That's adding features. That's
reindenting code. Sometimes that's fixing easy bugs. That's rarely updating a
web page or writing copious documentation or performing system administration
or bisecting errors or setting up a huge test cluster. (All of those unfun
things happen, occasionally. That "occasionally" proves that they're not fun.
If they were fun, they'd happen more often.)</p>

<p>Volunteers tend to do the fun things. That's rarely maintaining code over a
long period of time. (If you've solved your problem and moved on, what's your
impetus to solve the problems of other people? Noble obligation? A sense of
pride? Shame? Boredom?)</p>

<p>(This suggests that the way the Perl community manages Google Summer of Code
projects is risky, at least if the goal is shipping working software that will
survive even only until next summer.)</p>

<p>This all suggests to me that the best way to think of limited funding for
community-driven software is leverage. It's elbow grease. It's hiring mechanics
in dirty overalls to work hard and take things apart and put them back together
and to get a thousand little details right. It has to be a little unglamorous
and it has to be very, very focused on shipping real software and keeping it
working in the hands of real users.</p>

<p>Sometimes, yes, funding is the best way to get something done sooner than it
would be without funding. Money buys attention and time, of course. Yet if we
apply money to get the fun things done&mdash;to buy birthday cake instead of
elbow grease&mdash;we're only hurting ourselves.</p>
        
    ]]></content:encoded>
			<wfw:commentRss>http://perlblogs.com/2012/04/23/fund-elbow-grease-not-birthday-cake/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Method-Function Equivalence Strikes Again!</title>
		<link>http://www.modernperlbooks.com/mt/2012/04/method-function-equivalence-strikes-again.html</link>
		<comments>http://www.modernperlbooks.com/mt/2012/04/method-function-equivalence-strikes-again.html#comments</comments>
		<pubDate>Wed, 18 Apr 2012 16:51:05 +0000</pubDate>
		<dc:creator>chromatic</dc:creator>
				<category><![CDATA[debugging]]></category>
		<category><![CDATA[modernperl]]></category>
		<category><![CDATA[perl]]></category>

		<guid isPermaLink="false">http://perlblogs.com/?guid=06b7e7b93ed59b8e92c81f1ba367214d</guid>
		<description><![CDATA[One of the satisfying aspects of writing an opinionated book like Modern Perl is writing a section like Avoid Method-Function Equivalence. Explaining to a novice programmer a potential pitfall and how to avoid it always seems to me like reducing...]]></description>
			<content:encoded><![CDATA[
        <p>One of the satisfying aspects of writing an opinionated book like <a href="http://www.modernperlbooks.com/books/modern_perl">Modern Perl</a> is writing a section like <a href="http://modernperlbooks.com/books/modern_perl/chapter_11.html#TWV0aG9kLUZ1bmN0aW9uRXF1aXZhbGVuY2U">Avoid Method-Function Equivalence</a>. Explaining to a novice programmer a potential pitfall and how to avoid it always seems to me like reducing the amount of potential misery in the world.</p>

<p>That's satisfying.</p>

<p>I've been revising a proof of concept document categorization system into
shape for the past year, by adding tests and refactoring and cleaning things up
and even adding features. Every week it gets a little bit better, and it's
fascinating to discover the patterns of this style of programming. (It's
related to <a
href="http://www.modernperlbooks.com/mt/2012/04/debuggability-driven-design.html">debuggability-driven
design</a>.) I've enjoyed the experience of watching code get more general and
useful and powerful even as that's meant shuffling around code and concepts far
beyond the initial design. While there are still messes (what working code
doesn't have a mess somewhere?), the code has a goodness to it.</p>

<p>Just when you get a big head, the universe punishes you for your unwarranted
hubris. (Annie Dillard once wrote "I no longer believe in divine playfulness."
Sometimes "divine antiauthoritarianism" is more like it.)</p>

<p>Monday night, my business partner found a bug. We have a categorization
system and several topics into which these documents could find themselves. We
added several new categories last month, and I had to revise the sharing system
such that documents in one cluster of categories never appeared in other
clusters. (Think of it this way: you have a newspaper and want to group
articles about food, television, movies, and books in a Life and Culture
section and articles about basketball, lacrosse, and hockey into a Sports
section, but you never accidentally want an article about food to show up in
the Sports section or an article about the felonious tax evasion of Kenny Mauer
to show up in the Life and Culture section.)</p>

<p>One line of filtering that's easy to explain to users is keyword filtering.
Any article in this topic (food, television, books) must contain one of these
keywords: food, television, cuisine, literature, novel, bestseller, author. You
get the picture.</p>

<p>Monday's bug was that documents in a single cluster which obviously belonged
to a single topic ("Which Television Shows Won't Be Back Next Season", for a
fake example) within a cluster showed up as belonging to the cluster as a whole
("Life and Culture") and not the topic within the cluster ("Television").</p>

<p>Fortunately I had most of the necessary scaffolding to build in debugging support. I expected that the keyword filtering was to blame, whether missing the appropriate keywords or not applying them appropriately. (I wondered if the system used a case-sensitive regular expression match or didn't stem noun phrases for comparison appropriately.)</p>

<p>Turns out it was my silly mistake.</p>

<p>All of this filtering for validity and cross-topic intra-cluster association
is in the single module <code>MyApp::Filter</code>. This started life as a
couple of <em>functions</em> that didn't belong elsewhere. As I moved more and
more code around and defined the filtering behavior more concretely, it grew
until it made more sense to treat these functions as methods. It's not an
object yet. It may never become an object; it manages no state. Yet I changed
its invocation mechanism from:</p>

<pre><code>
=head2 make_bounded_regex

Turns a list of arguments into an optimized, case-insensitive regex which
matches any of them and requires boundaries at their ends.

=cut

sub make_bounded_regex
{
    return unless @_;

    my @keywords = map { s/\s/./; $_ } @_;
    my $ra       = Regexp::Assemble-&gt;new( flags =&gt; 'i' );
    my $re       = $ra-&gt;add( map { '\b' . $_ . '\b' } @keywords )-&gt;re;

    return qr/$re/;
}</code></pre>

<p>... to:</p>

<pre><code>sub make_bounded_regex
{
    <strong>my $class = shift;</strong>
    return unless @_;
    ...
}</code></pre>

<p>I made all of these functions into methods in one fell refactoring swoop.
(Why not? Be consistent! Do more than the bare minimum! Eat your vegetables!) I missed one place which called <code>make_bounded_regex()</code>:</p>

<pre><code>sub _build_keyword_filter
{
    my $self     = shift;
    my $kw       = $self-&gt;keywords;
    return unless @$keywords;
    return Feedie::Filter::make_bounded_regex( @$keywords );
}</code></pre>

<p>... such that the first keyword (and usually the most important, because
that's what users put in first) becomes the <code>$class</code> parameter to
the method. Because it's a class method, nothing ever uses <code>$class</code>,
so there's no error message about wrong package names.</p>

<p>The tests don't catch this either because of the distribution of test data.
(Obviously a mistake to rectify.)</p>

<p>Sure, a language with integrated refactoring support (you don't even need an
early binding language with a static type system to get this) could have shown
me the error right away. That's one thing I <em>do</em> like about Java. Sure,
you need that scaffolding to get anything done, but it does occasionally help
you not write bugs.)</p>

<p>What bothers me most of all is that Perl itself has no means by which it
could even give an <em>optional</em> warning when you treat a method as a
function or vice versa. You don't have even a runtime safety net here.</p>

<p>Warnings will never replace the need for programmer caution, but bugs
happen. Bugs always happen. I keep the error log as squeaky clean as possible,
and warnings have caught a lot of bugs and potential bugs even during testing,
sometimes in our deployed software.</p>

<p>In lieu of warnings though, the best I can do is document my mistakes and
explain why they make them in the hope that I won't make them again and you'll
be more cautious than I was. (At least this one was easy to fix.)</p>

        
    ]]></content:encoded>
			<wfw:commentRss>http://perlblogs.com/2012/04/18/method-function-equivalence-strikes-again/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Method-Function Equivalence Strikes Again!</title>
		<link>http://www.modernperlbooks.com/mt/2012/04/method-function-equivalence-strikes-again.html</link>
		<comments>http://www.modernperlbooks.com/mt/2012/04/method-function-equivalence-strikes-again.html#comments</comments>
		<pubDate>Wed, 18 Apr 2012 16:51:05 +0000</pubDate>
		<dc:creator>chromatic</dc:creator>
				<category><![CDATA[debugging]]></category>
		<category><![CDATA[modernperl]]></category>
		<category><![CDATA[perl]]></category>

		<guid isPermaLink="false">http://perlblogs.com/?guid=06b7e7b93ed59b8e92c81f1ba367214d</guid>
		<description><![CDATA[One of the satisfying aspects of writing an opinionated book like Modern Perl is writing a section like Avoid Method-Function Equivalence. Explaining to a novice programmer a potential pitfall and how to avoid it always seems to me like reducing...]]></description>
			<content:encoded><![CDATA[
        <p>One of the satisfying aspects of writing an opinionated book like <a href="http://www.modernperlbooks.com/books/modern_perl">Modern Perl</a> is writing a section like <a href="http://modernperlbooks.com/books/modern_perl/chapter_11.html#TWV0aG9kLUZ1bmN0aW9uRXF1aXZhbGVuY2U">Avoid Method-Function Equivalence</a>. Explaining to a novice programmer a potential pitfall and how to avoid it always seems to me like reducing the amount of potential misery in the world.</p>

<p>That's satisfying.</p>

<p>I've been revising a proof of concept document categorization system into
shape for the past year, by adding tests and refactoring and cleaning things up
and even adding features. Every week it gets a little bit better, and it's
fascinating to discover the patterns of this style of programming. (It's
related to <a
href="http://www.modernperlbooks.com/mt/2012/04/debuggability-driven-design.html">debuggability-driven
design</a>.) I've enjoyed the experience of watching code get more general and
useful and powerful even as that's meant shuffling around code and concepts far
beyond the initial design. While there are still messes (what working code
doesn't have a mess somewhere?), the code has a goodness to it.</p>

<p>Just when you get a big head, the universe punishes you for your unwarranted
hubris. (Annie Dillard once wrote "I no longer believe in divine playfulness."
Sometimes "divine antiauthoritarianism" is more like it.)</p>

<p>Monday night, my business partner found a bug. We have a categorization
system and several topics into which these documents could find themselves. We
added several new categories last month, and I had to revise the sharing system
such that documents in one cluster of categories never appeared in other
clusters. (Think of it this way: you have a newspaper and want to group
articles about food, television, movies, and books in a Life and Culture
section and articles about basketball, lacrosse, and hockey into a Sports
section, but you never accidentally want an article about food to show up in
the Sports section or an article about the felonious tax evasion of Kenny Mauer
to show up in the Life and Culture section.)</p>

<p>One line of filtering that's easy to explain to users is keyword filtering.
Any article in this topic (food, television, books) must contain one of these
keywords: food, television, cuisine, literature, novel, bestseller, author. You
get the picture.</p>

<p>Monday's bug was that documents in a single cluster which obviously belonged
to a single topic ("Which Television Shows Won't Be Back Next Season", for a
fake example) within a cluster showed up as belonging to the cluster as a whole
("Life and Culture") and not the topic within the cluster ("Television").</p>

<p>Fortunately I had most of the necessary scaffolding to build in debugging support. I expected that the keyword filtering was to blame, whether missing the appropriate keywords or not applying them appropriately. (I wondered if the system used a case-sensitive regular expression match or didn't stem noun phrases for comparison appropriately.)</p>

<p>Turns out it was my silly mistake.</p>

<p>All of this filtering for validity and cross-topic intra-cluster association
is in the single module <code>MyApp::Filter</code>. This started life as a
couple of <em>functions</em> that didn't belong elsewhere. As I moved more and
more code around and defined the filtering behavior more concretely, it grew
until it made more sense to treat these functions as methods. It's not an
object yet. It may never become an object; it manages no state. Yet I changed
its invocation mechanism from:</p>

<pre><code>
=head2 make_bounded_regex

Turns a list of arguments into an optimized, case-insensitive regex which
matches any of them and requires boundaries at their ends.

=cut

sub make_bounded_regex
{
    return unless @_;

    my @keywords = map { s/\s/./; $_ } @_;
    my $ra       = Regexp::Assemble-&gt;new( flags =&gt; 'i' );
    my $re       = $ra-&gt;add( map { '\b' . $_ . '\b' } @keywords )-&gt;re;

    return qr/$re/;
}</code></pre>

<p>... to:</p>

<pre><code>sub make_bounded_regex
{
    <strong>my $class = shift;</strong>
    return unless @_;
    ...
}</code></pre>

<p>I made all of these functions into methods in one fell refactoring swoop.
(Why not? Be consistent! Do more than the bare minimum! Eat your vegetables!) I missed one place which called <code>make_bounded_regex()</code>:</p>

<pre><code>sub _build_keyword_filter
{
    my $self     = shift;
    my $kw       = $self-&gt;keywords;
    return unless @$keywords;
    return Feedie::Filter::make_bounded_regex( @$keywords );
}</code></pre>

<p>... such that the first keyword (and usually the most important, because
that's what users put in first) becomes the <code>$class</code> parameter to
the method. Because it's a class method, nothing ever uses <code>$class</code>,
so there's no error message about wrong package names.</p>

<p>The tests don't catch this either because of the distribution of test data.
(Obviously a mistake to rectify.)</p>

<p>Sure, a language with integrated refactoring support (you don't even need an
early binding language with a static type system to get this) could have shown
me the error right away. That's one thing I <em>do</em> like about Java. Sure,
you need that scaffolding to get anything done, but it does occasionally help
you not write bugs.)</p>

<p>What bothers me most of all is that Perl itself has no means by which it
could even give an <em>optional</em> warning when you treat a method as a
function or vice versa. You don't have even a runtime safety net here.</p>

<p>Warnings will never replace the need for programmer caution, but bugs
happen. Bugs always happen. I keep the error log as squeaky clean as possible,
and warnings have caught a lot of bugs and potential bugs even during testing,
sometimes in our deployed software.</p>

<p>In lieu of warnings though, the best I can do is document my mistakes and
explain why they make them in the hope that I won't make them again and you'll
be more cautious than I was. (At least this one was easy to fix.)</p>

        
    ]]></content:encoded>
			<wfw:commentRss>http://perlblogs.com/2012/04/18/method-function-equivalence-strikes-again/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

