I wrote this little document to help people that have problems with the coding style I use, also hoping to find some followers :-).

The "perlish" coding style

Reviewing this style, the author of perltidy said:
"... it is an advantage to be able get a quick overview of code structure by scanning vertically down the left side, particularly when aided by leading operators and good indentation. It's much easier than having to use a wide field of view, particularly for long lines. And especially if it's not code you wrote yourself. I've seen leading commas used fairly often, but I hadn't seen leading semi-colons, but it does make sense." 

Steve Hancock

This style might seem "bloody evil" if you don't know its principles, but it is probably the most "perlish" style available at the moment, just because it extends some peculiar perl syntax to coding style. Besides it minimizes errors, improve readability and provides some other useful added features to the code.

Why it's "perlish"

You know the funny perl characters which identify what kind of 'entity' you are dealing with ($ % @ &). That characters always precede the identifiers, giving you an imediate feedback about the type of entity that follows: this is a feature very appreciated by perl programmers.

This style applies exactly the same concept to multiple 'entities', (which are entities with multiple elements, such as lists and blocks ususally spanned over multiple lines): it uses operators to precede the entity that follows. Look at this list (being '___' the simplification of any variable or value):

( ___
, ___
, ___
)

If you look at it 'globally' (i.e. not line by line, but as it was a single line), the operators draws a vertical ideal line that precedes the elements of the list and identifies what kind of entity you are dealing with. Indeed you can extend the same concept for code blocks, anonimous hashes and arrays, string concatenations and even for iterators, just using a different combination of operators:

( ___
, ___
, ___
)
{ ___
; ___
; ___
}
{ ___
, ___
, ___
}
[ ___
, ___
, ___
]
( ___
. ___
. ___
)
( ___
; ___
; ___
)

Advantages

Simpler to read
your eyes don't have to search the operators at the end of the lines to know what kind of 'entity' you are dealing with
Imediate feedback
operators at the start of the line, enforce identations, making it simpler to understand the code structure
Vertical alignment
this style improves vertical alignment of related items, thus improving the overall sense of order and reducing chaos
Reduces errors
the vertical alignment make it obvious the placement of operators, making it simpler to find inconsistencies and harder to forget something

Disadvantages

Unconventional
if you use it for public examples, you better put a link to this writeup or most people will think you are an alien :-).
Unfamiliar
depending on your mental flexibility, it could take you from a couple of hours to a couple of days to familiarize with this style

Important note

I don't use this code for paid jobs (unless the client explicitly wants it). Instead I supply a canonical re-styled version of all my code, following the client's guidelines. I never use it for public examples.

Example

; sub capture (&;*)
   { my $code = shift
   ; my $fh   = shift || select
   ; local $output     
   ; no strict 'refs'
   ; if ( my $to = tied *$fh )
      { my $tc = ref $to
      ; bless $to, __PACKAGE__
      ; &{$code}()
      ; bless $to, $tc
      }
     else
      { tie *$fh , __PACKAGE__
      ; &{$code}()
      ; untie *$fh
      }
   ; \$output
   }

If you want to see more examples, look at the source of my modules published here.


The starting semicolon trauma

I added this little appendix because semicolons placed at the start of lines seems to be very traumatic for some people :-).

Conventions teach you that semicolons are something that "ends" statements, and this style might appear evil in consequence of what you learned, because it apparently violates that "rule".

Just put aside conventions for a moment and take a look at this block:

{ A ; B ; C }

For perl syntax it is perfectly OK, and there are just 2 semicolon in it, not 3, because you don't need to put the semicolon at the end of the last statement in a block or in a file. For this reason perl accepts a syntax which appears to use semicolons to separate statements, rather than just end statements, anyway that doesn't matter too much: what does matter is that you can leave that block on a single line or split it into multiple lines if you find it more organized/readable.

The more popular convention splits the line after the semicolons, thus producing blocks like:

{ A ; 
  B ; 
  C 
}

But you can split the same block just before the semicolon and you will have:

{ A 
; B 
; C 
}

which is exactly the same block, just formatted with Perlish Coding Style. When you know it, you have to admit it's not so traumatic ;-)


Perltidy

Anyway, if you prefer to transform this style into your preferred style, you can use the very cool perltidy utility.


Credits

The creation of this style was inspired by an unconventional detail in the Mark Overmeer's coding style that I found very useful: multiline lists are formatted with commas at the start of each line. I just extended that concept to all the possible applications in coding style.


APPENDIX I

More detail about this style

Here you can find all the details about this style. If you decide to use this style, please read carefully this section.

1. No more than 80 column width.

This is a good cross platform convention, which improve readability of code.

2. No tab characters, but 2 or 3 tabs spaces identation

This might seems quite unconventional as well (specially because I always preferred tabs over spaces :-), but here are the reasons:

  1. when you need to indent a line e.g. one tab and a half, you must mix tabs and spaces and since tabs are expanded in arbitrary widht given different editor preferences (mostly are 4 spaces but not all), the vertical alignment will be compromised by user's preferences (which you cannot control). Avoiding tabs warrants you the same identation rendering on any editor, regardless its preferences.
  2. 3 spaces instead 4 is a simple way to reduce line width, still maintainig a visible identation, besides it allows nice vertical alignment such this:
# 3 spaces
; if ( something )
   { do_this()
   ; do_that()
   ; if (something_else )
      { do_something()
      ; do_something_else()
      }
     else
      { baz()
      }
   }
  else
   { do_nothing()
   }

# 4 spaces
; if ( something )
    { do_this()
    ; do_that()
    ; if (something_else )
        { do_something()
        ; do_something_else()
        }
      else
        { baz()
        }
    }
  else
    { do_nothing()
    }

As you can see, the 4 spaces identation does not have any evident advantages over the 3 spaces (which still idents blocks clearly enough to be distinguished by each other), and have the disadvantage to increase the line width 25% more than the 3 spaces. When you have multiple nested blocks you can save one space per level, so often you will save 3-4-5 spaces each line, which makes the difference about splitting or not a single statement over the available 80 lines.

Besides the 'else' statement is aligned with the 'if' statement and is however outdented one space with respect of its own block, so it still clearly stands out in order to be seen when you scan the code vertically on the left side. With 3 space, it starts exactly in the middle of the previous level and the next level of identation. With 4 space version the 'if' block starts exactly under a black space, which is ughly :-) and does not help to make evident the ideal vertical line which include the 'f' for the 3 spaces version and is helped by the space itself. Look in rapid succession at the following versions:

; if 
   {
   ; 
   }

; if 
    { 
    ; 
    }

Same thing with the 'do', while 'map' and any other more-than-2-chars statements are always ok with that respect.

3. Vertical alignment

I always try to give an as vertical as possible development to the code, because shorter lines are simpler to read and understand.

This:

; qw( somethig_foo something_bar something_baz something_else )

is harder to read than this:

; qw| something_foo 
      something_bar 
      something_baz
      something_else
    |

Note: the || instead of () are just a very personal convention which tries to distinguish qw by qq and q which I tend to write as qq() or q()

The reasons are:

  1. the second solution uses 2 dimension, while the first uses just 1 dimension: this allows to better utilize the same area
  2. open and close operators are vertical aligned so simpler to see/check/write and harder to forget (as most convention in the Perlish style)
  3. items of the list are grouped in an area which your eyes can perceive globally saving the scanning through the right
  4. 2D allows to switch your attention from an element to the other by following a straight vertical path: e.g. starting with the '_' character from the first line you can notice the similar prefix and the different suffix of each element down the line, while in the one-line solution your eyes have to jump from one '_' to the next in a more disturbed way, thus finding differences and similarity may be a very difficult task.

Anyway this doesn't apply to very short lines, since I find that this 1D (one horizontal dimension):

; qw( a b c )

is better than this fake 2D (more likely just an alternative 1 vertical dimension):

; qw| a
      b
      c
    |

4. Nested operators

It's better to avoid full identation for nested parens and brackets; that produces more compact lines without having to renounce to the vertically aligned open and close operators. The rule I apply is that nested {[( can be separated by just one blank space, so the resulting identation is reduced to 2 spaces.

I try also to vertically align the '=>' and the '=', leaving all what follows aligned at the right of '=' and '=>'

This is simpler to read because vertically aligned and more compact:

; @struct = ( [ { ___    => [ ___
                            , ___
                            ]
                }
              , { ______ => [ __
                            , __
                            ]
                , __     => ____
                }
              ]
            , [ ___
              , __
              ]
            )

This is harder to read because it's not vertically aligned and whider without any real benefit:

; @struct = (   [   { ___ => [ ___
                             , ___
                             ]
                    }
                ,   { ______ => [ __
                                , __
                                ]
                    , __ => ____
                    }
                ]
            ,   [ ___
                , __
                ]
            )