Dynamically loading modules in Perl at run time

Abstract

It can be very practical —and cleaner— to be able to load modules only referring them by name, let's say in a configuration file, so that a program can be extended only adding modules and not modifying the core of the application itself

Many applications implement processing pipelines: there is some data in input and it needs to flow through different functions or, more generically, processors to produce a final output. Examples of these pipelines are image processing pipelines that automatically resize the images, apply some filters and then encode the image in another format; the same applies to text processing pipelines that take in input one or more fragments of texts in some format, apply some transformations, extract information and then stores the information in some form.

Generically, those applications can be written as a loop —in the case below, an infinite one but it can be limited somehow— and for each piece of the input, in the example the variable $pkt which is retrieved from some queue. The following piece of code then processes weaved the data to produce some output.

 while(1) {
   my $pkt = dequeue();

   # some processing here
   ...

   # some other processing here
   ...
 }

As long as the operations to perform are one or two, and they are small in size, having them inlined in the code of the loop is not a big deal. If the operations grow in number, it could make sense to create one subroutine to perform each step and using the body of the loop only to define the pipeline to be called.

 while(1) {
   my $pkt = dequeue();
   doSomeProcessing1($pkt);
   doSomeProcessing2($pkt);
   ...
   doSomeProcessingN($pkt);
 } 
 ...
 sub doSomeProcessing1 {
    my $x = shift;
    ...
 }

While better than the inlined code, this forces us to up date the code always in two places at each new step: one to create the subroutine and one in the loop. A more practical approach could be something like the following one, in which all the actions to be performed are kept in a list and invoked one after the other for each piece of the input.

 while(1) {
   my $pkt = dequeue();
   for my $action in (@actions) {
      $action->run($pkt);
   }
 }

In this case, the code would still need to be changed in different places but the "main loop" of the application would be set in stone. The problem is that —usually— all the modules would need to be known in advance and the actions would need to be created like this:

 use Action1;
 use Action2;
 ...
 my @actions = ( Action1->new(),
                 Action2->new(),
                 ...
               );
 ...

For every new action, aside from creating the new ActionN.pm file and place that in some place visible from the application, for instance using the switch -I/path/to/the/action/modules when invoking the Perl interpreter, we would need to add the use statement and to create a new object in the @actions array.

The ideal solution would be to leave the main file untouched —or modify that only sporadically— and being able to add modules without modifying the code, but only adding the module names to a configuration file or list of parameters. This can be done in Perl in a way easier than you could initially imagine.

Let's suppose that the variable @modulesToLoad contain the list of the names of the modules to load and that they all are stored in some path modules relative to where the application is deployed, i.e., where the main file is, then the following code load and create the @actions dynamically.

 use FindBin;
 ...
 my @actions = ();
 for my $module (@modulesToLoad) {
   use lib "$FindBin::Bin/modules";
   eval "use $module";
   if ($@) {
     die "unable to load module $module";
   }
   push @actions, $module->new();
 }
 ...
 # this is the same as before
 while(1) {
   my $pkt = dequeue();
   for my $action in (@actions) {
      $action->run($pkt);
   }
 }

How / Why does this work?

The code works in the following way. The FindBin modules is needed just to find the path where the script is as we decided to use the directory modules at the same lever of the script to store our +.pm+ files, this choice helps the deployment of the script with all the modules as they all are in the same place. The magic really happens in the eval "use $module" statement where the Perl interpreter calls itself on the code contained in the string passed as an argument: that statement imports the module, referred by name, in the name-space of the current interpreter. The if($@){…} just checks for errors in evaluating the string, e.g., if the module does not exist. Once the magic is done, the module is created, still referring it by name, and pushed in the @actions array.

As for the configuration of the modules, if they need different parameters, I usually proceed in the following way. As I am a big fan of Config::IniFiles and I structure my configuration file like:

 [SomeSection]
 modules=<<EOM
 Action1
 Action2
 ...
 EOM

 [Action1_Config]
 param1=value1
 ...

Then the new methods of each module, expects to receive in input a reference to a Config::IniFiles object and, knowing its own name, it looks for its parameters in the the right section: the ones of the form ${moduleName}_Config. The code, then becomes something like:

 use FindBin;
 use Config::IniFiles;
 ...
 my $cfg = Config::IniFiles->new( -file => $configFile, -handle_trailing_comment => 1 );
 my @modulesToLoad = grep !/^\s*$/,                             # no empty modules' names
                     map { $_ =~ s/^\s*//; $_ =~ s/\s*$//; $_ } # trim names left and right
                     $cfg->val('SomeSection', 'modules');
 ...
 my @actions = ();
 for my $module (@modulesToLoad) {
   use lib "$FindBin::Bin/modules";
   eval "use $module";
   if ($@) {
     die "unable to load module $module";
   }
   push @actions, $module->new($cfg);
 }
 ...
 # this is still the same as before
 while(1) {
   my $pkt = dequeue();
   for my $action in (@actions) {
      $action->run($pkt);
   }
 }

After this, hopefully, I have only to add modules and not to touch the main file again focusing on the pure application logic alone.

As a caveat, all the modules that are invoked would need to implement the same interface but as the concept of interface does not exist in Perl, they just need nominally to implement the same subroutines, for instance, in the case above, each Action class would look like the following one

 package ActionN;

 use strict;
 use warnings;

 use Config::IniFiles;

 sub new {
   my $class = shift;
   my $cfg = shift; # this is a Config::IniFiles object

   my %config = ( );
   return bless \%config, $class;
 }

 sub run {
   my ($self, $data) = @_;
   # do something interesting here
 }

1;

And that's all 😊

Post Scriptum — 12th August 2018

Some readers made me notice two things

1. Instead of doing by hand eval "use $module", a safer approach could be to use the module Module::Runtime which essentially does the same but it also performs seme checks. The code wuld then need to be changed as:

    use Module::Runtime qw(use_module);
    ...
    my @actions = ();
    for my $module (@modulesToLoad) {
      use lib "$FindBin::Bin/modules";
      eval {
        push @actions, use_module($module)->new($cfg);
      }
      if ($@) {
        die "unable to load module $module";
      }
    }
    ...

2. Insead of using use lib "$FindBin::Bin/modules" using the module lib::relative. On this, I disagree as the module internally uses __FILE__ to check from where the invocation is performed. I disagree because I want to make sure that the library is a subfolder of where the main program is and not some other random directory. This because the loading of the modules could be performed —in a larger application with a more complex deployment— from an arbirary module which has been installed far from the main application using such functionality.