PHP

AsyncBatchRunner Design Proposal

AsyncBatchRunner Design Proposal

I’d like to propose a small library which allows you to asynchronously submit jobs and execute the jobs in batch later with PHP.

Objective

This small library will provide a generic asynchronous batching interface. The primary use case of this library is our Logging agent on our runtime, but I want it to be generally useful for other use cases too.

Background

On our runtime, we want to provide a logging agent that can ingest logs at high throughput without blocking the user requests. We generally don’t want to issue a single Logging API call for every log lines that user emits so we want to make it asynchronous and we also want to batch the logs into a single API call for high throughput. Unfortunately, with the current versions of PHP, it’s somewhat difficult to achieve it in a portable manner, partly because PHP doesn’t have thread support for web applications, and partly because there’s no great portable way to share some data in a performant way among multi processes without enabling some extensions.

Requirements

The library needs to provide easy ways to register your async jobs and submit the data. The data should be batched up at a certain level and the actual runner implementation must make every effort to achieve the job in batch. In the particular case of our Logging agent, the Logging agent should be able to emit the logs asynchronously, and once it succeeds, the runner implementation will make every effort to send them to the Logging API. The user-land code must run anywhere without modification.

Design Ideas

The whole AsyncBatchRunner system consists of 2 parts. The first is user-facing library which allows users to register their job and submit the data. When registering the job, the user specify 1) string identifier, 2) callable, 3) options including batch size and time threshold. When submitting the data, the user specify 1) the string identifier for the job, 2) target item. The library will store the target items according to chosen runner implementation and available extensions (I’ll explain it later).

We will have multi tier implementations for the runner depending on the environment. Here are some ideas for the runner implementations:

  1. On our runtime
    Because on our runtime, the SystemV IPC extensions are available, and also we’re already using supervisord for controlling nginx and php-fpm processes, we will run a PHP daemon process as the runner, managed by supervisord, and utilize the kernel message queue for passing the target items. Let’s call this DaemonRunner.
  2. Anywhere else if you’re willing to run the runner manually
    We can provide a how-to document for running the runner, and introduce an environment variable for indicating our user-facing library to use the DaemonRunner. We can use the SystemV IPC for message passing if available, and if not, we’ll use other means for the message passing (likely file based one).
  3. Anyone who doesn’t want to run the daemon manually
    We can provide a runner implementation for executing the jobs on the script shutdown as a fallback. This runner implementation will try to run the job in batch within a shutdown hook.

In every case, if the runner fails to run the jobs after some retries, it should somehow try to save the job data somewhere (likely in the filesystem).

User facing interface

 

/**
 *  @param string $identifier Job identifier.
 *  @param callable $func A callable for the job execution. It must accepts an
                          array of the target items.
 *  @param array $config [optional] {
 *      Configuration options.
 *
 *      @type int $batchSize Once the runner has this number of items,
 *                           the runner will do the jobs in batch.
 *      @type int $callPeriod The runner must do the job regardless of the
 *                            current number of items once $callPeriod seconds
 *                            has passed after the last execution.
 *  }
 * 
 */
function registerAsyncJob(
    $identifier,
    callable $func,
    array $config = [],
)
/**
 * @param string $identifier Job identifier.
 * @param mixed $item A target data for the job.
 */
function submitItem(
    $identifier,
    $item
)

User code example

 

function doJob($items)
{
    // Actually do the job in batch and returns true when succeeds
}
$asyncBatchRunner = new AsyncBatchRunner();
$asyncBatchRunner->registerAsyncJob(
    'myjob',
    'doJob',
    ['batchSize' => 100, 'callPeriod' => 2]
);
$asyncBatchRunner->submitItem('item1'); // This immediately returns.

Note: In our Logging use case, the direct user of this system will be our Logging client library. The end users will just use our Logging client.

Leave a Reply

Your email address will not be published. Required fields are marked *

By submitting this form, you accept the Mollom privacy policy.