CloudCron

A simple distributed, cost effective, backwards-compatible, cloud friendly cron for the masses

Introduction

If you google around looking for a "cloud friendly cron" that runs on AWS: you're out of luck. You're probablly just trying to solve one thing: I have a crontab that I want to run on AWS

Solutions you find around tend to be in the following areas:

Lots of times, users just put their cron on one "cron instance"

The CloudCron solution

Give it a cron file, and it will run the same command you already have, on the same server if you want. No need for reprogramming your jobs

With minimal infrastructure (just a worker node)

Have more crons than a machine can handle? You can scale your worker nodes at need

You can deploy a new worker node with new code or patches without downtime

Lambda based solutions have a maximum running time of 5 minutes

You get the opportunity to version control your crontab, and deploy it as a part of your CD

How does it work?

CloudCron is basically split into two halves: a "cron compiler", that transforms a cronfile into CloudWatch events

Your crontab file gets transformed into a bunch of CloudWatch Events that get pushed to CloudFormation so they can be managed as a whole

These CloudWatch events inject a message for each ocurrance of a cron event to an SQS queue, which is getting polled by a worker process. This

worker process runs in the same machine (or machines) where your old cron could execute it's jobs.

Installation

You can install CloudCron with any perl package manager. We recommend to use carton so you don't install dependencies in your system (carton is usually installable via your SOs package manager):

mkdir mycron
cd mycron
echo 'requires "CloudCron";' >> cpanfile
echo 'requires "CloudCron::Worker";' >> cpanfile
carton install
carton exec $SHELL -l

CloudCron is split in two parts:

Get me started

cloudcron init --name MyCronQueue --region eu-west-1

This command deploys the SQS queue and informs you how to start a worker. We can have as many queues as we want (for different groups of crons, for example)

You can start a worker the output of the last command, but first create a log configuration file:

printf "log4perl.appender.Screen = Log::Log4perl::Appender::Screen\nlog4perl.appender.Screen.layout = Log::Log4perl::Layout::PatternLayout\nlog4perl.appender.Screen.layout.ConversionPattern = [%%d][CloudCron] %%p %%m%%n\n# Catch all errors\nlog4perl.logger = INFO, Screen\n" > log.conf

And now launch the worker in background, pointing to the log config you just created

cloudcron-worker --queue_url https://sqs.eu-west-1.amazonaws.com/012345678901/MyCronQueue-CloudCronQueue-LPNF3N07WF68 --region eu-west-1 --log_conf log.conf &

The worker is now idle, waiting for it's first jobs, so we need to create them:

cloudcron deploy --name MyCronFile --destination_queue MyCronQueue --region eu-west-1 path_to_crontab_file

The name of the destination queue is the name we gave to cloudcron init

Once the deploy command finishes, we wait for the worker to start receiving the messages, and executing your jobs!

If you modify your crontab file, just redeploy. cloudcron will detect that you already deployed this cron, and will update the events:

cloudcron deploy --name MyCronFile --destination_queue MyCronQueue --region eu-west-1 path_to_crontab_file

Once you're ready, you can delete queues, and crons with the cloudcron remove command

Nitty gritty details

Known Limitations

TOPOLOGIES

With cloudcron you can generate a lot of differente topologies to suit your needs:

Single crontab

crontab ---> Queue <--- Worker node

Lots of jobs (multinode)

If one machine is not enough to handle the load of all your crons, you can just add more cloudcron-workers polling the same queue

                                      |---- Worker node 1
crontab (lots of jobs)  ---> Queue <--|---- Worker node 2
                                      |---- Worker node 3

Lots of jobs (autoscaling)

If the load on your worker nodes varies enough, you might want to autoscale your worker node fleet applying autoscaling to the worker node pool autoscaling group

                                     |-A--   
crontab (lots of jobs) ---> Queue <--|-S-- Worker node N
                      |              |-G--
                      |                |
                      +--- CloudWatch--+

Caution should be taken: when autoscaling shuts down a worker instance, it will kill the processes actually executing. There is no facility in cloudcron to let your jobs finish (contributions welcome :))

Manage lots of jobs

You can deploy independant crontab files to the same queue, as long as the worker polling the queue is able to execute the commands in your crontab.

crontab for ETLs -----+
                      |
crontab for cleanup --+---> Queue <--- Worker node
                      |
crontab for X --------+

Crons running with different users

You can deploy an independant worker for each user. The two workers can run on the same instance

crontab for user1 ----> Queue1 <---- Worker running with user1

crontab for user2 ----> Queue2 <---- Worker running with user2

Flexibility (custom topologies)

You can combine these scearios as you want to adapt them to your needs. Mix, match, and report your topologies back so we can document them!

Deploying cloudcron-worker

In the examples directory there is a sample of how to run a cloudcron-worker from an upstart job

Bugs and Source

The source code is located here: https://github.com/capside/CloudCron

Bugs and Issues can be reported here: https://github.com/capside/CloudCron/issues

Authors

Pau Cervera, Eduard Badillo, Jose Luis Martinez

License and Copyright

Copyright (c) 2017 by CAPSiDE SL

This code is distributed under the Apache 2 License. The full text of the license can be found in the LICENSE file included with this module.