I have recently been creating some automatic jobs for doing database replication, and I started reflecting upon the differences between doing them using a cronjob vs. using a scheduled Jenkins job. I figured I might share my thoughts on the topic.

Both work great at executing a task at a given time. Indeed, Jenkins uses a cron-like schedule syntax for scheduling periodic jobs. But here is where they start parting ways. Cron stores the command line in the crontabs, which is a collection of system and user files, generally with a single line for a schedule and a shell command to run. The locations where you have to check to find a particular cronjob include:

  • /etc/anacrontab - Which points to directories to be periodically processed for jobs on a daily, weekly or monthly basis.
  • /etc/cron.d - A directory containing crontabs listing the schedule and command to be used, along with the user to use for the item.
  • /var/spool/cron - The directory containing per-user crontabs installed by those users. Unlike the crontabs under /etc/cron.d, these crontabs do not list the user to run the command, as they are run as the individual users.

Using shell commands, these commands can have a little complexity, but once you start getting much complexity, you find yourself writing a script file using bash, or perhaps even php, python or some other language, or even creating a custom compiled executable. A common location for these scripts is often the bin directory of the user, or a system-wide directory such as /usr/local/bin. And one nice thing about these scripts is that you can get really complex, with fire redirection, pipelines, supplied input and anything else allowed by the language chosen. 

But the drawback of using cron... where did you put the command, and just exactly what was the command you run to do the task. With one host, this is bad enough, but when you might run the command from any one of multiple hosts, such as when you are cloning a database where it might be run on an admin server, or either of the source or destination database server, it gets a bit more involved.

And another drawback is securely authenticating any commands which might require things like SSH keys, passwords, etc. You generally end up creating a dedicated, unencrypted SSH key, or use a file such as .netrc to store the username and password in plaintext. And if you are lucky, you might find yourself able to use a file such as .mylogin.cnf, where a program such as mysql_config_editor is used to store the credentials in a relatively secure fashion.

Jenkins, on the other hand, may have one master server and several slave servers, but you can see all the jobs in one place. And while a pipeline follows a special format and is limited to a certain degree as to what shell commands you can execute. For example, in a regular shell script, you might do the following:

psql --user=myuser --host=mydbserver mydatabase <<EOD
SELECT * FROM sometable;
DELETE FROM sometable WHERE "timestamp" < INTERVAL '30 days'; 
VACUUM FULL sometable;
EOD

in a Jenkins pipeline, you have to use an echo command and pipe that into the psql command, which gets complicated when you want more than a single command. However, you can always work around this by writing a script just like you would for execution by cron, and have something like the following, which shows a full Jenkins pipeline, complete with using a SSH key to remotely execute:

pipeline {
    agent any;
    
    stages {
        stage('Clone database from server1 to server2') {
            steps {
	            sshagent(credentials: ['jenkins-ssh']) {
    	            sh '''
            	        ssh root@server1 '/usr/local/bin/clone-database'
                    '''
                }
            }
        }
    }
}

You are still limited to using methods such as .netrc or .pgpass file on the remote server in this simple example due to the script being remotely executed, but a more complicated pipeline could potentially provision that file or a script file temporarily to allow using secrets on that remote host. Or a more complex solution would be to setup a Jenkins agent on the node and specify that agent.

One last advantage to Jenkins is that with a little bit of work, I can have authenticated user accounts on the Jenkins server, and grant access to jobs based on roles or individual users, as well as restricting access to secrets which those jobs may use. That way, you need not give out general access to the machines or services involved.

Between these two solutions, if it is possible that I might want to run the task on demand in between scheduled runs, I often lean towards Jenkins, with the exception of running Ansible playbooks, since the playbooks are located in a central location, and I can quickly use shell  history tricks like !! to repeat the run over and over.