OTA Updates
SocketXP OTA update tool is extremely useful when you have to deploy software updates on a fleet of Linux based edge devices.
The OTA update tool can be used to update software packages, applications binaries, config files, and run scripts on a group of remote edge devices
You can use the OTA update tool to automate your DevOps workflow.
Creating and Deploying OTA Updates
Creating and deploying OTA updates using the tool is a two step process:
- Create an OTA Update Workflow
- Create an OTA Update Deployment from the workflow
The basic concept behind this two step approach is to reuse the OTA update workflow code between multiple deployments.
OTA Update Best Practices
For a successful OTA update deployment, have a small fleet of devices in your development lab as a Test Group. These devices should be similar to the ones in the Production Group.
First, deploy your OTA updates to the Test Group. Verify that the update is successful and everthing is working as expected. Look at the OTA update deployment logs for each of the devices in the Test Group.
Fine tune your OTA update workflow or fix bugs in your app, until the OTA updates to the Test Group is successful.
Only then deploy your OTA updates to devices in your Production Group.
Some sophisticated development teams have Development, Testing, Stating and Production Groups. They even create a small subgroup under the Production Group, named as Canary Group, to rollout their OTA updates to the Canary Group first (following the Canary Deployment Model) before deploying the OTA update to all other devices in the Production Group.
Step #1: Create an OTA Update Workflow
OTA Update Workflow is the logic behind the OTA update.
You need to first define the software update workflow logic in the form of a Workflow Definition Script. The script can be written as a Linux shell script or Python/Perl script or any script.
The workflow script defines the various steps in the software update process and the order in which these various steps needs to be executed. The workflow script will also have a rollback on failure
logic to rollback the device software to the original state (the state before the OTA update).
Workflow Script Blueprint
An OTA update workflow script should have the following logic:
- Stop the app or service running in the device ( On failure, do what? - Undo the current action.)
- Back up the existing working app/version and working config files (On failure, do what? - Undo the previous actions)
- Download and install the new app binary to the app directory (On failure, do what? - Restore from the backup)
- Update the app configuration and settings, or download a new config file with the required settings. (On failure, do what? - Restore from the backup)
- Start the app or service (On failure, do what? - Restore from the backup)
- Verify if the app is up and running (On failure, do what? - Restore from the backup)
- Clean up the back up.
- Done. Exit.
Here are some sample Workflow Script
files written in Python and Shell script, respectively, using the above blueprint. You could leverage these scripts and adapt it for your application's OTA update workflow definition.
import os
import subprocess
import sys
def run_command(command):
"""
Run a shell command and return the output, error and return code.
"""
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
output, error = process.communicate()
return output, error, process.returncode
def create_backup():
"""
Create a backup of the myapp binary and configuration files.
"""
print("Creating backup of myapp configuration files...")
# Define the source directories for the backup
src_dirs = ["/etc/myapp/", "/var/lib/myapp/"]
dst_dir = "/tmp/myapp_backup/"
# Create the destination directory if it doesn't exist
os.makedirs(dst_dir, exist_ok=True)
# Run the rsync command to create the backup
for src_dir in src_dirs:
dst_subdir = os.path.join(dst_dir, src_dir.lstrip('/'))
os.makedirs(dst_subdir, exist_ok=True)
output, error, return_code = run_command(f"rsync -a {src_dir} {dst_subdir}")
if return_code != 0:
print(f"Error creating backup: {error}")
return False
print("Backup created successfully.")
return True
def restore_backup():
"""
Restore the backup of the myapp configuration files.
"""
print("Restoring backup of myapp binary and configuration files...")
# Define the destination directories for the backup
dst_dirs = ["/etc/myapp/", "/var/lib/myapp/"]
src_dir = "/tmp/myapp_backup/"
# Run the rsync command to restore the backup
for dst_dir in dst_dirs:
src_subdir = os.path.join(src_dir, dst_dir.lstrip('/'))
output, error, return_code = run_command(f"rsync -a --delete {src_subdir}/ {dst_dir}")
if return_code != 0:
print(f"Error restoring backup: {error}")
return False
# Clean up the backup directory
output, error, return_code = run_command(f"rm -rf {src_dir}")
if return_code != 0:
print(f"Error cleaning up backup directory: {error}")
return False
print("Backup restored and cleaned up successfully.")
return True
def handle_error(error_message, restore=True):
"""
Handle an error by printing an error message, restoring from backup if necessary,
and restarting the myapp systemd service.
"""
print(error_message)
if restore:
# Attempt to restore the backup and revert the changes
if not restore_backup():
sys.exit("Error restoring backup.")
# Restart the myapp systemd service
output, error, return_code = run_command("systemctl start myapp")
if return_code != 0:
sys.exit(f"Error starting myapp service: {error}")
return False
def install_myapp():
"""
Install the myapp package.
"""
print("Installing myapp...")
# Stop the myapp systemd service if it is running
output, error, return_code = run_command("systemctl stop myapp")
if return_code != 0:
return handle_error(f"Error stopping myapp service: {error}", restore=False)
# Create a backup before performing any changes
if not create_backup():
sys.exit("Error creating backup.")
# Download the package from AWS S3 bucket
url = "https://abcdefghxyz.amazonaws.com/v41/bin/myapp.deb"
output, error, return_code = run_command(f"wget -O /tmp/myapp.deb {url}")
if return_code != 0:
return handle_error(f"Error downloading package: {error}")
# Install the package
output, error, return_code = run_command("dpkg -i /tmp/myapp.deb")
if return_code != 0:
return handle_error(f"Error installing package: {error}")
# Start and enable the myapp systemd service
output, error, return_code = run_command("systemctl start myapp && systemctl enable myapp")
if return_code != 0:
return handle_error(f"Error starting or enabling myapp service: {error}")
print("myapp installed successfully.")
return True
Here is a shell script version of the workflow script for downloading and installing myapp binary:
#!/bin/sh
#================================================
# Example Workflow Script:
# MyApp Update Workflow Script
#================================================
# On-failure (before the backup) just restore the service
invoke_restore() {
systemctl start myapp
systemctl status myapp
exit 1
}
# On-failure clean up and restore from backup.
invoke_restore_backup() {
systemctl stop myapp
cp /usr/local/bin/myapp.bkup /usr/local/bin/myapp
cp -R /var/lib/myapp.bkup /var/lib/myapp
rm -rf /var/lib/myapp.bkup
cp /etc/myapp/config.json.bkup /etc/myapp/config.json
systemctl start myapp
systemctl status myapp
exit 1
}
debug_error() {
echo "$(date +"%Y-%m-%d %H:%M:%S.%N"): Error: $1"
}
debug_log() {
echo "$(date +"%Y-%m-%d %H:%M:%S.%N"): $1"
}
# Workflow Begins.
# stop the service first
systemctl stop myapp
# backup the existing working app and configuration.
cp /usr/local/bin/myapp /usr/local/bin/myapp.bkup
if [ $? -eq 0 ]; then
debug_log "Binary copied."
else
debug_error "Binary backup failed."
invoke_restore
fi
mkdir -p /var/lib/myapp.bkup
cp -R /var/lib/myapp /var/lib/myapp.bkup
if [ $? -eq 0 ]; then
debug_log "Working dir copied."
else
debug_error "Working dir backup failed."
invoke_restore
fi
cp /etc/myapp/config.json /etc/myapp/config.json.bkup
if [ $? -eq 0 ]; then
debug_log "Config copied."
else
debug_error "Config backup failed."
invoke_restore
fi
# Download the new myapp file from your online repository such as AWS S3 Bucket
curl -O https://abcdefghxyz.amazonaws.com/v41/bin/myapp && chmod +wx myapp && sudo mv myapp /usr/local/bin
if [ $? -eq 0 ]; then
debug_log "New binary downloaded."
else
debug_error "New binary download failed."
invoke_restore_backup
fi
# Download a new myapp config.json file from your online repository such as AWS S3 Bucket
curl -O https://abcdefghxyz.amazonaws.com/v41/cfg/config.json && sudo mv config.json /etc/myapp/config.json
if [ $? -eq 0 ]; then
debug_log "New config file downloaded."
else
debug_error "New config file download failed."
invoke_restore_backup
fi
# Start the service again
systemctl start myapp
# Check the status of the service
STATUS="$(systemctl status myapp | grep 'Active: active (running)')"
if [ -z $STATUS ]; then
debug_error "myapp service failed to run."
invoke_restore_backup
else
debug_log "myapp service is running."
fi
How to create an OTA Update Workflow
To create an OTA Update Workflow
, go to the DevOps Automation
section in the SocketXP Portal.
Click the CREATE WORKFLOW
tab and start creating an OTA Update Worflow as described below:
-
Workflow Name - Provide some name to remember this workflow for your reference. Eg:
MyIoTApp upgrade to v3.0 workflow
,Fix antenna issue
,security fixes
-
Upload a Workflow Script File - Upload a workflow script file that you have written to update the software in your devices.
-
Destination Filename - Specify the file path where the workflow script needs to be downloaded on your edge Linux devices. Eg:
/usr/local/bin/workflow-script.sh
,/var/lib/my-iot-app/my-workflow-script.py
,/home/pi/wf-script.pl
-
Script Execution Command - Specify a Linux command to run on your edge devices to execute the workflow script downloaded in the previous step. Eg:
sh /usr/local/bin/workflow-script.sh
,python /var/lib/my-iot-app/my-workflow-script.py
,perl /home/pi/wf-script.pl
-
Finally click the
CREATE WORKFLOW
button to save and finish.
Go to the WORKFLOWS
tab and hit the Refresh
button there to view your newly created workflow.
Now that the OTA Update Workflow is created, it is ready to be deployed on select group of devices.
Note
If you plan to use the Remote Jobs REST APIs to create a workflow and deploy a new job to your remote IoT devices, then the "Command" field in the JSON data in the REST API's HTTP request body should have the following command:
"echo `copy paste the contents of the script.sh file above` > /usr/local/bin/script.sh; sh /usr/local/bin/script.sh"
What the above command does is basically uploads the content of the local file script.sh to the /usr/local/bin/script.sh file in your remote IoT device and then executes the uploaded file as a shell script in the device.
Step #2: Create an OTA Update Deployment
OTA Update Deployment is the actual deployment of the OTA Update Workflow defined in the previous step on select group of devices. This step is very simple. You select a workflow and specify a device group or tag on which you want the workflow to be deployed.
How to create an OTA Update Deployment
Go to the WORKFLOWS
tab and click the '+' icon (shown below) next to any workflow listed there. This will popup a Create a new deployment
window.
Create a new deployment as described below:
Workflow ID
- This pre-filled data shows the workflow ID used for the deployment to be created.Workflow Name
- This pre-filled data shows the workflow name used for the deployment to be created.Deployment Name
- Specify a name to remember this deployment for your reference.Device Group
orTag
- Specify a Device Group or a Device Tag ( Eg: testing, production, customer-xyz, temp-sensor etc.) on which to deploy the OTA update.- Finally, click the
CREATE DEPLOYMENT
button to save and finish.
Go to the DEPLOYMENTS
tab and hit the Refresh
button to view your newly created deployment.
Note
You can create as many number of OTA Update Deployments from an OTA Update Workflow. Each deployment can be for a different device group or tag such as development, testing, production etc. This way the workflow and the workflow script can be reused across deployments.
You can monitor the progress of the OTA Update Deployment by clicking the Refresh
button.
You can monitor the progress of the deployment on individual devices in the Device Group or Device Tag, by simply clicking the Deployment row. A Deployment Progress
window will popup. The window will have a table containing the deployment progress for each of the devices in the group or tag on which the deployment was scheduled. Hit the Refresh
button in the window to view the progress.
You can also view the logs (workflow script execution logs and error logs) for each device by clicking the VIEW LOG
button next to the device. If your job shows as failed or error, then check the stderr logs.