Auto Shutdown AWS EC2s with Lambda, CloudWatch (and Terraform)

Today we are looking at how to automatically shut down EC2 instances at the end of the day automatically using a combination of Lambda and CloudWatch.

Lambda Function

TL;DR I’ve written a ‘stop EC2’ Lambda function that is deployable with Terraform to get you started, I’ve not made a ‘start EC2’ Lambda function yet. Future improvements, I’d like to programmatically schedule instances using tags. Clone from: github.com/xanmanning/terraform-autostop

What we will end up with is a simple Lambda function that will search all of our regions and automatically send the stop-instances command to anything that is still running after 8:00pm.

To do this I will also be using Terraform as the process is then repeatable and can be versioned in git. I am fairly new to Terraform but I honestly cannot understand how I ever survived without it.

Do you even Terraform?

I’ve been asked on numerous occasions, at different employers “How do I stop (and start) Amazon EC2 instances at regular intervals?”, and I stumbled upon an article written by AWS that uses Lambda.

Now this knowledge base article is quite basic, you effectively write two Lambda functions:

  1. Shut down a specific instance
  2. Start a specific instance

You then use the time based event in CloudWatch to trigger this Lambda function. It’s kind of like writing a Python script on your VM and triggering it in cron - although you obviously cannot power on that machine with cron. I started looking at other solutions and came across this guide December 2016 when exploring turning development EC2 instances of an API platform off and on to a schedule:

EC2 Scheduler Implementation Guide

That’s great, but with a need to use a DynamoDB table it felt over engineered for what we needed at the time. What I wanted was something in the middle of these two solutions - scheduling tagged instances without a database.

Sadly, I was not able to build a solution due to a need to change employment.

Fast forward to November 2017 and I have been asked again for my thoughts on using Lambda functions for scheduling EC2s. For my own proof of concept I wanted to programmatically look for EC2 instances in any AWS region that is tagged for automatic shutdown. For an EC2 to be automatically shut down at the end of the day (8:00pm) the EC2 requires the tag AutoShutdown: True. Below is the Python 2.7 Lambda function:

#!/usr/bin/env python2

#
# Autostop EC2 Lambda POC script.
# xmanning - 2017
#

from __future__ import print_function
import boto3
from datetime import datetime

# Poweroff time (24h format)
poweroff_time = 2000

# Set to false if we aren't debugging on a local machine.
local_debug = False

# StopEC2 Object
class StopEC2:
    # initialize required EC2 client and configuration
    def __init__(self):
        self.init_client = boto3.client('ec2')
        self.ec2_client = {}
        self.instance_stop_list = {}
        self.now = datetime.now()
        self.time = int(self.now.strftime('%H%M'))

    # Return a list of regions
    def list_regions(self):
        regions = []
        for region_info in self.init_client.describe_regions()['Regions']:
            regions.append(region_info['RegionName'])
        return regions

    def create_regions_ec2_clients(self):
        for region in self.list_regions():
            self.instance_stop_list[region] = []
            self.ec2_client[region] = boto3.client('ec2', region_name=region)

    def destroy_regions_ec2_clients(self):
        for region in self.list_regions():
            self.ec2_client[region] = None

    def list_ec2s_per_region(self):
        print('================================================')
        print('  Scanning Regions for stoppable EC2 instances  ')
        print('================================================')
        print('')
        print('Key:')
        print('  +  EC2s with AutoStop Tag')
        print('  -  EC2s to be ignored')
        print('')
        self.create_regions_ec2_clients()
        for region in self.list_regions():
            print('------------------------------------------------')
            print('EC2s running in {}'.format(region))
            print('------------------------------------------------')
            for reservation in self.ec2_client[region].describe_instances()['Reservations']:
                for instance in reservation['Instances']:
                    if instance['State']['Name'] == 'running':
                        name = "No Name"
                        autostop = '-'
                        if 'Tags' in instance:
                            for tag in instance['Tags']:
                                if tag['Key'] == 'Name':
                                    name = tag['Value']
                                if tag['Key'] == 'AutoStop' and tag['Value'] == 'True':
                                    autostop = '+'
                                    self.instance_stop_list[region].append(instance['InstanceId'])
                        print('  {}  {} ({})'.format(autostop, instance['InstanceId'], name))
            print('')
        print('Done')
        print('')

    def stop_instances(self):
        print('')
        print('================================================')
        print('            Running Stop Proceedure!')
        print('================================================')
        print('')
        for region in self.list_regions():
            if len(self.instance_stop_list[region]) > 0:
                print('------------------------------------------------')
                print('Stopping: {}'.format(self.instance_stop_list[region]))
                print('------------------------------------------------')
                self.ec2_client[region].stop_instances(InstanceIds=self.instance_stop_list[region])
        print('Done')
        self.destroy_regions_ec2_clients()



def lambda_handler(event, context):
    client = StopEC2()
    if client.time >= poweroff_time:
        client.list_ec2s_per_region()
        client.stop_instances()
    else:
        print('It is not power off time! (>={})'.format(poweroff_time))

if __name__ == '__main__' and local_debug == True:
    lambda_handler([],[])

Now as mentioned, I like to use Terraform for managing my AWS infrastructure. It means I can spin up and tear down all my musings at a moments notice. There is one thing to note with AWS Lambda and Terraform, loading a function is not quite as straight forward as pointing your Terraform config to a .py file. You either have to upload the script to an S3 bucket or archive it into a .zip file to which you need to work out the sha256 hash of that file. Luckily Terraform will work out this hash but creating/updating the .zip file is a bit of a pain.

resource "aws_lambda_function" "stop_ec2_instances" {
  filename         = ".lambda_stop_ec2_payload.zip"
  function_name    = "stop_ec2_instances"
  description      = "Shuts down unused EC2 instances."
  role             = "${aws_iam_role.lambda_start_stop_ec2.arn}"
  handler          = "stop_ec2.lambda_handler"
  source_code_hash = "${base64sha256(file(".lambda_stop_ec2_payload.zip"))}"
  runtime          = "python2.7"
  timeout          = 180
}

For this I have created a simple bash script that wraps around the Terraform binary. I recommend this as it will speed up your workflow for deploying to Lambda. My bash script works by firstly compressing the Python scripts in the payloads/ directory and then runs the terraform plan/apply functions - on the fly generating the .zip files and the associated has for upload to AWS Lambda.

build_payloads() {
    info "Building payload files."
    local __PWD=$(pwd)
    cd payloads
    for file in * ; do
        local script="${file%.*}"
        info "Adding ${file} to .lambda_${script}_payload.zip"
        zip -r9 "../.lambda_${script}_payload.zip" "${file}" || \
            fatal "Could not build payload files"
    done
    cd "${__PWD}"
}

Feel free to clone my repository and play with the Lambda function.

https://github.com/xanmanning/terraform-autostop

Future improvements

  1. Change the tag AutoShutdown: True to be PowerSchedule: 0800;2000 (08:00am - 20:00pm)
  2. Create a Start function that also reads the PowerSchedule tag.
  3. Have the option for a ‘default’ schedule for untagged instances.
  4. Exempt EC2s for Stop/Start with a specific tags, eg. Production
  5. Add Paramiko support for running commands after EC2 start or before EC2 stop.

Further reading