How to Count Lambda costs through cloud watch

Introduction

How to calculate costs of running multiple Lambda functions by using cloud watch logs.

AWS Lambda functions record their logs to cloudwatch. And if you run thousands of lambda functions you might want to look through all the logs to find the cost for running all those.

This article describes a script that makes it easy to find those costs.

Background

When a Lambda function executes it writes all standard output to AWS Cloudwatch. Finally at the end of the execution, Lambda writes a summary to CloudWatch indicating the costs involved. Which looks like this:

REPORT RequestId: 42a2245b-4237-56ad-ab71-37f37a265b12 Duration: 21342.35 ms Billed Duration: 2300 ms Memory Size: 2048 MB Max Memory Used: 200 MB Init Duration: 740.19 ms

Now If you ran your lambda function thousands of times, you will thousands of these summaries littering your logs. The goal being to dive into those logs and somehow assemble all that text into something useful for estimating the costs of those Lambda executions.

For instance, the best approach would be to get all that nice cost information into an excel spread-sheet. Then let excel crunch some numbers to tally the costs.

The Script

Using python and AWS python api (called boto3), I wrote a script to do just that. This script crawls through some Cloudwatch logs and streams, parses the summary costs and output’s some results into a .csv file. Then that .csv file can be opened in Microsoft excel.

import boto3
import os
import json
import time
import re
import sys
import datetime
from timeit import timeit as timer

test = "REPORT RequestId: 41f11e5b-4237-56ad-ab71-48b46e176e08	Duration: 45777.35 ms	Billed Duration: 45800 ms	Memory Size: 3008 MB	Max Memory Used: 195 MB	Init Duration: 740.19 ms"
p = re.compile("REPORT RequestId:\s(.*)\sDuration: (\d+\.\d+).*Billed Duration:\s(\d+).*Memory Size:\s(\d+).*Memory Used: (\d+).*")

test_match = p.match(test)

# print(match.group())  # The entire string matched
# print(match.group(1)) # Request id
# print(match.group(2)) # Duration
# print(match.group(3)) # Bill Duration
# print(match.group(4)) # Memory Allocated
# print(match.group(5)) # Memory Used

def write_to(output_file, events):
    """
    Writes output events to a log file

    Only if it matches a given regular expression
    """
    count = 0
    for event in events:
        log_text = event['message'].rstrip()
        match = p.match(log_text)
        if match:
            output_file.write("%s," % match.group(1))
            output_file.write("%s," % match.group(2))
            output_file.write("%s," % match.group(3))
            output_file.write("%s," % match.group(4))
            output_file.write("%s," % match.group(5))
            output_file.write("\n")
            count += 1
    return count

def get_stream_events(log_client, group_name, stream_name, output_file):
    """
    Gets all the events in a given stream

    Sends all events to a write function for processing.
    """
    kwargs = {'logGroupName' : group_name, 'logStreamName' : stream_name}

    while True:
        response = log_client.get_log_events(**kwargs)

        events = response['events']
        write_to(output_file, events)

        if 'nextToken' in response:
            kwargs['nextToken'] = response['nextToken']
        else:
            break


def get_log_group(group_name, stream_prefix):
    """
    Iterates through all log groups looking for a stream

    Looks for all streams whose name starts with the stream_prefix parameter
    This is slow, so should not be used. 
    """
    print("getting group: %s" % group_name)
    logs = boto3.client('logs') # cloudwatch

    with open('output.csv', 'w') as output_file:
        output_file.write("RequestId, Duration, Billed Duration, Memory Size, Max Memory Used\n")
        kwargs = {'logGroupName' : group_name, 'limit': 50, }
        stream_count = 0
        while True:
            response = logs.describe_log_streams(**kwargs)
            streams = response['logStreams']
            for stream in streams:
                
                
                stream_name = stream['logStreamName']
                print("stream: %s" % stream_name)
                if (not stream_name.startswith(stream_prefix)):
                    continue

                stream_count += 1
                get_stream_events(logs, group_name, stream_name, output_file)

            time.sleep(0.21) # this method is throttled if called more than 5 times per second.
            if 'nextToken' in response:
                kwargs['nextToken'] = response['nextToken']
            else:
                break
        print("Stream Count: %s" % stream_count)
        
def get_filter_log_events(group_name, stream_prefix, start_time, end_time):
    """
    Gets Cloud watch logs filtered by a few parameters

    :param - group_name: The fully qualified name of the cloud watch group
    :param - stream_prefix: Any part of the beginning of the stream name to query in
    :param - start_time: The starting time to filter for events.
    :param - end_time: The ending time to filter for events.

    It is critical to get start and end times in the correct format that Amazon uses. Even on their own lambda
    log pages, times are listed in different formats, reporting times in different timezones.

    The time that must be used is the time reported under cloud watch, and is the time that the streams show
    for their start and last time stamps.
    
    The timezone must be listed in UTC time.
    The final format is like this:
    AMAZON_TIME_FORMAT = "%Y-%m-%dT%H:%M:%S.%fZ%z"
    where %z is of the form +HHMM or -HHMM

    For example:
    2020-03-23T10:15:00.00Z-0600

    year: 2020
    month: 03
    day: 23
    hour: 10 AM (24 hour format)
    minute: 15
    second: 00
    milliseconds: 00
    time zone: -6 hours after Greenwhich mean time
    """
    print("getting group: %s" % group_name)
    logs = boto3.client('logs') # cloudwatch

    with open('output.csv', 'w') as output_file:
        output_file.write("RequestId, Duration (ms), Billed Duration (ms), Memory Size (MB), Max Memory Used (MB)\n")
        kwargs = {'logGroupName' : group_name, 'logStreamNamePrefix' : stream_prefix, 'startTime': start_time, 'endTime': end_time}
        count = 0
        while True:
            response = logs.filter_log_events(**kwargs)
            if 'events' in response:
                events = response['events']
                count += write_to(output_file, events)

                if 'nextToken' in response:
                    kwargs['nextToken'] = response['nextToken']
                else:
                    break
        print("Number of matched events: %s" % count)


if __name__ == '__main__':
    if (test_match == None):
        raise RuntimeError("The regular expression did not work: %s" % p)

    
    if (len(sys.argv) == 3): # args includes the name of this script as the first argument
        log_group_name = sys.argv[1]
        stream_prefix = sys.argv[2]

        print("Looging in Log Group: %s" % log_group_name)
        print("Looking for streams starting with: %s" % stream_prefix)
        get_log_group(log_group_name, stream_prefix)

    elif (len(sys.argv) == 5):
        start = timer()
        log_group_name = sys.argv[1]
        stream_prefix = sys.argv[2]
        start_time = sys.argv[3]
        end_time = sys.argv[4]
        AMAZON_TIME_FORMAT = "%Y-%m-%dT%H:%M:%S.%fZ%z"
        # 2020-03-23T16:14:47.637
        # The date contains year, month, day, hour, minute, second, and microsecond.

        start_time = datetime.datetime.strptime(start_time, AMAZON_TIME_FORMAT)
        end_time   = datetime.datetime.strptime(end_time  , AMAZON_TIME_FORMAT)

        print("Start time : %s" % start_time)
        print("  End time : %s"   % end_time)

        # Convert to milliseconds since epoch
        start_time = int(start_time.timestamp() * 1000)
        end_time   = int(end_time.timestamp() * 1000)
        print("epoch start time : %s" % start_time)
        print("  epoch end time : %s"   % end_time)

        get_filter_log_events(log_group_name, stream_prefix, start_time, end_time)
        end = timer()
        print("Elapsed time: %s" % (end-start))
    else:
        print("wrong number of arguments")


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s