GNU Parallel and AWS CLI

We all know that the GUI/Console for AWS is sadly slow and cumbersome to use. That’s why they created the CLI and SDK tools to let impatient people do more work in less time. However, the CLI and SDK are also slow because well, processing time takes for ever sometimes. So for the truly impatient isn’t it nice that you can use GNU Parallel to complete more work in less time? Let me show you an example.

The Problem

I need a list of all instances, in four regions, in all my AWS accounts. To do this we will loop through each of my AWS profiles, each region, and list the instances. To do this one at a time is something like:

aws --region us-east-1 --profile my-secret-account-role-devopsadmin ec2 describe-instances

which gives us something like:

{
    "Reservations": [
        "Instances": [
            {
                "InstanceId": "..."
                ...
            }
        ]
    ]
}

we’ll pipe that through jq to get just the InstanceId:

jq -r ".Reservations[].Instances[].InstanceId"

That all works great, except I have 33 accounts using 4 regions each, that’s 132 runs… that is too much work. Let’s make it a bit easier.

Solution One:

To accomplish this I can build a simple bash script as follows:

for role in $(cat ~/.aws/credentials | grep devopsadmin | cut -d '[' -f2 | cut -d ']' -f1 | grep -v ^role_arn); do
	for region in us-east-1 us-west-2 eu-west-1 eu-west-2; do
		echo $role $region;
		aws --profile $role --region $region ec2 describe-instances | jq -r ".Reservations[].Instances[].InstanceId";
	done
done

This loops through all 132 combinations and outputs something like:

my-secret-account-role-devopsadmin us-east-1
i-0f15bd143e9db408e
i-01697a6359a8062b8
i-056f9ab8d977a58a3
i-06f93ebd5cd372bfc

If we run time on this command you can see it takes over 2 minutes to complete.

$ time ./simple_script > simple_results

real	2m17.105s
user	1m7.725s
sys	0m8.015s

This is much better than doing it all by hand. Hold on, we can do better!

Solution Two:

GNU Parallel is a script that can replace loops and xargs and runs things in parallel. This is a great use case because we can make each of these calls to AWS in parallel and not change anything. Let’s look at the script:

parallel "echo {2} {1}; aws --region {1} --profile {2} ec2 describe-instances | jq -r \".Reservations[].Instances[].InstanceId\" | xargs -0 "  ::: us-east-1 us-west-2 eu-west-2 eu-west-1 ::: $(cat ~/.aws/credentials | grep devopsadmin | cut -d '[' -f2 | cut -d ']' -f1 | grep -v ^role_arn)

I will be the first to admit the above string is daunting! Let’s break this down.

Parallel has multiple ways it can be run, I am using this format:

parallel command ::: input 1 :: input 2

I can reference each iteration of the inputs as {1} and {2}.

I get very similar output as the first script (just extra whitespace), and it all only takes 14 seconds!

$ time ./simple_parallel_script > parallel_results

real    0m14.716s
user    1m30.136s
sys     0m10.425s

That is what I am talking about!