Envoylint.com

Envoylint.com

Validate your Envoy configs from your browser

envoylint.com

What is this?

This site takes a Envoy config and validates it for you. Run against actual Envoy binaries with multiple versions supported (1.16.2, 1.16.0, 1.14, 1.12)

How does it work?

It sends the config to a Lambda running Envoy in validate mode or against the config_load_check_tool and prints the results.
There is a 30 second timeout on the linter due to API Gateway limitations. Extremely large configs may reach that.

Do you save any data?

No and all sessions run on ephemeral Lambda containers. But it’s best to never send any sensitive data.

Can I see the code?

https://github.com/ysawa0/envoylint

Sniff Kubernetes Pod requests and headers using tpcdump

Sniff Kubernetes Pod requests and headers using tpcdump

When your fancy observability tools have failed you there’s still trusty tpcdump

The was done on Ubuntu. YMMV on other distros.

Exec into a pod

1
kubectl exec -it my-pod-name sh

Run

1
2
cat /sys/class/net/eth0/iflink
> 588 # container eth id

It should return a number, the container eth id

Now run

1
kubectl describe po my-pod-name | grep Node

To find out the node it’s running on

SSH into the node then run below to find the eni id

1
2
ip link | grep 588
> eni89aabc12345

Now use tcpdump to sniff the requests coming in

1
tcpdump -A -i eni89aabc12345

To capture a specific header

1
tcpdump -A -i eni89aabc12345 | grep -i X-Real-IP -C 5
AWS Double Charges Cross AZ Traffic and Two Solutions

AWS Double Charges Cross AZ Traffic and Two Solutions

Recently I saw an intriguing article by Corey Quinn of lastweekinaws.com. His finding was that although AWS lists cross AZ traffic as costing $0.10 / GB on their documentation, cross AZ traffic actually costs $0.20 / GB; they double charge – charging you for data going out of an AZ and again for going into the other AZ.

This pricing is the same as cross region traffic – $0.20 / GB !

When I read this, it was hard to believe! For the last couple of years, I’ve always thought that cross AZ traffic is cheaper than cross region. Many of my co-workers believed the same as well.

I just had to recreate the cross AZ tests in the post to convince my self.

The test is simple as Corey was kind enough to detail the steps. I’ve added some additional details below to make it even easier to recreate.

I chose a region where I had absolutely nothing running in it to factor out any external factors.

The Bill

Per the test, I sent 10GB of traffic from an EC2 in us-west-2a to another in us-west-2b. Once the bill came in, I was charged for 20GB of data. So every 1GB transferred counted as 2.

Cross AZ bills

Talking to AWS Support

After reaching out to AWS, they confirmed the results I saw. While the docs do not expliclity state that cross AZ costs are “double charged”, it does state that data “is charged at $0.01/GB each direction.”
Ingress egress price

Reducing cross AZ data costs

As most organizations today deploy their services across multiple AZs for high availability, it’s difficult to reduce cross AZ data without the right architecture.

Here are two concepts being used today to combat rising cloud costs (note: implementing either is not easy!)

Service mesh (Envoy, Istio, etc.)

By adopting service mesh architecture, it’s possible to force service to service communication to be within the same AZ.

For example, if a service A container is running in us-east-1a, a service mesh sidecar container running alongside it can ensure all requests goto services also running in 1a.

Implementing this with Envoy can be done through the zone aware routing feature or by using the Endpoint Discovery Service to dynamically send instances of your service endpoints (IP addresses) that are in the same AZ.

Cluster per AZ

This is a model being adopted by companies at scale like Tinder, Lyft and Reddit. They’re making it possible by using Kubernetes. The idea here is to have a production cluster per AZ, where each cluster has a complete replica of all your services – each cluster is an independent copy.

Cluster / AZ. Image taken from Reddit k8s talk

All communication within each cluster is done in the same AZ as the traffic is pinned inside.

Of course, any shared data stores must be located outside the clusters. Managed data stores such as S3, DynamoDB and ElastiCache make a good fit.

Hands Free Canary with ALB Advanced Routing Rules

Hands Free Canary with ALB Advanced Routing Rules

Canary deployments may seem like an advanced technique that requires a team of engineers to implement.

But with the new Advanced Request Routing for ALBs (Application Load Balancer), safely releasing new versions of your application straight into production has never been easier.

First, either create a new ALB or use an existing ALB and copy its DNS name.
e

Then, clone this repo https://github.com/ysawa0/alb-canary

And copy the DNS name to this section of serverless.yml

1
2
3
4
environment:
stage: ${self:custom.stage}
region: ${self:custom.region}
alb_dns_name: canary-alb-1183014609.us-east-1.elb.amazonaws.com

This repo will deploy a Lambda with a API Gateway endpoint that will redirect users to the ALB with a twist – it will add a ?id=$val GET parameter. Where $val will be an integer from 1 to 6.

It uses the Serverless Framework

1
2
# Install Serverless if you don't have it
npm install serverless -g

Then run to deploy the Lambda and API Gateway

1
sls deploy

Save the endpoint of the deployed API Gateway for later.
e

Now, we will set up routing rules that will mimic our “application”.

Click View/edit rules

e

Add the rules below
e
e

It should now look like this.
e

Now, trying querying the API Gateway endpoint we deployed earlier.

1 out of 6 times, it should be bucketed into the canary rule.

1
2
3
4
curl -L https://dmkgpj2yxh.execute-api.us-east-1.amazonaws.com/qa/canary
Id was 1 through 5
curl -L https://dmkgpj2yxh.execute-api.us-east-1.amazonaws.com/qa/canary
Id is 6, you've been canaried!

That’s it! You’ve set up a canary deployment where 1/6 users are canaried.

Using AWS ElastiCache Redis with Spinnaker

Using AWS ElastiCache Redis with Spinnaker

To productionalize a Spinnaker installation for high availability, one of the recommendations is to use an external Redis store, such as AWS ElasticCache. This guide will go over how to migrate a Kubernetes installation of Spinnaker to an AWS ElasticCache Redis instance using Halyard.

All config files (with the proper directory structure) used in this guide can be found in this this repo: ysawa0/spinnaker-elasticcache-redis

spinnaker logo elasticcache logo

Create the ElastiCache instance

elasticcache-settings

Keep Cluster Mode unchecked.

Node type will depend on your needs and budget, here we chose a m5.large

For Engine Version choose 3.2.10.

Configure Halyard and update Spinnaker

elasticcache-settings

After the instance is created, copy the Primary Endpoint for the cluster.

If you want to update all Spinnaker services at once, place this snippet into ~/.hal/default/service-settings/redis.yml, and replace $REDIS_PRIMARY_ENDPOINT with your endpoint.

1
2
overrideBaseUrl: redis://$REDIS_PRIMARY_ENDPOINT
skipLifeCycleManagement: true

To update each Spinnaker service at a time, place the below into ~/.hal/default/profile-settings/$SERVICE-local.yml

Where $SERVICE would be orca, clouddriver, gate, etc.

1
services.redis.baseUrl: redis://$REDIS_PRIMARY_ENDPOINT

Lastly, after updating the base URLs, place this into ~/.hal/default/profiles/gate-local.yml.

1
2
3
redis:
configuration:
secure: true

Now update Spinnaker by running hal deploy apply.

After you confirm that everything is working as expected, it’s time to disable the spin-redis service.

Update ~/.hal/default/service-settings/redis.yml by inserting enabled: false

1
2
3
overrideBaseUrl: redis://$REDIS_PRIMARY_ENDPOINT
skipLifeCycleManagement: true
enabled: false

And scale down the Redis Deployment to 0 replicas in Kubernetes.

1
kubectl scale deploy spin-redis -n spinnaker --replicas=0

Now sit back, relax and enjoy having to monitor one less data store.

How we fixed a Node.js memory leak by using ShadowReader to replay production traffic into QA

How we fixed a Node.js memory leak by using ShadowReader to replay production traffic into QA

A problem Edmunds faced recently was a memory leak in our Node.js application. It confounded the engineering team as it was only occurring in our production environment; we could not reproduce it in QA, until we introduced a new type of load testing tool developed here at Edmunds, which replays production traffic.

Shadow-reader-logo
load-test-animation

Read about it on opensource.com!