In this blog post, we showcase how ALLEN implements network security on AWS by centralizing the ingress and egress traffic management. We'll explore the architecture ALLEN has adopted, highlighting the problems that led to the centralized architecture, the advantages of centralized ingress and egress, challenges and the road ahead.
Glossary
AWS VPC (Virtual Private Cloud) lets you provision a logically isolated section of the Amazon Web Services (AWS) cloud where you can launch AWS resources in a virtual network that you define. Each VPC works as an isolated network of its own i.e., it allows you to create your own network space and control how your network and the resources that reside in it are exposed to the internet.
NAT Gateways are used by Amazon VPC for Network Address Translation (NAT) so resources in private subnets can communicate with resources on the internet.
North-South Traffic refers to the communication between internal networks such as a VPC and external networks including the Internet; it comprises both incoming or ingress traffic and outgoing or egress traffic.
East-West Traffic: East-West traffic is the communication within a network or VPC from one resource to another, usually internal communication between services and instances.
Problem Statement:
ALLEN has multiple public facing web applications and APIs which are deployed across multiple AWS VPCs for workload isolation and protection. There are some VPCs for data platform and data science workloads as well which require outbound internet access for software upgrades, accessing third party packages, etc. Providing secure ingress and egress from all these VPCs had several challenges like,
In the distributed networking model, each VPC required its own internet gateway and inspection mechanism. This meant managing each account VPC individually, which makes it difficult to centrally administer, maintain configurations or enforce policies. This increases the complexity of managing multiple configurations across numerous VPCs.
Ensuring a single point of egress from the ALLEN' AWS Network was essential to enforce company wide egress policies such as allowed domains, container image registries, allowed ports or protocols, etc.
Each VPC required its own set of NAT Gateways deployed across 3 availability zones for high availability and provisioning these many NAT Gateway would have incurred a lot of cost since NAT gateways have hourly charges ($0.056 per gateway per hour in Mumbai).
Deep packet inspection for any traffic entering or exiting the VPC would require setting up separate firewall appliances for each VPC or having AWS Gateway load balancer endpoints or AWS Network Firewall endpoints created in each VPC. However this is not the most cost effective solution as the cost of the GWLB or Network Firewall endpoints will grow along with increasing number of VPCs.
Hitting the virtual firewall appliances limits or some virtual appliances becoming unhealthy could become a bottleneck for the network. We needed a scalable, highly available and fault tolerant firewall solution.
Centralized Inspection Model overview for north-south traffic:
The centralized ingress and egress inspection model uses dedicated VPCs for Ingress traffic, Egress traffic and Network Inspection such that,
All the inbound traffic to ALLEN’s public applications and APIs enters the network through a centralized ingress VPC.
All the outbound traffic i.e. the traffic destined to the internet from any VPC in ALLEN’s network is routed through a centralized egress VPC.
All the ingress traffic is routed to the centralized inspection VPC in the security account before reaching the application’s ELB.
All the egress traffic is routed to the centralized inspection VPC before reaching the NAT Gateway.
VPC to VPC connectivity considerations:
AWS provides two different VPC to VPC connectivity models i.e.
Many to Many: One way to achieve this is via. VPC Peering where traffic between each VPC is managed individually.
Pros:
Highest Performance: Offers the best performance for inter-VPC communication.
Cost-Effective: Generally cheaper compared to other connectivity options, especially with a limited number of VPCs (single digits).
Simple for Few VPCs: Ideal when dealing with a small number of VPCs.
Cons:
Lack of Transitive Routing: VPC Peering does not support transitive routing. This can lead to a complex network mesh if many VPCs need to be interconnected.
Complex On-Premises Integration: Connecting on-premises networks requires individual connections to each VPC, complicating the setup.
Increased Network Complexity: Each additional VPC increases overall network complexity.
Hub and Spoke: This involves routing all the VPC to VPC traffic through a central hub, which handles the traffic routing based on the defined rules. AWS Transit Gateway is an AWS service which helps in achieving this by acting as a hub to provide centralized routing. Transit Gateway route tables let us control how traffic is routed among all the connected spoke networks.
Pros:
Reduced Complexity: Simplifies network design and operations by centralizing routing.
Centralized Management: Provides a single point for managing traffic routing between VPCs.
Supports Transitive Routing: Facilitates communication between multiple VPCs through the central hub.
Enhanced Connectivity Options: Allows integration with on-premises networks, Direct Connect, and cross-cloud VPN connections at the Transit Gateway.
Cons:
Data Processing Costs: Charges apply for data processed through the Transit Gateway.
Hourly Charges: Costs for provisioning the Transit Gateway and its attachments to each VPC.
We chose the hub-and-spoke network architecture as it reduces the network complexity, provides ease of operations and enables us to centrally manage the connectivity between different networks.
Opting for Gateway Load Balancer with self-hosted firewall appliances instead of AWS Network Firewall:
Some of the core requirements from the firewall solution were Intrusion Prevention (IDPS), Deep Packet Inspection (DPI), URL Filtering, outbound traffic filtering to prevent data loss, etc. AWS Network firewall provides all of the above mentioned capabilities and it internally uses AWS Gateway load balancer to load balance the requests. However, we opted for Gateway Load Balancer with a fleet of virtual firewall appliances due to the following reasons:
Using AWS Gateway Load Balancer gave us more control over the virtual firewall appliances, flexibility to use firewalls from different vendors which provide more advanced firewall features and overall lesser price as compared to the AWS Network firewall.
Using third party firewall appliances behind Gateway Load balancer allowed us to use DNS based routing to the ELBs in the Spoke VPC. Using AWS Network Firewall only allowed us to route to IP addresses. Since the ALBs do not support static IP addresses, using NLB as a target in the spoke VPC was the only option.
Our legacy IT team had been using firewall appliances for the on-premises and our physical coaching centres across India. Hence, we decided to continue using the similar firewall appliances for the Cloud Network inspection.
Centralized Ingress Traffic Inspection:
At ALLEN, we have implemented a multi-account strategy using the AWS Control tower Landing Zone where the accounts such as Networking account, Security Account, Audit Account have been created within the Core and Infrastructure OUs with strict guardrails. In this multi-account strategy, the networking account is created in the infrastructure OU which is used to setup the networking resources such as Transit Gateway, NAT Gateways, Internet gateways, VPN tunnels with on-premise networks, etc.
The centralized Ingress and egress VPCs are also created in ALLEN’s Networking Account along with other Network resources such as Transit Gateway, CloudFront, AWS WAF, etc.
This VPC has 3 sets of subnets across 3 Availability Zones i.e.
Firewall Subnets: These subnets are used to create the Gateway Load Balancer Endpoints which route the traffic to the Gateway Load Balancer using AWS PrivateLink.
Protected Subnets: These subnets are used to create public facing Elastic Load Balancers.
TGW Subnets: These subnets are used to create Transit Gateway attachments for the Ingress VPC to connect it to the Transit Gateway.
An Internet Gateway (IGW) edge association is created in the Ingress VPC with a route to the GWLB endpoint. i.e. any traffic coming from IGW and destined to the ALB subnet or the Protected Subnets will be routed to GWLB endpoints.
Ingress traffic flow:
A user makes a request to ALLEN’s Web Application or API.
The user request is routed to the nearest CloudFront edge location where AWS WAF is enabled. AWS WAF blocks any malicious layer-7 traffic and protects against Layer 7 DDoS attacks at the edge location itself. Once the request is allowed by AWS WAF, the request is sent to the AWS ELB, entering ALLEN’s Ingress VPC via the Internet Gateway.
The route table associated with Internet Gateway has the routes for protected subnets that direct requests destined for ELB to Gateway Load Balancer Endpoints.
Gateway Load Balancer Endpoints route traffic (using AWS Private Link) to GWLB which then encapsulates original IP traffic with the GENEVE header. GWLB uses a 5-tuple approach for TCP/UDP flows and using the 5-tuple hash, it makes sure that both directions of a flow (i.e., source to destination, and destination to source) are consistently forwarded to the same target appliance. 5-tuple includes source IP, source port, destination IP, destination port, and transport protocol.
The firewall appliances inspect the traffic. If allowed, the traffic is returned to the Gateway Load Balancer endpoints. The Firewall Subnet route table forwards the inspected and allowed requests to the Public ELB.
Public ELB has the internal NLB static IPs in its target group. Load Balancer route table routes the traffic to the Transit Gateway attachment.
The request reaches the transit Gateway and the transit gateway route table routes the requests to the spoke VPC attachment.
Finally, the request is routed to the spoke VPC NLB which load balances the requests to the backend applications.
Centralized Egress Traffic Inspection:
It was crucial for ALLEN to maintain a common policy for any traffic originating from the ALLEN Network destined to the internet. This lets our Cloud security team decide who has access to internet egress and what are the trusted third-party domains, allowed docker repositories which any of the application can access over the internet.
To achieve this, we created an Egress VPC in our Networking Account and routed all the egress traffic to this VPC using AWS Transit Gateway. The Egress VPC consists of 3 set of subnets across 3 Availability Zones i.e.
Public Subnets: These subnets have a default route to the Internet gateway and are used to create the NAT Gateways.
Firewall Subnets: These subnets are used to create the Gateway Load Balancer endpoints.
TGW Subnets: These subnets are used to create Transit Gateway attachments for the Egress VPC to connect it to the Transit Gateway.
Outbound traffic inspection flow:
Traffic originates from one of the Spoke VPCs application subnet. The subnet route table has the default route pointing to the Transit Gateway attachment.
Transit Gateway route table associated with the VPC have a default route to the Egress VPC.
Traffic arrives at the Egress VPC’s TGW subnet.
TGW subnet’s route table forwards the internet traffic to the Gateway Load Balancer Endpoint.
Gateway Load Balancer Endpoints route traffic (using AWS Private Link) to GWLB as discussed in the Ingress Inspection flow. If the request is allowed, the request returns the Gateway Load Balancer endpoint.
The firewall subnet route table has a default route to the NAT Gateway within the same AZ.
NAT Gateway performs source IP address translation and routes traffic to Internet Gateway. From there traffic is sent out to the Internet.
When the return traffic comes back from the internet, it first hits the Internet Gateway. Internet Gateway routes this traffic back to the same NAT Gateway which did the IP translation.
The route table in the Public subnet has a route for the Spoke VPC IPs which sends the traffic to the Gateway Load Balancer endpoint within the same subnet.
GWLBE forwards the return traffic to the GWLB. GWLB encapsulates the traffic with a GENEVE header and forwards it to the same virtual appliance which was chosen during the outbound traffic inspection.
If the traffic is allowed, it returns to the GWLB endpoint. The Firewall subnet route table sends the traffic to the Transit Gateway from where it is routed back to the Spoke VPC.
Enforcing the centralized network inspection flow:
In order to ensure that the north-south traffic to or from any workload VPC follows the above-mentioned network flow, we have defined service control policies (SCPs) on the AWS OUs, which block any user from creating/modifying any network resources such as route tables, NAT gateways, transit gateway attachments, etc. Access for modifying and creating any network resources is only provided to the ALLEN’s Infrastructure team.
There are also some AWS Control Tower detective and preventive guardrails along with some custom Config rules enabled which ensure that we are always following the AWS best practices while configuring different resources involved in the above architecture.
Some of the critical configs have auto-remediation setup using SSM documents to fix any non-compliant resources in minimal time.
The path forward :
ALLEN is continuously enhancing its security posture across its network on AWS Cloud, on-premises, and with other cloud providers. We plan to further enhance the east-west traffic inspection between Spoke VPCs and traffic between the AWS Network and the on-premises network. Additionally, ALLEN is exploring AI and ML-driven capabilities of various Layer 7 WAF and Next Generation Firewall solutions available in the market. These solutions will further enhance our security posture and expedite the remediation process. We are also implementing SIEM and SOAR solutions to achieve real-time infrastructure and user awareness, enabling accurate threat detection, analysis, and automated threat remediation.
All these efforts are aimed at providing secure applications to students using ALLEN’s platform for their NEET and JEE preparation journeys.