Explore our expert-made templates & start with the right one for you.
VPC Flow Logs Analysis in Athena: A Step-by-Step Guide
How to Analyze VPC Flow Logs for Security
In this article we will go through the technical side of analyzing AWS VPC flow logs for security and intrusion. We will go through the definition of the VPC flow logs and what they are. We’ll also list some use cases of how you can use flow logs to secure your instances.
We cover the data ingestion and preparation process here: Analyzing Amazon VPC Flow Logs Using SQL
We’ll discuss ways to defend instances from malicious software and unauthorized traffic. Then we will explain how to enable VPC flow logs and how to access them through CloudWatch.
Also, we’ll go through some use cases for analyzing VPC flow logs via Amazon Athena. It is an easy and interactive query service perfect for analyzing flow logs with standard SQL.
What Are VPC Flow Logs?
Amazon VPC flow logs allow you to track and analyze all the IP addresses coming in and out from the network interface in the VPC. So flow logs can work as the main source of information to the network in your VPC. From VPC flow logging information, you get the traffic flows within VPC, subnets, or ENIs. Flow logs are collected outside your network traffic, which doesn’t affect your network performance.
Every flow log is published to Amazon CloudWatch Logs to easily retrieve its data and help monitor the traffic for your instance and determine the direction of this traffic and what you can do with it.
Using a flow log, you can analyze the region where you get the most traffic and detect if specific traffic is not properly connected and reaches the instanc
Before the introduction of VPC flow logs, customers collected network flow logs by installing agents on their instances, limiting them in terms of how they could view the network flows.
Benefits of VPC Flow Logs Analysis
Flow logs can help you analyze incoming network requests and decide whether to accept or reject them – improving Access Control List rules. Using VPC flow logs, you can create alerts for unauthorized IPs and destination port redirects, which could indicate malicious software trying to access your network.
By analyzing VPC flow logs, we can detect threats by monitoring port scanning. Also, we monitor the traffic flows to build confidence between ACLs. We also can use the flow logs to diagnose and troubleshoot the connection issues.
It is important to understand the difference between security groups and network ACLs; in security groups, it acts as a firewall application that allows network traffic to go in and out. Network ACLs act as a network firewall from subnets that controls the traffic movement.
You can monitor remote logins from ports like SSH and RDP. These ports can only be accessed from trusted sources. So using flow logs, we can analyze these ports to maximize security and detect suspicious activities.
Analyzing VPC Flow Logs for Security in Athena – Step by Step
Before beginning, you have to access your VPC flow logs. You can access them through the CloudWatch interface, then select the log group, then the log stream to view it
After creating the flow logs, gaining access to them, and publishing them to Amazon CloudWatch Logs, you can analyze these logs using Amazon S3 to provide scalability. Let’s start by configuring amazon Athena to query data to try different security scenarios.
Step1
Copy this DDL code into the Athena console
CREATE EXTERNAL TABLE IF NOT EXISTS vpc_flow_logs ( version int, account string, interfaceid string, sourceaddress string, destinationaddress string, sourceport int, destinationport int, protocol int, numpackets int, numbytes bigint, starttime int, endtime int, action string, logstatus string ) PARTITIONED BY (dt string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' LOCATION 's3://example_bucket/prefix/AWSLogs/{subscribe_account_id}/vpcflowlogs/{region_code}/' TBLPROPERTIES ("skip.header.line.count"="1");
Step 2
Change location s3://example_bucket/prefix/AWSLogs/{subscribe_account_id}/vpcflowlogs/{region_code}/ to the address of the log you want to analyze.
Step 3
Now, you need to run the above query in the Athena console, which will register a table called vpc_flow_logs.
Now, there are many use cases scenarios for analyzing flow logs in security.
For example, to monitor SSH and RDP traffic. RDP is used for Windows while SSH is for AWS linux instances. We run the query below to get the activity on the SSH and RDP ports. Here 22 is the port of SSH, while 3389 is for RDP.
SELECT * FROM vpc_flow_logs WHERE sourceport in (22,3389) OR destinationport IN (22, 3389) ORDER BY starttime ASC
Another scenario is to monitor traffic on web app ports. Let’s assume that your application serves requests on a specific port. By applying these queries you get the top 10 IP addresses that are transferred.
SELECT ip, sum(bytes) as total_bytes FROM ( SELECT destinationaddress as ip, sum(numbytes) as bytes FROM vpc_flow_logs GROUP BY 1 UNION ALL SELECT sourceaddress as ip, sum(numbytes) as bytes FROM vpc_flow_logs GROUP BY 1 ) GROUP BY ip ORDER BY total_bytes DESC LIMIT 10
Now suppose you want to check the servers which have the highest number of HTTPS requests. We use this query as it counts the number of packets received on HTTPS port 443.
SELECT SUM(numpackets) AS packetcount, destinationaddress FROM vpc_flow_logs WHERE destinationport = 443 AND date > current_date - interval '7' day GROUP BY destinationaddress ORDER BY packetcount DESC LIMIT 10;
Conclusion
By increasing the popularity of AWS environments, it becomes more complex, which requires more enhanced tools and techniques. With these tutorials, we explained the VPC flow logs and how to analyze them to track the traffic on your instance for better data security management and detecting malicious software and events, which helps teams identify and fix them.
For more information about VPC flow logs, visit Amazon’s official page for VPC flow logs to learn more about it, troubleshooting, and how to publish on Amazon S3 and CloudWatch. It’s very useful documentation to get started using the flow logs in your system.