Cowrie Honeypot Attack Data Analysis
SSH-only honeypot running on AWS EU (Stockholm) for a few days – analyzing real attack traffic
1. Intro: Why Even Bother With a Honeypot?
In the world of cybersecurity, honeypots serve as digital decoys, they're intentionally vulnerable systems designed to attract and monitor malicious activity. They're like leaving a fake wallet on the street to see who tries to pick it up. By deploying a honeypot, I can observe real-world attacker behavior, including brute force attempts, malware delivery patterns, and botnet activity.
For this project, we spun up a Cowrie SSH honeypot on an AWS Free Tier instance. Cowrie is a medium-interaction honeypot that emulates a real SSH server, allowing us to capture and analyze attack patterns while keeping my actual systems safe. The data collected provides fascinating insights into the constant barrage of automated attacks that occur on the internet.
Spoiler: The noise on the Internet never sleeps. Within minutes of deployment, the honeypot was receiving connection attempts from around the globe.
2. Server Setup: AWS Free Tier + Ubuntu = Good Enough
We started with an EC2 t2.micro instance (Free Tier eligible) running Ubuntu 22.04 LTS. This setup proved more than sufficient for our needs, demonstrating that you don't need expensive infrastructure to run an effective honeypot.
Security was a priority from the start. We created a dedicated PEM key for SSH access and implemented strict security group rules. The instance was placed in the eu-north-1 (Stockholm) region, which provided an interesting perspective on global attack patterns.
3. Honeypot Setup: Deploying Cowrie
Isolation is crucial when running a honeypot. We created a dedicated Linux user 'cowrie' to run the honeypot service, minimizing potential security risks to the host system.
The setup process involved:
# Create cowrie user and set up environment
sudo useradd -m -d /home/cowrie -s /bin/bash cowrie
sudo su - cowrie
# Clone and set up Cowrie
git clone https://github.com/cowrie/cowrie.git
cd cowrie
python3 -m venv cowrie-env
source cowrie-env/bin/activate
pip install -r requirements.txt
# Configure Cowrie
cp cowrie.cfg.dist cowrie.cfg
# Edit cowrie.cfg to set:
# [ssh]
# listen_endpoints = tcp:22:interface=0.0.0.0
# [telnet]
# listen_endpoints = tcp:23:interface=0.0.0.0
4. Log Parsing: Structured Data Extraction
Cowrie generates detailed JSON logs containing rich information about each connection attempt. The Python parser extracted and enriched several key metrics from these logs:
- Session data including timestamps, source IPs, and attempted credentials
- ASN and organization information using ipwhois
- Geographic location data from ipinfo.io
- Open ports and services from Shodan API
Here's the comprehensive log parser:
import json
import glob
from collections import defaultdict
import requests
from ipwhois import IPWhois
import ipaddress
import shodan
LOG_DIR_PATTERN = "/home/cowrie/cowrie/var/log/cowrie/cowrie.json*"
SHODAN_API_KEY = ""
api = shodan.Shodan(SHODAN_API_KEY)
def is_valid_ip(ip):
try:
return ipaddress.ip_address(ip).is_global
except ValueError:
return False
def get_asn_info(ip):
try:
obj = IPWhois(ip)
res = obj.lookup_rdap()
asn = res.get("asn", "N/A")
org = res.get("network", {}).get("name", "N/A")
net_name = res.get("asn_description", "N/A")
return asn, org, net_name
except Exception:
return "Private IP", "N/A", "N/A"
def get_geoip(ip):
try:
res = requests.get(f"https://ipinfo.io/{ip}/json", timeout=5)
if res.status_code == 200:
data = res.json()
loc = data.get("loc", ",").split(',')
return {
"country": data.get("country", "N/A"),
"region": data.get("region", "N/A"),
"city": data.get("city", "N/A"),
"latitude": loc[0],
"longitude": loc[1]
}
except Exception:
pass
return {"country": "N/A", "region": "N/A", "city": "N/A", "latitude": "N/A", "longitude": "N/A"}
def get_shodan_info(ip):
try:
result = api.host(ip)
open_ports = result.get("ports", [])
services = [str(service.get("product", "unknown")) for service in result.get("data", [])]
return open_ports, services
except shodan.APIError as e:
print(f" Shodan API error for IP {ip}: {e}")
return "N/A", "N/A"
except Exception:
return "N/A", "N/A"
def main():
ip_counts = defaultdict(int)
log_files = sorted(glob.glob(LOG_DIR_PATTERN))
for log_file in log_files:
with open(log_file, 'r') as f:
for line in f:
try:
log = json.loads(line)
if log.get("eventid") == "cowrie.session.connect":
ip = log.get("src_ip")
if is_valid_ip(ip):
ip_counts[ip] += 1
except json.JSONDecodeError:
continue
print(f"\nTotal unique IPs: {len(ip_counts)}\n")
for ip, count in sorted(ip_counts.items(), key=lambda x: x[1], reverse=True):
asn, org, net_name = get_asn_info(ip)
geo = get_geoip(ip)
ports, services = get_shodan_info(ip)
print(f"[+] IP: {ip} - {count} attempts")
print(f" ASN: {asn}")
print(f" Org: {org}")
print(f" Net Name: {net_name}")
print(f" Location: {geo['city']}, {geo['region']}, {geo['country']}")
print(f" Lat/Lon: {geo['latitude']}, {geo['longitude']}")
print(f" Open Ports: {ports}")
print(f" Services: {services}\n")
if __name__ == "__main__":
main()
5. Pew Pew Map: Attack Visualizer
To visualize the global nature of the attacks, we created an interactive map using Leaflet.js. The map shows attacks from 21 unique IPs, with the most active attacker making 6 attempts from Columbus, Ohio.
6. Country Distribution
Using the ipwhois Python library, we enriched our IP data with country and ASN information. The results showed a clear pattern of attacks primarily from:
- United States (9 unique IPs)
- China (4 unique IPs)
- Hong Kong (2 unique IPs)
- Romania, Russia, Taiwan (1 IP each)
This distribution reflects the global nature of automated scanning, with a particular focus on cloud infrastructure in these regions.
7. ASN Distribution
Analysis of the ASN data revealed interesting patterns in network ownership:
- AWS (AS16509): 4 attacking IPs
- Microsoft (AS8075): 4 attacking IPs
- Alibaba (AS37963): 3 attacking IPs
- Google Cloud (AS396982): 2 attacking IPs
This distribution suggests that most attacks originated from cloud infrastructure rather than residential or business networks.
8. Lessons Learned
- Cloud Infrastructure: Most attacks originated from cloud providers, suggesting automated scanning tools running on cloud infrastructure.
- Low Traffic Volume: The relatively low number of unique IPs (21) suggests that running on AWS in the EU region with only SSH enabled results in less visibility to automated scanners.
- Data Quality: The structured output format made it easy to create visualizations and analyze patterns.
9. Future Enhancements
Based on the findings, we plan to:
- Enable Telnet to increase visibility and capture more attack data
- Integrate Shodan API to enrich attacker data with open ports and services
- Implement reverse DNS lookups and threat intelligence feeds for automated enrichment
- Deploy on a smaller VPS provider to compare attack patterns