CyHelp Operations · CTF
Operations · CTF

Blue Team Operations Guide

Want to see some code?

Monitoring Setup

Install Splunk Enterprise and forwarders on every Linux and Windows host before go-live.

Open Monitoring Guide

Detection Queries

Ready-to-paste SPL alerts and dashboards for Linux, Windows, and server services.

Open Dashboards & Alerts

Overview & Priorities

Your role is detection and response · not pentesting, not patching everything in sight. The red team will get in. Your job is to know when, where, and how, fast enough to contain it.

Priority order during prep

1. Asset inventory · you can't defend what you don't know exists.

2. Credential rotation · assume the red team has every default.

3. Baseline snapshots · you need a "known good" for diffing later.

4. Splunk ingestion · get logs flowing ASAP; you're blind without them.

5. Hardening · lock down services, disable junk, firewall rules.

6. Dashboards & alerts · build the cockpit you'll watch live.

7. Backups · snapshot everything before go-live.

Team roles (3-person)

Watcher

Eyes on Splunk dashboards, triages alerts, logs every event in the running journal.

Responder

Investigates the Watcher's leads · runs commands on hosts, pulls files, decides containment.

Reserve / Sleep

Off-shift, recovering. Rotates in for the next slot. Stay disciplined · don't burn out hour 2.

Shift Schedule

Three people · 2.5 days · 8-hour shifts · always 2 active, 1 on reserve/sleep.

Slot00–0808–1616–24
Day 1P3 sleepP1 + P2P2 + P3
Day 2P3 + P1P1 + P2P2 + P3
Day 3P3 + P1P1 + P2·
Tip Shift overlaps by 30 min · outgoing person briefs incoming using the handoff template. Never end a shift mid-incident without a verbal walkthrough.

Golden Rules

Asset Inventory

Build a master list of every host, IP, role, OS, and service before anything else. Without it you are blind.

Network discovery

bash · quick scan
# Discover live hosts on the local /24
sudo nmap -sn 192.168.1.0/24 -oA hosts_alive

# Service/version scan on discovered hosts
sudo nmap -sV -sC -O -iL hosts_alive.gnmap -oA services

# Quick top-1000 sweep
sudo nmap -T4 --top-ports 1000 192.168.1.0/24 -oA top1000

Inventory template

IPHostnameOSRoleOwner
10.0.0.10dc01Win 2022Domain ControllerP1
10.0.0.20web01Ubuntu 22Apache + PHPP2
10.0.0.30db01Debian 12MySQLP2
10.0.0.40splunkUbuntu 22SIEMP3

Baseline Snapshots

You need a "known good" of every host. Hash everything you can. When something feels off later, you diff against this.

bash · Linux baseline
# Hash all binaries in PATH
find /usr/bin /usr/sbin /bin /sbin -type f -exec sha256sum {} \; > /root/baseline-bins.txt

# Capture state of services, listeners, users, cron
systemctl list-units --type=service --state=running > /root/baseline-services.txt
ss -tlnp > /root/baseline-listeners.txt
cat /etc/passwd /etc/shadow /etc/group > /root/baseline-accounts.txt
crontab -l; ls -la /etc/cron* > /root/baseline-cron.txt

# Pull the baseline off-host immediately
scp /root/baseline-*.txt blue@splunk:/var/baselines/$(hostname)/
PowerShell · Windows baseline
Get-Service | Where Status -eq Running | Export-Csv baseline-services.csv
Get-Process | Export-Csv baseline-procs.csv
Get-NetTCPConnection -State Listen | Export-Csv baseline-listeners.csv
Get-LocalUser; Get-LocalGroupMember Administrators | Export-Csv baseline-admins.csv
Get-ScheduledTask | Export-Csv baseline-tasks.csv

Harden Linux

SSH

/etc/ssh/sshd_config
PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
PermitEmptyPasswords no
MaxAuthTries 3
ClientAliveInterval 300
ClientAliveCountMax 2
AllowUsers blue
LoginGraceTime 30
bash
sudo sshd -t  # validate config
sudo systemctl restart sshd

fail2ban (SSH protection)

bash
sudo apt install -y fail2ban
sudo cp /etc/fail2ban/jail.conf /etc/fail2ban/jail.local
# in jail.local: bantime = 1h, maxretry = 3, [sshd] enabled = true
sudo systemctl enable --now fail2ban

Quick wins

Harden Windows

Local policy quick-wins

Disable LLMNR (GPO)

GPO Path
Computer Configuration → Administrative Templates
  → Network → DNS Client
    → Turn Off Multicast Name Resolution = Enabled

PowerShell logging

PowerShell (Admin)
Set-ItemProperty -Path "HKLM:\Software\Policies\Microsoft\Windows\PowerShell\ScriptBlockLogging" -Name EnableScriptBlockLogging -Value 1
Set-ItemProperty -Path "HKLM:\Software\Policies\Microsoft\Windows\PowerShell\ModuleLogging" -Name EnableModuleLogging -Value 1
Note For deeper AD hardening · Mimikatz, Pass-the-Hash, krbtgt, Kerberoasting · see the Targetted Hardening page.

Firewall / Network Device Hardening

UFW (Ubuntu) baseline

bash
sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow from 10.0.0.0/24 to any port 22  # SSH from LAN only
sudo ufw allow 80,443/tcp                            # web
sudo ufw allow from 10.0.0.40 to any port 9997  # Splunk receive
sudo ufw enable
sudo ufw status verbose

Network device principles

Credential Rotation

Assume every default password is already burned. Rotate everything in the first hour.

Linux · bulk password reset

bash (run as root)
# Generate strong random password
openssl rand -base64 24

# Force password change on next login
passwd <user>
chage -d 0 <user>

# Lock dormant accounts
usermod -L <user>

Windows / AD

PowerShell (Admin / DC)
# Force password reset for all enabled users
Get-ADUser -Filter {Enabled -eq $true} |
  Set-ADUser -ChangePasswordAtLogon $true

# Reset specific account with random password
$pw = ConvertTo-SecureString ([System.Web.Security.Membership]::GeneratePassword(20,5)) -AsPlainText -Force
Set-ADAccountPassword -Identity <user> -NewPassword $pw -Reset
Critical The krbtgt account must be reset twice (10h apart) to invalidate any Golden Tickets the red team may have crafted. See Golden Ticket hardening.

Splunk Setup

The full Splunk Enterprise install + Universal Forwarder setup for Linux and Windows clients is on the Monitoring Guide page.

Quick start

Server install (single .deb), forwarders on every host, port 9997 for receive, port 8000 for the web UI.

Open Monitoring Guide

Forwarders

UF on Linux uses /opt/splunkforwarder; on Windows the MSI wizard with a receiving indexer pre-set.

Forwarder setup

Splunk Dashboards

Pre-baked SPL queries for alerts and dashboards across Linux, Windows, and server services live on the Dashboards & Alerts page.

Backups

Snapshot every important host before go-live. If the red team trashes a box, you restore from the snapshot, not from a panicked Google search.

VM snapshots

bash · VirtualBox / VMware
# VirtualBox
VBoxManage snapshot <vmname> take "pre-ctf-baseline" --description "clean state"

# VMware (per-VM)
vmrun snapshot /path/to/vm.vmx "pre-ctf-baseline"

Application data

Monitoring Playbook

What you do every hour as the Watcher. Discipline beats genius · keep the rotation.

Hourly rotation (all panels)

  1. Failed logins by source IP · anything over 50 in 5 min = brute force.
  2. Successful logins of Domain Admins · every one of these is investigated.
  3. New processes per host · anything unusual (powershell.exe spawning cmd, wmic, vssadmin) gets a ticket.
  4. Outbound network from servers · DC should have zero egress traffic.
  5. Host heartbeats · a silent host means a dead UF or a wiped log channel.
  6. HTTP 5xx / web errors · sudden spike = exploitation attempt.
  7. Privileged Windows events · 4720, 4732, 4740, 4624 type 10 (RDP), 4769 RC4.

Journal entry template

journal.md
[14:23] FAILED-LOGIN-SPIKE
  src_ip: 10.0.0.66 (workstation05)
  target: db01:22 · 47 fails in 4 min
  action: blocked src_ip on db01 ufw, opened ticket
  status: contained, watching for further attempts from same /24

Triage & Escalation

SevExamplesActionSLA
P1DA login from unknown IP, krbtgt activity, ntds.dit accessWake reserve, full team active, isolate DC0 min
P2Web shell, new admin, lateral SMBResponder takes lead, contain host5 min
P3Brute force, recon, scan noiseBlock source, log, monitor15 min
P4Single failed login, low-rate noiseNote in journal, ignore·
Note Escalating early is free. Escalating late is a domain takeover. If unsure: assume P2 and verify down.

IR: Linux Compromise

You suspect a Linux host is owned. Run this checklist before rebooting.

bash · first-look IR
# Active connections + listening sockets
ss -tnp; ss -tlnp

# Suspicious processes
ps auxf
ps -ef --forest

# Recently modified files (last 24h)
find / -mtime -1 -type f -not -path "/proc/*" -not -path "/sys/*" 2>/dev/null

# Logged-in users + history
w; last -i | head -30
cat /home/*/.bash_history /root/.bash_history 2>/dev/null

# SUID/SGID newly added (compare against baseline)
find / -perm /6000 -type f 2>/dev/null > /tmp/suid-now.txt
diff /root/baseline-suid.txt /tmp/suid-now.txt

Containment options (least destructive first)

  1. Network isolation · drop firewall to deny all except Splunk + management host
  2. Kill the suspicious process (preserve memory dump first if possible)
  3. Disable the compromised account: passwd -l user
  4. Snapshot the host (if VM) before any further changes
  5. Restore from baseline snapshot as last resort

IR: Windows Compromise

PowerShell (Admin)
# Active connections
Get-NetTCPConnection | Where State -eq Established | Sort RemoteAddress

# Recent processes (with command line)
Get-CimInstance Win32_Process | Select Name, ProcessId, ParentProcessId, CommandLine | ft -auto

# Newly created services
Get-WinEvent -FilterHashtable @{LogName='System'; Id=7045} -MaxEvents 50

# Recent logins (Event 4624)
Get-WinEvent -FilterHashtable @{LogName='Security'; Id=4624; StartTime=(Get-Date).AddHours(-2)}

# Scheduled tasks added recently
Get-ScheduledTask | Where Date -gt (Get-Date).AddDays(-1)

IR: Find Persistence Mechanisms

Common places attackers hide to survive reboots.

Linux

Windows

IR: Lateral Movement

Spotting the attacker hopping host-to-host.

Indicators

Containment

  1. Block source host at the firewall · kill its outbound to internal targets
  2. Kill active sessions: logoff <sessionid> on Windows, kill SSH PIDs on Linux
  3. Reset credentials of every account that touched the source host
  4. Audit the destination hosts for new accounts, services, scheduled tasks

Common Attack Patterns

SSH brute force

High failed-login count from one IP. Block IP, ensure key-only auth, fail2ban active.

Web shell upload

Look for new .php, .aspx, .jsp files in web roots; outbound from www-data; POST with body containing cmd=.

Credential dumping

Sysmon Event 10 ProcessAccess on lsass.exe. Mimikatz/Rubeus signatures in command line.

Kerberoasting

Single user requesting many service tickets (Event 4769, RC4 encryption flag).

Pass-the-Hash

Event 4624 Logon_Type=3 + NTLM authentication from a non-DC source. Block laterally.

Persistence via scheduled task

New schtask running unsigned binary or PowerShell -enc. Disable, investigate parent.

Splunk Query Cheatsheet

SPL · most common queries
# All events from one host in last hour
index=* host=db01 earliest=-1h

# Failed SSH logins by IP
index=main sourcetype=linux_secure "Failed password"
| stats count by src_ip, user
| sort -count

# Windows logon failures
index=wineventlog EventCode=4625
| stats count by Account_Name, src_ip

# Top sourcetypes / volume
index=* earliest=-1h
| stats count by host, sourcetype | sort -count

# Live process creation (requires Sysmon or 4688)
index=wineventlog EventCode=4688 New_Process_Name="*powershell*"
| table _time, ComputerName, Process_Command_Line

# Silent hosts (no logs in last 30 min)
| metadata type=hosts | eval mins_silent=round((now()-recentTime)/60,1)
| where mins_silent > 30
| sort -mins_silent

For more, see the dedicated Dashboards & Alerts page.

Critical Ports Reference

PortProtoServiceNotes
22TCPSSHRestrict to mgmt subnet
53UDP/TCPDNSWatch for tunneling
88TCPKerberosDC only
135TCPRPC endpoint mapperWindows only, internal
389/636TCPLDAP / LDAPSDC only
445TCPSMBBlock between workstations
3389TCPRDPBastion only, never internet-exposed
5985/5986TCPWinRMInternal mgmt only
8000TCPSplunk Web UIInternal access only
9997TCPSplunk receiveFrom forwarders to indexer

Log File Locations

Linux

PathContent
/var/log/auth.logSSH, sudo, login
/var/log/syslogGeneral system events
/var/log/kern.logKernel messages
/var/log/apache2/access.logApache requests
/var/log/nginx/access.logNginx requests
/var/log/mysql/error.logMySQL errors
~/.bash_historyPer-user shell history
journalctl -u <svc>systemd unit logs

Windows

ChannelContent
SecurityLogons, account changes, audit (4624, 4625, 4720, 4732, 4740, 4769)
SystemService events (7045 = new service)
ApplicationApp-level errors
Microsoft-Windows-PowerShell/Operational4103, 4104 PowerShell logs
Microsoft-Windows-Sysmon/Operational1/3/8/10/22 etc. (if Sysmon installed)

Useful Tools

Sysmon (Windows)

Endpoint visibility on steroids · process create, network connect, image load, remote thread, registry. Use SwiftOnSecurity config as the baseline.

SwiftOnSecurity config

auditd (Linux)

Kernel-level audit framework. Use auditctl to track file accesses, syscalls, and command execution.

chkrootkit / rkhunter

Rootkit detectors. Run during prep for a baseline; re-run during IR to spot kernel-level implants.

AIDE

File integrity checker. Compute baseline hashes, then aide --check later to find tampered files.

fail2ban

Watches log files and bans IPs that hit thresholds. Easy SSH/web brute force defense.

tcpdump / Wireshark

For packet capture during incidents. tcpdump -i any -w incident.pcap host <ip> for a quick capture.

Shift Handoff Template

Use this every shift change. Verbal walkthrough + journal entry.

handoff.md
SHIFT HANDOFF · <date> <time>
Outgoing: P1
Incoming: P2

ACTIVE INCIDENTS
  1. <short title> · sev P2 · host: db01 · owner: P1
     status: contained, monitoring egress
     next-action: re-image at 14:00 if no further activity

OPEN TICKETS
  - #007 SSH bruteforce 10.0.0.66 → blocked, watching
  - #008 unusual cron on web01 → escalate to P3 if reappears

ENVIRONMENT CHANGES THIS SHIFT
  - rotated credentials on dc01 (krbtgt 1st reset done, 2nd at 19:00)
  - added 2 new alerts in Splunk (failed PowerShell -enc, new service)

WATCHLIST FOR INCOMING SHIFT
  - krbtgt 2nd reset at 19:00 · must run before red team can reuse golden ticket
  - Splunk disk at 78% · rotate older indexes if needed

UNCERTAIN / TO-INVESTIGATE
  - one alert at 12:34 looked like recon from 10.0.0.99 · couldn't reach owner
  - need second pair of eyes on web01 access.log for SQLi patterns