This syllabus starts with a 2-hour core (four 30-minute modules) and then extends into a full-day 12-hour Linux admin track. It balances fundamental concepts, hands-on practice, and essential networking/security skills like SSH.
🐧 Linux Essentials: 2-Hour Crash Course
⏱️ Course Overview
- Module 1 (0:00 - 0:30): Shell Navigation & The Help System
- Module 2 (0:30 - 1:00): File Manipulations & Text Processing
- Module 3 (1:00 - 1:30): Remote Access (SSH), File Transfer & Networking
- Module 4 (1:30 - 2:00): Permissions, Process Management & Troubleshooting
🛠️ Module 1: Shell Navigation & The Help System (30 Mins)
Introduction to the CLI (5 mins)
- Linux relies on the Command Line Interface (CLI) for speed and server automation.
- The “Shell” (usually Bash or Zsh) translates your typed text into system actions.
Basic Navigation (15 mins)
- pwd: Print Working Directory. Shows your exact current location.
- ls: List directory contents.
- ls -l: Long format (shows sizes, owners, and permissions).
- ls -a: Shows hidden files (files starting with a dot, like .bashrc).
- cd: Change directory.
- cd /: Go to the root system directory.
- cd ~ or just cd: Go to your user’s home directory.
- cd ..: Move up one level.
- Pro-Tip: Use the Tab key for autocomplete to prevent typos.
Self-Help & Documentation (10 mins)
- man
: Opens the system manual (e.g., man ls). Press q to exit. -
–help: Outputs a quick-reference summary flag list directly in the terminal.
📂 Module 2: File Manipulations & Text Processing (30 Mins)
Creating and Moving Files (15 mins)
- mkdir
: Create a new directory. - touch
: Create an empty file or update an existing file's timestamp. - cp
: Copy files. - Use cp -r to copy folders recursively.
- mv
: Move or rename files and folders. - rm
: Delete files. - Use rm -rf
to forcefully delete a directory and everything inside it (use with caution).
Inspecting and Searching Text (15 mins)
- cat
: Dumps the entire file contents onto your screen. - less
: Opens an interactive viewer for large files. Use arrow keys to scroll, q to quit. - head -n 20
: View the first 20 lines of a file. - tail -n 20
: View the last 20 lines of a file. - tail -f
: Follow mode. Streams new additions to a file (like active log files) in real-time. - grep “
" : Searches for matching text patterns inside files.
🌐 Module 3: Remote Access (SSH), File Transfer & Networking (30 Mins)
Introduction to SSH (10 mins)
- Secure Shell (SSH) encrypts the connection between your machine and a remote Linux server.
- Basic connection: ssh username@remote_host_ip
- Using a specific port: ssh -p 2222 username@remote_host_ip
- Key-Based Authentication: Utilizing ~/.ssh/id_rsa keys instead of passwords for automated, highly secure access.
Copying Files Remotely (10 mins)
- scp: Secure Copy Protocol (best for simple, single file transfers over SSH).
- Local to Remote: scp localfile.txt username@remote_ip:/path/to/destination/
- Remote to Local: scp username@remote_ip:/path/to/remotefile.txt /local/destination/
- rsync: Remote Sync. Faster and smarter than scp because it only copies differences between files and allows resuming interrupted transfers.
- Example: rsync -avz local_folder/ username@remote_ip:/remote_folder/
Basic Networking Utilities (10 mins)
- ping
: Check if a remote server is reachable and active. - curl
or wget : Download files directly from the web via CLI. - ip a: Display network interfaces and your current IP addresses.
🔒 Module 4: Permissions, System Control & Troubleshooting (30 Mins)
Linux Permissions & Sudo (10 mins)
- Linux uses three user tiers: User (u), Group (g), and Others (o).
- Linux uses three access types: Read (r), Write (w), and Execute (x).
- chmod: Change file permissions.
- chmod +x script.sh: Makes a script executable.
- chown: Change file ownership (e.g., chown username:groupname file.txt).
- sudo: SuperUser Do. Runs a single command with root (administrator) privileges.
Process Management (10 mins)
- ps aux: Lists every single running process on the system.
- top or htop: Interactive task managers showing live CPU and memory usage.
- kill
: Gracefully stops a process using its Process ID number. - kill -9
: Forcefully terminates a frozen process immediately.
System Diagnostics (10 mins)
- df -h: Displays remaining disk space in human-readable formats (GB/MB).
- free -h: Displays total, used, and available RAM memory.
- history: Shows a list of all commands previously executed in this terminal session.
🏁 Hands-on Lab Challenge (To run during the final 15 mins)
Perform this exact sequence on your test environments to validate your understanding:
- Log into your remote training server using SSH.
- Create a folder named backup_test in your home directory.
- Generate a system status file: df -h > disk_space.txt.
- Use grep to find the word “root” inside disk_space.txt.
- Change the file permissions so it is read-only for everyone (chmod 444 disk_space.txt).
- Disconnect from SSH and try to use scp to pull that disk_space.txt file back to your local machine.
🚀 Hour 3: Users, Packages, Services, and Logs
Module 5 (2:00 - 2:30): User and Group Administration
- whoami, id: Confirm current user and group membership.
- useradd / adduser: Create local users.
- passwd
: Set or reset user passwords. - usermod -aG
: Add a user to a group (example: sudo or wheel). - groups
: Verify effective group membership. - Common admin check:
- Ubuntu:
groups <username>should includesudo. - AlmaLinux:
groups <username>should includewheel.
- Ubuntu:
Module 6 (2:30 - 3:00): Package and Service Management
- Package managers:
- Ubuntu: apt
- AlmaLinux: dnf
- Core package actions:
- Search:
apt search <pkg>ordnf search <pkg> - Install:
sudo apt install -y <pkg>orsudo dnf install -y <pkg> - Remove:
sudo apt remove <pkg>orsudo dnf remove <pkg>
- Search:
- Service control with systemd:
sudo systemctl status <service>sudo systemctl start <service>sudo systemctl stop <service>sudo systemctl enable <service>
- Essential examples:
- Ubuntu SSH service: ssh
- AlmaLinux SSH service: sshd
Hour 3 Mini-Drill (5-10 mins)
- Install htop.
- Check SSH service status with systemctl.
- Enable SSH service to start on boot.
- Verify with
systemctl is-enabled <service>.
🔧 Hour 4: Automation, Scheduling, and Recovery Basics
Module 7 (3:00 - 3:30): Shell Scripting Fundamentals
- Why scripts: repeatability, consistency, and faster ops.
- Create your first script:
nano health_check.sh(or your preferred editor)- Add a shebang:
#!/usr/bin/env bash - Add command checks:
date,uptime,df -h,free -h
- Make executable and run:
chmod +x health_check.sh./health_check.sh
- Save output for audits:
./health_check.sh > health_report.txt
Module 8 (3:30 - 4:00): Scheduling and Troubleshooting Workflow
- Scheduled tasks with cron:
crontab -eto edit- Example every day at 06:00:
0 6 * * * /home/<user>/health_check.sh >> /home/<user>/health.log 2>&1
- Log investigation:
journalctl -xefor recent system issues.journalctl -u ssh --since "1 hour ago"(Ubuntu)journalctl -u sshd --since "1 hour ago"(AlmaLinux)
- Network/service triage checklist:
- Verify IP (
ip a) - Verify listener (
ss -tulpen | grep 22) - Verify firewall policy (examples):
- Ubuntu (UFW):
sudo ufw status verbose - AlmaLinux (firewalld):
sudo firewall-cmd --list-all
- Ubuntu (UFW):
- Verify service state in systemctl (examples):
- Full status:
sudo systemctl status ssh(Ubuntu) orsudo systemctl status sshd(AlmaLinux) - Quick state check:
systemctl is-active ssh(Ubuntu) orsystemctl is-active sshd(AlmaLinux)
- Full status:
- Verify IP (
🏁 Hands-on Lab Challenge 2 (End of Hour 4)
Complete this sequence to validate your Hour 3 and 4 skills:
- Create a user named opsuser and add it to sudo (Ubuntu) or wheel (AlmaLinux).
- Install htop and verify it launches.
- Write a script named health_check.sh that outputs date, uptime, disk usage, and memory usage.
- Make the script executable and run it, saving output to health_report.txt.
- Create a cron job that runs the script every day at 06:00 and appends to health.log.
- Confirm the cron entry exists with
crontab -l. - Check recent SSH service logs using journalctl for your distro.
- Document one troubleshooting finding from logs in a file named incident_notes.txt.
💿 Bonus: Installing Ubuntu Server 26.04 and AlmaLinux
Use this section if you want to build your own lab VMs (VirtualBox, VMware, Proxmox, Hyper-V, or cloud instances).
Before You Start
- Minimum recommended per VM: 2 vCPU, 2-4 GB RAM, 20+ GB disk.
- Download official ISO images:
- Ubuntu Server 26.04 LTS ISO from ubuntu.com.
- AlmaLinux ISO from almalinux.org.
- Create bootable media:
- On Linux/macOS:
dd(advanced users only). - On Windows/macOS/Linux: tools like Rufus, balenaEtcher, or Ventoy.
- On Linux/macOS:
Install Ubuntu Server 26.04 LTS (Quick Path)
- Boot from the Ubuntu Server 26.04 ISO.
- Select language, keyboard layout, and network settings.
- Set hostname (for example: ubuntu-lab).
- Create your admin user and strong password.
- For storage, choose guided partitioning unless you need a custom layout.
- Enable OpenSSH Server during setup so remote access works immediately.
- Complete install, reboot, and remove ISO media.
- Verify after first login:
cat /etc/os-releaseip asudo systemctl status ssh
Install AlmaLinux (Quick Path)
- Boot from the AlmaLinux ISO.
- In the installer, configure:
- Keyboard and timezone.
- Installation destination (auto-partitioning is fine for labs).
- Network and hostname (for example: alma-lab).
- In software selection, choose a minimal/server profile.
- Set root password and create a regular admin user.
- Start installation, then reboot when finished.
- Verify after first login:
cat /etc/os-releaseip asudo systemctl status sshd
First Updates (Both Distros)
- Ubuntu:
sudo apt update && sudo apt upgrade -y
- AlmaLinux:
sudo dnf update -y
- Optional but useful for this course:
sudo apt install -y htop curl wget(Ubuntu)sudo dnf install -y htop curl wget(AlmaLinux)
Bonus: SSH Keys on Windows (What They Are + How to Manage Them)
- SSH keys come in a pair:
- Public key: safe to share. You publish this to servers, Git hosting, or tools.
- Private key: secret. Never share this file, never email it, never paste it in chat.
- How auth works:
- A server stores your public key in
~/.ssh/authorized_keys. - Your Windows machine proves identity using the matching private key.
- A server stores your public key in
sequenceDiagram
participant C as Windows client
participant K as Private key
participant S as SSH server
participant A as authorized_keys
C->>K: Sign challenge locally
C->>S: Send public-key auth request
S->>A: Compare presented key
A-->>S: Match found
S-->>C: Login allowed
Where Keys Live on Windows
- Default OpenSSH folder:
C:\Users\<your_user>\.ssh\
- Typical files:
id_ed25519(private key)id_ed25519.pub(public key)
- Good habit: keep one key per purpose (for example: one for admin servers, one for Git).
Generate a New Key Pair (PowerShell)
- Open PowerShell.
- Run:
ssh-keygen -t ed25519 -C "your_email@example.com"
- When prompted:
- Save path: press Enter for default, or set a custom filename.
- Passphrase: set one (recommended).
- Verify files:
Get-ChildItem $HOME\.ssh
Start ssh-agent and Load Your Private Key
- Ensure the agent service is running:
Get-Service ssh-agent | Set-Service -StartupType AutomaticStart-Service ssh-agent
- Add your key:
ssh-add $HOME\.ssh\id_ed25519
- Confirm key is loaded:
ssh-add -l
Publish Your Public Key Safely
- View/copy only the
.pubfile:Get-Content $HOME\.ssh\id_ed25519.pub
- Publish to a Linux server (Option 1, easiest):
ssh-copy-id username@server_ip(if available in your shell)
- Publish to a Linux server (Option 2, manual):
- SSH into server and append key text into
~/.ssh/authorized_keys. - Then fix permissions:
chmod 700 ~/.sshchmod 600 ~/.ssh/authorized_keys
- SSH into server and append key text into
- Publish to Git hosting (GitHub/GitLab/Azure DevOps):
- Paste only the public key (
.pub) into SSH Keys settings.
- Paste only the public key (
Validate and Troubleshoot
- Test server login with key auth:
ssh -i $HOME\.ssh\id_ed25519 username@server_ip
- Debug connection issues:
ssh -v username@server_ip
- Common mistakes:
- Wrong file shared (private key instead of
.pub). - Bad file permissions on server
~/.sshorauthorized_keys. - Using the wrong username/host or key filename.
- Wrong file shared (private key instead of
Alternative SSH Tools on Windows
If you do not want to use the built-in OpenSSH client, these are common alternatives.
1. PuTTY + PuTTYgen + Pageant (Classic and Widely Used)
- What each tool does:
- PuTTY: SSH terminal client.
- PuTTYgen: key generator and key converter.
- Pageant: SSH key agent for caching unlocked private keys.
- Typical setup flow:
- Install PuTTY from the official site.
- Open PuTTYgen and click Generate (move mouse until complete).
- Save:
- Private key as
.ppk(keep secret). - Public key text (copy to
authorized_keyson server).
- Private key as
- In PuTTY, configure:
- Session: hostname/IP and port 22.
- Connection > Data: auto-login username (optional).
- Connection > SSH > Auth > Credentials: select your
.ppkfile.
- Save the PuTTY session profile and connect.
- Optional (recommended):
- Start Pageant and load your
.ppkonce, so PuTTY sessions can reuse it without repeated prompts.
- Start Pageant and load your
2. Convert Existing OpenSSH Keys for PuTTY
If you already created id_ed25519 with ssh-keygen, convert it for PuTTY:
- Open PuTTYgen.
- Click Load and select your OpenSSH private key (
id_ed25519). - Save private key as
.ppk. - Use this
.ppkin PuTTY Auth settings.
Note: your public key stays the same conceptually; keep publishing only the public key content.
3. MobaXterm (All-in-One SSH + SFTP GUI)
- Why people use it:
- Built-in terminal, tabs, and graphical SFTP browser.
- Basic workflow:
- Create a new SSH session (host, username, port).
- Under advanced SSH settings, select your private key file.
- Connect and use the left SFTP pane to transfer files.
- Key safety:
- Use passphrase-protected keys and do not export private keys into shared folders.
4. Bitvise SSH Client (Good GUI Controls)
- Why people use it:
- Friendly GUI for terminal + SFTP + port forwarding.
- Basic workflow:
- Create a profile with host, port, and username.
- Import/select your private key in Client key manager.
- Connect and save the profile for repeat use.
- Best practice:
- Keep separate profiles/keys for production vs lab servers.
Tool Choice Quick Guide
- Built-in OpenSSH (PowerShell/Windows Terminal): best for scripting and automation.
- PuTTY suite: best for traditional Windows SSH workflows and key conversion needs.
- MobaXterm: best for users who want terminal + easy file transfer in one window.
- Bitvise: best for users who prefer a full-featured SSH GUI with clear profiles.
🧭 Extended Track: Hours 5-12 (Full-Day Linux Admin Foundations)
If you want to continue beyond the first 4 hours, use this expanded path to build practical, job-ready Linux administration skills.
Hour 5 (4:00 - 5:00): Storage and Filesystems
Module 9 (4:00 - 4:20): Disk Discovery and Partitioning
- Identify disks and partitions:
lsblk,blkid,fdisk -l. - Understand device naming (
/dev/sda,/dev/nvme0n1p1). - Confirm the correct target disk before changes:
lsblk -o NAME,SIZE,FSTYPE,MOUNTPOINT,MODELsudo fdisk -l
- Prefer
partedfor modern GPT-aware workflows, orfdiskfor simple MBR/GPT tasks. - Create a single LVM partition type on new data disks when planning for growth.
Module 10 (4:20 - 4:40): Filesystems and Mounting
- Create filesystems:
mkfs.ext4,mkfs.xfs. - Mount and unmount devices:
mount,umount. - Make mounts persistent with
/etc/fstabusingUUID=entries (more stable than/dev/sdX). - Validate
fstabbefore reboot:sudo mount -a
- Use a safe mount option baseline for data volumes:
defaults,nofailfor non-root data disks in many lab/server cases.
Module 11 (4:40 - 5:00): LVM in Production (Why, Safe Usage, and Real Examples)
- Physical Volume (PV), Volume Group (VG), Logical Volume (LV) concepts.
- Why use LVM:
- Online growth without repartitioning most of the time.
- Cleaner capacity management across multiple disks.
- Easier migration and operational flexibility for app/data volumes.
- Core commands:
pvcreate,vgcreate,lvcreate,lvextend,vgs,lvs,pvs. - Filesystem growth after LV growth:
- ext4:
resize2fs - xfs:
xfs_growfs(grow while mounted)
- ext4:
graph TD
D[Physical disk] --> P[Partition marked for LVM]
P --> PV[PV: pvcreate]
PV --> VG[VG: vgcreate or vgextend]
VG --> LV[LV: lvcreate or lvextend]
LV --> FS[Filesystem: ext4 or xfs]
FS --> M[Mount point: /data]
D2[Additional disk] --> PV
Safe LVM Setup Pattern (Recommended)
- Pre-change checks:
- Confirm backups/snapshots exist and are restorable.
- Record current state:
lsblk,pvs,vgs,lvs,df -h. - Confirm application I/O profile and low-traffic change window.
- Add new physical disk and verify detection:
lsblk
- Partition disk for LVM (example with
/dev/sdb):sudo parted -s /dev/sdb mklabel gpt mkpart primary 1MiB 100% set 1 lvm on
- Build/extend LVM stack:
sudo pvcreate /dev/sdb1- New VG path:
sudo vgcreate vg_data /dev/sdb1 - Existing VG path:
sudo vgextend vg_data /dev/sdb1
- Create logical volume (example):
sudo lvcreate -n lv_app -L 100G vg_data
- Create filesystem and mount:
sudo mkfs.xfs /dev/vg_data/lv_appsudo mkdir -p /datasudo mount /dev/vg_data/lv_app /datasudo blkid /dev/vg_data/lv_app- Add
UUID=<uuid> /data xfs defaults,nofail 0 2to/etc/fstab sudo mount -a
Example: Grow an Existing LV with Minimal Downtime
- Scenario:
/datais on/dev/vg_data/lv_appand needs +50G.- Ensure free space exists in VG:
vgs- Extend LV:
sudo lvextend -L +50G /dev/vg_data/lv_app- Grow filesystem:
- XFS mounted at
/data:sudo xfs_growfs /data - ext4:
sudo resize2fs /dev/vg_data/lv_app- Validate:
lvsdf -h /data
Downtime and Risk Reduction Checklist for Disk Work
- Never modify unknown disks; verify by size/model/serial first.
- Prefer adding capacity (new PV + VG/LV extension) over risky partition rewrites.
- Avoid shrinking filesystems/LVs unless absolutely required (higher risk).
- Keep root and critical app data on separate LVs where possible.
- Always test
fstabwithmount -abefore reboot. - Keep a rollback path: backup, snapshot, and command log.
- For critical systems, perform changes in a maintenance window and monitor logs/live metrics.
Hour 5 Lab Test
- Add a new virtual disk and confirm identification with
lsblkandfdisk -l. - Partition it as LVM and create a PV.
- Create or extend
vg_data, then createlv_app. - Format
lv_app(xfs or ext4), mount at/data, and persist with UUID in/etc/fstab. - Simulate growth by extending the LV and filesystem online.
- Validate with
pvs,vgs,lvs, anddf -h /data. - Write a short rollback and safety checklist in
storage_change_notes.txt.
Hour 6 (5:00 - 6:00): Networking and Firewall Operations
Module 12 (5:00 - 5:20): Interfaces and Routing
- Why this matters:
- Most outages start with basic network issues: bad IP, wrong gateway, or service not listening.
- Inspect interfaces and addresses:
ip a- What to look for: interface state
UP, correct subnet, expected primary NIC.
- Inspect routing table:
ip route- What to look for: default route (
default via <gateway>) and correct interface.
- Check listening sockets:
ss -tulpen- Example filter for SSH/HTTP ports:
ss -tulpen | grep -E ':22|:80|:443'
- Interface troubleshooting examples:
- Restart NetworkManager-managed interface (if used):
sudo nmcli con up <connection_name> - Bounce interface quickly (lab use):
sudo ip link set dev <iface> down && sudo ip link set dev <iface> up
- Restart NetworkManager-managed interface (if used):
- Quick validation workflow:
ip a->ip route->ping <gateway_ip>->ping 8.8.8.8- If gateway ping fails, issue is usually local VLAN/NIC config.
- If gateway works but internet ping fails, issue is usually upstream routing/firewall.
Module 13 (5:20 - 5:40): DNS and Connectivity Troubleshooting
- Why this matters:
- Many “network” incidents are actually DNS failures, not transport failures.
- Name resolution checks:
- View resolver state:
resolvectl status - Query a record:
dig example.com +short - Compare with alternative tool:
nslookup example.com
- View resolver state:
- Distinguish DNS failure vs network failure:
- Test IP reachability directly:
ping 1.1.1.1 - Test name resolution path:
ping example.com - If IP works but hostname fails, focus on DNS config.
- Test IP reachability directly:
- Path checks:
traceroute example.comto inspect hops.mtr -rw example.comfor combined latency/loss view (if installed).
- HTTP/HTTPS endpoint checks:
- Headers/status only:
curl -I https://example.com - Verbose TLS/connection details:
curl -v https://example.com - Test from a specific interface/source IP:
curl --interface <iface_or_ip> -I https://example.com
- Headers/status only:
- Common issue patterns and fixes:
- Wrong DNS server in resolver config -> update NetworkManager/netplan config and re-test.
- Local firewall blocks egress DNS/HTTPS -> verify policy and retry.
- Proxy environment mismatch -> check
http_proxy/https_proxyvariables.
- Practical troubleshooting sequence:
ip a->ip route->ping gateway->ping 1.1.1.1->dig example.com->curl -I https://example.com- Stop at first failure point and fix that layer before moving on.
Module 14 (5:40 - 6:00): Firewall Workflows
- Ubuntu UFW:
sudo ufw status verbosesudo ufw allow 22/tcpsudo ufw allow 80/tcp
- AlmaLinux firewalld:
sudo firewall-cmd --list-allsudo firewall-cmd --add-service=ssh --permanentsudo firewall-cmd --add-service=http --permanentsudo firewall-cmd --reload
- Important Docker note (security-critical):
- Docker-published ports (
-p) can bypass expected UFW/firewalld behavior because Docker manages iptables/nft chains directly. - Result: a container port may become reachable even when host firewall policy appears restrictive.
- Docker-published ports (
- Safer ways to expose Docker ports:
- Bind container ports to loopback when external access is not needed:
docker run -d -p 127.0.0.1:8080:80 --name webapp nginx
- Reverse-proxy pattern:
- Keep app containers on internal Docker networks.
- Expose only one hardened ingress proxy (Nginx/Traefik/Caddy) to public ports.
- Restrict source IPs with firewall rules on the host where possible.
- Use explicit, minimal published ports instead of broad mappings.
- Bind container ports to loopback when external access is not needed:
- Verification steps after publishing ports:
- Check published ports:
docker ps --format "table \t" - Check host listeners:
ss -tulpen | grep -E ':80|:443|:8080' - Validate firewall rules still match intended exposure.
- Check published ports:
- Operational best practices for Docker + firewall:
- Default-deny on host firewall and allow only required ports.
- Do not publish admin ports (databases, dashboards) directly to the internet.
- Document every published container port and business justification.
- Re-test exposure from an external host after each deployment.
graph TD
C[Container port] --> D[Docker publish -p]
D --> H[Host port]
H --> L[Bind to 127.0.0.1 for local only]
H --> W[Bind to 0.0.0.0 for public access]
H --> R[Reverse proxy entrypoint]
R --> F[UFW or firewalld]
F --> I[External client]
Hour 6 Lab Test
- Allow only SSH and HTTP through the firewall.
- Confirm listener and firewall rules.
- Validate remote access still works.
- Document one failed test and how you corrected it.
Hour 7 (6:00 - 7:00): Monitoring and Log Analysis
Module 15 (6:00 - 6:20): Logs and Journald
- Read recent logs:
journalctl -xe. - Filter by service and time:
journalctl -u ssh --since "30 min ago"(Ubuntu)journalctl -u sshd --since "30 min ago"(AlmaLinux)
- Follow logs in real time:
journalctl -f.
Module 16 (6:20 - 6:40): Performance Monitoring
- CPU and memory:
top,htop,vmstat. - Disk I/O:
iostat. - Load and uptime:
uptime,w. - Host metrics export (for centralized monitoring):
- Install and run Node Exporter on each Linux host.
- Validate exporter endpoint locally:
curl http://127.0.0.1:9100/metrics | head.
- Monitoring data flow (common pattern):
- Node Exporter on host -> Prometheus scrape -> Grafana dashboards.
- Why this helps:
- Local commands are great for live troubleshooting.
- Centralized metrics are better for trend analysis and proactive alerting.
Module 17 (6:40 - 7:00): Alerting Mindset and Baselines
- Capture baseline metrics at normal load.
- Track trends, not just one-time spikes.
- Define simple thresholds for CPU, memory, disk, and service state.
- Send metrics to Grafana stack (practical baseline):
- Add host target to Prometheus scrape config (example target:
server1:9100). - Add Prometheus as a Grafana data source.
- Import/create dashboards for CPU, memory, disk, filesystem, and network.
- Add host target to Prometheus scrape config (example target:
- Alerting workflow:
- Prometheus evaluates alert rules.
- Alertmanager routes notifications (email/Slack/Teams/webhook).
- Grafana can also alert directly from panel queries if preferred.
- Starter alerts to implement first:
- CPU usage > 90% for 5 minutes.
- Filesystem usage > 85% for 10 minutes.
- Host unreachable (
up == 0) for 2 minutes. - SSH service down.
- Logs + metrics together:
- Optional logging path: Promtail/Fluent Bit -> Loki -> Grafana logs view.
- Correlate metric spikes with log events for faster root-cause analysis.
graph LR
H[Linux host] --> NE[Node Exporter]
NE --> P[Prometheus]
P --> G[Grafana dashboards]
P --> A[Alertmanager]
A --> N[Email / Slack / Teams]
H --> L[Promtail or Fluent Bit]
L --> LK[Loki]
LK --> G
Hour 7 Lab Test
- Simulate high CPU or memory load.
- Identify impact using monitoring commands.
- Confirm Node Exporter metrics are reachable on port 9100.
- Add the host to Prometheus and verify target health is up.
- Build/import a Grafana dashboard and verify live host metrics.
- Create one alert rule (for example: high CPU) and test notification routing.
- Correlate symptoms with logs.
- Produce a short incident summary with findings.
Hour 8 (7:00 - 8:00): Service Management and Boot Troubleshooting
Module 18 (7:00 - 7:20): systemd Deep Dive
- Unit types and dependencies.
- Common commands:
systemctl list-units --type=servicesystemctl status <service>systemctl restart <service>
Module 19 (7:20 - 7:40): Startup Control
- Enable/disable services:
systemctl enable <service>systemctl disable <service>
- Mask/unmask for hard disable:
systemctl mask <service>systemctl unmask <service>
Module 20 (7:40 - 8:00): Recovery Basics
- Troubleshoot failed services with
journalctl -u <service>. - Validate unit files with
systemd-analyze verify. - Rescue-mode concepts and safe rollback planning.
Hour 8 Lab Test
- Intentionally misconfigure a non-critical test service.
- Detect failure cause via systemctl and journalctl.
- Correct and restart service.
- Confirm successful boot persistence.
Hour 9 (8:00 - 9:00): Bash Scripting and Automation
Module 21 (8:00 - 8:20): Script Structure
- Shebang, variables, quoting, and exit codes.
- Input args and validation.
- Why script structure matters:
- Consistent structure reduces production mistakes and makes scripts easier to debug.
- Example skeleton:
#!/usr/bin/env bashset -euo pipefailusage() { echo "Usage: $0 <target_dir>"; }[[ $# -ne 1 ]] && usage && exit 1target_dir="$1"
- Quoting example (avoid word-splitting bugs):
- Good:
cp "$src_file" "$target_dir/" - Risky:
cp $src_file $target_dir/
- Good:
- Exit code pattern:
0means success.- Non-zero means failure and should be handled/logged.
Module 22 (8:20 - 8:40): Control Flow and Safety
- Conditionals (
if) and loops (for,while). - Safer scripts with
set -euo pipefail. - Logging and timestamped output.
- Example conditional check:
if systemctl is-active --quiet ssh; then echo "SSH OK"; else echo "SSH DOWN"; fi
- Example loop over checks:
for cmd in df free uptime; do echo "== $cmd =="; "$cmd"; done
- Safety explanation:
-e: stop on command errors.-u: fail on undefined variables.-o pipefail: fail pipeline if any command fails.
- Timestamped logging example:
log_file="/var/log/health_check.log"echo "$(date '+%F %T') INFO starting health check" | tee -a "$log_file"
- Basic failure handler pattern:
trap 'echo "$(date '+%F %T') ERROR line $LINENO" | tee -a "$log_file"' ERR
Module 23 (8:40 - 9:00): Scheduling Automation
- Cron review and troubleshooting.
- Intro to systemd timers (optional advanced path).
- Cron example (daily at 06:00):
0 6 * * * /home/<user>/health_check.sh >> /home/<user>/health.log 2>&1
- Cron troubleshooting checklist:
- Use absolute paths in scripts (
/usr/bin/df,/usr/bin/free) when needed. - Ensure execute bit is set:
chmod +x /home/<user>/health_check.sh. - Confirm cron entry:
crontab -l. - Check logs:
- Ubuntu/Debian:
grep CRON /var/log/syslog - AlmaLinux/RHEL:
sudo journalctl -u crond --since "1 hour ago"
- Ubuntu/Debian:
- Use absolute paths in scripts (
- systemd timer mini-example (more reliable for modern servers):
- Service unit runs
health_check.sh. - Timer unit uses
OnCalendar=*-*-* 06:00:00. - Enable with
sudo systemctl enable --now health_check.timer.
- Service unit runs
Hour 9 Lab Test
- Build a script that checks disk, memory, and SSH service.
- Write output to dated files in a
reports/directory. - Schedule it daily and verify execution.
Hour 10 (9:00 - 10:00): Security Hardening Essentials
Module 24 (9:00 - 9:20): SSH Hardening
- Disable direct root login.
- Prefer key-only auth where practical.
- Review
sshd_configsafely before restart. - Why this matters:
- SSH is a primary attack path. Hardening it reduces brute-force and credential abuse risk.
- Recommended baseline settings in
sshd_config:PermitRootLogin noPasswordAuthentication no(after key auth is confirmed)PubkeyAuthentication yesPermitEmptyPasswords no- Optional:
AllowUsers adminuser opsuser
- Safe change workflow (avoid lockout):
- Keep one active SSH session open.
- Edit config and validate syntax:
sudo sshd -t
- Reload service:
- Ubuntu:
sudo systemctl reload ssh - AlmaLinux:
sudo systemctl reload sshd
- Ubuntu:
- Test login in a second terminal before closing the first session.
- Verification examples:
ssh -o PreferredAuthentications=password -o PubkeyAuthentication=no user@hostshould fail when password auth is disabled.- Check effective settings:
sudo sshd -T | grep -E 'permitrootlogin|passwordauthentication|pubkeyauthentication'
- Add Fail2ban for brute-force protection:
- Why use it:
- Automatically bans abusive IPs after repeated failed logins.
- Adds a strong defensive layer for internet-exposed services.
- Install:
- Ubuntu:
sudo apt install -y fail2ban - AlmaLinux:
sudo dnf install -y fail2ban
- Ubuntu:
- Safe configuration model:
- Do not edit
jail.confdirectly. - Create local overrides in
jail.localor files injail.d/.
- Do not edit
- Enable and start:
sudo systemctl enable --now fail2ban
- Verify daemon health:
sudo fail2ban-client status
- Example
jail.localbaseline (service-specific):[DEFAULT]bantime = 1hfindtime = 10mmaxretry = 5ignoreip = 127.0.0.1/8 ::1 <your_admin_ip_or_subnet>- ``
[sshd]enabled = trueport = sshlogpath = %(sshd_log)sbackend = systemd- ``
[nginx-http-auth]enabled = trueport = http,httpslogpath = /var/log/nginx/error.log- ``
[apache-badbots]enabled = trueport = http,httpslogpath = /var/log/apache2/*error.log(Ubuntu) or/var/log/httpd/*error_log(AlmaLinux)
- Apply and validate:
sudo systemctl restart fail2bansudo fail2ban-client statussudo fail2ban-client status sshd
- Useful operations:
- List current bans:
sudo fail2ban-client status <jail_name> - Unban a host:
sudo fail2ban-client set <jail_name> unbanip <ip>
- List current bans:
- Safety notes:
- Set
ignoreipbefore enabling aggressive jails to avoid self-lockout. - Start with moderate
maxretryandbantime, then tune from logs.
- Set
- Why use it:
Module 25 (9:20 - 9:40): Least Privilege and sudo Hygiene
- Use dedicated admin users.
- Reduce broad sudo access.
- Audit privileged group membership regularly.
- Why this matters:
- Limiting privilege lowers blast radius if one account is compromised.
- Practical controls:
- Use personal named admin accounts, not shared admin logins.
- Put only required users in privileged groups:
- Ubuntu group:
sudo - AlmaLinux group:
wheel
- Ubuntu group:
- Prefer command-specific sudo rules over full ALL access when possible.
- Audit examples:
- Check sudo/wheel membership:
getent group sudogetent group wheel
- Review sudo permissions safely:
sudo visudo -csudo -l -U <username>
- Check sudo/wheel membership:
- Safer sudoers pattern example:
- Grant service restart only:
<username> ALL=(root) NOPASSWD: /bin/systemctl restart nginx
- Grant service restart only:
- Operational guidance:
- Remove unused accounts quickly.
- Enforce strong passwords and key passphrases.
- Review privileged access on a schedule (weekly/monthly).
Module 26 (9:40 - 10:00): Patching and Vulnerability Hygiene
- Keep systems updated with apt/dnf.
- Remove unused packages/services.
- Document security changes and rollback plans.
- Why this matters:
- Most exploited vulnerabilities are old and already patched.
- Update workflow examples:
- Ubuntu:
sudo apt updatesudo apt list --upgradablesudo apt upgrade -y
- AlmaLinux:
sudo dnf check-updatesudo dnf update -y
- Ubuntu:
- Post-patch validation checklist:
- Verify critical services are running:
systemctl status ssh(Ubuntu)systemctl status sshd(AlmaLinux)
- Check failed services after patch/reboot:
systemctl --failed
- Review recent errors:
journalctl -p err -b
- Verify critical services are running:
- Minimize downtime and risk:
- Patch first in test/staging, then production.
- Use maintenance windows for kernel/glibc/major updates.
- Reboot only when required and validate app paths immediately.
- Optional vulnerability tooling examples:
- Ubuntu:
sudo apt install -y unattended-upgrades - AlmaLinux:
sudo dnf install -y dnf-automatic - Schedule automated security updates after testing policy is defined.
- Ubuntu:
Hour 10 Lab Test
- Enforce SSH key auth in a lab environment.
- Verify root SSH login is blocked.
- Confirm SSH hardening settings with
sshd -Toutput checks. - Audit users with sudo or wheel membership and remove one unnecessary privilege.
- Run patch workflow for your distro and capture package changes.
- Configure Fail2ban for SSH and one web service jail, then verify jail status.
- Validate services and document post-patch checks in
security_change_notes.txt.
Hour 11 (10:00 - 11:00): Backup and Restore Operations
Module 27 (10:00 - 10:20): Backup Strategy Basics
- Define what to back up and recovery objectives.
- Differentiate config, data, and system-state backups.
- Why this matters:
- Backups are only valuable if recovery is fast and predictable.
- Core planning terms:
- RPO (Recovery Point Objective): how much data loss is acceptable.
- RTO (Recovery Time Objective): how quickly service must be restored.
- Practical backup scope guidance:
- Config:
/etc, service unit overrides, application configs. - Data: app data directories, databases, uploads.
- State/inventory: package lists, users/groups, cron/systemd timers.
- Config:
- Backup policy example:
- Daily incremental backup, weekly full backup, 30-day retention.
- Keep at least one backup copy off-host.
Module 28 (10:20 - 10:40): Backup Tooling
- Sync workflows with
rsync. - Archive workflows with
tar+ compression. - Exclusion lists and retention rotation concepts.
rsyncexample (config + app data):sudo rsync -aHAX --delete /etc /srv/app-data /backup/current/
tarexample (timestamped archive):sudo tar -czf /backup/archives/etc-$(date +%F).tar.gz /etc
- Exclusion file example:
/root/backup-excludes.txtwith entries like*.tmp,cache/,node_modules/- Use with rsync:
--exclude-from=/root/backup-excludes.txt
- Rotation example (keep last 7 daily archives):
ls -1t /backup/archives/etc-*.tar.gz | tail -n +8 | xargs -r rm -f
- Verification step:
tar -tzf /backup/archives/etc-$(date +%F).tar.gz | head
Module 29 (10:40 - 11:00): Restore Validation
- Practice restore in a clean location.
- Verify integrity and permissions after restore.
- Never trust a backup until restore is tested.
- Restore drill example:
- Create test restore path:
sudo mkdir -p /restore-test - Extract backup:
sudo tar -xzf /backup/archives/etc-YYYY-MM-DD.tar.gz -C /restore-test - Compare sample files:
diff -u /etc/hosts /restore-test/etc/hosts
- Create test restore path:
- rsync restore example:
sudo rsync -aHAX /backup/current/etc/ /etc/
- Permission and ownership validation:
sudo find /restore-test -maxdepth 3 -printf '%M %u:%g %p\n' | head
- Service validation after restore:
systemctl --failed- App-specific smoke test (login page/API health endpoint).
graph LR
S[Live system] --> B[Backup job: rsync or tar]
B --> R[Backup repository]
R --> T[Restore test]
T --> V[Validate files, permissions, and service]
Hour 11 Lab Test
- Define RPO/RTO targets for a sample service in 2-3 sentences.
- Back up
/etcand one app data directory using bothrsyncandtar. - Create an exclusion list and re-run backup.
- Simulate accidental deletion of test data.
- Restore from backup to a test path, then to the live path.
- Validate permissions, service health, and sample application data.
- Document backup frequency, retention, and restore results in
backup_restore_notes.txt.
Hour 12 (11:00 - 12:00): Containers and DevOps Intro (Optional)
Module 30 (11:00 - 11:20): Container Fundamentals
- Images, containers, volumes, and networks.
- Docker vs Podman overview.
- Key concepts explained:
- Image: immutable template.
- Container: running instance of an image.
- Volume: persistent storage outside container lifecycle.
- Network: communication boundary between containers/services.
- Practical inspection commands:
- Docker:
docker images,docker ps -a,docker volume ls,docker network ls - Podman:
podman images,podman ps -a,podman volume ls,podman network ls
- Docker:
Module 31 (11:20 - 11:40): Running Containerized Services
- Pull, run, inspect, logs, stop/remove lifecycle.
- Persist data with volumes.
- Docker quick example (nginx):
- Pull image:
docker pull nginx:alpine - Run container:
docker run -d --name web1 -p 8080:80 nginx:alpine - Inspect:
docker inspect web1 | head - Logs:
docker logs --tail 50 web1 - Stop/remove:
docker stop web1 && docker rm web1
- Pull image:
- Volume persistence example:
docker volume create webdatadocker run -d --name web2 -p 8081:80 -v webdata:/usr/share/nginx/html nginx:alpine
- Safer publishing reminder:
- Bind locally if public exposure is not needed:
-p 127.0.0.1:8080:80
- Bind locally if public exposure is not needed:
Module 32 (11:40 - 12:00): Operational Patterns
- Environment variables and secrets basics.
- Intro to declarative infrastructure mindset.
- Environment variable example:
docker run -d --name api -e APP_ENV=prod -e LOG_LEVEL=info my-api:latest
- Secret handling guidance:
- Do not bake secrets into images.
- Prefer secret stores, runtime injection, or mounted secret files.
- Declarative mindset example:
- Define service config in compose/manifests.
- Version-control infra definitions and review changes before apply.
- Health and restart policy basics:
- Add health checks where possible.
- Use restart policies for resilience (
--restart unless-stopped).
Hour 12 Lab Test
- Pull and run a web container with a named volume.
- Map host port and verify with
curl -I http://localhost:<port>. - Restart container and verify persisted content still exists.
- Capture and interpret logs for one normal event and one error event.
- Run a second container with environment variables and inspect effective config.
- Write a short
compose.yamlor equivalent manifest for repeatable deployment.
📘 Additional Standalone Sections
Common Failure Playbooks
- SSH lockout after config changes.
- Disk full conditions and emergency cleanup.
- DNS resolution failures.
- Service crash loops and restart storms.
Command Cheat Sheets by Theme
- Networking quick commands.
- Storage/filesystem quick commands.
- Security hardening quick commands.
- Logs and monitoring quick commands.
Ubuntu vs AlmaLinux Quick Reference
- Package manager: apt vs dnf.
- SSH service name: ssh vs sshd.
- Firewall tools: ufw vs firewalld.
- Admin group convention: sudo vs wheel.
Pre-Change and Post-Change Checklists
- Pre-change: capture baseline and rollback steps.
- During change: one variable at a time, log actions.
- Post-change: verify services, logs, connectivity, and monitoring.
🧪 Final Capstone (2-3 Hours)
Build and operate a complete Linux server workflow from scratch:
- Provision Ubuntu or AlmaLinux VM and harden SSH.
- Create admin and non-admin users with proper group membership.
- Configure firewall rules for SSH and a web service.
- Deploy and enable a simple service at boot.
- Add monitoring script + scheduled execution.
- Set up backup job for config and app data.
- Simulate one outage scenario and recover service.
- Produce a final operations report with commands used, findings, and lessons learned.