Files
Church-Music/legacy-site/documentation/md-files/DEEP_DEBUGGING_REPORT.md

10 KiB

Deep Debugging Report - Port Conflict Resolution

Date: December 17, 2025
Issue: Backend service failing to start with "Address already in use" error
Status: RESOLVED with safeguards implemented


🔍 ROOT CAUSE ANALYSIS

The Problem

Backend systemd service (church-music-backend.service) was failing repeatedly with error:

[ERROR] Connection in use: ('127.0.0.1', 8080)
[ERROR] connection to ('127.0.0.1', 8080) failed: [Errno 98] Address already in use
[ERROR] Can't connect to ('127.0.0.1', 8080)

Investigation Process

  1. Service Status Check

    • Backend service in failed state after 5 restart attempts
    • Systemd restart limit reached (StartLimitBurst=5)
    • Exit code 1 (FAILURE)
  2. Log Analysis

    • Error logs showed consistent port 8080 binding failures
    • No application errors - purely infrastructure issue
    • Repeated retry attempts over ~90 seconds
  3. Port Analysis

    sudo lsof -i :8080
    # Found: python 17329 pts - python app.py
    
  4. Process Investigation

    ps aux | grep 17329
    # Result: python app.py running as development server
    

Root Cause Identified

A Flask development server (python app.py) was running in the background, occupying port 8080 and preventing the production Gunicorn service from starting.

How it happened:

  • The start-dev-mode.sh script starts python app.py in background
  • No cleanup when switching to production mode
  • No collision detection between dev and production modes
  • Process persisted across reboots/sessions

🛠️ FIXES IMPLEMENTED

1. Immediate Fix: Kill Rogue Process

sudo kill 17329  # Freed port 8080
sudo systemctl reset-failed church-music-backend.service
sudo systemctl start church-music-backend.service

Result: Backend service started successfully

2. Systemd Service Enhancement

File: church-music-backend.service

Added pre-start check:

ExecStartPre=/media/pts/Website/Church_HOP_MusicData/backend/pre-start-check.sh

This script:

  • Checks if port 8080 is in use before starting
  • Kills any rogue processes (except systemd services)
  • Prevents startup if port can't be freed
  • Logs all actions for debugging

File: backend/pre-start-check.sh

3. Port Cleanup Utility

File: cleanup-ports.sh

Comprehensive port management script:

  • Checks ports 8080 (backend) and 5100 (frontend)
  • Identifies processes using each port
  • Distinguishes between systemd services and rogue processes
  • Safely kills only non-systemd processes
  • Cleans up stale PID files
  • Color-coded output for clarity

Usage:

./cleanup-ports.sh

4. Development Mode Safeguards

File: start-dev-mode.sh

Enhanced with:

  • Production service detection: Warns if systemd services are running
  • Interactive prompt: Asks permission to stop production services
  • Old process cleanup: Kills previous dev mode processes
  • PID file management: Removes stale PID files
  • Clear status display: Shows running services and how to stop them

File: stop-dev-mode.sh (NEW)

Properly stops development mode:

  • Kills backend and frontend dev processes
  • Removes PID files
  • Kills any stray processes
  • Prevents port conflicts

5. Documentation Updates


🔒 SAFEGUARDS ADDED

1. Pre-Start Port Validation

  • Automatic port conflict detection
  • Kills rogue processes before service start
  • Prevents "Address already in use" errors
  • Logged for audit trail

2. Dev/Production Separation

  • Development mode checks for production services
  • Interactive warning system
  • Cannot run both modes simultaneously
  • Clear error messages

3. Process Tracking

  • PID files for development mode
  • Automatic cleanup of stale PIDs
  • Process state validation

4. Monitoring & Diagnostics

  • Enhanced logging in service files
  • Dedicated cleanup script
  • Verification script for WebSocket fix
  • Clear error messages with solutions

🧪 VERIFICATION TESTS

Test 1: Service Startup

sudo systemctl status church-music-backend

Result: Active (running) with pre-start check successful

Test 2: API Endpoints

curl http://localhost:8080/api/health

Result: {"status":"ok","ts":"2025-12-17T07:24:06.301875"}

Test 3: HTTPS Access

curl -I https://houseofprayer.ddns.net/

Result: HTTP/2 200

Test 4: No Port Conflicts

sudo lsof -i :8080

Result: Only gunicorn workers (systemd service)

Test 5: Pre-Start Check

sudo systemctl restart church-music-backend
journalctl -u church-music-backend | grep ExecStartPre

Result: ExecStartPre=/media/pts/Website/Church_HOP_MusicData/backend/pre-start-check.sh (code=exited, status=0/SUCCESS)


📊 FAILURE POINTS ANALYSIS

Identified Failure Points

  1. Port Binding

    • Risk: Multiple processes competing for same port
    • Mitigation: Pre-start port check, automatic cleanup
    • Detection: Service fails immediately with clear error
  2. Development vs Production Conflict

    • Risk: Running both modes simultaneously
    • Mitigation: Interactive warnings, automatic detection
    • Detection: start-dev-mode.sh checks systemd services
  3. Zombie Processes

    • Risk: Background processes persisting after crashes
    • Mitigation: PID tracking, automatic cleanup
    • Detection: cleanup-ports.sh finds and kills
  4. Service Restart Limits

    • Risk: Hitting StartLimitBurst causing permanent failure
    • Mitigation: Pre-start checks prevent repeated failures
    • Recovery: Manual reset with systemctl reset-failed
  5. Missing Dependencies

    • Risk: Backend starts before database ready
    • Mitigation: After=postgresql.service in service file
    • Detection: Backend logs show connection errors

Monitoring Recommendations

  1. Port Monitoring

    # Add to cron for automated monitoring
    */5 * * * * /media/pts/Website/Church_HOP_MusicData/cleanup-ports.sh
    
  2. Service Health Checks

    curl http://localhost:8080/api/health
    
  3. Log Monitoring

    sudo journalctl -u church-music-backend -f
    

📝 USAGE GUIDE

# Start services
sudo systemctl start church-music-backend
sudo systemctl start church-music-frontend

# Check status
sudo systemctl status church-music-backend
sudo systemctl status church-music-frontend

# View logs
sudo journalctl -u church-music-backend -f

Development Mode

# Start (will check for conflicts)
./start-dev-mode.sh

# Stop
./stop-dev-mode.sh

# View logs
tail -f /tmp/church-*.log

Troubleshooting

# Clean up port conflicts
./cleanup-ports.sh

# Reset failed services
sudo systemctl reset-failed church-music-backend

# Verify WebSocket fix (for frontend)
./verify-websocket-fix.sh

📈 IMPROVEMENTS SUMMARY

Before

  • Port conflicts caused service failures
  • No detection of dev/prod conflicts
  • Manual cleanup required
  • Difficult to diagnose issues
  • Zombie processes persisted

After

  • Automatic port conflict resolution
  • Dev/prod conflict detection and warnings
  • Automated cleanup scripts
  • Clear error messages and logs
  • Automatic zombie process cleanup
  • Pre-start validation
  • Comprehensive documentation

🎯 LESSONS LEARNED

  1. Always validate port availability before binding

    • Implement pre-start checks in systemd services
    • Log port conflicts with process details
  2. Separate development and production environments

    • Never mix dev and prod processes
    • Implement conflict detection
    • Clear documentation of each mode
  3. Track background processes properly

    • Use PID files for all background processes
    • Clean up PIDs on exit
    • Validate process state before operations
  4. Provide clear error messages

    • Log what's wrong and how to fix it
    • Include process details in errors
    • Offer automated solutions
  5. Document everything

    • Usage guides for operators
    • Troubleshooting steps
    • Architecture decisions

Created/Updated

  1. cleanup-ports.sh - Port conflict resolution
  2. backend/pre-start-check.sh - Service pre-start validation
  3. start-dev-mode.sh - Enhanced with safeguards
  4. stop-dev-mode.sh - Proper cleanup
  5. church-music-backend.service - Added pre-start check
  6. WEBSOCKET_HTTPS_FIX.md - WebSocket security fix
  7. STATUS.md - Updated system status

Configuration Files


FINAL STATUS

Backend Service: Running (with pre-start protection)
Frontend Service: Running (production build)
WebSocket Error: Fixed (no dev server in production)
Port Conflicts: Prevented (automatic cleanup)
Documentation: Complete
Safeguards: Implemented

System Status: FULLY OPERATIONAL with enhanced reliability


🆘 EMERGENCY PROCEDURES

If services fail to start:

  1. Quick Fix

    ./cleanup-ports.sh
    sudo systemctl reset-failed church-music-backend
    sudo systemctl start church-music-backend
    
  2. Check Logs

    sudo journalctl -u church-music-backend --no-pager | tail -50
    
  3. Manual Port Check

    sudo lsof -i :8080
    sudo kill -9 <PID>  # If rogue process found
    
  4. Restart All

    ./stop-dev-mode.sh
    sudo systemctl restart church-music-backend
    sudo systemctl restart church-music-frontend
    

Author: GitHub Copilot (Claude Sonnet 4.5)
Date: December 17, 2025
Status: Production Ready