Files
Church-Music/legacy-site/documentation/md-files/DEEP_DEBUGGING_REPORT.md

433 lines
10 KiB
Markdown

# Deep Debugging Report - Port Conflict Resolution
**Date:** December 17, 2025
**Issue:** Backend service failing to start with "Address already in use" error
**Status:** ✅ RESOLVED with safeguards implemented
---
## 🔍 ROOT CAUSE ANALYSIS
### The Problem
Backend systemd service (`church-music-backend.service`) was failing repeatedly with error:
```
[ERROR] Connection in use: ('127.0.0.1', 8080)
[ERROR] connection to ('127.0.0.1', 8080) failed: [Errno 98] Address already in use
[ERROR] Can't connect to ('127.0.0.1', 8080)
```
### Investigation Process
1. **Service Status Check**
- Backend service in failed state after 5 restart attempts
- Systemd restart limit reached (StartLimitBurst=5)
- Exit code 1 (FAILURE)
2. **Log Analysis**
- Error logs showed consistent port 8080 binding failures
- No application errors - purely infrastructure issue
- Repeated retry attempts over ~90 seconds
3. **Port Analysis**
```bash
sudo lsof -i :8080
# Found: python 17329 pts - python app.py
```
4. **Process Investigation**
```bash
ps aux | grep 17329
# Result: python app.py running as development server
```
### Root Cause Identified
**A Flask development server (`python app.py`) was running in the background**, occupying port 8080 and preventing the production Gunicorn service from starting.
**How it happened:**
- The `start-dev-mode.sh` script starts `python app.py` in background
- No cleanup when switching to production mode
- No collision detection between dev and production modes
- Process persisted across reboots/sessions
---
## 🛠️ FIXES IMPLEMENTED
### 1. Immediate Fix: Kill Rogue Process
```bash
sudo kill 17329 # Freed port 8080
sudo systemctl reset-failed church-music-backend.service
sudo systemctl start church-music-backend.service
```
**Result:** ✅ Backend service started successfully
### 2. Systemd Service Enhancement
**File:** [church-music-backend.service](church-music-backend.service)
Added pre-start check:
```ini
ExecStartPre=/media/pts/Website/Church_HOP_MusicData/backend/pre-start-check.sh
```
This script:
- Checks if port 8080 is in use before starting
- Kills any rogue processes (except systemd services)
- Prevents startup if port can't be freed
- Logs all actions for debugging
**File:** [backend/pre-start-check.sh](backend/pre-start-check.sh)
### 3. Port Cleanup Utility
**File:** [cleanup-ports.sh](cleanup-ports.sh)
Comprehensive port management script:
- Checks ports 8080 (backend) and 5100 (frontend)
- Identifies processes using each port
- Distinguishes between systemd services and rogue processes
- Safely kills only non-systemd processes
- Cleans up stale PID files
- Color-coded output for clarity
Usage:
```bash
./cleanup-ports.sh
```
### 4. Development Mode Safeguards
**File:** [start-dev-mode.sh](start-dev-mode.sh)
Enhanced with:
- **Production service detection**: Warns if systemd services are running
- **Interactive prompt**: Asks permission to stop production services
- **Old process cleanup**: Kills previous dev mode processes
- **PID file management**: Removes stale PID files
- **Clear status display**: Shows running services and how to stop them
**File:** [stop-dev-mode.sh](stop-dev-mode.sh) (NEW)
Properly stops development mode:
- Kills backend and frontend dev processes
- Removes PID files
- Kills any stray processes
- Prevents port conflicts
### 5. Documentation Updates
- [WEBSOCKET_HTTPS_FIX.md](WEBSOCKET_HTTPS_FIX.md) - WebSocket security fix
- [STATUS.md](STATUS.md) - Updated system status
- This file - Comprehensive debugging documentation
---
## 🔒 SAFEGUARDS ADDED
### 1. Pre-Start Port Validation
- Automatic port conflict detection
- Kills rogue processes before service start
- Prevents "Address already in use" errors
- Logged for audit trail
### 2. Dev/Production Separation
- Development mode checks for production services
- Interactive warning system
- Cannot run both modes simultaneously
- Clear error messages
### 3. Process Tracking
- PID files for development mode
- Automatic cleanup of stale PIDs
- Process state validation
### 4. Monitoring & Diagnostics
- Enhanced logging in service files
- Dedicated cleanup script
- Verification script for WebSocket fix
- Clear error messages with solutions
---
## 🧪 VERIFICATION TESTS
### Test 1: Service Startup
```bash
sudo systemctl status church-music-backend
```
**Result:** ✅ Active (running) with pre-start check successful
### Test 2: API Endpoints
```bash
curl http://localhost:8080/api/health
```
**Result:** ✅ `{"status":"ok","ts":"2025-12-17T07:24:06.301875"}`
### Test 3: HTTPS Access
```bash
curl -I https://houseofprayer.ddns.net/
```
**Result:** ✅ HTTP/2 200
### Test 4: No Port Conflicts
```bash
sudo lsof -i :8080
```
**Result:** ✅ Only gunicorn workers (systemd service)
### Test 5: Pre-Start Check
```bash
sudo systemctl restart church-music-backend
journalctl -u church-music-backend | grep ExecStartPre
```
**Result:** ✅ `ExecStartPre=/media/pts/Website/Church_HOP_MusicData/backend/pre-start-check.sh (code=exited, status=0/SUCCESS)`
---
## 📊 FAILURE POINTS ANALYSIS
### Identified Failure Points
1. **Port Binding**
- **Risk:** Multiple processes competing for same port
- **Mitigation:** Pre-start port check, automatic cleanup
- **Detection:** Service fails immediately with clear error
2. **Development vs Production Conflict**
- **Risk:** Running both modes simultaneously
- **Mitigation:** Interactive warnings, automatic detection
- **Detection:** start-dev-mode.sh checks systemd services
3. **Zombie Processes**
- **Risk:** Background processes persisting after crashes
- **Mitigation:** PID tracking, automatic cleanup
- **Detection:** cleanup-ports.sh finds and kills
4. **Service Restart Limits**
- **Risk:** Hitting StartLimitBurst causing permanent failure
- **Mitigation:** Pre-start checks prevent repeated failures
- **Recovery:** Manual reset with `systemctl reset-failed`
5. **Missing Dependencies**
- **Risk:** Backend starts before database ready
- **Mitigation:** `After=postgresql.service` in service file
- **Detection:** Backend logs show connection errors
### Monitoring Recommendations
1. **Port Monitoring**
```bash
# Add to cron for automated monitoring
*/5 * * * * /media/pts/Website/Church_HOP_MusicData/cleanup-ports.sh
```
2. **Service Health Checks**
```bash
curl http://localhost:8080/api/health
```
3. **Log Monitoring**
```bash
sudo journalctl -u church-music-backend -f
```
---
## 📝 USAGE GUIDE
### Production Mode (Recommended)
```bash
# Start services
sudo systemctl start church-music-backend
sudo systemctl start church-music-frontend
# Check status
sudo systemctl status church-music-backend
sudo systemctl status church-music-frontend
# View logs
sudo journalctl -u church-music-backend -f
```
### Development Mode
```bash
# Start (will check for conflicts)
./start-dev-mode.sh
# Stop
./stop-dev-mode.sh
# View logs
tail -f /tmp/church-*.log
```
### Troubleshooting
```bash
# Clean up port conflicts
./cleanup-ports.sh
# Reset failed services
sudo systemctl reset-failed church-music-backend
# Verify WebSocket fix (for frontend)
./verify-websocket-fix.sh
```
---
## 📈 IMPROVEMENTS SUMMARY
### Before
- ❌ Port conflicts caused service failures
- ❌ No detection of dev/prod conflicts
- ❌ Manual cleanup required
- ❌ Difficult to diagnose issues
- ❌ Zombie processes persisted
### After
- ✅ Automatic port conflict resolution
- ✅ Dev/prod conflict detection and warnings
- ✅ Automated cleanup scripts
- ✅ Clear error messages and logs
- ✅ Automatic zombie process cleanup
- ✅ Pre-start validation
- ✅ Comprehensive documentation
---
## 🎯 LESSONS LEARNED
1. **Always validate port availability before binding**
- Implement pre-start checks in systemd services
- Log port conflicts with process details
2. **Separate development and production environments**
- Never mix dev and prod processes
- Implement conflict detection
- Clear documentation of each mode
3. **Track background processes properly**
- Use PID files for all background processes
- Clean up PIDs on exit
- Validate process state before operations
4. **Provide clear error messages**
- Log what's wrong and how to fix it
- Include process details in errors
- Offer automated solutions
5. **Document everything**
- Usage guides for operators
- Troubleshooting steps
- Architecture decisions
---
## 🔗 RELATED FILES
### Created/Updated
1. [cleanup-ports.sh](cleanup-ports.sh) - Port conflict resolution
2. [backend/pre-start-check.sh](backend/pre-start-check.sh) - Service pre-start validation
3. [start-dev-mode.sh](start-dev-mode.sh) - Enhanced with safeguards
4. [stop-dev-mode.sh](stop-dev-mode.sh) - Proper cleanup
5. [church-music-backend.service](church-music-backend.service) - Added pre-start check
6. [WEBSOCKET_HTTPS_FIX.md](WEBSOCKET_HTTPS_FIX.md) - WebSocket security fix
7. [STATUS.md](STATUS.md) - Updated system status
### Configuration Files
- [nginx-ssl.conf](nginx-ssl.conf) - HTTPS proxy configuration
- [frontend/.env](frontend/.env) - WebSocket security settings
- [frontend/.env.production](frontend/.env.production) - Production build settings
---
## ✅ FINAL STATUS
**Backend Service:** ✅ Running (with pre-start protection)
**Frontend Service:** ✅ Running (production build)
**WebSocket Error:** ✅ Fixed (no dev server in production)
**Port Conflicts:** ✅ Prevented (automatic cleanup)
**Documentation:** ✅ Complete
**Safeguards:** ✅ Implemented
**System Status:** FULLY OPERATIONAL with enhanced reliability
---
## 🆘 EMERGENCY PROCEDURES
If services fail to start:
1. **Quick Fix**
```bash
./cleanup-ports.sh
sudo systemctl reset-failed church-music-backend
sudo systemctl start church-music-backend
```
2. **Check Logs**
```bash
sudo journalctl -u church-music-backend --no-pager | tail -50
```
3. **Manual Port Check**
```bash
sudo lsof -i :8080
sudo kill -9 <PID> # If rogue process found
```
4. **Restart All**
```bash
./stop-dev-mode.sh
sudo systemctl restart church-music-backend
sudo systemctl restart church-music-frontend
```
---
**Author:** GitHub Copilot (Claude Sonnet 4.5)
**Date:** December 17, 2025
**Status:** Production Ready ✅