433 lines
10 KiB
Markdown
433 lines
10 KiB
Markdown
|
|
# Deep Debugging Report - Port Conflict Resolution
|
||
|
|
|
||
|
|
**Date:** December 17, 2025
|
||
|
|
**Issue:** Backend service failing to start with "Address already in use" error
|
||
|
|
**Status:** ✅ RESOLVED with safeguards implemented
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 🔍 ROOT CAUSE ANALYSIS
|
||
|
|
|
||
|
|
### The Problem
|
||
|
|
|
||
|
|
Backend systemd service (`church-music-backend.service`) was failing repeatedly with error:
|
||
|
|
|
||
|
|
```
|
||
|
|
[ERROR] Connection in use: ('127.0.0.1', 8080)
|
||
|
|
[ERROR] connection to ('127.0.0.1', 8080) failed: [Errno 98] Address already in use
|
||
|
|
[ERROR] Can't connect to ('127.0.0.1', 8080)
|
||
|
|
```
|
||
|
|
|
||
|
|
### Investigation Process
|
||
|
|
|
||
|
|
1. **Service Status Check**
|
||
|
|
- Backend service in failed state after 5 restart attempts
|
||
|
|
- Systemd restart limit reached (StartLimitBurst=5)
|
||
|
|
- Exit code 1 (FAILURE)
|
||
|
|
|
||
|
|
2. **Log Analysis**
|
||
|
|
- Error logs showed consistent port 8080 binding failures
|
||
|
|
- No application errors - purely infrastructure issue
|
||
|
|
- Repeated retry attempts over ~90 seconds
|
||
|
|
|
||
|
|
3. **Port Analysis**
|
||
|
|
|
||
|
|
```bash
|
||
|
|
sudo lsof -i :8080
|
||
|
|
# Found: python 17329 pts - python app.py
|
||
|
|
```
|
||
|
|
|
||
|
|
4. **Process Investigation**
|
||
|
|
|
||
|
|
```bash
|
||
|
|
ps aux | grep 17329
|
||
|
|
# Result: python app.py running as development server
|
||
|
|
```
|
||
|
|
|
||
|
|
### Root Cause Identified
|
||
|
|
|
||
|
|
**A Flask development server (`python app.py`) was running in the background**, occupying port 8080 and preventing the production Gunicorn service from starting.
|
||
|
|
|
||
|
|
**How it happened:**
|
||
|
|
|
||
|
|
- The `start-dev-mode.sh` script starts `python app.py` in background
|
||
|
|
- No cleanup when switching to production mode
|
||
|
|
- No collision detection between dev and production modes
|
||
|
|
- Process persisted across reboots/sessions
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 🛠️ FIXES IMPLEMENTED
|
||
|
|
|
||
|
|
### 1. Immediate Fix: Kill Rogue Process
|
||
|
|
|
||
|
|
```bash
|
||
|
|
sudo kill 17329 # Freed port 8080
|
||
|
|
sudo systemctl reset-failed church-music-backend.service
|
||
|
|
sudo systemctl start church-music-backend.service
|
||
|
|
```
|
||
|
|
|
||
|
|
**Result:** ✅ Backend service started successfully
|
||
|
|
|
||
|
|
### 2. Systemd Service Enhancement
|
||
|
|
|
||
|
|
**File:** [church-music-backend.service](church-music-backend.service)
|
||
|
|
|
||
|
|
Added pre-start check:
|
||
|
|
|
||
|
|
```ini
|
||
|
|
ExecStartPre=/media/pts/Website/Church_HOP_MusicData/backend/pre-start-check.sh
|
||
|
|
```
|
||
|
|
|
||
|
|
This script:
|
||
|
|
|
||
|
|
- Checks if port 8080 is in use before starting
|
||
|
|
- Kills any rogue processes (except systemd services)
|
||
|
|
- Prevents startup if port can't be freed
|
||
|
|
- Logs all actions for debugging
|
||
|
|
|
||
|
|
**File:** [backend/pre-start-check.sh](backend/pre-start-check.sh)
|
||
|
|
|
||
|
|
### 3. Port Cleanup Utility
|
||
|
|
|
||
|
|
**File:** [cleanup-ports.sh](cleanup-ports.sh)
|
||
|
|
|
||
|
|
Comprehensive port management script:
|
||
|
|
|
||
|
|
- Checks ports 8080 (backend) and 5100 (frontend)
|
||
|
|
- Identifies processes using each port
|
||
|
|
- Distinguishes between systemd services and rogue processes
|
||
|
|
- Safely kills only non-systemd processes
|
||
|
|
- Cleans up stale PID files
|
||
|
|
- Color-coded output for clarity
|
||
|
|
|
||
|
|
Usage:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
./cleanup-ports.sh
|
||
|
|
```
|
||
|
|
|
||
|
|
### 4. Development Mode Safeguards
|
||
|
|
|
||
|
|
**File:** [start-dev-mode.sh](start-dev-mode.sh)
|
||
|
|
|
||
|
|
Enhanced with:
|
||
|
|
|
||
|
|
- **Production service detection**: Warns if systemd services are running
|
||
|
|
- **Interactive prompt**: Asks permission to stop production services
|
||
|
|
- **Old process cleanup**: Kills previous dev mode processes
|
||
|
|
- **PID file management**: Removes stale PID files
|
||
|
|
- **Clear status display**: Shows running services and how to stop them
|
||
|
|
|
||
|
|
**File:** [stop-dev-mode.sh](stop-dev-mode.sh) (NEW)
|
||
|
|
|
||
|
|
Properly stops development mode:
|
||
|
|
|
||
|
|
- Kills backend and frontend dev processes
|
||
|
|
- Removes PID files
|
||
|
|
- Kills any stray processes
|
||
|
|
- Prevents port conflicts
|
||
|
|
|
||
|
|
### 5. Documentation Updates
|
||
|
|
|
||
|
|
- [WEBSOCKET_HTTPS_FIX.md](WEBSOCKET_HTTPS_FIX.md) - WebSocket security fix
|
||
|
|
- [STATUS.md](STATUS.md) - Updated system status
|
||
|
|
- This file - Comprehensive debugging documentation
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 🔒 SAFEGUARDS ADDED
|
||
|
|
|
||
|
|
### 1. Pre-Start Port Validation
|
||
|
|
|
||
|
|
- Automatic port conflict detection
|
||
|
|
- Kills rogue processes before service start
|
||
|
|
- Prevents "Address already in use" errors
|
||
|
|
- Logged for audit trail
|
||
|
|
|
||
|
|
### 2. Dev/Production Separation
|
||
|
|
|
||
|
|
- Development mode checks for production services
|
||
|
|
- Interactive warning system
|
||
|
|
- Cannot run both modes simultaneously
|
||
|
|
- Clear error messages
|
||
|
|
|
||
|
|
### 3. Process Tracking
|
||
|
|
|
||
|
|
- PID files for development mode
|
||
|
|
- Automatic cleanup of stale PIDs
|
||
|
|
- Process state validation
|
||
|
|
|
||
|
|
### 4. Monitoring & Diagnostics
|
||
|
|
|
||
|
|
- Enhanced logging in service files
|
||
|
|
- Dedicated cleanup script
|
||
|
|
- Verification script for WebSocket fix
|
||
|
|
- Clear error messages with solutions
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 🧪 VERIFICATION TESTS
|
||
|
|
|
||
|
|
### Test 1: Service Startup
|
||
|
|
|
||
|
|
```bash
|
||
|
|
sudo systemctl status church-music-backend
|
||
|
|
```
|
||
|
|
|
||
|
|
**Result:** ✅ Active (running) with pre-start check successful
|
||
|
|
|
||
|
|
### Test 2: API Endpoints
|
||
|
|
|
||
|
|
```bash
|
||
|
|
curl http://localhost:8080/api/health
|
||
|
|
```
|
||
|
|
|
||
|
|
**Result:** ✅ `{"status":"ok","ts":"2025-12-17T07:24:06.301875"}`
|
||
|
|
|
||
|
|
### Test 3: HTTPS Access
|
||
|
|
|
||
|
|
```bash
|
||
|
|
curl -I https://houseofprayer.ddns.net/
|
||
|
|
```
|
||
|
|
|
||
|
|
**Result:** ✅ HTTP/2 200
|
||
|
|
|
||
|
|
### Test 4: No Port Conflicts
|
||
|
|
|
||
|
|
```bash
|
||
|
|
sudo lsof -i :8080
|
||
|
|
```
|
||
|
|
|
||
|
|
**Result:** ✅ Only gunicorn workers (systemd service)
|
||
|
|
|
||
|
|
### Test 5: Pre-Start Check
|
||
|
|
|
||
|
|
```bash
|
||
|
|
sudo systemctl restart church-music-backend
|
||
|
|
journalctl -u church-music-backend | grep ExecStartPre
|
||
|
|
```
|
||
|
|
|
||
|
|
**Result:** ✅ `ExecStartPre=/media/pts/Website/Church_HOP_MusicData/backend/pre-start-check.sh (code=exited, status=0/SUCCESS)`
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 📊 FAILURE POINTS ANALYSIS
|
||
|
|
|
||
|
|
### Identified Failure Points
|
||
|
|
|
||
|
|
1. **Port Binding**
|
||
|
|
- **Risk:** Multiple processes competing for same port
|
||
|
|
- **Mitigation:** Pre-start port check, automatic cleanup
|
||
|
|
- **Detection:** Service fails immediately with clear error
|
||
|
|
|
||
|
|
2. **Development vs Production Conflict**
|
||
|
|
- **Risk:** Running both modes simultaneously
|
||
|
|
- **Mitigation:** Interactive warnings, automatic detection
|
||
|
|
- **Detection:** start-dev-mode.sh checks systemd services
|
||
|
|
|
||
|
|
3. **Zombie Processes**
|
||
|
|
- **Risk:** Background processes persisting after crashes
|
||
|
|
- **Mitigation:** PID tracking, automatic cleanup
|
||
|
|
- **Detection:** cleanup-ports.sh finds and kills
|
||
|
|
|
||
|
|
4. **Service Restart Limits**
|
||
|
|
- **Risk:** Hitting StartLimitBurst causing permanent failure
|
||
|
|
- **Mitigation:** Pre-start checks prevent repeated failures
|
||
|
|
- **Recovery:** Manual reset with `systemctl reset-failed`
|
||
|
|
|
||
|
|
5. **Missing Dependencies**
|
||
|
|
- **Risk:** Backend starts before database ready
|
||
|
|
- **Mitigation:** `After=postgresql.service` in service file
|
||
|
|
- **Detection:** Backend logs show connection errors
|
||
|
|
|
||
|
|
### Monitoring Recommendations
|
||
|
|
|
||
|
|
1. **Port Monitoring**
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Add to cron for automated monitoring
|
||
|
|
*/5 * * * * /media/pts/Website/Church_HOP_MusicData/cleanup-ports.sh
|
||
|
|
```
|
||
|
|
|
||
|
|
2. **Service Health Checks**
|
||
|
|
|
||
|
|
```bash
|
||
|
|
curl http://localhost:8080/api/health
|
||
|
|
```
|
||
|
|
|
||
|
|
3. **Log Monitoring**
|
||
|
|
|
||
|
|
```bash
|
||
|
|
sudo journalctl -u church-music-backend -f
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 📝 USAGE GUIDE
|
||
|
|
|
||
|
|
### Production Mode (Recommended)
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Start services
|
||
|
|
sudo systemctl start church-music-backend
|
||
|
|
sudo systemctl start church-music-frontend
|
||
|
|
|
||
|
|
# Check status
|
||
|
|
sudo systemctl status church-music-backend
|
||
|
|
sudo systemctl status church-music-frontend
|
||
|
|
|
||
|
|
# View logs
|
||
|
|
sudo journalctl -u church-music-backend -f
|
||
|
|
```
|
||
|
|
|
||
|
|
### Development Mode
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Start (will check for conflicts)
|
||
|
|
./start-dev-mode.sh
|
||
|
|
|
||
|
|
# Stop
|
||
|
|
./stop-dev-mode.sh
|
||
|
|
|
||
|
|
# View logs
|
||
|
|
tail -f /tmp/church-*.log
|
||
|
|
```
|
||
|
|
|
||
|
|
### Troubleshooting
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Clean up port conflicts
|
||
|
|
./cleanup-ports.sh
|
||
|
|
|
||
|
|
# Reset failed services
|
||
|
|
sudo systemctl reset-failed church-music-backend
|
||
|
|
|
||
|
|
# Verify WebSocket fix (for frontend)
|
||
|
|
./verify-websocket-fix.sh
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 📈 IMPROVEMENTS SUMMARY
|
||
|
|
|
||
|
|
### Before
|
||
|
|
|
||
|
|
- ❌ Port conflicts caused service failures
|
||
|
|
- ❌ No detection of dev/prod conflicts
|
||
|
|
- ❌ Manual cleanup required
|
||
|
|
- ❌ Difficult to diagnose issues
|
||
|
|
- ❌ Zombie processes persisted
|
||
|
|
|
||
|
|
### After
|
||
|
|
|
||
|
|
- ✅ Automatic port conflict resolution
|
||
|
|
- ✅ Dev/prod conflict detection and warnings
|
||
|
|
- ✅ Automated cleanup scripts
|
||
|
|
- ✅ Clear error messages and logs
|
||
|
|
- ✅ Automatic zombie process cleanup
|
||
|
|
- ✅ Pre-start validation
|
||
|
|
- ✅ Comprehensive documentation
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 🎯 LESSONS LEARNED
|
||
|
|
|
||
|
|
1. **Always validate port availability before binding**
|
||
|
|
- Implement pre-start checks in systemd services
|
||
|
|
- Log port conflicts with process details
|
||
|
|
|
||
|
|
2. **Separate development and production environments**
|
||
|
|
- Never mix dev and prod processes
|
||
|
|
- Implement conflict detection
|
||
|
|
- Clear documentation of each mode
|
||
|
|
|
||
|
|
3. **Track background processes properly**
|
||
|
|
- Use PID files for all background processes
|
||
|
|
- Clean up PIDs on exit
|
||
|
|
- Validate process state before operations
|
||
|
|
|
||
|
|
4. **Provide clear error messages**
|
||
|
|
- Log what's wrong and how to fix it
|
||
|
|
- Include process details in errors
|
||
|
|
- Offer automated solutions
|
||
|
|
|
||
|
|
5. **Document everything**
|
||
|
|
- Usage guides for operators
|
||
|
|
- Troubleshooting steps
|
||
|
|
- Architecture decisions
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 🔗 RELATED FILES
|
||
|
|
|
||
|
|
### Created/Updated
|
||
|
|
|
||
|
|
1. [cleanup-ports.sh](cleanup-ports.sh) - Port conflict resolution
|
||
|
|
2. [backend/pre-start-check.sh](backend/pre-start-check.sh) - Service pre-start validation
|
||
|
|
3. [start-dev-mode.sh](start-dev-mode.sh) - Enhanced with safeguards
|
||
|
|
4. [stop-dev-mode.sh](stop-dev-mode.sh) - Proper cleanup
|
||
|
|
5. [church-music-backend.service](church-music-backend.service) - Added pre-start check
|
||
|
|
6. [WEBSOCKET_HTTPS_FIX.md](WEBSOCKET_HTTPS_FIX.md) - WebSocket security fix
|
||
|
|
7. [STATUS.md](STATUS.md) - Updated system status
|
||
|
|
|
||
|
|
### Configuration Files
|
||
|
|
|
||
|
|
- [nginx-ssl.conf](nginx-ssl.conf) - HTTPS proxy configuration
|
||
|
|
- [frontend/.env](frontend/.env) - WebSocket security settings
|
||
|
|
- [frontend/.env.production](frontend/.env.production) - Production build settings
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## ✅ FINAL STATUS
|
||
|
|
|
||
|
|
**Backend Service:** ✅ Running (with pre-start protection)
|
||
|
|
**Frontend Service:** ✅ Running (production build)
|
||
|
|
**WebSocket Error:** ✅ Fixed (no dev server in production)
|
||
|
|
**Port Conflicts:** ✅ Prevented (automatic cleanup)
|
||
|
|
**Documentation:** ✅ Complete
|
||
|
|
**Safeguards:** ✅ Implemented
|
||
|
|
|
||
|
|
**System Status:** FULLY OPERATIONAL with enhanced reliability
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 🆘 EMERGENCY PROCEDURES
|
||
|
|
|
||
|
|
If services fail to start:
|
||
|
|
|
||
|
|
1. **Quick Fix**
|
||
|
|
|
||
|
|
```bash
|
||
|
|
./cleanup-ports.sh
|
||
|
|
sudo systemctl reset-failed church-music-backend
|
||
|
|
sudo systemctl start church-music-backend
|
||
|
|
```
|
||
|
|
|
||
|
|
2. **Check Logs**
|
||
|
|
|
||
|
|
```bash
|
||
|
|
sudo journalctl -u church-music-backend --no-pager | tail -50
|
||
|
|
```
|
||
|
|
|
||
|
|
3. **Manual Port Check**
|
||
|
|
|
||
|
|
```bash
|
||
|
|
sudo lsof -i :8080
|
||
|
|
sudo kill -9 <PID> # If rogue process found
|
||
|
|
```
|
||
|
|
|
||
|
|
4. **Restart All**
|
||
|
|
|
||
|
|
```bash
|
||
|
|
./stop-dev-mode.sh
|
||
|
|
sudo systemctl restart church-music-backend
|
||
|
|
sudo systemctl restart church-music-frontend
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
**Author:** GitHub Copilot (Claude Sonnet 4.5)
|
||
|
|
**Date:** December 17, 2025
|
||
|
|
**Status:** Production Ready ✅
|