Initial commit - Church Music Database
This commit is contained in:
432
legacy-site/documentation/md-files/DEEP_DEBUGGING_REPORT.md
Normal file
432
legacy-site/documentation/md-files/DEEP_DEBUGGING_REPORT.md
Normal file
@@ -0,0 +1,432 @@
|
||||
# Deep Debugging Report - Port Conflict Resolution
|
||||
|
||||
**Date:** December 17, 2025
|
||||
**Issue:** Backend service failing to start with "Address already in use" error
|
||||
**Status:** ✅ RESOLVED with safeguards implemented
|
||||
|
||||
---
|
||||
|
||||
## 🔍 ROOT CAUSE ANALYSIS
|
||||
|
||||
### The Problem
|
||||
|
||||
Backend systemd service (`church-music-backend.service`) was failing repeatedly with error:
|
||||
|
||||
```
|
||||
[ERROR] Connection in use: ('127.0.0.1', 8080)
|
||||
[ERROR] connection to ('127.0.0.1', 8080) failed: [Errno 98] Address already in use
|
||||
[ERROR] Can't connect to ('127.0.0.1', 8080)
|
||||
```
|
||||
|
||||
### Investigation Process
|
||||
|
||||
1. **Service Status Check**
|
||||
- Backend service in failed state after 5 restart attempts
|
||||
- Systemd restart limit reached (StartLimitBurst=5)
|
||||
- Exit code 1 (FAILURE)
|
||||
|
||||
2. **Log Analysis**
|
||||
- Error logs showed consistent port 8080 binding failures
|
||||
- No application errors - purely infrastructure issue
|
||||
- Repeated retry attempts over ~90 seconds
|
||||
|
||||
3. **Port Analysis**
|
||||
|
||||
```bash
|
||||
sudo lsof -i :8080
|
||||
# Found: python 17329 pts - python app.py
|
||||
```
|
||||
|
||||
4. **Process Investigation**
|
||||
|
||||
```bash
|
||||
ps aux | grep 17329
|
||||
# Result: python app.py running as development server
|
||||
```
|
||||
|
||||
### Root Cause Identified
|
||||
|
||||
**A Flask development server (`python app.py`) was running in the background**, occupying port 8080 and preventing the production Gunicorn service from starting.
|
||||
|
||||
**How it happened:**
|
||||
|
||||
- The `start-dev-mode.sh` script starts `python app.py` in background
|
||||
- No cleanup when switching to production mode
|
||||
- No collision detection between dev and production modes
|
||||
- Process persisted across reboots/sessions
|
||||
|
||||
---
|
||||
|
||||
## 🛠️ FIXES IMPLEMENTED
|
||||
|
||||
### 1. Immediate Fix: Kill Rogue Process
|
||||
|
||||
```bash
|
||||
sudo kill 17329 # Freed port 8080
|
||||
sudo systemctl reset-failed church-music-backend.service
|
||||
sudo systemctl start church-music-backend.service
|
||||
```
|
||||
|
||||
**Result:** ✅ Backend service started successfully
|
||||
|
||||
### 2. Systemd Service Enhancement
|
||||
|
||||
**File:** [church-music-backend.service](church-music-backend.service)
|
||||
|
||||
Added pre-start check:
|
||||
|
||||
```ini
|
||||
ExecStartPre=/media/pts/Website/Church_HOP_MusicData/backend/pre-start-check.sh
|
||||
```
|
||||
|
||||
This script:
|
||||
|
||||
- Checks if port 8080 is in use before starting
|
||||
- Kills any rogue processes (except systemd services)
|
||||
- Prevents startup if port can't be freed
|
||||
- Logs all actions for debugging
|
||||
|
||||
**File:** [backend/pre-start-check.sh](backend/pre-start-check.sh)
|
||||
|
||||
### 3. Port Cleanup Utility
|
||||
|
||||
**File:** [cleanup-ports.sh](cleanup-ports.sh)
|
||||
|
||||
Comprehensive port management script:
|
||||
|
||||
- Checks ports 8080 (backend) and 5100 (frontend)
|
||||
- Identifies processes using each port
|
||||
- Distinguishes between systemd services and rogue processes
|
||||
- Safely kills only non-systemd processes
|
||||
- Cleans up stale PID files
|
||||
- Color-coded output for clarity
|
||||
|
||||
Usage:
|
||||
|
||||
```bash
|
||||
./cleanup-ports.sh
|
||||
```
|
||||
|
||||
### 4. Development Mode Safeguards
|
||||
|
||||
**File:** [start-dev-mode.sh](start-dev-mode.sh)
|
||||
|
||||
Enhanced with:
|
||||
|
||||
- **Production service detection**: Warns if systemd services are running
|
||||
- **Interactive prompt**: Asks permission to stop production services
|
||||
- **Old process cleanup**: Kills previous dev mode processes
|
||||
- **PID file management**: Removes stale PID files
|
||||
- **Clear status display**: Shows running services and how to stop them
|
||||
|
||||
**File:** [stop-dev-mode.sh](stop-dev-mode.sh) (NEW)
|
||||
|
||||
Properly stops development mode:
|
||||
|
||||
- Kills backend and frontend dev processes
|
||||
- Removes PID files
|
||||
- Kills any stray processes
|
||||
- Prevents port conflicts
|
||||
|
||||
### 5. Documentation Updates
|
||||
|
||||
- [WEBSOCKET_HTTPS_FIX.md](WEBSOCKET_HTTPS_FIX.md) - WebSocket security fix
|
||||
- [STATUS.md](STATUS.md) - Updated system status
|
||||
- This file - Comprehensive debugging documentation
|
||||
|
||||
---
|
||||
|
||||
## 🔒 SAFEGUARDS ADDED
|
||||
|
||||
### 1. Pre-Start Port Validation
|
||||
|
||||
- Automatic port conflict detection
|
||||
- Kills rogue processes before service start
|
||||
- Prevents "Address already in use" errors
|
||||
- Logged for audit trail
|
||||
|
||||
### 2. Dev/Production Separation
|
||||
|
||||
- Development mode checks for production services
|
||||
- Interactive warning system
|
||||
- Cannot run both modes simultaneously
|
||||
- Clear error messages
|
||||
|
||||
### 3. Process Tracking
|
||||
|
||||
- PID files for development mode
|
||||
- Automatic cleanup of stale PIDs
|
||||
- Process state validation
|
||||
|
||||
### 4. Monitoring & Diagnostics
|
||||
|
||||
- Enhanced logging in service files
|
||||
- Dedicated cleanup script
|
||||
- Verification script for WebSocket fix
|
||||
- Clear error messages with solutions
|
||||
|
||||
---
|
||||
|
||||
## 🧪 VERIFICATION TESTS
|
||||
|
||||
### Test 1: Service Startup
|
||||
|
||||
```bash
|
||||
sudo systemctl status church-music-backend
|
||||
```
|
||||
|
||||
**Result:** ✅ Active (running) with pre-start check successful
|
||||
|
||||
### Test 2: API Endpoints
|
||||
|
||||
```bash
|
||||
curl http://localhost:8080/api/health
|
||||
```
|
||||
|
||||
**Result:** ✅ `{"status":"ok","ts":"2025-12-17T07:24:06.301875"}`
|
||||
|
||||
### Test 3: HTTPS Access
|
||||
|
||||
```bash
|
||||
curl -I https://houseofprayer.ddns.net/
|
||||
```
|
||||
|
||||
**Result:** ✅ HTTP/2 200
|
||||
|
||||
### Test 4: No Port Conflicts
|
||||
|
||||
```bash
|
||||
sudo lsof -i :8080
|
||||
```
|
||||
|
||||
**Result:** ✅ Only gunicorn workers (systemd service)
|
||||
|
||||
### Test 5: Pre-Start Check
|
||||
|
||||
```bash
|
||||
sudo systemctl restart church-music-backend
|
||||
journalctl -u church-music-backend | grep ExecStartPre
|
||||
```
|
||||
|
||||
**Result:** ✅ `ExecStartPre=/media/pts/Website/Church_HOP_MusicData/backend/pre-start-check.sh (code=exited, status=0/SUCCESS)`
|
||||
|
||||
---
|
||||
|
||||
## 📊 FAILURE POINTS ANALYSIS
|
||||
|
||||
### Identified Failure Points
|
||||
|
||||
1. **Port Binding**
|
||||
- **Risk:** Multiple processes competing for same port
|
||||
- **Mitigation:** Pre-start port check, automatic cleanup
|
||||
- **Detection:** Service fails immediately with clear error
|
||||
|
||||
2. **Development vs Production Conflict**
|
||||
- **Risk:** Running both modes simultaneously
|
||||
- **Mitigation:** Interactive warnings, automatic detection
|
||||
- **Detection:** start-dev-mode.sh checks systemd services
|
||||
|
||||
3. **Zombie Processes**
|
||||
- **Risk:** Background processes persisting after crashes
|
||||
- **Mitigation:** PID tracking, automatic cleanup
|
||||
- **Detection:** cleanup-ports.sh finds and kills
|
||||
|
||||
4. **Service Restart Limits**
|
||||
- **Risk:** Hitting StartLimitBurst causing permanent failure
|
||||
- **Mitigation:** Pre-start checks prevent repeated failures
|
||||
- **Recovery:** Manual reset with `systemctl reset-failed`
|
||||
|
||||
5. **Missing Dependencies**
|
||||
- **Risk:** Backend starts before database ready
|
||||
- **Mitigation:** `After=postgresql.service` in service file
|
||||
- **Detection:** Backend logs show connection errors
|
||||
|
||||
### Monitoring Recommendations
|
||||
|
||||
1. **Port Monitoring**
|
||||
|
||||
```bash
|
||||
# Add to cron for automated monitoring
|
||||
*/5 * * * * /media/pts/Website/Church_HOP_MusicData/cleanup-ports.sh
|
||||
```
|
||||
|
||||
2. **Service Health Checks**
|
||||
|
||||
```bash
|
||||
curl http://localhost:8080/api/health
|
||||
```
|
||||
|
||||
3. **Log Monitoring**
|
||||
|
||||
```bash
|
||||
sudo journalctl -u church-music-backend -f
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📝 USAGE GUIDE
|
||||
|
||||
### Production Mode (Recommended)
|
||||
|
||||
```bash
|
||||
# Start services
|
||||
sudo systemctl start church-music-backend
|
||||
sudo systemctl start church-music-frontend
|
||||
|
||||
# Check status
|
||||
sudo systemctl status church-music-backend
|
||||
sudo systemctl status church-music-frontend
|
||||
|
||||
# View logs
|
||||
sudo journalctl -u church-music-backend -f
|
||||
```
|
||||
|
||||
### Development Mode
|
||||
|
||||
```bash
|
||||
# Start (will check for conflicts)
|
||||
./start-dev-mode.sh
|
||||
|
||||
# Stop
|
||||
./stop-dev-mode.sh
|
||||
|
||||
# View logs
|
||||
tail -f /tmp/church-*.log
|
||||
```
|
||||
|
||||
### Troubleshooting
|
||||
|
||||
```bash
|
||||
# Clean up port conflicts
|
||||
./cleanup-ports.sh
|
||||
|
||||
# Reset failed services
|
||||
sudo systemctl reset-failed church-music-backend
|
||||
|
||||
# Verify WebSocket fix (for frontend)
|
||||
./verify-websocket-fix.sh
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📈 IMPROVEMENTS SUMMARY
|
||||
|
||||
### Before
|
||||
|
||||
- ❌ Port conflicts caused service failures
|
||||
- ❌ No detection of dev/prod conflicts
|
||||
- ❌ Manual cleanup required
|
||||
- ❌ Difficult to diagnose issues
|
||||
- ❌ Zombie processes persisted
|
||||
|
||||
### After
|
||||
|
||||
- ✅ Automatic port conflict resolution
|
||||
- ✅ Dev/prod conflict detection and warnings
|
||||
- ✅ Automated cleanup scripts
|
||||
- ✅ Clear error messages and logs
|
||||
- ✅ Automatic zombie process cleanup
|
||||
- ✅ Pre-start validation
|
||||
- ✅ Comprehensive documentation
|
||||
|
||||
---
|
||||
|
||||
## 🎯 LESSONS LEARNED
|
||||
|
||||
1. **Always validate port availability before binding**
|
||||
- Implement pre-start checks in systemd services
|
||||
- Log port conflicts with process details
|
||||
|
||||
2. **Separate development and production environments**
|
||||
- Never mix dev and prod processes
|
||||
- Implement conflict detection
|
||||
- Clear documentation of each mode
|
||||
|
||||
3. **Track background processes properly**
|
||||
- Use PID files for all background processes
|
||||
- Clean up PIDs on exit
|
||||
- Validate process state before operations
|
||||
|
||||
4. **Provide clear error messages**
|
||||
- Log what's wrong and how to fix it
|
||||
- Include process details in errors
|
||||
- Offer automated solutions
|
||||
|
||||
5. **Document everything**
|
||||
- Usage guides for operators
|
||||
- Troubleshooting steps
|
||||
- Architecture decisions
|
||||
|
||||
---
|
||||
|
||||
## 🔗 RELATED FILES
|
||||
|
||||
### Created/Updated
|
||||
|
||||
1. [cleanup-ports.sh](cleanup-ports.sh) - Port conflict resolution
|
||||
2. [backend/pre-start-check.sh](backend/pre-start-check.sh) - Service pre-start validation
|
||||
3. [start-dev-mode.sh](start-dev-mode.sh) - Enhanced with safeguards
|
||||
4. [stop-dev-mode.sh](stop-dev-mode.sh) - Proper cleanup
|
||||
5. [church-music-backend.service](church-music-backend.service) - Added pre-start check
|
||||
6. [WEBSOCKET_HTTPS_FIX.md](WEBSOCKET_HTTPS_FIX.md) - WebSocket security fix
|
||||
7. [STATUS.md](STATUS.md) - Updated system status
|
||||
|
||||
### Configuration Files
|
||||
|
||||
- [nginx-ssl.conf](nginx-ssl.conf) - HTTPS proxy configuration
|
||||
- [frontend/.env](frontend/.env) - WebSocket security settings
|
||||
- [frontend/.env.production](frontend/.env.production) - Production build settings
|
||||
|
||||
---
|
||||
|
||||
## ✅ FINAL STATUS
|
||||
|
||||
**Backend Service:** ✅ Running (with pre-start protection)
|
||||
**Frontend Service:** ✅ Running (production build)
|
||||
**WebSocket Error:** ✅ Fixed (no dev server in production)
|
||||
**Port Conflicts:** ✅ Prevented (automatic cleanup)
|
||||
**Documentation:** ✅ Complete
|
||||
**Safeguards:** ✅ Implemented
|
||||
|
||||
**System Status:** FULLY OPERATIONAL with enhanced reliability
|
||||
|
||||
---
|
||||
|
||||
## 🆘 EMERGENCY PROCEDURES
|
||||
|
||||
If services fail to start:
|
||||
|
||||
1. **Quick Fix**
|
||||
|
||||
```bash
|
||||
./cleanup-ports.sh
|
||||
sudo systemctl reset-failed church-music-backend
|
||||
sudo systemctl start church-music-backend
|
||||
```
|
||||
|
||||
2. **Check Logs**
|
||||
|
||||
```bash
|
||||
sudo journalctl -u church-music-backend --no-pager | tail -50
|
||||
```
|
||||
|
||||
3. **Manual Port Check**
|
||||
|
||||
```bash
|
||||
sudo lsof -i :8080
|
||||
sudo kill -9 <PID> # If rogue process found
|
||||
```
|
||||
|
||||
4. **Restart All**
|
||||
|
||||
```bash
|
||||
./stop-dev-mode.sh
|
||||
sudo systemctl restart church-music-backend
|
||||
sudo systemctl restart church-music-frontend
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**Author:** GitHub Copilot (Claude Sonnet 4.5)
|
||||
**Date:** December 17, 2025
|
||||
**Status:** Production Ready ✅
|
||||
Reference in New Issue
Block a user