feat(desktop): ✨ 实现一些功能
1. 实现任务暂停功能 2. 实现页面的国际化功能 3.优化项目的结构以及BUG 4. 优化系统架构 5. 实现一大堆的功能
This commit is contained in:
50
.trae/skills/debugging/README.md
Normal file
50
.trae/skills/debugging/README.md
Normal file
@@ -0,0 +1,50 @@
|
||||
# debugging
|
||||
|
||||
Comprehensive debugging specialist for errors, test failures, log analysis, and system problems. Use when encountering issues, analyzing error logs, investigating system anomalies, debugging production issues, analyzing stack traces, or identifying root causes. Combines general debugging workflows with error pattern detection and log analysis.
|
||||
|
||||
---
|
||||
|
||||
## 📦 Downloaded from [Skillstore.io](https://skillstore.io)
|
||||
|
||||
This skill was downloaded from **AI Skillstore** — the official marketplace for Claude Code, Codex, and Claude skills.
|
||||
|
||||
🔗 **Skill Page**: [skillstore.io/skills/x-89jobrien-debugging](https://skillstore.io/skills/x-89jobrien-debugging)
|
||||
|
||||
## 🚀 Installation
|
||||
|
||||
### Via Claude Code Plugin System
|
||||
|
||||
```
|
||||
/plugin marketplace add aiskillstore/marketplace
|
||||
/plugin install x-89jobrien-debugging@aiskillstore
|
||||
```
|
||||
|
||||
### Manual Installation
|
||||
|
||||
Copy the contents of this folder to your project's `.claude/skills/` directory.
|
||||
|
||||
## 📋 Skill Info
|
||||
|
||||
| Property | Value |
|
||||
|----------|-------|
|
||||
| **Name** | debugging |
|
||||
| **Version** | 1.0.1 |
|
||||
| **Author** | Joseph OBrien |
|
||||
|
||||
### Supported Tools
|
||||
|
||||
- claude
|
||||
- codex
|
||||
- claude-code
|
||||
|
||||
## 🌐 Discover More Skills
|
||||
|
||||
Browse thousands of AI skills at **[skillstore.io](https://skillstore.io)**:
|
||||
|
||||
- 🔍 Search by category, tool, or keyword
|
||||
- ⭐ Find verified, security-audited skills
|
||||
- 📤 Submit your own skills to share with the community
|
||||
|
||||
---
|
||||
|
||||
*From [skillstore.io](https://skillstore.io) — AI Skills Marketplace*
|
||||
553
.trae/skills/debugging/SKILL.md
Normal file
553
.trae/skills/debugging/SKILL.md
Normal file
@@ -0,0 +1,553 @@
|
||||
---
|
||||
name: debugging
|
||||
description: Comprehensive debugging specialist for errors, test failures, log analysis,
|
||||
---
|
||||
|
||||
# Debugging
|
||||
|
||||
This skill provides comprehensive debugging capabilities for identifying and fixing errors, test failures, unexpected behavior, and production issues. It combines general debugging workflows with specialized error analysis, log parsing, and pattern recognition.
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
- When encountering errors or exceptions in code
|
||||
- When tests are failing and you need to understand why
|
||||
- When investigating unexpected behavior or bugs
|
||||
- When analyzing stack traces and error messages
|
||||
- When debugging production issues
|
||||
- When fixing issues reported by users or QA
|
||||
- When analyzing error logs and stack traces
|
||||
- When investigating performance issues or anomalies
|
||||
- When correlating errors across multiple services
|
||||
- When identifying recurring error patterns
|
||||
- When setting up error monitoring and alerting
|
||||
- When conducting post-mortem analysis of incidents
|
||||
|
||||
## What This Skill Does
|
||||
|
||||
1. **Error Analysis**: Captures and analyzes error messages and stack traces
|
||||
2. **Log Parsing**: Extracts errors from logs using regex patterns and structured parsing
|
||||
3. **Stack Trace Analysis**: Analyzes stack traces across multiple programming languages
|
||||
4. **Error Correlation**: Identifies relationships between errors across distributed systems
|
||||
5. **Pattern Recognition**: Detects common error patterns and anti-patterns
|
||||
6. **Reproduction**: Identifies steps to reproduce the issue
|
||||
7. **Isolation**: Locates the exact failure point in code
|
||||
8. **Root Cause Analysis**: Works backward from symptoms to identify underlying causes
|
||||
9. **Minimal Fix**: Implements the smallest change that resolves the issue
|
||||
10. **Verification**: Confirms the solution works and doesn't introduce new issues
|
||||
11. **Monitoring Setup**: Creates queries and alerts for error detection
|
||||
|
||||
## Helper Scripts
|
||||
|
||||
This skill includes Python helper scripts in `scripts/`:
|
||||
|
||||
- **`parse_logs.py`**: Parses log files and extracts errors, exceptions, and stack traces. Outputs JSON with error analysis and pattern detection.
|
||||
|
||||
```bash
|
||||
python scripts/parse_logs.py /var/log/app.log
|
||||
```
|
||||
|
||||
## How to Use
|
||||
|
||||
### Debug an Error
|
||||
|
||||
```
|
||||
Debug this error: TypeError: Cannot read property 'x' of undefined
|
||||
```
|
||||
|
||||
```
|
||||
Investigate why the test is failing in test_user_service.js
|
||||
```
|
||||
|
||||
### Analyze Error Logs
|
||||
|
||||
```
|
||||
Analyze the error logs in /var/log/app.log and identify the root cause
|
||||
```
|
||||
|
||||
```
|
||||
Investigate why the API is returning 500 errors
|
||||
```
|
||||
|
||||
### Pattern Detection
|
||||
|
||||
```
|
||||
Find patterns in these error logs from the past 24 hours
|
||||
```
|
||||
|
||||
```
|
||||
Correlate errors between the API service and database
|
||||
```
|
||||
|
||||
## Debugging Process
|
||||
|
||||
### 1. Capture Error Information
|
||||
|
||||
**Error Message:**
|
||||
|
||||
- Read the full error message
|
||||
- Note the error type (TypeError, ReferenceError, etc.)
|
||||
- Identify the error location (file and line number)
|
||||
|
||||
**Stack Trace:**
|
||||
|
||||
- Analyze the call stack
|
||||
- Identify the sequence of function calls
|
||||
- Find where the error originated
|
||||
|
||||
**Context:**
|
||||
|
||||
- Check recent code changes
|
||||
- Review related code files
|
||||
- Understand the execution flow
|
||||
|
||||
### 2. Error Extraction (Log Analysis)
|
||||
|
||||
**Using Helper Script:**
|
||||
|
||||
The skill includes a Python helper script for parsing logs:
|
||||
|
||||
```bash
|
||||
# Parse log file and extract errors
|
||||
python scripts/parse_logs.py /var/log/app.log
|
||||
```
|
||||
|
||||
**Manual Log Parsing Patterns:**
|
||||
|
||||
```bash
|
||||
# Extract errors from logs
|
||||
grep -i "error\|exception\|fatal\|critical" /var/log/app.log
|
||||
|
||||
# Extract stack traces
|
||||
grep -A 20 "Exception\|Error\|Traceback" /var/log/app.log
|
||||
|
||||
# Extract specific error types
|
||||
grep "TypeError\|ReferenceError\|SyntaxError" /var/log/app.log
|
||||
```
|
||||
|
||||
**Structured Log Parsing:**
|
||||
|
||||
```javascript
|
||||
// Parse JSON logs
|
||||
const errors = logs
|
||||
.filter(log => log.level === 'error' || log.level === 'critical')
|
||||
.map(log => ({
|
||||
timestamp: log.timestamp,
|
||||
message: log.message,
|
||||
stack: log.stack,
|
||||
context: log.context
|
||||
}));
|
||||
```
|
||||
|
||||
### 3. Stack Trace Analysis
|
||||
|
||||
**Common Patterns:**
|
||||
|
||||
**JavaScript/Node.js:**
|
||||
|
||||
```
|
||||
Error: Cannot read property 'x' of undefined
|
||||
at FunctionName (file.js:123:45)
|
||||
at AnotherFunction (file.js:456:78)
|
||||
```
|
||||
|
||||
**Python:**
|
||||
|
||||
```
|
||||
Traceback (most recent call last):
|
||||
File "app.py", line 123, in function_name
|
||||
result = process(data)
|
||||
File "utils.py", line 45, in process
|
||||
return data['key']
|
||||
KeyError: 'key'
|
||||
```
|
||||
|
||||
**Java:**
|
||||
|
||||
```
|
||||
java.lang.NullPointerException
|
||||
at com.example.Class.method(Class.java:123)
|
||||
at com.example.AnotherClass.call(AnotherClass.java:456)
|
||||
```
|
||||
|
||||
### 4. Error Correlation
|
||||
|
||||
**Timeline Analysis:**
|
||||
|
||||
- Group errors by timestamp
|
||||
- Identify error spikes and patterns
|
||||
- Correlate with deployments or changes
|
||||
- Check for cascading failures
|
||||
|
||||
**Service Correlation:**
|
||||
|
||||
- Map errors across service boundaries
|
||||
- Identify upstream/downstream relationships
|
||||
- Track error propagation paths
|
||||
- Find common failure points
|
||||
|
||||
### 5. Pattern Recognition
|
||||
|
||||
**Common Error Patterns:**
|
||||
|
||||
**N+1 Query Problem:**
|
||||
|
||||
```
|
||||
Multiple database queries in loop
|
||||
Pattern: SELECT * FROM users; SELECT * FROM posts WHERE user_id = ?
|
||||
```
|
||||
|
||||
**Memory Leaks:**
|
||||
|
||||
```
|
||||
Gradually increasing memory usage
|
||||
Pattern: Memory growth over time without release
|
||||
```
|
||||
|
||||
**Race Conditions:**
|
||||
|
||||
```
|
||||
Intermittent failures under load
|
||||
Pattern: Errors only occur with concurrent requests
|
||||
```
|
||||
|
||||
**Timeout Issues:**
|
||||
|
||||
```
|
||||
Requests timing out
|
||||
Pattern: Errors after specific duration (e.g., 30s)
|
||||
```
|
||||
|
||||
### 6. Reproduce the Issue
|
||||
|
||||
**Reproduction Steps:**
|
||||
|
||||
1. Identify the exact conditions that trigger the error
|
||||
2. Create a minimal test case that reproduces the issue
|
||||
3. Verify the issue is consistent and reproducible
|
||||
4. Document the steps clearly
|
||||
|
||||
**Example:**
|
||||
|
||||
```markdown
|
||||
## Reproduction Steps
|
||||
|
||||
1. Navigate to `/users/123`
|
||||
2. Click "Edit Profile"
|
||||
3. Submit form without filling required fields
|
||||
4. Error occurs: "Cannot read property 'validate' of undefined"
|
||||
```
|
||||
|
||||
### 7. Isolate the Failure Location
|
||||
|
||||
**Code Analysis:**
|
||||
|
||||
- Read the code around the error location
|
||||
- Trace the execution path
|
||||
- Identify where the assumption breaks
|
||||
- Check variable states and values
|
||||
|
||||
**Debugging Techniques:**
|
||||
|
||||
- Add strategic logging to track execution
|
||||
- Use debugger breakpoints
|
||||
- Inspect variable states
|
||||
- Check function return values
|
||||
- Verify data structures
|
||||
|
||||
### 8. Form and Test Hypotheses
|
||||
|
||||
**Hypothesis Formation:**
|
||||
|
||||
- What could cause this error?
|
||||
- What assumptions might be wrong?
|
||||
- What edge cases weren't considered?
|
||||
- What dependencies might be missing?
|
||||
|
||||
**Testing Hypotheses:**
|
||||
|
||||
- Add logging to verify assumptions
|
||||
- Test edge cases
|
||||
- Check input validation
|
||||
- Verify dependencies are available
|
||||
- Test with different data
|
||||
|
||||
### 9. Root Cause Analysis
|
||||
|
||||
**Investigation Steps:**
|
||||
|
||||
1. **Start with Symptoms**: What error is occurring?
|
||||
2. **Work Backward**: What changed before the error?
|
||||
3. **Check Patterns**: Is this recurring or isolated?
|
||||
4. **Correlate Events**: What else happened at the same time?
|
||||
5. **Identify Cause**: What is the underlying issue?
|
||||
|
||||
**Analysis Framework:**
|
||||
|
||||
```markdown
|
||||
## Error Analysis
|
||||
|
||||
**Error**: [Description]
|
||||
**Frequency**: [How often]
|
||||
**Timeline**: [When it started]
|
||||
**Affected Services**: [Which services]
|
||||
**User Impact**: [How many users affected]
|
||||
|
||||
**Root Cause Hypothesis**:
|
||||
- [Primary hypothesis with evidence]
|
||||
- [Alternative hypotheses]
|
||||
|
||||
**Evidence**:
|
||||
- [Log entries supporting hypothesis]
|
||||
- [Error patterns observed]
|
||||
- [Correlation with other events]
|
||||
|
||||
**Recommended Actions**:
|
||||
- [Immediate fix]
|
||||
- [Long-term prevention]
|
||||
```
|
||||
|
||||
### 10. Implement Minimal Fix
|
||||
|
||||
**Fix Principles:**
|
||||
|
||||
- Fix the root cause, not just symptoms
|
||||
- Make the smallest change possible
|
||||
- Preserve existing functionality
|
||||
- Don't introduce new complexity
|
||||
- Add appropriate error handling
|
||||
|
||||
**Fix Verification:**
|
||||
|
||||
- Test the fix with the reproduction case
|
||||
- Verify no regressions
|
||||
- Check edge cases
|
||||
- Ensure error handling is appropriate
|
||||
- Confirm the fix is complete
|
||||
|
||||
### 11. Document the Solution
|
||||
|
||||
**Documentation Should Include:**
|
||||
|
||||
- Root cause explanation
|
||||
- Evidence supporting the diagnosis
|
||||
- Specific code fix
|
||||
- Testing approach
|
||||
- Prevention recommendations
|
||||
|
||||
## Examples
|
||||
|
||||
### Example 1: Null Reference Error
|
||||
|
||||
**Input**: Error: `TypeError: Cannot read property 'name' of undefined`
|
||||
|
||||
**Output**:
|
||||
|
||||
```markdown
|
||||
## Debugging Analysis
|
||||
|
||||
### Error Information
|
||||
- **Type**: TypeError
|
||||
- **Location**: `src/services/user.js:45`
|
||||
- **Message**: Cannot read property 'name' of undefined
|
||||
|
||||
### Root Cause
|
||||
The `getUser()` function returns `undefined` when a user is not found, but the code assumes it always returns a user object.
|
||||
|
||||
### Code Location
|
||||
```javascript
|
||||
// src/services/user.js:45
|
||||
const user = getUser(userId);
|
||||
console.log(user.name); // ❌ Error: user is undefined
|
||||
```
|
||||
|
||||
### Fix
|
||||
|
||||
```javascript
|
||||
// Add null check
|
||||
const user = getUser(userId);
|
||||
if (!user) {
|
||||
throw new Error(`User ${userId} not found`);
|
||||
}
|
||||
console.log(user.name); // ✅ Safe
|
||||
```
|
||||
|
||||
### Prevention
|
||||
|
||||
- Always validate return values from functions that might return null/undefined
|
||||
- Use optional chaining: `user?.name`
|
||||
- Add TypeScript types to catch these issues at compile time
|
||||
|
||||
```
|
||||
|
||||
### Example 2: Database Connection Errors (Log Analysis)
|
||||
|
||||
**Input**: Error logs showing database connection failures
|
||||
|
||||
**Output**:
|
||||
|
||||
```markdown
|
||||
## Error Analysis: Database Connection Failures
|
||||
|
||||
### Pattern Detected
|
||||
|
||||
**Error**: `ECONNREFUSED` errors from PostgreSQL
|
||||
**Frequency**: 15 errors/minute, started 2 hours ago
|
||||
**Timeline**: Errors began after deployment at 14:30
|
||||
|
||||
### Root Cause Analysis
|
||||
|
||||
**Hypothesis**: Connection pool exhaustion
|
||||
|
||||
**Evidence**:
|
||||
- Errors correlate with high traffic period (14:30-16:30)
|
||||
- Connection pool size: 10, concurrent requests: 50+
|
||||
- No connection cleanup in error handlers
|
||||
- Errors spike during peak usage
|
||||
|
||||
**Code Location**: `src/db/connection.js:45`
|
||||
|
||||
**Fix**:
|
||||
```javascript
|
||||
// Add connection cleanup
|
||||
try {
|
||||
const result = await query(sql);
|
||||
return result;
|
||||
} catch (error) {
|
||||
// Ensure connection is released
|
||||
await releaseConnection();
|
||||
throw error;
|
||||
}
|
||||
```
|
||||
|
||||
**Monitoring Query**:
|
||||
|
||||
```sql
|
||||
SELECT count(*) FROM pg_stat_activity WHERE state = 'active';
|
||||
```
|
||||
|
||||
```
|
||||
|
||||
## Reference Files
|
||||
|
||||
For detailed debugging workflows, error patterns, and techniques, load reference files as needed:
|
||||
|
||||
- **`references/debugging_workflows.md`** - Common debugging workflows by issue type, language-specific debugging, debugging techniques, debugging checklists, and common error patterns (database errors, memory leaks, race conditions, timeouts, authentication errors, network errors, application errors, performance errors)
|
||||
- **`references/INCIDENT_POSTMORTEM.template.md`** - Incident postmortem template with timeline, root cause analysis, and action items
|
||||
|
||||
When debugging specific types of issues or analyzing error patterns, load `references/debugging_workflows.md` and refer to the relevant section.
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Debugging Approach
|
||||
|
||||
1. **Start with Symptoms**: Understand what's wrong before jumping to solutions
|
||||
2. **Work Backward**: Trace from error to cause
|
||||
3. **Test Hypotheses**: Don't assume, verify
|
||||
4. **Minimal Changes**: Fix only what's necessary
|
||||
5. **Verify Fixes**: Always test that the fix works
|
||||
|
||||
### Log Analysis Techniques
|
||||
|
||||
1. **Use Structured Logging**: JSON logs are easier to parse and analyze
|
||||
2. **Include Context**: Add request IDs, user IDs, timestamps to all logs
|
||||
3. **Log Levels**: Use appropriate levels (error, warn, info, debug)
|
||||
4. **Correlation IDs**: Use request IDs to trace errors across services
|
||||
5. **Error Grouping**: Group similar errors to identify patterns
|
||||
|
||||
### Error Pattern Recognition
|
||||
|
||||
**Time-Based Patterns:**
|
||||
- Errors at specific times (deployment windows, peak hours)
|
||||
- Errors after specific duration (timeouts, memory leaks)
|
||||
- Errors during specific events (database migrations, cache clears)
|
||||
|
||||
**Frequency Patterns:**
|
||||
- Sudden spikes (deployment issues, traffic spikes)
|
||||
- Gradual increases (memory leaks, resource exhaustion)
|
||||
- Intermittent (race conditions, timing issues)
|
||||
|
||||
**Correlation Patterns:**
|
||||
- Errors in multiple services simultaneously (infrastructure issues)
|
||||
- Errors after specific user actions (application bugs)
|
||||
- Errors correlated with external services (dependency issues)
|
||||
|
||||
### Common Debugging Patterns
|
||||
|
||||
**Null/Undefined Checks:**
|
||||
```javascript
|
||||
// Always check for null/undefined
|
||||
if (!value) {
|
||||
// Handle missing value
|
||||
}
|
||||
```
|
||||
|
||||
**Error Handling:**
|
||||
|
||||
```javascript
|
||||
try {
|
||||
// Risky operation
|
||||
} catch (error) {
|
||||
// Log error with context
|
||||
console.error('Operation failed:', error);
|
||||
// Handle gracefully
|
||||
}
|
||||
```
|
||||
|
||||
**Logging:**
|
||||
|
||||
```javascript
|
||||
// Strategic logging
|
||||
console.log('Before operation:', { userId, data });
|
||||
const result = await operation();
|
||||
console.log('After operation:', { result });
|
||||
```
|
||||
|
||||
**Type Checking:**
|
||||
|
||||
```javascript
|
||||
// Verify types
|
||||
if (typeof value !== 'string') {
|
||||
throw new TypeError('Expected string');
|
||||
}
|
||||
```
|
||||
|
||||
### Monitoring Setup
|
||||
|
||||
**Error Rate Monitoring:**
|
||||
|
||||
```javascript
|
||||
// Track error rate over time
|
||||
const errorRate = errors.length / totalRequests;
|
||||
if (errorRate > 0.01) { // 1% error rate threshold
|
||||
alert('High error rate detected');
|
||||
}
|
||||
```
|
||||
|
||||
**Error Alerting:**
|
||||
|
||||
- Alert on error rate spikes (> 5% increase)
|
||||
- Alert on new error types
|
||||
- Alert on critical error patterns
|
||||
- Alert on error correlation across services
|
||||
|
||||
### Prevention Strategies
|
||||
|
||||
1. **Input Validation**: Validate all inputs at boundaries
|
||||
2. **Type Safety**: Use TypeScript or type checking
|
||||
3. **Error Boundaries**: Catch errors at appropriate levels
|
||||
4. **Testing**: Write tests for edge cases
|
||||
5. **Code Review**: Review code for common pitfalls
|
||||
|
||||
## Related Use Cases
|
||||
|
||||
- Fixing production bugs
|
||||
- Debugging test failures
|
||||
- Investigating user-reported issues
|
||||
- Analyzing error logs
|
||||
- Root cause analysis
|
||||
- Performance debugging
|
||||
- Production incident investigation
|
||||
- System reliability analysis
|
||||
- Error monitoring setup
|
||||
- Post-mortem analysis
|
||||
- Debugging distributed systems
|
||||
617
.trae/skills/debugging/references/debugging-workflows.md
Normal file
617
.trae/skills/debugging/references/debugging-workflows.md
Normal file
@@ -0,0 +1,617 @@
|
||||
---
|
||||
author: Joseph OBrien
|
||||
status: unpublished
|
||||
updated: '2025-12-23'
|
||||
version: 1.0.1
|
||||
tag: skill
|
||||
type: reference
|
||||
parent: debugging
|
||||
---
|
||||
|
||||
# Debugging Workflows
|
||||
|
||||
Reference guide for common debugging workflows and techniques across different scenarios.
|
||||
|
||||
## Debugging Workflows by Issue Type
|
||||
|
||||
### Production Errors
|
||||
|
||||
**Workflow:**
|
||||
|
||||
1. Capture error message and stack trace
|
||||
2. Check error logs for context
|
||||
3. Identify when error started (deployment, traffic spike, etc.)
|
||||
4. Reproduce in staging environment
|
||||
5. Add logging to trace execution path
|
||||
6. Identify root cause
|
||||
7. Implement fix with tests
|
||||
8. Deploy and monitor
|
||||
|
||||
**Tools:**
|
||||
|
||||
- Error tracking (Sentry, Rollbar)
|
||||
- Log aggregation (ELK, Datadog)
|
||||
- APM tools (New Relic, AppDynamics)
|
||||
|
||||
### Test Failures
|
||||
|
||||
**Workflow:**
|
||||
|
||||
1. Read test failure message
|
||||
2. Understand what the test expects
|
||||
3. Run test in isolation
|
||||
4. Check test data and setup
|
||||
5. Trace through code execution
|
||||
6. Identify why test fails
|
||||
7. Fix code or test as appropriate
|
||||
8. Verify test passes
|
||||
|
||||
**Tools:**
|
||||
|
||||
- Test runner debug mode
|
||||
- IDE debugger
|
||||
- Test coverage tools
|
||||
|
||||
### Performance Issues
|
||||
|
||||
**Workflow:**
|
||||
|
||||
1. Measure current performance
|
||||
2. Identify slow operations
|
||||
3. Profile to find bottlenecks
|
||||
4. Analyze profiling data
|
||||
5. Optimize identified bottlenecks
|
||||
6. Measure improvement
|
||||
7. Verify no regressions
|
||||
|
||||
**Tools:**
|
||||
|
||||
- Profilers (Chrome DevTools, py-spy)
|
||||
- APM tools
|
||||
- Performance monitoring
|
||||
|
||||
## Language-Specific Debugging
|
||||
|
||||
### JavaScript/Node.js
|
||||
|
||||
**Debugging Tools:**
|
||||
|
||||
- Chrome DevTools
|
||||
- Node.js debugger
|
||||
- console.log (strategic logging)
|
||||
- debugger statement
|
||||
|
||||
**Common Issues:**
|
||||
|
||||
- Undefined variables
|
||||
- Async/await errors
|
||||
- Promise rejections
|
||||
- Scope issues
|
||||
- Type coercion
|
||||
|
||||
**Techniques:**
|
||||
|
||||
- Use debugger breakpoints
|
||||
- Log variable states
|
||||
- Check call stack
|
||||
- Inspect closures
|
||||
- Monitor event loop
|
||||
|
||||
### Python
|
||||
|
||||
**Debugging Tools:**
|
||||
|
||||
- pdb (Python debugger)
|
||||
- ipdb (enhanced debugger)
|
||||
- print() statements
|
||||
- logging module
|
||||
|
||||
**Common Issues:**
|
||||
|
||||
- AttributeError
|
||||
- TypeError
|
||||
- IndentationError
|
||||
- Import errors
|
||||
- NameError
|
||||
|
||||
**Techniques:**
|
||||
|
||||
- Use pdb.set_trace()
|
||||
- Check variable types
|
||||
- Verify imports
|
||||
- Check indentation
|
||||
- Use type hints
|
||||
|
||||
### Java
|
||||
|
||||
**Debugging Tools:**
|
||||
|
||||
- IntelliJ debugger
|
||||
- Eclipse debugger
|
||||
- jdb (command line)
|
||||
- Logging frameworks
|
||||
|
||||
**Common Issues:**
|
||||
|
||||
- NullPointerException
|
||||
- ClassCastException
|
||||
- OutOfMemoryError
|
||||
- StackOverflowError
|
||||
|
||||
**Techniques:**
|
||||
|
||||
- Set breakpoints
|
||||
- Inspect variables
|
||||
- Check exception stack traces
|
||||
- Monitor memory usage
|
||||
- Use profilers
|
||||
|
||||
## Debugging Techniques
|
||||
|
||||
### Binary Search
|
||||
|
||||
**When to Use:**
|
||||
|
||||
- Large codebase
|
||||
- Unclear where issue is
|
||||
- Many potential causes
|
||||
|
||||
**Process:**
|
||||
|
||||
1. Divide code in half
|
||||
2. Test which half has issue
|
||||
3. Repeat on problematic half
|
||||
4. Narrow down to specific location
|
||||
|
||||
### Rubber Duck Debugging
|
||||
|
||||
**Process:**
|
||||
|
||||
1. Explain code to "rubber duck" (or yourself)
|
||||
2. Walk through execution step by step
|
||||
3. Identify assumptions
|
||||
4. Find where logic breaks
|
||||
|
||||
### Logging Strategy
|
||||
|
||||
**What to Log:**
|
||||
|
||||
- Function entry/exit
|
||||
- Variable values at key points
|
||||
- Decision points (if/else branches)
|
||||
- Error conditions
|
||||
- Performance metrics
|
||||
|
||||
**Log Levels:**
|
||||
|
||||
- DEBUG: Detailed diagnostic info
|
||||
- INFO: General informational messages
|
||||
- WARN: Warning messages
|
||||
- ERROR: Error conditions
|
||||
- CRITICAL: Critical failures
|
||||
|
||||
### Reproducing Issues
|
||||
|
||||
**Steps:**
|
||||
|
||||
1. Identify exact conditions that trigger issue
|
||||
2. Document steps to reproduce
|
||||
3. Create minimal test case
|
||||
4. Verify issue reproduces consistently
|
||||
5. Isolate variables
|
||||
|
||||
**Common Challenges:**
|
||||
|
||||
- Intermittent issues
|
||||
- Race conditions
|
||||
- Environment-specific
|
||||
- Data-dependent
|
||||
|
||||
## Debugging Checklist
|
||||
|
||||
### Before Starting
|
||||
|
||||
- [ ] Understand what should happen
|
||||
- [ ] Understand what's actually happening
|
||||
- [ ] Have reproduction steps
|
||||
- [ ] Have access to logs/debugger
|
||||
|
||||
### During Debugging
|
||||
|
||||
- [ ] Form hypotheses
|
||||
- [ ] Test hypotheses systematically
|
||||
- [ ] Document findings
|
||||
- [ ] Check assumptions
|
||||
- [ ] Look for patterns
|
||||
|
||||
### After Finding Root Cause
|
||||
|
||||
- [ ] Verify root cause is correct
|
||||
- [ ] Understand why it happened
|
||||
- [ ] Implement fix
|
||||
- [ ] Test fix thoroughly
|
||||
- [ ] Check for similar issues
|
||||
- [ ] Document solution
|
||||
|
||||
## Common Error Patterns
|
||||
|
||||
Reference guide for identifying and resolving common error patterns across different systems and languages.
|
||||
|
||||
### Database Errors
|
||||
|
||||
#### Connection Pool Exhaustion
|
||||
|
||||
**Symptoms:**
|
||||
|
||||
- `ECONNREFUSED` errors
|
||||
- Errors spike during high traffic
|
||||
- Connection pool size is smaller than concurrent requests
|
||||
|
||||
**Pattern:**
|
||||
|
||||
```
|
||||
Error: ECONNREFUSED
|
||||
Connection pool exhausted
|
||||
Too many connections
|
||||
```
|
||||
|
||||
**Root Causes:**
|
||||
|
||||
- Connection pool size too small
|
||||
- Connections not being released
|
||||
- Long-running transactions holding connections
|
||||
- Missing connection cleanup in error handlers
|
||||
|
||||
**Solutions:**
|
||||
|
||||
- Increase connection pool size
|
||||
- Ensure connections are released in finally blocks
|
||||
- Add connection timeout
|
||||
- Implement connection retry logic
|
||||
|
||||
#### N+1 Query Problem
|
||||
|
||||
**Symptoms:**
|
||||
|
||||
- Slow response times
|
||||
- Many database queries for single operation
|
||||
- Queries increase linearly with data size
|
||||
|
||||
**Pattern:**
|
||||
|
||||
```
|
||||
SELECT * FROM users;
|
||||
SELECT * FROM posts WHERE user_id = 1;
|
||||
SELECT * FROM posts WHERE user_id = 2;
|
||||
SELECT * FROM posts WHERE user_id = 3;
|
||||
...
|
||||
```
|
||||
|
||||
**Solutions:**
|
||||
|
||||
- Use eager loading (JOINs)
|
||||
- Batch queries
|
||||
- Use data loaders
|
||||
- Implement query result caching
|
||||
|
||||
### Memory Leaks
|
||||
|
||||
#### Event Listener Leaks
|
||||
|
||||
**Symptoms:**
|
||||
|
||||
- Memory usage grows over time
|
||||
- No decrease after component/page unload
|
||||
- Correlates with user interactions
|
||||
|
||||
**Pattern:**
|
||||
|
||||
```javascript
|
||||
// Problem: Listeners registered but never removed
|
||||
window.addEventListener('resize', handler);
|
||||
// Missing: window.removeEventListener('resize', handler);
|
||||
```
|
||||
|
||||
**Solutions:**
|
||||
|
||||
- Always remove event listeners
|
||||
- Use cleanup functions in React useEffect
|
||||
- Use WeakMap for automatic cleanup
|
||||
- Monitor listener count
|
||||
|
||||
#### Closure Leaks
|
||||
|
||||
**Symptoms:**
|
||||
|
||||
- Memory growth in long-running applications
|
||||
- Large objects retained in closures
|
||||
- Circular references
|
||||
|
||||
**Pattern:**
|
||||
|
||||
```javascript
|
||||
// Problem: Large object retained in closure
|
||||
function createHandler(largeData) {
|
||||
return function() {
|
||||
// largeData retained even if not used
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
**Solutions:**
|
||||
|
||||
- Avoid retaining large objects in closures
|
||||
- Use WeakMap/WeakSet when possible
|
||||
- Clear references when done
|
||||
- Use memory profilers to identify leaks
|
||||
|
||||
### Race Conditions
|
||||
|
||||
#### Concurrent Modification
|
||||
|
||||
**Symptoms:**
|
||||
|
||||
- Intermittent failures
|
||||
- Data inconsistency
|
||||
- Errors only under load
|
||||
|
||||
**Pattern:**
|
||||
|
||||
```
|
||||
Thread 1: Read value (100)
|
||||
Thread 2: Read value (100)
|
||||
Thread 1: Write value (101)
|
||||
Thread 2: Write value (101) // Lost update
|
||||
```
|
||||
|
||||
**Solutions:**
|
||||
|
||||
- Use locks/mutexes
|
||||
- Implement optimistic locking
|
||||
- Use atomic operations
|
||||
- Add request queuing
|
||||
|
||||
#### Async Race Conditions
|
||||
|
||||
**Symptoms:**
|
||||
|
||||
- Results arrive out of order
|
||||
- Stale data displayed
|
||||
- Race between multiple async operations
|
||||
|
||||
**Pattern:**
|
||||
|
||||
```javascript
|
||||
// Problem: Race between requests
|
||||
fetch('/api/users/1').then(setUser1);
|
||||
fetch('/api/users/2').then(setUser2);
|
||||
// Results may arrive in wrong order
|
||||
```
|
||||
|
||||
**Solutions:**
|
||||
|
||||
- Use Promise.all for parallel operations
|
||||
- Cancel previous requests
|
||||
- Use request IDs to match responses
|
||||
- Implement request deduplication
|
||||
|
||||
### Timeout Issues
|
||||
|
||||
#### Request Timeouts
|
||||
|
||||
**Symptoms:**
|
||||
|
||||
- Requests fail after specific duration
|
||||
- Timeout errors in logs
|
||||
- Slow external dependencies
|
||||
|
||||
**Pattern:**
|
||||
|
||||
```
|
||||
Error: Request timeout after 30000ms
|
||||
ETIMEDOUT
|
||||
```
|
||||
|
||||
**Solutions:**
|
||||
|
||||
- Increase timeout for slow operations
|
||||
- Implement retry with exponential backoff
|
||||
- Add timeout configuration
|
||||
- Optimize slow operations
|
||||
|
||||
#### Database Query Timeouts
|
||||
|
||||
**Symptoms:**
|
||||
|
||||
- Queries fail after timeout period
|
||||
- Slow query logs show long-running queries
|
||||
- Timeouts during peak load
|
||||
|
||||
**Solutions:**
|
||||
|
||||
- Optimize slow queries
|
||||
- Add appropriate indexes
|
||||
- Increase query timeout
|
||||
- Implement query cancellation
|
||||
|
||||
### Authentication Errors
|
||||
|
||||
#### Token Expiration
|
||||
|
||||
**Symptoms:**
|
||||
|
||||
- 401 Unauthorized errors
|
||||
- Errors after specific time period
|
||||
- Token refresh needed
|
||||
|
||||
**Pattern:**
|
||||
|
||||
```
|
||||
401 Unauthorized
|
||||
Token expired
|
||||
Invalid token
|
||||
```
|
||||
|
||||
**Solutions:**
|
||||
|
||||
- Implement token refresh logic
|
||||
- Handle token expiration gracefully
|
||||
- Add token expiration checks
|
||||
- Use refresh tokens
|
||||
|
||||
#### Session Expiration
|
||||
|
||||
**Symptoms:**
|
||||
|
||||
- Users logged out unexpectedly
|
||||
- Session errors after inactivity
|
||||
- Cookie expiration issues
|
||||
|
||||
**Solutions:**
|
||||
|
||||
- Extend session on activity
|
||||
- Implement session refresh
|
||||
- Handle expiration gracefully
|
||||
- Clear expired sessions
|
||||
|
||||
### Network Errors
|
||||
|
||||
#### Connection Refused
|
||||
|
||||
**Symptoms:**
|
||||
|
||||
- Service unavailable errors
|
||||
- Connection refused errors
|
||||
- Service not running
|
||||
|
||||
**Pattern:**
|
||||
|
||||
```
|
||||
ECONNREFUSED
|
||||
Connection refused
|
||||
Service unavailable
|
||||
```
|
||||
|
||||
**Solutions:**
|
||||
|
||||
- Check if service is running
|
||||
- Verify port configuration
|
||||
- Check firewall rules
|
||||
- Implement health checks
|
||||
|
||||
#### DNS Resolution Failures
|
||||
|
||||
**Symptoms:**
|
||||
|
||||
- Cannot resolve hostname
|
||||
- DNS lookup failures
|
||||
- Network configuration issues
|
||||
|
||||
**Pattern:**
|
||||
|
||||
```
|
||||
ENOTFOUND
|
||||
DNS resolution failed
|
||||
getaddrinfo failed
|
||||
```
|
||||
|
||||
**Solutions:**
|
||||
|
||||
- Verify DNS configuration
|
||||
- Check hostname spelling
|
||||
- Use IP addresses as fallback
|
||||
- Implement DNS caching
|
||||
|
||||
### Application Errors
|
||||
|
||||
#### Null Reference Errors
|
||||
|
||||
**Symptoms:**
|
||||
|
||||
- NullPointerException (Java)
|
||||
- TypeError: Cannot read property (JavaScript)
|
||||
- AttributeError (Python)
|
||||
|
||||
**Pattern:**
|
||||
|
||||
```
|
||||
TypeError: Cannot read property 'x' of undefined
|
||||
NullPointerException
|
||||
AttributeError: 'NoneType' object has no attribute
|
||||
```
|
||||
|
||||
**Solutions:**
|
||||
|
||||
- Add null checks
|
||||
- Use optional chaining
|
||||
- Provide default values
|
||||
- Validate inputs
|
||||
|
||||
#### Type Errors
|
||||
|
||||
**Symptoms:**
|
||||
|
||||
- Type mismatch errors
|
||||
- Invalid type errors
|
||||
- Casting failures
|
||||
|
||||
**Pattern:**
|
||||
|
||||
```
|
||||
TypeError: expected string, got number
|
||||
InvalidCastException
|
||||
Type mismatch
|
||||
```
|
||||
|
||||
**Solutions:**
|
||||
|
||||
- Add type validation
|
||||
- Use type guards
|
||||
- Implement proper type checking
|
||||
- Handle type conversions
|
||||
|
||||
### Performance Errors
|
||||
|
||||
#### Out of Memory
|
||||
|
||||
**Symptoms:**
|
||||
|
||||
- Application crashes
|
||||
- Memory limit exceeded
|
||||
- Heap out of memory
|
||||
|
||||
**Pattern:**
|
||||
|
||||
```
|
||||
OutOfMemoryError
|
||||
Heap out of memory
|
||||
Memory limit exceeded
|
||||
```
|
||||
|
||||
**Solutions:**
|
||||
|
||||
- Increase memory limits
|
||||
- Optimize memory usage
|
||||
- Implement pagination
|
||||
- Use streaming for large data
|
||||
|
||||
#### CPU Exhaustion
|
||||
|
||||
**Symptoms:**
|
||||
|
||||
- Slow response times
|
||||
- High CPU usage
|
||||
- Application freezing
|
||||
|
||||
**Pattern:**
|
||||
|
||||
- High CPU utilization (90%+)
|
||||
- Slow processing
|
||||
- Event loop blocking
|
||||
|
||||
**Solutions:**
|
||||
|
||||
- Optimize algorithms
|
||||
- Use worker threads
|
||||
- Implement caching
|
||||
- Break up long-running tasks
|
||||
@@ -0,0 +1,206 @@
|
||||
---
|
||||
author: Joseph OBrien
|
||||
status: unpublished
|
||||
updated: '2025-12-23'
|
||||
version: 1.0.1
|
||||
tag: skill
|
||||
type: reference
|
||||
parent: debugging
|
||||
---
|
||||
|
||||
# Incident Postmortem: {{INCIDENT_TITLE}}
|
||||
|
||||
**Incident ID:** {{INC-XXXX}}
|
||||
**Date:** {{YYYY-MM-DD}}
|
||||
**Duration:** {{START_TIME}} - {{END_TIME}} ({{DURATION}})
|
||||
**Severity:** {{SEV1|SEV2|SEV3|SEV4}}
|
||||
**Status:** {{RESOLVED|MONITORING}}
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
{{ONE_PARAGRAPH_SUMMARY}}
|
||||
|
||||
### Impact
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| Users Affected | {{N}} |
|
||||
| Revenue Impact | ${{N}} |
|
||||
| Requests Failed | {{N}} |
|
||||
| Error Rate | {{N}}% |
|
||||
| Downtime | {{DURATION}} |
|
||||
|
||||
---
|
||||
|
||||
## Timeline
|
||||
|
||||
| Time (UTC) | Event |
|
||||
|------------|-------|
|
||||
| {{HH:MM}} | {{TRIGGER_EVENT}} |
|
||||
| {{HH:MM}} | Alert fired: {{ALERT_NAME}} |
|
||||
| {{HH:MM}} | On-call paged |
|
||||
| {{HH:MM}} | Investigation started |
|
||||
| {{HH:MM}} | Root cause identified |
|
||||
| {{HH:MM}} | Mitigation applied |
|
||||
| {{HH:MM}} | Service recovered |
|
||||
| {{HH:MM}} | Incident closed |
|
||||
|
||||
---
|
||||
|
||||
## Root Cause
|
||||
|
||||
{{DETAILED_ROOT_CAUSE_ANALYSIS}}
|
||||
|
||||
### Contributing Factors
|
||||
|
||||
1. {{FACTOR_1}}
|
||||
2. {{FACTOR_2}}
|
||||
3. {{FACTOR_3}}
|
||||
|
||||
### What Failed
|
||||
|
||||
- **Detection:** {{HOW_WAS_IT_DETECTED}}
|
||||
- **Prevention:** {{WHY_WASNT_IT_PREVENTED}}
|
||||
- **Response:** {{RESPONSE_GAPS}}
|
||||
|
||||
---
|
||||
|
||||
## Resolution
|
||||
|
||||
### Immediate Actions
|
||||
|
||||
1. {{ACTION_1}}
|
||||
2. {{ACTION_2}}
|
||||
|
||||
### Mitigation Steps
|
||||
|
||||
```bash
|
||||
{{COMMANDS_OR_STEPS_TAKEN}}
|
||||
```
|
||||
|
||||
### Verification
|
||||
|
||||
- [ ] Service health restored
|
||||
- [ ] Error rates normalized
|
||||
- [ ] No recurring alerts
|
||||
|
||||
---
|
||||
|
||||
## Lessons Learned
|
||||
|
||||
### What Went Well
|
||||
|
||||
- {{POSITIVE_1}}
|
||||
- {{POSITIVE_2}}
|
||||
|
||||
### What Went Wrong
|
||||
|
||||
- {{NEGATIVE_1}}
|
||||
- {{NEGATIVE_2}}
|
||||
|
||||
### Where We Got Lucky
|
||||
|
||||
- {{LUCKY_1}}
|
||||
|
||||
---
|
||||
|
||||
## Action Items
|
||||
|
||||
| ID | Action | Owner | Priority | Due Date | Status |
|
||||
|----|--------|-------|----------|----------|--------|
|
||||
| 1 | {{ACTION}} | {{OWNER}} | {{P1-4}} | {{DATE}} | {{STATUS}} |
|
||||
| 2 | {{ACTION}} | {{OWNER}} | {{P1-4}} | {{DATE}} | {{STATUS}} |
|
||||
| 3 | {{ACTION}} | {{OWNER}} | {{P1-4}} | {{DATE}} | {{STATUS}} |
|
||||
|
||||
### Prevention
|
||||
|
||||
- [ ] {{PREVENTIVE_MEASURE_1}}
|
||||
- [ ] {{PREVENTIVE_MEASURE_2}}
|
||||
|
||||
### Detection
|
||||
|
||||
- [ ] {{DETECTION_IMPROVEMENT_1}}
|
||||
- [ ] {{DETECTION_IMPROVEMENT_2}}
|
||||
|
||||
### Response
|
||||
|
||||
- [ ] {{RESPONSE_IMPROVEMENT_1}}
|
||||
- [ ] {{RESPONSE_IMPROVEMENT_2}}
|
||||
|
||||
---
|
||||
|
||||
## Technical Details
|
||||
|
||||
### Affected Systems
|
||||
|
||||
| System | Impact | Recovery |
|
||||
|--------|--------|----------|
|
||||
| {{SYSTEM}} | {{DESCRIPTION}} | {{TIME}} |
|
||||
|
||||
### Metrics During Incident
|
||||
|
||||
| Metric | Normal | During Incident | Peak |
|
||||
|--------|--------|-----------------|------|
|
||||
| Latency (p99) | {{MS}} | {{MS}} | {{MS}} |
|
||||
| Error Rate | {{N}}% | {{N}}% | {{N}}% |
|
||||
| CPU Usage | {{N}}% | {{N}}% | {{N}}% |
|
||||
| Memory | {{N}}GB | {{N}}GB | {{N}}GB |
|
||||
|
||||
### Logs
|
||||
|
||||
```
|
||||
{{RELEVANT_LOG_SNIPPETS}}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Communication
|
||||
|
||||
### Internal
|
||||
|
||||
| Time | Channel | Message |
|
||||
|------|---------|---------|
|
||||
| {{TIME}} | {{SLACK/EMAIL}} | {{SUMMARY}} |
|
||||
|
||||
### External
|
||||
|
||||
| Time | Channel | Audience | Message |
|
||||
|------|---------|----------|---------|
|
||||
| {{TIME}} | Status Page | Customers | {{MESSAGE}} |
|
||||
|
||||
---
|
||||
|
||||
## Related Incidents
|
||||
|
||||
| ID | Date | Similarity |
|
||||
|----|------|------------|
|
||||
| {{INC-XXXX}} | {{DATE}} | {{DESCRIPTION}} |
|
||||
|
||||
---
|
||||
|
||||
## Appendix
|
||||
|
||||
### A. Alert Configuration
|
||||
|
||||
```yaml
|
||||
{{ALERT_CONFIG}}
|
||||
```
|
||||
|
||||
### B. Runbook Updates Needed
|
||||
|
||||
- {{RUNBOOK_UPDATE_1}}
|
||||
- {{RUNBOOK_UPDATE_2}}
|
||||
|
||||
---
|
||||
|
||||
## Quality Checklist
|
||||
|
||||
- [ ] Timeline is complete and accurate
|
||||
- [ ] Root cause clearly identified
|
||||
- [ ] Impact quantified
|
||||
- [ ] Action items have owners and due dates
|
||||
- [ ] Lessons learned documented
|
||||
- [ ] Prevention measures identified
|
||||
- [ ] Related incidents linked
|
||||
124
.trae/skills/debugging/scripts/parse-logs.py
Normal file
124
.trae/skills/debugging/scripts/parse-logs.py
Normal file
@@ -0,0 +1,124 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Error log parser for debugging skill.
|
||||
Extracts errors, exceptions, and stack traces from log files.
|
||||
"""
|
||||
|
||||
import json
|
||||
import re
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
def extract_errors(log_content: str) -> list[dict]:
|
||||
"""Extract errors and exceptions from log content."""
|
||||
errors = []
|
||||
|
||||
patterns = {
|
||||
"error": re.compile(r"(?i)(error|ERROR):\s*(.+?)(?:\n|$)"),
|
||||
"exception": re.compile(r"(?i)(exception|Exception|EXCEPTION):\s*(.+?)(?:\n|$)"),
|
||||
"fatal": re.compile(r"(?i)(fatal|FATAL):\s*(.+?)(?:\n|$)"),
|
||||
"critical": re.compile(r"(?i)(critical|CRITICAL):\s*(.+?)(?:\n|$)"),
|
||||
"traceback": re.compile(r"Traceback \(most recent call last\):(.+?)(?=\n\w|\Z)", re.DOTALL),
|
||||
}
|
||||
|
||||
lines = log_content.split("\n")
|
||||
for i, line in enumerate(lines):
|
||||
for error_type, pattern in patterns.items():
|
||||
match = pattern.search(line)
|
||||
if match:
|
||||
error = {
|
||||
"type": error_type,
|
||||
"message": (match.group(2) if len(match.groups()) > 1 else match.group(0)),
|
||||
"line_number": i + 1,
|
||||
"line_content": line,
|
||||
"timestamp": extract_timestamp(line),
|
||||
}
|
||||
|
||||
if error_type == "traceback":
|
||||
error["stack_trace"] = match.group(1).strip()
|
||||
|
||||
errors.append(error)
|
||||
|
||||
return errors
|
||||
|
||||
|
||||
def extract_timestamp(line: str) -> str | None:
|
||||
"""Extract timestamp from log line if present."""
|
||||
timestamp_patterns = [
|
||||
r"(\d{4}-\d{2}-\d{2}[\sT]\d{2}:\d{2}:\d{2})",
|
||||
r"(\d{2}/\d{2}/\d{4}\s+\d{2}:\d{2}:\d{2})",
|
||||
r"\[(\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2})\]",
|
||||
]
|
||||
|
||||
for pattern in timestamp_patterns:
|
||||
match = re.search(pattern, line)
|
||||
if match:
|
||||
return match.group(1)
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def group_errors_by_type(errors: list[dict]) -> dict[str, list[dict]]:
|
||||
"""Group errors by type."""
|
||||
grouped: dict[str, list[dict]] = {}
|
||||
for error in errors:
|
||||
error_type = error["type"]
|
||||
if error_type not in grouped:
|
||||
grouped[error_type] = []
|
||||
grouped[error_type].append(error)
|
||||
return grouped
|
||||
|
||||
|
||||
def analyze_error_patterns(errors: list[dict]) -> dict:
|
||||
"""Analyze error patterns and provide insights."""
|
||||
if not errors:
|
||||
return {}
|
||||
|
||||
type_counts: dict[str, int] = {}
|
||||
for error in errors:
|
||||
error_type = error["type"]
|
||||
type_counts[error_type] = type_counts.get(error_type, 0) + 1
|
||||
|
||||
message_counts: dict[str, int] = {}
|
||||
for error in errors:
|
||||
message = error.get("message", "")[:100]
|
||||
message_counts[message] = message_counts.get(message, 0) + 1
|
||||
|
||||
most_common = sorted(message_counts.items(), key=lambda x: x[1], reverse=True)[:5]
|
||||
|
||||
return {
|
||||
"total_errors": len(errors),
|
||||
"by_type": type_counts,
|
||||
"most_common_errors": [{"message": msg, "count": count} for msg, count in most_common],
|
||||
}
|
||||
|
||||
|
||||
def main():
|
||||
"""Main entry point."""
|
||||
if len(sys.argv) < 2:
|
||||
print("Usage: parse_logs.py <log_file>")
|
||||
sys.exit(1)
|
||||
|
||||
log_file = Path(sys.argv[1])
|
||||
if not log_file.exists():
|
||||
print(f"Error: File not found: {log_file}")
|
||||
sys.exit(1)
|
||||
|
||||
log_content = log_file.read_text()
|
||||
|
||||
errors = extract_errors(log_content)
|
||||
|
||||
analysis = analyze_error_patterns(errors)
|
||||
|
||||
output = {
|
||||
"file": str(log_file),
|
||||
"errors": errors,
|
||||
"analysis": analysis,
|
||||
"grouped_by_type": group_errors_by_type(errors),
|
||||
}
|
||||
|
||||
print(json.dumps(output, indent=2))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
Reference in New Issue
Block a user