← Back to Dashboard
📖 GitHub Xray - Complete Owner's Manual
Table of Contents
Overview
Capabilities
Features
Access Methods
Constraints & Limits
Expected Behavior
Best Practices
Troubleshooting
---
Overview
GitHub Xray is a comprehensive repository intelligence and production readiness analysis tool. It scans GitHub repositories (user or organization), analyzes their tech stack, structure, and completeness, then generates detailed reports with production-readiness scores.
What it does:
- Fetches all repositories for a GitHub user or organization
- Analyzes each repository's file structure and contents
- Detects tech stack (languages, frameworks, databases, infrastructure, CI/CD)
- Calculates production-readiness scores (0-100%)
- Generates comprehensive reports (JSON, Markdown, CSV)
- Provides interactive file tree viewers
- Tracks progress and allows resuming interrupted scans
What it doesn't do:
- Modify repositories (read-only)
- Commit changes
- Access private repos without proper token permissions
- Analyze code quality or performance (only structure and completeness)
---
Capabilities
1. Repository Scanning
What it scans:
- All repositories for a user or organization
- Public and private repositories (with appropriate token)
- Forked repositories (optional, can be filtered out)
Scan process:
- Fetches repository list from GitHub API
- For each repository:
- Retrieves complete file tree
- Analyzes file structure and contents
- Detects tech stack components
- Checks for best practices (README, tests, CI/CD, etc.)
- Calculates readiness score
- Generates comprehensive reports
Time estimates:
- Small org (1-10 repos): 1-2 minutes
- Medium org (10-50 repos): 5-15 minutes
- Large org (50-200 repos): 15-60 minutes
- Very large org (200+ repos): 1-3 hours
2. Tech Stack Detection
Languages detected:
- JavaScript, TypeScript
- Python, Go, Rust, Java
- PHP, Ruby, C#
- And more (via file patterns)
Frameworks detected:
- Frontend: React, Next.js, Vue, Angular, Svelte, Remix, Gatsby
- Backend: Express, Django, Flask, FastAPI, Rails, Spring, Laravel
- Full-stack: Next.js, Nuxt.js
Infrastructure detected:
- Docker, Kubernetes
- Terraform, Ansible, Pulumi
- Serverless, Vercel, Netlify
- CloudFormation
Databases detected:
- PostgreSQL, MySQL, MongoDB
- Redis, SQLite, DynamoDB
- Cassandra, Elasticsearch
CI/CD detected:
- GitHub Actions, GitLab CI
- CircleCI, Travis CI, Jenkins
- Azure Pipelines
3. Production Readiness Scoring
Each repository receives a score (0-100%) based on:
| Criteria | Weight | Description |
| README | 10% | Documentation exists |
| Lockfiles | 10% | Dependency versions locked |
| Env Templates | 10% | Environment configuration examples |
| Tests | 15% | Test files/frameworks present |
| CI/CD | 10% | Automated pipelines configured |
| Deployment Config | 15% | Docker/deploy configs present |
| No Secrets | 10% | No secrets committed to repo |
| Recent Activity | 10% | Commits within last 90 days |
Score interpretation:
- 80-100%: Production-ready, well-maintained
- 60-79%: Good, minor improvements needed
- 40-59%: Fair, several components missing
- 0-39%: Needs significant work
4. Report Generation
Report types:
- JSON: Machine-readable, complete data
- Markdown: Human-readable, formatted report
- CSV: Spreadsheet-compatible (optional)
Report contents:
- Executive summary (average scores, distribution)
- Top 5 production-ready repos
- Bottom 5 repos needing attention
- Complete repository table
- Detailed analysis for each repository
- Tech stack breakdown
- Missing components list
- Risk assessments
5. File Tree Viewers
Interactive HTML viewers that allow:
- Browse all repositories in one interface
- Expand/collapse file trees
- Search repositories
- View repository metadata
- See file sizes and structure
---
Features
Dashboard (Web Interface)
Location: https://githubxray.pro (after deployment)
Features:
- ✅ Start new scans with parameter selection
- ✅ View active scans and progress
- ✅ Browse recent reports
- ✅ Generate file tree viewers
- ✅ Save/load GitHub token
- ✅ Filter repositories (language, stars, forks, etc.)
- ✅ Configure scan options (concurrency, CSV export, cache)
Best for:
- Interactive use
- Visual progress tracking
- Quick parameter adjustments
- Non-technical users
CLI (Command Line)
Command: npm run start
Available commands:
scan - Analyze repositories
crawl - Generate file tree viewer
list - List repositories and root files
check - Verify token and rate limits
dashboard - Start web dashboard
Best for:
- Automation/scripting
- CI/CD integration
- Server deployments
- Batch processing
Caching
What's cached:
- Repository lists (1 hour TTL)
- File trees per repository (1 hour TTL)
Benefits:
- Faster subsequent scans
- Reduced API rate limit usage
- Offline capability for cached data
Disable when:
- You want fresh data
- Repositories have changed significantly
- Debugging cache issues
Progress Persistence
What's saved:
- Scan progress (completed repos, pending repos)
- Analysis results (as they complete)
- Scan configuration
Benefits:
- Resume interrupted scans
- No data loss on crashes
- Track long-running scans
Resume a scan:
npm run start scan --user --resume
Filtering Options
Available filters:
--skip-forks: Exclude forked repositories
--only-private: Only scan private repos
--only-public: Only scan public repos
--language : Filter by primary language
--min-stars : Minimum star count
Use cases:
Focus on original repos (skip forks)
Analyze only TypeScript projects
Find popular repos (min stars)
Separate private/public analysis
Concurrency Control
Default: 3 repositories processed in parallel
Options:
--concurrency 1: Sequential (safer for rate limits)
--concurrency 3: Default (balanced)
--concurrency 5: Fast (uses more API calls)
When to adjust:
Rate limit issues → use 1
Large orgs → use 5 (if rate limit allows)
Small orgs → default is fine
---
Access Methods
1. Web Dashboard
After deployment:
URL: https://githubxray.pro
Full-featured web interface
All capabilities available
Token can be saved for convenience
Features accessible:
All scan options
Report viewing
File tree viewer generation
Progress tracking
2. CLI Commands
Available from terminal/SSH:
# Scan repositories
npm run start scan --user [options]
Generate file tree viewer
npm run start crawl --user
List repositories
npm run start list --user
Check token
npm run start check
Start dashboard (if not deployed)
npm run start dashboard
Both methods are fully capable - choose based on your preference and use case.
---
Constraints & Limits
GitHub API Rate Limits
Authenticated requests:
Limit: 5,000 requests/hour
Reset: Every hour
Impact: Large scans may hit limits
Unauthenticated requests:
Limit: 60 requests/hour
Not recommended for scanning
Rate limit handling:
Tool checks limits before requests
Pauses if limit approached
Shows remaining requests in check command
Strategies:
Use authenticated token (required)
Reduce concurrency for large orgs
Scan during off-peak hours
Use caching to reduce API calls
File Size Limits
Secret detection:
Only checks files < 10KB
Samples first 50 lines
Larger files skipped
Reason: Performance and API efficiency
Repository Size
Very large repositories:
May take longer to analyze
File tree retrieval can be slow
Consider filtering by size if needed
Empty repositories:
Marked as "unknown" type
Score: 0%
Minimal analysis performed
Network & Timeouts
Default timeouts:
API requests: 30 seconds
File downloads: 30 seconds
Long scans:
Progress is saved periodically
Can resume if interrupted
No data loss on network issues
Storage
Generated files:
Reports: ~1-5MB per scan (depends on repo count)
Cache: ~10-50MB (depends on repos analyzed)
Progress: <1MB per active scan
Cleanup:
Old reports can be deleted manually
Cache auto-expires after 1 hour
Progress files deleted after completion
---
Expected Behavior
During a Scan
What you'll see:
Repository fetching (with pagination)
Filtering message (if filters applied)
Progress indicators: [1/92] Analyzing repo-name...
Results: ✓ Score: 85% | Type: backend | Stack: TypeScript
Report generation
Summary table
Normal behavior:
Some repos may show warnings (empty, branch issues)
Errors for inaccessible repos (continues with others)
Progress saved every few repos
Terminal output shows real-time progress
Dashboard behavior:
Scan runs in background
Terminal shows progress
Dashboard shows scan ID
Active scans section updates
After a Scan
Generated files:
report-.json - Complete data
report-.md - Human-readable
report-.csv - If CSV option enabled
Report locations:
Local: ./reports/ directory
Dashboard: Accessible via web interface
Docker: Persisted in volume
Error Handling
Graceful failures:
Individual repo errors don't stop scan
Error repos marked with 0% score
Error message in risks section
Scan continues with remaining repos
Common errors:
"Branch not found" → Empty or unusual branch structure
"Rate limit exceeded" → Wait or reduce concurrency
"Not found" → Check username/org name
"Permission denied" → Token lacks required scopes
---
Best Practices
1. Token Management
✅ DO:
Use environment variables or saved token file
Use token with repo scope for private repos
Rotate tokens periodically
Use separate token for production
❌ DON'T:
Commit tokens to git
Share tokens publicly
Use tokens with excessive permissions
2. Scanning Large Organizations
✅ DO:
Start with filters (language, stars) to test
Use caching for repeated scans
Run during off-peak hours
Monitor rate limits (npm run start check)
Use lower concurrency (1-2) for very large orgs
❌ DON'T:
Scan 1000+ repos without filters
Use max concurrency on first scan
Ignore rate limit warnings
3. Report Management
✅ DO:
Keep reports for comparison
Export to CSV for analysis
Archive old reports periodically
Use timestamps to track changes
❌ DON'T:
Delete reports immediately (keep for history)
Rely solely on cached data for critical decisions
4. Performance Optimization
✅ DO:
Enable caching for repeated scans
Use appropriate concurrency (3 is good default)
Filter repositories when possible
Clear cache if data seems stale
❌ DON'T:
Disable cache unnecessarily
Use concurrency > 5 (hits rate limits)
Scan everything when you only need a subset
5. Production Deployment
✅ DO:
Use Docker for easy deployment
Set up monitoring (PM2/systemd)
Configure SSL/HTTPS
Regular backups of reports directory
Monitor disk space
❌ DON'T:
Expose dashboard without authentication (if sensitive)
Run without process manager (PM2/systemd)
Ignore security headers
Store tokens in code
6. Troubleshooting Workflow
When scans fail:
Check token: npm run start check
Verify username/org name
Check rate limits
Review error messages in terminal
Try with --no-cache for fresh data
Reduce concurrency to 1
Check logs (PM2/systemd)
When dashboard doesn't work:
Check container is running: docker ps
View logs: docker-compose logs github-xray
Verify Traefik routing
Check DNS resolution
Test localhost: curl http://localhost:3000
---
Troubleshooting
"Rate limit exceeded"
Symptoms: Scan stops, error message
Solutions:
Wait for rate limit reset (check with npm run start check)
Reduce concurrency: --concurrency 1
Use caching to reduce API calls
Scan in smaller batches
"User or organization not found"
Symptoms: Error immediately after starting scan
Solutions:
Verify username/org name spelling
Check if account is private (token needs access)
Verify token has appropriate scopes
Try accessing the account via GitHub web UI
"Analysis failed for repository"
Symptoms: Individual repos show 0% score with error
Solutions:
Repository might be empty
Branch structure unusual
Access permissions issue
This is normal - scan continues with other repos
Dashboard not accessible
Symptoms: 502/503 errors, connection refused
Solutions:
Check container status: docker ps | grep github-xray
View logs: docker-compose logs github-xray
Verify Traefik labels in docker-compose.yml
Check network: docker network inspect kratombans-network
Test container directly: docker exec -it github-xray curl localhost:3000
Slow scans
Symptoms: Scan takes very long
Solutions:
Normal for large orgs (100+ repos)
Enable caching for faster repeats
Use filters to reduce repo count
Increase concurrency (if rate limit allows)
Check network connection
Duplicate entries in reports
Symptoms: Same repo appears multiple times
Solutions:
This was a bug (now fixed)
Rebuild/restart if you see this
Clear cache and rescan
Token not saving
Symptoms: Token prompt every time
Solutions:
Check file permissions: chmod 600 .github-xray-token
Verify file exists: ls -la .github-xray-token
Check disk space
Try saving via dashboard again
---
Operational Recommendations
Daily Operations
Monitor rate limits: Run check command regularly
Review new reports: Check for score changes
Archive old reports: Keep last 10-20 scans
Monitor disk space: Reports and cache can grow
Weekly Operations
Clear old cache: Delete .cache/ if stale
Review scan patterns: Optimize filters if needed
Check for updates: Update dependencies if available
Backup reports: Export important scans
Monthly Operations
Rotate tokens: Generate new GitHub token
Review configuration: Update defaults if needed
Clean up storage: Remove very old reports
Performance review: Analyze scan times
For Large Organizations (200+ repos)
Use filters: Start with language or star filters
Schedule scans: Run during off-peak hours
Incremental scans: Scan subsets, then combine reports
Monitor closely: Watch rate limits and progress
Consider batching: Split into multiple smaller scans
---
Advanced Usage
Automated Scans
Cron job example:
# Daily scan at 2 AM
0 2 * cd /path/to/github-xray && npm run start scan --user org-name --include-csv >> /var/log/github-xray.log 2>&1
CI/CD Integration
GitHub Actions example:
- name: Scan repositories
run: |
npm install
npm run build
npm run start scan --user ${{ github.repository_owner }} --include-csv
Custom Scoring
Edit src/types.ts to modify DEFAULT_CRITERIA_WEIGHTS:
export const DEFAULT_CRITERIA_WEIGHTS: ReadinessCriteria = {
clearPurpose: 15, // Increased from 10
hasReadme: 15, // Increased from 10
// ... adjust as needed
};
Extending Detection
Add new tech stack detection in src/analyzer.ts:
if (path.includes('your-pattern')) {
languages.push('Your Language');
frameworks.push('Your Framework');
}
---
Support & Maintenance
Logs Location
PM2: ./logs/pm2-*.log
Docker: docker-compose logs github-xray
Systemd: journalctl -u github-xray
Updating
Docker:
git pull
docker-compose up -d --build
Direct VM:
git pull
npm install --production
npm run build
pm2 restart github-xray
Backup Strategy
What to backup:
reports/ directory (all generated reports)
.github-xray-token (securely)
Configuration files (if customized)
Backup frequency:
Reports: Weekly (or after important scans)
Token: When rotated
Config: When changed
---
Quick Reference
Common Commands
# Scan all repos
npm run start scan --user
Scan with filters
npm run start scan --user --language TypeScript --min-stars 10
Generate file tree viewer
npm run start crawl --user
Check token
npm run start check
Start dashboard
npm run start dashboard
Docker management
docker-compose up -d
docker-compose logs -f github-xray
docker-compose restart github-xray
File Locations
Reports: ./reports/
Cache: ./.cache/
Progress: ./.progress/
Token: ./.github-xray-token
Logs: ./logs/ (PM2) or Docker logs
URLs
Dashboard: https://githubxray.pro
Reports: https://githubxray.pro/reports/report-*.json
Viewers: https://githubxray.pro/reports/repo-viewer-*.html
---
Version Information
Current Version: 1.0.0
Node.js Required: 18.0.0+
TypeScript: 5.3.3+
Last Updated: 2026-01-18
---
Additional Resources
GitHub REST API Docs
GitHub Personal Access Tokens
Traefik Documentation
Docker Documentation
---
For issues or questions, check the troubleshooting section or review the logs.