← Back to Dashboard

📖 GitHub Xray - Complete Owner's Manual

Overview

Capabilities

Features

---

Overview

GitHub Xray is a comprehensive repository intelligence and production readiness analysis tool. It scans GitHub repositories (user or organization), analyzes their tech stack, structure, and completeness, then generates detailed reports with production-readiness scores.

What it does:

Fetches all repositories for a GitHub user or organization
Analyzes each repository's file structure and contents
Detects tech stack (languages, frameworks, databases, infrastructure, CI/CD)
Calculates production-readiness scores (0-100%)
Generates comprehensive reports (JSON, Markdown, CSV)
Provides interactive file tree viewers
Tracks progress and allows resuming interrupted scans

What it doesn't do:

Modify repositories (read-only)
Commit changes
Access private repos without proper token permissions
Analyze code quality or performance (only structure and completeness)

---

Capabilities

1. Repository Scanning

What it scans:

All repositories for a user or organization
Public and private repositories (with appropriate token)
Forked repositories (optional, can be filtered out)

Scan process:

Fetches repository list from GitHub API
For each repository:
Generates comprehensive reports

Time estimates:

Small org (1-10 repos): 1-2 minutes
Medium org (10-50 repos): 5-15 minutes
Large org (50-200 repos): 15-60 minutes
Very large org (200+ repos): 1-3 hours

2. Tech Stack Detection

Languages detected:

JavaScript, TypeScript
Python, Go, Rust, Java
PHP, Ruby, C#
And more (via file patterns)

Frameworks detected:

Frontend: React, Next.js, Vue, Angular, Svelte, Remix, Gatsby
Backend: Express, Django, Flask, FastAPI, Rails, Spring, Laravel
Full-stack: Next.js, Nuxt.js

Infrastructure detected:

Docker, Kubernetes
Terraform, Ansible, Pulumi
Serverless, Vercel, Netlify
CloudFormation

Databases detected:

PostgreSQL, MySQL, MongoDB
Redis, SQLite, DynamoDB
Cassandra, Elasticsearch

CI/CD detected:

GitHub Actions, GitLab CI
CircleCI, Travis CI, Jenkins
Azure Pipelines

3. Production Readiness Scoring

Each repository receives a score (0-100%) based on:

Criteria	Weight	Description
README	10%	Documentation exists
Lockfiles	10%	Dependency versions locked
Env Templates	10%	Environment configuration examples
Tests	15%	Test files/frameworks present
CI/CD	10%	Automated pipelines configured
Deployment Config	15%	Docker/deploy configs present
No Secrets	10%	No secrets committed to repo
Recent Activity	10%	Commits within last 90 days

Score interpretation:

80-100%: Production-ready, well-maintained
60-79%: Good, minor improvements needed
40-59%: Fair, several components missing
0-39%: Needs significant work

4. Report Generation

Report types:

JSON: Machine-readable, complete data
Markdown: Human-readable, formatted report
CSV: Spreadsheet-compatible (optional)

Report contents:

Executive summary (average scores, distribution)
Top 5 production-ready repos
Bottom 5 repos needing attention
Complete repository table
Detailed analysis for each repository
Tech stack breakdown
Missing components list
Risk assessments

5. File Tree Viewers

Interactive HTML viewers that allow:

Browse all repositories in one interface
Expand/collapse file trees
Search repositories
View repository metadata
See file sizes and structure

---

Features

Dashboard (Web Interface)

Location:

https://githubxray.pro

Features:

✅ Start new scans with parameter selection
✅ View active scans and progress
✅ Browse recent reports
✅ Generate file tree viewers
✅ Save/load GitHub token
✅ Filter repositories (language, stars, forks, etc.)
✅ Configure scan options (concurrency, CSV export, cache)

Best for:

Interactive use
Visual progress tracking
Quick parameter adjustments
Non-technical users

CLI (Command Line)

Command:

npm run start

Available commands:

scan - Analyze repositories
crawl - Generate file tree viewer
list - List repositories and root files
check - Verify token and rate limits
dashboard - Start web dashboard

Best for:

Automation/scripting
CI/CD integration
Server deployments
Batch processing

Caching

What's cached:

Repository lists (1 hour TTL)
File trees per repository (1 hour TTL)

Benefits:

Faster subsequent scans
Reduced API rate limit usage
Offline capability for cached data

Disable when:

You want fresh data
Repositories have changed significantly
Debugging cache issues

Progress Persistence

What's saved:

Scan progress (completed repos, pending repos)
Analysis results (as they complete)
Scan configuration

Benefits:

Resume interrupted scans
No data loss on crashes
Track long-running scans

Resume a scan:

npm run start scan --user  --resume

Filtering Options

Available filters:

--skip-forks: Exclude forked repositories
--only-private: Only scan private repos
--only-public: Only scan public repos
--language : Filter by primary language
--min-stars : Minimum star count

Use cases:

Focus on original repos (skip forks)
Analyze only TypeScript projects
Find popular repos (min stars)
Separate private/public analysis

Concurrency Control

Default:

Options:

--concurrency 1: Sequential (safer for rate limits)
--concurrency 3: Default (balanced)
--concurrency 5: Fast (uses more API calls)

When to adjust:

Rate limit issues → use 1
Large orgs → use 5 (if rate limit allows)
Small orgs → default is fine

---

Access Methods

1. Web Dashboard

After deployment:

URL: https://githubxray.pro
Full-featured web interface
All capabilities available
Token can be saved for convenience

Features accessible:

All scan options
Report viewing
File tree viewer generation
Progress tracking

2. CLI Commands

Available from terminal/SSH:

# Scan repositories
npm run start scan --user  [options]
Generate file tree viewer
npm run start crawl --user 
List repositories
npm run start list --user 
Check token
npm run start check
Start dashboard (if not deployed)
npm run start dashboard

Both methods are fully capable

---

Constraints & Limits

GitHub API Rate Limits

Authenticated requests:

Limit: 5,000 requests/hour
Reset: Every hour
Impact: Large scans may hit limits

Unauthenticated requests:

Limit: 60 requests/hour
Not recommended for scanning

Rate limit handling:

Tool checks limits before requests
Pauses if limit approached
Shows remaining requests in check command

Strategies:

Use authenticated token (required)
Reduce concurrency for large orgs
Scan during off-peak hours
Use caching to reduce API calls

File Size Limits

Secret detection:

Only checks files < 10KB
Samples first 50 lines
Larger files skipped

Reason:

Repository Size

Very large repositories:

May take longer to analyze
File tree retrieval can be slow
Consider filtering by size if needed

Empty repositories:

Marked as "unknown" type
Score: 0%
Minimal analysis performed

Network & Timeouts

Default timeouts:

API requests: 30 seconds
File downloads: 30 seconds

Long scans:

Progress is saved periodically
Can resume if interrupted
No data loss on network issues

Storage

Generated files:

Reports: ~1-5MB per scan (depends on repo count)
Cache: ~10-50MB (depends on repos analyzed)
Progress: <1MB per active scan

Cleanup:

Old reports can be deleted manually
Cache auto-expires after 1 hour
Progress files deleted after completion

---

Expected Behavior

During a Scan

What you'll see:

Repository fetching (with pagination)
Filtering message (if filters applied)
Progress indicators: [1/92] Analyzing repo-name...
Results: ✓ Score: 85% | Type: backend | Stack: TypeScript
Report generation
Summary table

Normal behavior:

Some repos may show warnings (empty, branch issues)
Errors for inaccessible repos (continues with others)
Progress saved every few repos
Terminal output shows real-time progress

Dashboard behavior:

Scan runs in background
Terminal shows progress
Dashboard shows scan ID
Active scans section updates

After a Scan

Generated files:

report-.json - Complete data
report-.md - Human-readable
report-.csv - If CSV option enabled

Report locations:

Local: ./reports/ directory
Dashboard: Accessible via web interface
Docker: Persisted in volume

Error Handling

Graceful failures:

Individual repo errors don't stop scan
Error repos marked with 0% score
Error message in risks section
Scan continues with remaining repos

Common errors:

"Branch not found" → Empty or unusual branch structure
"Rate limit exceeded" → Wait or reduce concurrency
"Not found" → Check username/org name
"Permission denied" → Token lacks required scopes

---

Best Practices

1. Token Management

✅ DO:

Use environment variables or saved token file
Use token with repo scope for private repos
Rotate tokens periodically
Use separate token for production

❌ DON'T:

Commit tokens to git
Share tokens publicly
Use tokens with excessive permissions

2. Scanning Large Organizations

✅ DO:

Start with filters (language, stars) to test
Use caching for repeated scans
Run during off-peak hours
Monitor rate limits (npm run start check)
Use lower concurrency (1-2) for very large orgs

❌ DON'T:

Scan 1000+ repos without filters
Use max concurrency on first scan
Ignore rate limit warnings

3. Report Management

✅ DO:

Keep reports for comparison
Export to CSV for analysis
Archive old reports periodically
Use timestamps to track changes

❌ DON'T:

Delete reports immediately (keep for history)
Rely solely on cached data for critical decisions

4. Performance Optimization

✅ DO:

Enable caching for repeated scans
Use appropriate concurrency (3 is good default)
Filter repositories when possible
Clear cache if data seems stale

❌ DON'T:

Disable cache unnecessarily
Use concurrency > 5 (hits rate limits)
Scan everything when you only need a subset

5. Production Deployment

✅ DO:

Use Docker for easy deployment
Set up monitoring (PM2/systemd)
Configure SSL/HTTPS
Regular backups of reports directory
Monitor disk space

❌ DON'T:

Expose dashboard without authentication (if sensitive)
Run without process manager (PM2/systemd)
Ignore security headers
Store tokens in code

6. Troubleshooting Workflow

When scans fail:

Check token: npm run start check
Verify username/org name
Check rate limits
Review error messages in terminal
Try with --no-cache for fresh data
Reduce concurrency to 1
Check logs (PM2/systemd)

When dashboard doesn't work:

Check container is running: docker ps
View logs: docker-compose logs github-xray
Verify Traefik routing
Check DNS resolution
Test localhost: curl http://localhost:3000

Solutions:

Check file permissions: chmod 600 .github-xray-token
Verify file exists: ls -la .github-xray-token
Check disk space
Try saving via dashboard again

---

Operational Recommendations

Daily Operations

Monitor rate limits: Run check command regularly
Review new reports: Check for score changes
Archive old reports: Keep last 10-20 scans
Monitor disk space: Reports and cache can grow

Weekly Operations

Clear old cache: Delete .cache/ if stale
Review scan patterns: Optimize filters if needed
Check for updates: Update dependencies if available
Backup reports: Export important scans

Monthly Operations

Rotate tokens: Generate new GitHub token
Review configuration: Update defaults if needed
Clean up storage: Remove very old reports
Performance review: Analyze scan times

For Large Organizations (200+ repos)

Use filters: Start with language or star filters
Schedule scans: Run during off-peak hours
Incremental scans: Scan subsets, then combine reports
Monitor closely: Watch rate limits and progress
Consider batching: Split into multiple smaller scans

---

Advanced Usage

Automated Scans

Cron job example:

# Daily scan at 2 AM
0 2   * cd /path/to/github-xray && npm run start scan --user org-name --include-csv >> /var/log/github-xray.log 2>&1

CI/CD Integration

GitHub Actions example:

- name: Scan repositories
  run: |
    npm install
    npm run build
    npm run start scan --user ${{ github.repository_owner }} --include-csv

Custom Scoring

Edit src/types.ts to modify DEFAULT_CRITERIA_WEIGHTS:

export const DEFAULT_CRITERIA_WEIGHTS: ReadinessCriteria = {
  clearPurpose: 15,  // Increased from 10
  hasReadme: 15,     // Increased from 10
  // ... adjust as needed
};

Extending Detection

Add new tech stack detection in src/analyzer.ts:

if (path.includes('your-pattern')) {
  languages.push('Your Language');
  frameworks.push('Your Framework');
}

---

Support & Maintenance

Logs Location

PM2: ./logs/pm2-*.log
Docker: docker-compose logs github-xray
Systemd: journalctl -u github-xray

Updating

Docker:

git pull
docker-compose up -d --build

Direct VM:

git pull
npm install --production
npm run build
pm2 restart github-xray

Backup Strategy

What to backup:

reports/ directory (all generated reports)
.github-xray-token (securely)
Configuration files (if customized)

Backup frequency:

Reports: Weekly (or after important scans)
Token: When rotated
Config: When changed

---

Quick Reference

Common Commands

# Scan all repos
npm run start scan --user 
Scan with filters
npm run start scan --user  --language TypeScript --min-stars 10
Generate file tree viewer
npm run start crawl --user 
Check token
npm run start check
Start dashboard
npm run start dashboard
Docker management
docker-compose up -d
docker-compose logs -f github-xray
docker-compose restart github-xray

File Locations

Reports: ./reports/
Cache: ./.cache/
Progress: ./.progress/
Token: ./.github-xray-token
Logs: ./logs/ (PM2) or Docker logs

URLs

Dashboard: https://githubxray.pro
Reports: https://githubxray.pro/reports/report-*.json
Viewers: https://githubxray.pro/reports/repo-viewer-*.html

---

Version Information

Current Version: 1.0.0
Node.js Required: 18.0.0+
TypeScript: 5.3.3+
Last Updated: 2026-01-18

---

Additional Resources

GitHub REST API Docs
GitHub Personal Access Tokens
Traefik Documentation
Docker Documentation

---

For issues or questions, check the troubleshooting section or review the logs.

📖 GitHub Xray - Complete Owner's Manual

Table of Contents

Overview

Capabilities

1. Repository Scanning

2. Tech Stack Detection

3. Production Readiness Scoring

4. Report Generation

5. File Tree Viewers

Features

Dashboard (Web Interface)

CLI (Command Line)

Caching

Progress Persistence

Filtering Options

Concurrency Control

Access Methods

1. Web Dashboard

2. CLI Commands

Generate file tree viewer

List repositories

Check token

Start dashboard (if not deployed)

Constraints & Limits

GitHub API Rate Limits

File Size Limits

Repository Size

Network & Timeouts

Storage

Expected Behavior

During a Scan

After a Scan

Error Handling

Best Practices

1. Token Management

2. Scanning Large Organizations

3. Report Management

4. Performance Optimization

5. Production Deployment

6. Troubleshooting Workflow

Troubleshooting

"Rate limit exceeded"

"User or organization not found"

"Analysis failed for repository"

Dashboard not accessible

Slow scans

Duplicate entries in reports

Token not saving

Operational Recommendations

Daily Operations

Weekly Operations

Monthly Operations

For Large Organizations (200+ repos)

Advanced Usage

Automated Scans

CI/CD Integration

Custom Scoring

Extending Detection

Support & Maintenance

Logs Location

Updating

Backup Strategy

Quick Reference

Common Commands

Scan with filters

Generate file tree viewer

Check token

Start dashboard

Docker management

File Locations

URLs

Version Information

Additional Resources