OrthoRoute/docs/tuning_guide.md
2025-11-04 10:43:36 -08:00

15 KiB
Raw Permalink Blame History

PathFinder Parameter Tuning Guide

This guide explains all the tunable parameters in OrthoRoute's PathFinder algorithm and how to adjust them for different board types and convergence issues.

Quick Reference Table

Parameter Default Range Impact
pres_fac_mult 1.10-1.35 1.05-2.0 How fast overused edges become expensive
pres_fac_max 6.0-16.0 4.0-64.0 Maximum pressure on overused edges
hist_gain 0.15-0.20 0.05-0.35 How fast history accumulates
hist_cost_weight 10.0-16.0 5.0-25.0 Relative importance of history vs present
via_cost 0.5-0.7 0.3-2.0 Penalty for using vias
max_iterations 40-100 20-200 When to give up
BARREL_PHASE_1_ITERS 10-50 0-100 When barrel penalties stop

Core PathFinder Parameters

1. Present Factor Multiplier (pres_fac_mult)

What it does: Controls how aggressively PathFinder escalates costs on overused edges each iteration.

Formula:

pres_fac(iter) = pres_fac_init × (pres_fac_mult)^(iter-1)

Example with mult=1.35:
Iter 1: pres_fac = 1.0
Iter 5: pres_fac = 3.3
Iter 10: pres_fac = 18.9
Iter 15: pres_fac = 107.9  (hits max)

How to tune:

Value Board Type Effect
1.05-1.10 Sparse (ρ < 0.5) Gentle, slow convergence
1.15-1.25 Normal (ρ 0.5-0.8) Balanced
1.30-1.40 Tight (ρ 0.8-1.0) Aggressive, forces rerouting
1.50-2.00 Over-congested (ρ > 1.0) Very aggressive (may oscillate)

When to adjust:

  • Diverging overuse? Increase (try 1.35-1.50)
  • Oscillating? Decrease (try 1.10-1.15)
  • Slow convergence? Increase slightly

2. Present Factor Maximum (pres_fac_max)

What it does: Caps how high pres_fac can grow, preventing infinite costs.

Default: 6.0-16.0 depending on board density

How to tune:

Value Use Case
4.0-6.0 Sparse boards - don't need extreme pressure
8.0-16.0 Tight boards - need strong rerouting signals
32.0-64.0 Desperate - board barely routable

Trade-off:

  • Too low: PathFinder gives up early, accepts overuse
  • Too high: Routes become extreme detours, slow convergence

Rule of thumb: Set to pres_fac_mult^N where N = expected iterations to convergence


3. History Gain (hist_gain)

What it does: Controls how fast PathFinder "remembers" overused edges across iterations.

Formula:

history[edge] += hist_gain × overuse[edge]  (each iteration)
total_cost[edge] = present[edge] + history[edge]

The balance problem:

  • Too HIGH (> 0.25): History dominates, fights present costs → oscillation
  • Too LOW (< 0.10): Present dominates, no learning → repeated mistakes
  • Just right (0.12-0.18): Balanced

How to tune:

Value Effect When to Use
0.05-0.10 Minimal memory Very responsive, but forgets quickly
0.12-0.18 Balanced Most boards
0.20-0.25 Strong memory Dense boards, but risks oscillation
0.30-0.35 Very strong Rarely needed, often causes problems

Diagnostic: Check hist/pres ratio in logs:

hist/pres ratio: 0.994  ← Good (history ~ present)
hist/pres ratio: 0.05   ← BAD: hist_gain too low!
hist/pres ratio: 10.0   ← BAD: hist_gain too high!

Target ratio: 0.5 - 2.0


4. History Cost Weight (hist_cost_weight)

What it does: Multiplies history before adding to total cost.

Formula:

total_cost = present + (hist_cost_weight × history)

Board-adaptive formula:

hist_cost_weight = 7.0                    # Base
                 + (1.0 per 4 layers)     # Layer bonus
                 + (6.0 × (ρ - 0.7))     # Congestion bonus

How to tune:

  • Higher weight (15-25): Emphasizes history, good for dense boards
  • Lower weight (5-10): Emphasizes present, good for sparse boards
  • Default (10-15): Usually fine, auto-tuned based on ρ

Rule: If hist/pres ratio is off, adjust hist_gain first, not weight.


Convergence Tuning

5. Maximum Iterations (max_iterations)

What it does: Stops routing after N iterations even if not converged.

Board-adaptive defaults:

ρ < 0.5:  40 iterations  (converges fast)
ρ 0.5-0.8: 60 iterations  (moderate)
ρ 0.8-1.0: 100 iterations (tight, needs time)
ρ > 1.0:  150+ iterations (may never converge)

How to tune:

  • Check when test board converges (e.g., 75 iterations)
  • Set max to 1.5× that value (safety margin)
  • For production: Lower to save time if partial routing acceptable

6. Stagnation Patience (stagnation_patience)

What it does: Stops early if overuse doesn't improve for N iterations.

Default: 5 iterations

Trade-off:

  • Too low (2-3): May quit before convergence
  • Too high (10+): Wastes time on hopeless boards

Example:

If overuse is: 1000 → 980 → 975 → 977 → 978 → 979
After 5 iterations without significant improvement, stop.

Via Parameters

7. Via Cost (via_cost)

What it does: Base penalty for using a via (layer change).

Default: 0.5-0.7 (half a grid step)

How to tune:

Value Effect
0.3-0.4 Encourage vias (more layer changes)
0.5-0.7 Balanced
0.8-1.5 Discourage vias (prefer planar routing)
2.0+ Strongly avoid vias (may cause congestion)

When to adjust:

  • Too many vias? Increase to 0.8-1.0
  • Routes not using layers? Decrease to 0.4-0.5
  • Via barrel conflicts high? Increase (makes vias less attractive)

Barrel Conflict Parameters

8. Barrel Phase 1 Duration (BARREL_PHASE_1_ITERS)

What it does: How many iterations to apply barrel conflict penalties before disabling them.

Location: unified_pathfinder.py line ~3757

How to tune:

Iterations Board Size Rationale
0 Any Disable barrel penalties entirely
5-10 Large (8k+ nets) Get to Phase 2 fast
20-30 Medium (2k-5k nets) Balanced approach
40-50 Small (< 1k nets) Can afford longer Phase 1

Phase 2 (penalties OFF) allows PathFinder to optimize without barrel constraint fighting convergence.

From today's experience:

  • Test board (512 nets): Phase 1 = 50 iterations worked perfectly
  • Large board (8,192 nets): Phase 1 = 10 iterations better (less divergence)

9. Barrel Conflict Penalty Strength

What it does: How much to penalize edges passing through other nets' via barrels.

Location: unified_pathfinder.py line ~3765

Formula:

conflict_penalty = min(multiplier × pres_fac, cap)

Current settings:

# Phase 1
multiplier = 5.0   # Scale with pres_fac
cap = 50.0         # Maximum penalty

How to tune:

Multiplier Cap Effect
2.0-5.0 30-50 Light (large boards)
10.0 100 Medium (default)
20.0 200+ Aggressive (small boards)

Trade-off:

  • Too strong: Forces divergence (routes avoid barrels at all costs)
  • Too weak: Doesn't prevent barrel conflicts

Diagnostic: If overuse grows in Phase 1, reduce multiplier/cap.


Advanced Tuning

10. Layer Bias Alpha (layer_bias alpha)

What it does: Adjusts costs to balance load across layers dynamically.

Location: _compute_layer_bias() - alpha parameter

Default: 0.88 (strong smoothing)

Range: 0.5-0.95

  • Lower (0.5-0.7): Quickly shifts routes to cooler layers
  • Higher (0.9-0.95): Gradual shifts, prevents whiplash

11. ROI Extraction Threshold

What it does: Distance threshold for using focused ROI vs full-graph routing.

Location: _route_all() line ~4470

Default: 125 steps (50mm @ 0.4mm grid)

ROI_THRESHOLD_STEPS = 125

How to tune:

  • Smaller (50-100): More nets use ROI = faster but may fail long routes
  • Larger (150-200): More nets use full graph = slower but more reliable

Diagnostic Tuning Workflow

Step 1: Check Congestion Ratio

ρ < 1.0 → Board is routable, tune for speed
ρ > 1.0 → Board is impossible, add layers FIRST

Step 2: Run 10-20 Iterations and Check Trend

If overuse is DECREASING: Parameters are good, let it run

If overuse is INCREASING: Too gentle - increase pres_fac_mult to 1.3-1.5

If overuse OSCILLATES (up/down/up/down): ⚠️ Balance issue:

  • Reduce hist_gain (try 0.12-0.15)
  • Reduce barrel penalty if in Phase 1

Step 3: Monitor Iteration Timing

With all GPU optimizations (from today):

Board Size Iter 1 Iter 2+ What's Slow?
512 nets 2 min 20 sec Normal
8,192 nets 100 min 2-3 min Iteration 1 is inherently slow

If iterations 2+ are slow (>10 min):

  • Check if GPU optimizations are active
  • Look for "BARREL-CONFLICT.*minutes" in logs
  • Ensure Python cache is cleared

Parameter Sets for Common Scenarios

Scenario 1: Sparse Board (ρ < 0.5)

Goal: Fast routing

pres_fac_mult = 1.15
pres_fac_max = 6.0
hist_gain = 0.12
max_iterations = 40
BARREL_PHASE_1_ITERS = 30

Scenario 2: Normal Board (ρ 0.5-0.8)

Goal: Reliable convergence

pres_fac_mult = 1.20
pres_fac_max = 10.0
hist_gain = 0.15
max_iterations = 60
BARREL_PHASE_1_ITERS = 30

Scenario 3: Tight Board (ρ 0.8-1.0) ← YOUR BOARD

Goal: Force convergence without divergence

pres_fac_mult = 1.35        # Aggressive!
pres_fac_max = 16.0         # High ceiling
hist_gain = 0.15            # Balanced
max_iterations = 100
BARREL_PHASE_1_ITERS = 10   # Short Phase 1
barrel_penalty_mult = 5.0   # Light penalties
barrel_penalty_cap = 50.0

Scenario 4: Over-Congested (ρ > 1.0)

Goal: Add layers first, then tune

⚠️ STOP! Add layers until ρ < 1.0
No amount of parameter tuning will make impossible boards routable.

Two-Phase Barrel Conflict Strategy

Phase 1: Apply barrel conflict penalties

  • Reduces barrel conflicts quickly
  • May increase overall congestion temporarily
  • Keep short (10-20 iterations for large boards)

Phase 2: Remove penalties, let PathFinder optimize

  • Converges to minimal overuse
  • Barrel conflicts may increase slightly but stay manageable
  • Where actual convergence happens

Example from test board (512 nets):

Phase 1 (iters 1-50):
  Barrel conflicts: 2,611 → 290 (89% reduction!)

Phase 2 (iters 51-75):
  Barrel conflicts: stable at ~300
  Overall overuse: 25,000 → 0 (CONVERGED!)

Example from large board (8,192 nets):

Phase 1 (iters 1-10): Short to avoid divergence
Phase 2 (iters 11+): Where convergence should happen

Advanced: History vs Present Balance

The hist/pres ratio in logs shows the balance:

[HISTORY-DEBUG] Iter 10:
  hist/pres ratio: 0.994  ← GOOD

Target: 0.5 - 2.0

If ratio < 0.1 (history too weak):

  • Increase hist_gain (try +0.05)
  • OR increase hist_cost_weight (try +3.0)

If ratio > 5.0 (history too strong):

  • Decrease hist_gain (try -0.05)
  • OR decrease hist_cost_weight (try -3.0)

If ratio oscillates wildly:

  • History and present are fighting
  • Reduce hist_gain to 0.10-0.12
  • May need to reduce pres_fac_mult slightly

GPU vs CPU Performance Tuning

Memory vs Speed Trade-offs

22 layers on 8,192-net board:

  • 181M edges
  • ~10 GB GPU memory needed
  • If GPU OOM: Reduce layers or use CPU fallback

CPU fallback implications:

  • Routing still works
  • ~2-3× slower
  • Via pooling penalties disabled (minor)

When GPU Memory is Tight

  1. Reduce layers (20 → 18 saves ~2 GB)
  2. Disable via pooling (already done for >100M edges)
  3. Use smaller grid pitch (0.4mm → 0.5mm = fewer nodes)

Common Issues and Fixes

Issue: Overuse Growing Each Iteration

Symptoms:

Iter 1: 2.4M overuse
Iter 5: 4.4M overuse (getting worse!)

Fixes:

  1. Increase pres_fac_mult to 1.35-1.50
  2. Reduce barrel penalty if in Phase 1
  3. Check if ρ > 1.0 (may be impossible to route)

Issue: Oscillation (Overuse Ping-Pongs)

Symptoms:

Iter 10: 5,000
Iter 11: 2,000
Iter 12: 6,000
Iter 13: 1,500
(never settles)

Fixes:

  1. Reduce hist_gain to 0.10-0.12
  2. Reduce pres_fac_mult to 1.15-1.20
  3. Check if in Phase 1 - may need to reach Phase 2

Issue: Slow Iterations (>10 min for iter 2+)

Symptoms:

  • Iteration 2+ taking 30-60 minutes
  • Should be 2-5 minutes with GPU

Fixes:

  1. Clear Python cache: find . -name "*.pyc" -delete
  2. Check GPU optimizations are active (look for "GPU" in logs)
  3. Restart Python (clears memory pools)
  4. Check barrel conflict timing - should be <1 second

Testing Your Changes

After tuning parameters:

  1. Test on small board first (512 nets, 18 layers)

    • Should converge in < 2 hours
    • Validates parameters work
  2. Then try large board (8,192 nets)

    • Expect ~4-8 hours
    • Monitor first 10 iterations
  3. Check convergence trend:

grep "\[ITER.*\] routed" logs/latest.log

Look for decreasing overuse:

Iter 1: 2,451,307
Iter 5: 1,800,000 ✓ (decreasing = good!)
Iter 10: 950,000 ✓

Emergency: Board Won't Converge

If you've tried everything and it still won't converge:

Option 1: Check Physical Constraints

grep "Congestion ratio" logs/latest.log

If ρ > 1.0: Add more layers (not a parameter problem!)

Option 2: Disable Barrel Penalties Entirely

BARREL_PHASE_1_ITERS = 0  # Skip Phase 1 completely

This may leave barrel conflicts but allows other routing to converge.

Option 3: Reduce Grid Pitch

Coarser grid = fewer nodes = easier routing:

grid_pitch: 0.4mm → 0.5mm or 0.6mm

Trade-off: Less routing precision.


File Locations for Parameter Changes

Automatic (board-adaptive):

  • orthoroute/algorithms/manhattan/parameter_derivation.py
    • Lines 70-95: pres_fac_mult by congestion
    • Lines 128-130: hist_gain
    • Lines 120-126: hist_cost_weight

Manual overrides:

  • orthoroute/algorithms/manhattan/unified_pathfinder.py
    • Line 3757: BARREL_PHASE_1_ITERS
    • Line 3765: Barrel penalty formula

Config file (future):

  • Could add to orthoroute.json for per-project tuning

References

  • McMurchie & Ebeling, "PathFinder: A Negotiation-Based Performance-Driven Router for FPGAs" (1995)
  • Betz & Rose, "VPR: A New Packing, Placement and Routing Tool for FPGA Research" (1997)
  • OrthoRoute extends these FPGA concepts to heterogeneous PCB routing

Pro Tip: Document your parameter changes in git commits. It's easy to forget what you tried!

Example commit message:

Tune for 8k-net board convergence

- pres_fac_mult: 1.10 → 1.35 (aggressive rerouting)
- hist_gain: 0.20 → 0.15 (reduce history dominance)
- BARREL_PHASE_1_ITERS: 50 → 10 (faster to Phase 2)

Board: ρ=0.915, 22 layers, 8,192 nets
Expected: Convergence in ~50-75 iterations