Every trail runner has seen this. You finish a hilly long run — 90 minutes, a good climb, a long descent, legs reasonably tired but not destroyed — and you upload the file. Strava tells you it was 280 TSS. TrainingPeaks says 310. Garmin Connect shows 295. You look at the number, knowing perfectly well that this was a normal long run and not a 300-TSS effort, and you wonder if something is broken. The short answer is yes, something is broken — not in your watch, but in the model that every major app uses to convert hilly pace into training load. The issue has been documented, complained about, and quietly ignored for more than a decade, and it continues to produce inflated numbers on every hilly run for every runner using any of the major platforms.
This guide is the practical, evidence-based version of what's wrong with running TSS on hilly terrain. It will explain what Normalized Graded Pace actually does, why the grade-adjustment curves systematically over-count downhill running, why the issue doesn't appear on flat runs but dominates hilly ones, what alternative metrics (heart-rate TSS, power TSS from Stryd, subjective effort) do better, and how to read your training log honestly when your mountain long runs are producing TSS numbers 40 percent higher than they should be. The topic is technical but the implications are practical: athletes who trust rTSS on hilly runs consistently under-recover and over-train because the tracking metric is telling them they did more work than they actually did.
How is running TSS actually calculated?
Running TSS (rTSS) was introduced by Joe Friel and the TrainingPeaks team as an adaptation of cycling TSS (which had been introduced by Andrew Coggan a few years earlier). The formula is structurally the same: TSS = (duration × IF²) × 100, where IF is the ratio of Normalized Power (or its running equivalent) to threshold power (or its running equivalent). For cycling this works cleanly because power is a direct measurement — you put out X watts, it doesn't matter whether the road is going up or down, your power meter reports it accurately.
For running, there is no direct power measurement on most watches and most athletes. The workaround is to use pace as a proxy for power, which works fine on flat ground but breaks down on hills because the same pace on a climb is much harder than the same pace on a descent. The solution is Normalized Graded Pace (NGP), a model that translates uphill and downhill pace into an equivalent flat-ground pace, supposedly producing a single metric that reflects the cardiovascular cost of the effort regardless of terrain.
The math of NGP relies on a grade-adjustment curve that says 'running at pace X at grade Y is metabolically equivalent to running at pace Z on flat ground'. The curve is mostly based on Alberto Minetti's 2002 paper in the Journal of Applied Physiology, which measured the metabolic cost of running on treadmill gradients from -45 percent to +45 percent. Minetti's curve is actually quite good for uphill running — the cost rises steeply with positive grade, and the NGP model captures that reasonably well. The problem is the downhill half of the curve, where Minetti's numbers and the real-world cost of downhill running diverge in important ways.
What does the Minetti curve actually say about downhill running?
Minetti's 2002 study, which remains the most-cited source on the metabolic cost of graded running, found that the oxygen cost of running is minimized at around -10 percent grade (a modest downhill) and rises as the grade becomes more negative than -10 percent or more positive than zero. That is: running slightly downhill is metabolically cheaper than running on flat ground, and running steeply downhill becomes metabolically more expensive again because the body has to brake against gravity with every stride.
This finding is correct in a narrow metabolic sense — oxygen consumption per kilometer of forward progress really does drop to a minimum around -10 percent grade. But it's also misleading if you apply it directly to NGP calculation, for two reasons. First, the cost of downhill running is not just metabolic; it's also mechanical and neuromuscular. Braking forces are high, muscle damage is much higher than flat ground, and the eccentric muscle loading has recovery consequences that oxygen cost doesn't capture. Second, the relationship between oxygen consumption and training load is not fixed — a 10 percent downhill at 4:00/km produces different cardiovascular demand than a 10 percent downhill at 5:30/km, because at the faster pace gravity is doing more of the propulsion work. Minetti's curve doesn't distinguish these two cases as cleanly as the NGP implementations assume.
The practical consequence is that when NGP calculations apply the Minetti curve to real-world downhill running, they often credit the athlete with a much faster equivalent pace than the actual metabolic cost justifies. A runner descending at 4:00/km on a 10 percent grade might be credited with an NGP of 3:30/km or even faster, which then gets fed into the rTSS formula and produces an inflated intensity factor. On a long descent, this compounds — ten minutes of 'fast' NGP contributes to the Normalized Graded Pace in a way that multiplies into high rTSS without reflecting actual training stress.
The underlying issue is that running downhill fast is not the same type of work as running uphill fast, and a single pace-equivalent model can't fully capture the difference. NGP approximates it well enough for moderate terrain and makes systematic errors on steep or sustained descents.
Why does this show up as inflated TSS?
Once NGP over-credits the downhill segments of a run, the inflation compounds through the rTSS formula in two ways. First, the Normalized Graded Pace for the whole run rises because the downhill segments get calculated as very fast flat-ground equivalent. Second, the Intensity Factor (NGP divided by threshold pace) rises accordingly, and because IF is squared in the TSS formula, any error in IF is amplified in the final TSS number.
A concrete example illustrates the magnitude of the issue. A runner with a threshold pace of 4:30/km does a 90-minute hilly long run: 45 minutes up at 5:30/km average and 45 minutes down at 4:00/km average. On flat ground, the overall pace would be about 4:45/km. But NGP applied to the downhill segments might credit the runner with an equivalent flat pace of around 3:50/km for the descents, and the 45-minute climb might be credited with an NGP of around 4:20/km. Blend the two and the overall NGP comes out around 4:05/km, giving an IF of 4:30/4:05 = 1.10. Plug that into the TSS formula: 1.5 hours × 1.10² × 100 = 181 rTSS.
The physiological reality of that run is closer to 110 to 130 rTSS. The runner's heart rate probably averaged around threshold for the climb and dropped significantly on the descent (because downhill running is cardiovascularly easier even when it's mechanically harder). The cumulative cardiovascular load was moderate, not extreme. But the rTSS number inflated by 40 to 65 percent because of how the NGP model reads downhill segments.
For longer, steeper runs the inflation gets worse. A mountain half marathon with 800 meters of climbing and 800 meters of descent can easily produce rTSS numbers of 300 to 400 in the major apps, while the honest training stress — the number that would actually inform your recovery planning — is closer to 150 to 200.
Does every app have the same bug, or do some handle hills correctly?
Most major platforms inherit the same underlying issue because they all use some variant of the Minetti curve or a similar metabolic-cost model to calculate grade-adjusted pace. The specific implementations differ slightly but the systematic direction of the error is the same.
- Strava uses Grade-Adjusted Pace (GAP) which is similar to NGP but used for pace display rather than load calculation. Strava's Fitness & Freshness feature uses a different load model (Suffer Score, derived from heart rate zones), which does not have the same downhill over-counting issue — but Strava's rTSS-like metrics in some third-party integrations do suffer from it.
- TrainingPeaks uses NGP directly for rTSS calculation and is the most-affected platform for serious athletes because TrainingPeaks' load tracking (CTL, ATL, TSB) is the reference framework most coaches use. Hilly runs produce inflated rTSS in TrainingPeaks, and the inflated load carries into the Performance Management Chart.
- Garmin Connect calculates a running Training Load that incorporates pace and grade, and it similarly over-counts hilly runs. Garmin's number is often lower than TrainingPeaks' for the same run, but it still systematically overstates hilly efforts.
- Intervals.icu, Final Surge, and most other analytics platforms use similar formulations and produce similar errors. The issue is not specific to any one vendor — it's structural to pace-based load calculation with grade adjustment.
- Stryd running power offers a different approach entirely. Stryd measures running power directly from a foot pod and calculates load from power, not pace, which sidesteps the NGP issue. Stryd's TSS-equivalent metric (also called rTSS internally) is more consistent across hilly and flat runs — but Stryd has its own quirks around downhill power measurement that mean it's not a perfect solution either.
The issue is not a bug in any one app's code — it's a limitation of the underlying model. Every platform that calculates training load from pace and grade adjustment will have some version of it. The question is how much the error matters for your training decisions, not which app has 'fixed' it.
What's the better metric for hilly runs?
The most honest training load metric for hilly running is heart-rate-based TSS (hrTSS), which calculates load from time-in-zone rather than from pace. Heart rate reflects the actual cardiovascular cost of the effort regardless of grade — if you're running easy, your heart rate is low whether you're going uphill or downhill, and if you're pushing hard, your heart rate is high regardless of direction. hrTSS is structurally immune to the NGP downhill over-counting issue because it doesn't depend on pace at all.
The catch with hrTSS is that heart rate has its own issues as a load metric. It responds slowly (the heart rate doesn't immediately jump when you start a hard effort), it drifts upward with fatigue and heat even when effort is constant, and it can be distorted by caffeine, dehydration, sleep deprivation, and altitude. hrTSS produces a reasonable average picture of load but it's not perfect on any given run.
For athletes with Stryd power meters, power-based rTSS is probably the most accurate option for hilly training because power captures the actual work output (which is what load should be proportional to) and doesn't depend on the pace-to-effort assumptions that NGP uses. The downside is that Stryd requires additional equipment and adds setup complexity.
The most practical solution for most athletes is to recognize that rTSS is unreliable on hilly runs and to manually discount it when reading their training log. A 300 rTSS mountain run is not 300 rTSS of stress — it's probably 150 to 200, and the athlete should plan recovery based on the honest number rather than the inflated one. This is crude but it matches reality better than trusting the app output.
How much does this matter for training decisions?
For flat or rolling training runs, rTSS is fine and the issue can be ignored. The NGP model works well for modest grade changes and produces roughly correct load numbers. Runners whose weekly training is mostly flat (road runners, track runners, most urban-based athletes) will rarely encounter the problem and can treat rTSS as a reliable metric.
For trail runners, ultra runners, and anyone doing significant training on steep terrain, the issue is significant and cumulative. Over a month of hilly training, rTSS inflation can make a runner think they've accumulated 3,500 weekly TSS when the honest number is closer to 2,500. That 40 percent inflation matters for two reasons: it overstates the athlete's fitness (TSS drives chronic training load in the Performance Management Chart), and it overstates the recovery required (athletes who see a 300 TSS run often plan additional rest that isn't physiologically necessary).
For coaches working with trail runners, the practical fix is usually to switch to hrTSS or Stryd for load tracking and to treat pace-based rTSS as an unreliable secondary signal. Several coaching groups specializing in mountain and ultra running have publicly discussed this issue and made the switch formally, but it hasn't propagated into the broader consumer-facing tools. Amateur trail runners using Strava and TrainingPeaks without knowing about the inflation are the athletes most affected by the issue and least likely to correct for it.
For ultra racing specifically, the issue can distort the athlete's entire training log. A week that includes a 5-hour mountain run might show up as 800+ TSS in TrainingPeaks, which is physiologically implausible and which inflates chronic training load in ways that make the plan look more ambitious than it actually is. Experienced ultra coaches tend to ignore pace-based TSS entirely for these runs and work from hrTSS, time, vertical gain, or subjective effort.
What should you do about it in your own training?
If you're primarily a road or track runner, keep using rTSS — it's fine for your terrain and the issue rarely affects you.
- If you're a trail runner, ultra runner, or mountain runner, consider switching to hrTSS as your primary load metric. Most analytics platforms let you configure which TSS model is used, and hrTSS will give you a more honest picture of your training load over time.
- If you have a Stryd power meter, use power-based TSS for hilly runs. It's more accurate than pace-based rTSS and handles downhill segments more correctly.
- If you're stuck with rTSS (no heart rate data, no power meter, no option to change the metric), apply a mental discount to hilly runs. For a run with significant descent, assume the displayed rTSS is 30 to 50 percent higher than the real training stress and plan recovery accordingly.
- Do not use rTSS to compare hilly and flat runs directly. A 120 TSS flat run and a 280 TSS mountain run are not 'more than double the stress' — they are probably comparable in real physiological cost. The numbers are not on the same scale.
- When reviewing your training log after a block of mostly hilly running, do not conclude that your fitness has exploded because your chronic training load shot up. Some of that increase is real; much of it is rTSS inflation on hilly runs. Cross-check with race results, time trials, or heart rate data before drawing conclusions about fitness changes.
Is anyone working on a better model?
Yes, though the fixes are slow to reach consumer platforms. The most promising alternatives to NGP involve direct power measurement (Stryd), combined pace-HR models that weight heart rate more heavily on steep terrain, and simpler vertical-gain-based adjustments that add a fixed load per meter of climbing without applying the Minetti curve to descents. Some academic research has proposed updated grade-adjustment curves that better handle steep downhills, but these have not been integrated into Strava, TrainingPeaks, or Garmin Connect as of the current versions of those platforms.
Stryd has probably made the most practical progress — running power measurement works pretty well for hilly terrain because it captures the actual work rate directly, and the Stryd versions of TSS-equivalent metrics are more consistent across flat and hilly runs than pace-based versions. The downside is that Stryd requires buying additional hardware and integrating it into your training stack, which is a barrier for casual users.
Critical-power-based running load metrics (using concepts from the Monod-Scherrer and Morton models of critical power, adapted to running) are another direction. These metrics avoid pace entirely and model load from power output and critical power thresholds. They're mostly used in academic settings and by elite coaches, not in consumer-facing tools, but they represent a more physiologically defensible framework than NGP-based rTSS for hilly terrain.
For now, the practical reality is that rTSS remains the default load metric in every major running app, and athletes training in significant terrain need to understand its limitations and correct for them rather than wait for the platforms to fix the issue.
Key takeaways
- Running TSS (rTSS) is calculated from Normalized Graded Pace (NGP), which uses Alberto Minetti's 2002 grade-adjustment curves to translate hill pace into flat-ground equivalent.
- The Minetti curve handles uphill running correctly but systematically over-counts downhill running because it doesn't properly account for gravitational assistance and the non-metabolic cost of braking.
- Strava, TrainingPeaks, Garmin Connect, Intervals.icu, Final Surge, and other major platforms all inherit this issue because they use similar grade-adjustment models.
- On hilly runs with significant descent, rTSS is inflated by 30 to 100 percent compared to the actual physiological training stress.
- Heart-rate-based TSS (hrTSS) is more honest for hilly terrain because it reflects actual cardiovascular cost regardless of grade — though heart rate has its own drift and latency issues.
- Stryd running power is probably the most accurate load metric for hilly runs because power captures actual work output directly, but it requires additional hardware.
- The practical rule for trail and ultra runners is to distrust rTSS on hilly runs, switch to hrTSS where possible, and mentally discount inflated TSS numbers when planning recovery.
- The issue is structural to pace-based load calculation with grade adjustment, not a bug in any specific app. Fixing it requires either different measurement (power) or different metrics (hrTSS, vertical-gain-based).
Frequently asked questions
Why does my hilly run show such a high TSS in Strava or TrainingPeaks?
Because Normalized Graded Pace over-counts downhill segments as much faster equivalent flat pace, which then inflates the intensity factor and the TSS calculation. The Minetti-curve-based grade adjustment used by most major apps captures uphill running accurately but systematically over-credits downhill running because it doesn't properly account for gravitational assistance. On a hilly run with significant descent, rTSS can be 30 to 100 percent higher than the actual physiological training stress. The number isn't lying to you exactly — it's a structural artifact of the model.
Should I use hrTSS instead of rTSS for my training log?
For trail, ultra, or mountain running, yes. Heart-rate-based TSS (hrTSS) reflects the actual cardiovascular cost of the effort regardless of terrain, which sidesteps the NGP over-counting issue. hrTSS isn't perfect — heart rate drifts with fatigue and heat, and it responds slowly to effort changes — but it's much more reliable than rTSS on hilly terrain. Most analytics platforms let you set which model is used as the default load metric. For flat road running, either metric works fine, and most runners can stick with rTSS.
Does this issue affect cycling TSS the same way?
No. Cycling TSS is calculated from actual power meter data, not from pace with grade adjustment. Your power meter reports 250 watts whether you're climbing or descending, so the Normalized Power and TSS calculations don't need a grade-adjustment curve. Cycling TSS is physiologically honest across all terrain. The hill issue is specific to running because running TSS has to infer effort from pace via a grade-adjustment model, and the model has systematic errors on descents.
Is Stryd's running power more accurate for hilly runs?
Yes, generally. Stryd measures running power directly from a foot pod and calculates load from power rather than from pace, which sidesteps the NGP downhill over-counting issue. Stryd's TSS-equivalent metric is more consistent across flat and hilly runs and produces numbers that better reflect the actual training stress. Stryd has its own quirks — the power measurement on very steep descents can still be noisy, and calibration matters — but for most hilly training Stryd is meaningfully more accurate than pace-based rTSS.
Can I manually correct the TSS on a hilly run?
Sort of. If you know your normal flat-terrain rTSS for a run of similar duration and perceived effort, you can mentally discount the hilly-run rTSS to match. A 300 TSS mountain run that should probably be 150 to 200 TSS can be manually logged as such in most platforms, or you can just treat the displayed number as unreliable and plan recovery based on your subjective sense and heart rate data. Some platforms allow manual TSS override on individual workouts, which is a reasonable workaround for athletes who want clean chronic training load numbers.
Does the issue affect my CTL and form chart in TrainingPeaks?
Yes, and significantly. Chronic Training Load (CTL) in the Performance Management Chart is calculated as a 42-day weighted rolling average of daily TSS. If hilly runs are inflating daily TSS by 30 to 100 percent, your CTL is inflated too, and the form-related metrics (TSB, ATL) inherit the error. Athletes training primarily on hilly terrain see CTL numbers that look high but don't correspond to their actual fitness, and race pacing decisions based on those numbers can be miscalibrated. The solution is to switch to hrTSS or Stryd for load tracking, which gives a more honest CTL for mountain training.
How CoreRise tracks training load honestly
CoreRise is aware of the rTSS inflation issue on hilly runs and handles it in two ways. First, when heart rate data is available, the coach cross-references pace-based load against heart-rate-based load and flags cases where the two diverge significantly — which usually corresponds to hilly terrain and NGP over-counting. Second, for athletes who train primarily on trails or in mountains, the coach can be configured to use hrTSS as the primary load metric so that the Performance Management Chart reflects actual physiological training stress rather than inflated pace-based numbers.
Cora can also help you interpret your training log when the numbers don't match your subjective feel. If you're seeing a 300 TSS mountain run in your log and it feels closer to 180 of training stress, the coach can explain why the number is inflated, what the honest number probably is, and how it should inform your recovery and next sessions. The goal is a training log that matches your physiological reality, not one that's distorted by a structural limitation in how running load is calculated.
- Running load is calculated with awareness of the rTSS inflation issue on hilly terrain.
- Heart-rate-based TSS is preferred as the primary load metric for athletes training on hilly or mountain terrain.
- The coach cross-checks pace-based and HR-based load and flags meaningful divergences.
- Chronic training load (CTL) reflects honest physiological stress rather than inflated pace-based numbers.
- The coach can explain discrepancies between app-reported TSS and your subjective training feel.