Evidence Grading Methodology: How We Rate Every Peptide Claim
The trust cornerstone of PeptideVox — how we assign an A-to-D evidence grade to every peptide claim, anchored to GRADE, the Oxford CEBM levels, USPSTF, and Cochrane RoB 2, and why human evidence is never blended with animal, in-vitro, or anecdotal data.
Every efficacy claim on PeptideVox carries a letter grade from A (proven in humans by randomized trials) to D (unproven — anecdote, mechanism, or marketing), anchored to internationally recognized appraisal systems and paired with an inline citation to a real primary source. The single most important discipline is that human evidence is never blended with animal, in-vitro, or anecdotal evidence, and a compound's legal and anti-doping status is graded on a separate axis from its efficacy.35
The peptide space has a structural honesty problem. The most heavily marketed compounds — BPC-157, TB-500, MOTS-c, epitalon, dihexa, and the bioregulator family — have their entire efficacy story built on animal studies, cell cultures, or single-lab literature, yet they are routinely discussed as though those findings were established human therapies. Meanwhile a smaller set of peptides — semaglutide, teriparatide, bremelanotide — genuinely do have rigorous human trial evidence for specific indications.1718 A reader has no way to tell these apart unless someone does the appraisal work and shows it transparently. This page is how we do that work.
This article is informational and editorial content for research and educational purposes only. It is not medical advice, not a protocol, and not a sourcing guide. Most peptides discussed on this site are not FDA-approved; many are sold as "research chemicals not for human use" and several are prohibited in sport. Consult a licensed clinician before any health decision.
Why grade every claim at all?
The purpose of an evidence grade is to compress "how much should I trust this claim?" into a single, defensible letter — and then to show our work with an inline citation so the reader never has to take our word for it. This mirrors the discipline of the best evidence-appraisal bodies in medicine, who long ago abandoned ungraded expert pronouncements in favor of explicit, reproducible grading of the body of evidence.35
The peptide field is exactly the kind of subject matter — health, safety, and money — where standards must be highest. Google's Search Quality Rater framework classifies health content as "Your Money or Your Life" (YMYL), where misleading or low-quality content can cause real-world harm, and holds it to a higher bar for Experience, Expertise, Authoritativeness, and Trust (E-E-A-T).12 This methodology is how we meet that bar. From a functional and integrative-medicine standpoint we are sympathetic to root-cause, regenerative thinking, and we cover compounds the pharma-default literature often ignores — but that lens governs what we investigate and how we frame it, never the evidentiary bar. A mechanistically elegant, root-cause-friendly hypothesis with only rat data is still a Grade C claim, and we label it as such.
What does each grade actually require?
Our four-tier scheme is deliberately simple for readers, but each tier is defined against established methodology so the grade is reproducible rather than arbitrary. A claim earns Grade A only when supported by human randomized controlled trials and/or meta-analyses or systematic reviews of RCTs for the specific indication and population. This is the top of every recognized hierarchy: in the Oxford CEBM 2011 levels, Level 1 is a systematic review of randomized trials,5 and in GRADE, randomized trials start as high-certainty evidence.4 It corresponds to a USPSTF Grade A/B recommendation, where there is high certainty that the net benefit is substantial or moderate.6 Canonical A-grade examples on this site are semaglutide for chronic weight management,17 teriparatide for osteoporotic fracture reduction,18 and bremelanotide for premenopausal hypoactive sexual desire disorder.19
Grade B is for genuine human evidence below the RCT bar: prospective cohort and observational studies, and small, open-label, single-arm, or early-phase (Phase 1/2) human trials. In GRADE, observational studies start at low certainty and must earn their way up.4 A Grade B claim says humans have been studied, a signal exists, but the evidence is preliminary and could change with a proper trial. Grade C is the most consequential grade in the peptide field, because it is where most of the popular "healing" and "longevity" compounds sit: the only supporting evidence is animal and/or in-vitro, with no qualifying human efficacy data. BPC-157 is the textbook case — a large, consistent preclinical literature but no completed human RCT, leaving its highest grade at C.21 Grade D is reserved for claims resting on anecdote, expert opinion, mechanism-only reasoning, or marketing copy with no controlled evidence — the situation a USPSTF "I statement" describes, where evidence is lacking, of poor quality, or conflicting.6 A D is not a statement that something is false; it is a statement that it is unproven.
| Our grade | GRADE certainty | Oxford CEBM 2011 | USPSTF analogue | Examine.com |
|---|---|---|---|---|
| A | High / Moderate | Level 1-2 (SR of RCTs / RCTs) | A / B (high certainty, substantial-moderate net benefit) | A (multiple consistent studies) |
| B | Low-Moderate | Level 3-4 (cohort, case-series) | C (moderate certainty, small net benefit) | B-C (fewer studies, possible/small effect) |
| C | Very Low (for human use) | Level 5 (mechanism-based) | I statement (insufficient human evidence) | D (very little / inconsistent research) |
| D | Below GRADE (no qualifying study) | Below Level 5 | I statement | D-F (no / contrary evidence) |
These mappings are approximate and directional. GRADE rates the certainty of a body of evidence as High, Moderate, Low, or Very Low;10 Examine.com grades interventions A-F on a per-outcome basis;9 and AHRQ similarly grades a body of evidence by study limitations, consistency, directness, and precision.8 We borrow their logic, not their exact arithmetic.
Which sources count, and which do not?
Not all sources are equal, and we rank them explicitly before any claim is graded. We draw primary literature first — PubMed/MEDLINE-indexed RCTs, Cochrane Library systematic reviews, and meta-analyses in peer-reviewed journals — and treat everything downstream of it as context, never as the basis for an efficacy grade. The tiers run from T1 (primary human RCT/meta-analytic evidence, which can support Grade A) down through cohort and early-phase human data (T2, Grade B), regulatory and official sources such as the FDA and the WADA Prohibited List (which establish legal status, graded separately), pharmacology and reference databases, trial registries such as ClinicalTrials.gov, specialty-society guidance, and finally preclinical, mechanistic, and single-lab literature (which can support Grade C at most and never establishes a human claim).2223
Three operating rules follow. The cross-check rule: every significant claim is verified against at least two independent sources, preferring primary over secondary. The registry-is-not-evidence rule: a trial appearing on ClinicalTrials.gov proves only that it was registered — until it reports results, it cannot raise a grade, so a registered-but-not-reporting Phase 2 trial leaves an existing Grade C verdict unchanged.23 The date-check rule: regulatory and legal facts are time-stamped and re-verified against the current year, because they move fast — we do not rely on stale 2023-era status claims for a 2026 page.
How is quality judged within a grade?
A grade reflects more than study design; it reflects how well the studies were done. For human trials we weigh the same five bias domains the revised Cochrane Risk of Bias 2 tool uses — the randomization process, deviations from intended interventions, missing outcome data, measurement of the outcome, and selection of the reported result — each judged low risk, some concerns, or high risk.7 A small, unblinded, industry-funded trial with selective outcome reporting does not carry the weight of a large, pre-registered, double-blind RCT, even though both are technically "RCTs."
Following the GRADE approach, we treat RCT evidence as starting high and observational evidence as starting low, then move it based on five factors that lower certainty — risk of bias, inconsistency, indirectness, imprecision, and publication bias — and factors that can raise it, chiefly a large effect size.43 Imprecision matters acutely here, where many "human" peptide studies enroll only a handful of subjects: an n=2 pilot is reported as a safety signal, not as efficacy. And like GRADE and AHRQ, we grade the body of evidence for a claim, not a single favorable study — one positive small trial against a backdrop of null or conflicting trials does not earn an A, and reliance on a single lab is a caution flag that holds a grade down.8
Where is the bright line between human, preclinical, and anecdotal?
The single most important discipline on this site is refusing to let evidence "level up" as it crosses categories. A result in rats, mice, or a cell line is reported in those exact terms and graded C — we never write "X heals tendons" on the strength of a rodent Achilles study; we write "in a rat Achilles model, X improved healing on functional, biomechanical, and histological measures (Grade C, preclinical)." The reason RCTs sit atop every hierarchy is precisely that animal and mechanistic data systematically over-predict human benefit.5
Mechanism is a hypothesis, not an outcome: a plausible receptor interaction explains how a compound might work, not that it does work in people, and in the Oxford scheme mechanism-based reasoning is the lowest level of evidence.5 We also apply an extrapolation ban — an open-label signal in older adults is not evidence for athletes, an intravenous safety pilot is not evidence for subcutaneous efficacy, and a microgram animal dose is not a human dosing recommendation. When the literature only supports a narrow, qualified statement, that is the only statement we make; and when a popular claim has no qualifying evidence, the page says so plainly, because naming the gap is itself part of the grade.
Finally, legal and anti-doping status is graded on a separate axis. We track each compound's federal status from primary FDA sources — the 503A bulk-substance categories, and the live 2026 timeline in which the FDA removed twelve peptides from Category 2 on April 15, 2026 (because nominations were withdrawn, not because the agency found them safe) ahead of a Pharmacy Compounding Advisory Committee review.11121314 For athletes, most research peptides fall under WADA category S0 (non-approved substances), prohibited at all times, with GLP-1 agonists moving to full prohibition in 2026.1516
Bottom line. This methodology exists so that a reader can trust a single letter. Grade A means human randomized-trial evidence stands behind the claim; B means real but preliminary human evidence; C means the science so far is animal- or cell-based only; and D means the claim rests on anecdote, mechanism, or marketing. Human evidence is never silently merged with preclinical or anecdotal evidence, every nontrivial claim carries a verifiable citation, and a compound's legal and anti-doping status is reported separately from its efficacy. We grade the evidence honestly, show our sources, and say so plainly when the evidence is weak or absent — because in a field this heavily marketed and this consequential to health, transparency is the product.
References
| # | Source | Type |
|---|---|---|
| 1 | Google Search Central. "E-A-T gets an extra E for Experience" (E-E-A-T; Trust the most important member). 2022. developers.google.com | Regulatory |
| 2 | Google. "Search Quality Rater Guidelines" (overview PDF; YMYL classification). services.google.com | Regulatory |
| 3 | GRADE Working Group — official site (certainty of evidence: High/Moderate/Low/Very Low). gradeworkinggroup.org | Review |
| 4 | Balshem H, et al. "GRADE guidelines: 3. Rating the quality of evidence." J Clin Epidemiol 2011. jclinepi.com | |
| 5 | OCEBM Levels of Evidence — Centre for Evidence-Based Medicine, University of Oxford (2011). cebm.ox.ac.uk | Review |
| 6 | USPSTF. "Grade Definitions" (A/B/C/D/I; certainty levels). uspreventiveservicestaskforce.org | Regulatory |
| 7 | Cochrane. "About Risk of Bias 2 (RoB 2)" — the five bias domains. cochrane.org | Review |
| 8 | AHRQ. "Grading the Strength of a Body of Evidence" (Methods Guide). effectivehealthcare.ahrq.gov | Review |
| 9 | Examine.com. "How evidence grades are calculated" (A-F, per-outcome). examine.com | Review |
| 10 | CDC ACIP GRADE Handbook, Ch. 7 — criteria determining certainty of evidence. cdc.gov | Regulatory |
| 11 | FDA. "Interim Policy on Compounding Using Bulk Drug Substances Under Section 503A." fda.gov | Regulatory |
| 12 | FDA. "Bulk Drug Substances Used in Compounding Under Section 503A" (categories 1/2/3). fda.gov | Regulatory |
| 13 | Orrick. "FDA Announces Removal of 12 Peptides from Category 2 and Schedules PCAC Meetings." 2026. orrick.com | Regulatory |
| 14 | FDA Law Blog. "FDA's Pep(tide) Rally! What Compounders and Industry Need to Know." 2026. thefdalawblog.com | Regulatory |
| 15 | WADA. "WADA's 2026 Prohibited List now in force" (S0 non-approved substances). wada-ama.org | Regulatory |
| 16 | USADA. "What's New on the 2026 WADA Prohibited List" (GLP-1 prohibition). usada.org | Regulatory |
| 17 | FDA. "FDA Approves First Treatment to Reduce Risk of Serious Heart Problems... Obesity/Overweight" (semaglutide CV outcomes). 2024. fda.gov | Regulatory |
| 18 | AAFP. "Teriparatide (Forteo) for Osteoporosis" (65% vertebral / 53% non-vertebral fracture reduction). 2004. aafp.org | Review |
| 19 | FDA. Vyleesi (bremelanotide) prescribing information, 2019. accessdata.fda.gov | Regulatory |
| 20 | Drugs.com. Vyleesi (bremelanotide) FDA approval history (2019, HSDD). drugs.com | Regulatory |
| 21 | Józwiak M, et al. "Multifunctionality and Possible Medical Application of the BPC 157 Peptide — Literature and Patent Review." Pharmaceuticals 2025 (no completed human RCT). pmc.ncbi.nlm.nih.gov | Review |
| 22 | Cochrane Library — systematic reviews. cochranelibrary.com | Review |
| 23 | ClinicalTrials.gov — trial registry. clinicaltrials.gov | Regulatory |
| 24 | PubMed (NIH/NCBI/MEDLINE) — primary literature index. pubmed.ncbi.nlm.nih.gov | Review |
Frequently Asked
Common questions · evidence-graded answersWhat do the A, B, C, and D peptide evidence grades mean?
The grade compresses how strong the evidence is for a specific claim into a single letter. Grade A means the claim is proven in humans by randomized controlled trials and/or meta-analyses of RCTs. Grade B means there is a genuine human signal below the RCT level — cohort, observational, open-label, single-arm, or early-phase trials. Grade C means the only supporting evidence is preclinical (animal or in-vitro), with no qualifying human efficacy data. Grade D means the claim rests on anecdote, expert opinion, mechanism-only reasoning, or marketing, with no controlled evidence at all. The grade always attaches to a particular use, dose, and population — not to the molecule in the abstract.
Why does animal evidence never count as human evidence?
A result in rats, mice, or a cell line describes only what happened in rats, mice, or a cell line. The history of pharmacology is full of compounds that performed beautifully in animals and then failed or harmed in humans, which is precisely why randomized controlled trials sit atop every recognized evidence hierarchy. Animal and mechanistic data systematically over-predict human benefit. So even a large, internally consistent, mechanistically elegant rodent literature — BPC-157 is the textbook case — earns at most a Grade C. We report a rat finding in rat terms and never translate a preclinical result into a human benefit claim.
How does this A-to-D scheme map to GRADE, Oxford CEBM, and USPSTF?
The four tiers borrow the logic, not the exact arithmetic, of the major appraisal systems. Grade A corresponds to GRADE High/Moderate certainty, Oxford CEBM Level 1-2 (systematic reviews of RCTs or RCTs), and a USPSTF A/B recommendation. Grade B maps to lower GRADE certainty and Oxford Levels 3-4 (cohort and case-series). Grade C corresponds to Oxford Level 5 (mechanism-based reasoning) and a USPSTF I statement of insufficient evidence. Grade D falls below GRADE entirely, with no qualifying study. These systems each grade slightly different objects — a body of evidence, a study design, a recommendation — so the mappings are directional rather than a precise conversion.
Why is a peptide's legal status graded separately from its efficacy?
Efficacy (does it work?) and regulatory status (is it legal or approved?) are independent axes, and conflating them misleads readers. A compound can have promising preclinical data and still be illegal to compound and banned in sport; conversely, an approved drug carries an A-grade for its approved indication but may be a D for an off-label claim. We track each compound's actual FDA status from primary sources, time-stamped to the current year — the 2026 503A Category 2 removals and pending PCAC review are a live example — and report anti-doping status against the WADA Prohibited List separately. Because athletes are strictly liable for what is in their bodies, anti-doping status is reported prominently and never folded into an efficacy grade.
What is the inline-citation mandate?
Grading is only half of trust; verifiability is the other half. Every nontrivial claim — every statistic, dose, mechanism, indication, and safety or legal statement — carries an inline, full-URL citation to the primary source, and every document ends with a citations table. We cite only sources we have actually opened and read, never fabricating a study, author, journal, year, or URL. Primary sources (PubMed, ClinicalTrials.gov, FDA, WADA) are cited in preference to blogs or news write-ups, and the citation must support the exact claim it is attached to. If a real source cannot be found for a claim, the claim is omitted or the absence of evidence is stated directly.
Does a low grade mean a peptide has been debunked?
No. A Grade C or D means unproven in humans, not disproven. Many Grade C peptides have legitimate scientific interest and coherent mechanistic rationale; the grade simply marks the honest distance between the current evidence and a human therapeutic claim. Likewise, a grade is not a recommendation — even an A-grade approved drug has contraindications and populations for whom it is inappropriate. From a functional and integrative-medicine standpoint we are sympathetic to root-cause, regenerative thinking and cover compounds the pharma-default literature ignores, but that lens shapes what we investigate and how we frame it; it never lowers the evidentiary bar or manufactures sourced facts.
PeptideVox is an evidence reference, not medical advice. Nothing here authorizes you to acquire, possess, or self-administer any compound.
This content is for informational and educational purposes only · No physician–patient relationship is created · Evidence grades reflect published data as of the stated revision and may change.