The Unjournal

Making impactful research more rigorous — and rigorous research more impactful

David Reinstein · Founder & Co-Director, unjournal.org

A talk for university researchers

In one slide

The Unjournal (unjournal.org) — a grant-funded nonprofit that pays experts to publicly evaluate and rate research, and assesses Pivotal Questions for stakeholders.

We aim to make impactful research more rigorous, and academic work more useful.

We support open science, open access & transparency.

We work to improve peer-review — aligning research incentives with truth-seeking and social value.

Focus: economics, policy & quantitative social science with global-impact potential · unjournal.pubpub.org

A bit about me

A typical day in academia — before I left to chase impact.

Warm, peer-to-peer open — an image, not a wall of text. The story to tell here: - I was a tenured academic economist for ~20 years (PhD economics) before I left to pursue impact. - The Unjournal grew out of three things colliding: (1) my own (common) gripes about academic peer review; (2) wanting research to have real-world impact — the kind of thing national research-assessment impact cases (e.g. the UK’s REF) at least gesture at; and (3) once I left, exposure to impact-focused funders who genuinely want to use research. - It partly started in conversations with academic colleagues — including one who said “this sounds like a lot of work.” They were right. So now I’m asking “has this work been useful enough for researchers to engage with and use?” - Logistics aside: this deck is hosted and open to Hypothes.is comments — annotate it later if you like. Interrupt me; there’s a menu of ways to engage at the end, and questions I’d love your thoughts on.

1 · The limitations of journal peer-review

An old system, still running the show

Journals do real things — curation, dissemination, community, a trusted signal.

But we already disseminate ourselves — working papers, arXiv, SSRN, dynamic docs — and a 17th-century filter still governs careers and credibility.

So the scarce, valuable part is increasingly the evaluation — the expert judgement — and we throw most of it away.

The biggest cost isn’t fees or paywalls: it’s the game

The average economics paper goes to 3–4 journals before it’s placed.

Reviewer time alone is a back-of-envelope ~$150M/year in economics.

But the biggest cost is authors’ time — reformatting, resubmitting, journal-shopping, strategising instead of just improving the work.

…e.g. “spin it as a hamburgers-economics paper for the American Hamburger Journal”

★ THE PUBLISH-OR-PERISH SLOT MACHINE ★

insert: 1 finished paper · ~6 months / spin

1 · pick an arm (which journal?)

AEREJJDE

2 · pull the lever → wait ~6 months →

✗↻✓

rejectR&Raccept

PAYOUT: one line on your CV

Careers staked on a noisy, slow spin — each spin ~6 months; placement takes 2–6 years.

“Playing this game diverts us from producing the most credible, useful research.”

“Published — so stop bothering me about it”

Journals take one format: ~30 static pages.

Publication says “done” — so we slice off the next paper.

Little room to improve, correct, or build in place.

Research as a living document

Evaluation needn’t be chained to a 30-page PDF:

Any format — dynamic, interactive, replicable documents.

Improve in place → then ask for further evaluation.

Open evaluation feeds open science & replication.

An interactive specification-curve / “multiverse” document — easy to build now, far more useful than static pages. ↓ what separating evaluation unlocks

Separating evaluation from publishing → a world of benefits

Decouple the evaluation from the 30-page PDF and it becomes a citable, first-class object — DOI, metadata, discoverable — instead of a one-shot stamp in a “PDF prison”.

2 · What The Unjournal is

unjournal.org · info.unjournal.org · unjournal.pubpub.org

We are not a journal, we don’t “publish papers”

A non-profit commissioning open evaluation of publicly-hosted research with potential for global impact.

We commission and pay for expert evaluation — and authors can still publish in a journal too.

Multiple evaluations, structured ratings, and an author response — all public, with DOIs.

Credible, citable peer review — not tied to any journal’s accept/reject.

unjournal.org · unjournal.pubpub.org · Funders: Survival & Flourishing Fund, Long-Term Future Fund, EA Infrastructure Fund

What an evaluation gives you

We don’t accept/reject or assign a tier; so we benchmark instead. Two halves, equally important:

The substance

Detailed referee-style reports
The authors’ response
Editorial summary

The structured ratings

Percentile ranking
Journal-tier equivalent (0–5)
Nine criteria, with quantified uncertainty
Claim identification and assessment

All public, citable, and comparable — see the evaluator interface → · ↓ the actual instrument

Moved up front (Exeter lesson: present “what an evaluation is and what the ratings are” early — questions about it kept arriving before the later slides). Lead with the rationale: because we don’t accept/reject or assign a tier, the benchmarked quantified judgement is what gives an evaluation career value and value to research users. The substantive reports are at least as important as the numbers — don’t present them as a “plus.” Caveats to say aloud: these are structured expert judgments, not oracle scores; reference class is serious work in the field over ~3 years; not every evaluator expresses uncertainty the same way — we give some flexibility; the published set is selected, so don’t read it as “all of economics.” Offer to show the live evaluation form. DOWN for a look at the actual instrument.

Inside the evaluation form

What an evaluator actually fills in — every rating elicited with a 90% credible interval, not a point score.

Percentile rating · Methods: justification, reasonableness, validity, robustness

0255075100

Midpoint 72 · 90% CI 66 – 78 (tight — a confident rating; one of nine criteria)

Journal-tier rating (0–5) — elicited as two separate ratings, each with its own 90% CI

“Should” — normative merit → Midpoint 3.8, 90% CI 2.7 – 4.6 (wide — more uncertain)

“Will” — predicted placement → Midpoint 3.2, 90% CI 2.5 – 3.9

012345

0 won’t publish · 1 OK · 2 marginal-B · 3 top-B · 4 marginal-A · 5 top-A (top-5: AER, QJE, Econometrica). Non-integers encouraged; gap between “should” and “will” = the placement lottery.

Claim identification & assessment — evaluators pull out the paper’s key claims and rate, for each, the strength of evidence and its implications.

Rebuilt from the live instrument · open the evaluator form →

Why this can work now

Some research users want more than “which journal published it” — and they want it faster:

Funders & research users who need evidence to act — e.g. Coefficient Giving, Survival & Flourishing Fund.

They want credible expert judgment, transparent reasoning, quantified beliefs & uncertainty.

What they’re really after: decision-relevance and value of information.

Many don’t yet know which questions matter most — or what the evidence already says. Helping surface and answer those is part of the job (prioritisation; Pivotal Questions).

And researchers keep their independence: we evaluate and prioritise existing public work — we don’t commission, own, or steer the research itself.

A different demand signal from the journal system.

↓ “but doesn’t this need everyone to move at once?”

Our early funders and partners come from the global-priorities / EA-adjacent world. They’re not mainly asking “is this top-5 material?” — they’re asking “how should this change our beliefs, and what should we do differently?” They want quantified beliefs with uncertainty and explicit reasoning.

Some partners are skeptical and say “academics don’t want to change” – I disagree. Many academics would prefer a different system (if it was credible) — but individual researchers and departments hesitate to move first. It’s a coordination failure mistaken for a preference. Outside demand matters because it can pay for a better signal while academia decides how much to trust it. And the demand may grow: the narrative I hear a lot is that AI wealth is likely to expand impact-focused philanthropy (Anthropic IPO, etc.)

Solving the coordination problem

Academics ~broadly agree open evaluation is better — but can’t move first alone.

Funding & grantmaker incentives can tip the balance.

We work to be highly visible — so evaluations & ratings are seen before conventional reviewers weigh in.

Building a bridge, not asking you to jump off one: Fear of Standing Out → Fear of Missing Out.

Making it discoverable where it counts

Unjournal evaluations are indexed in Google Scholar — surfacing with the working paper, not years later. search “source:unjournal” →

Speed

One round of public evaluation → a credible output now.

A publicly citable signal after one round.

A traditional journal: 6+ months, R&R at best — then maybe accepted after substantial revisions.

For fast-moving topics, that lag means missing the decision window entirely.

AI capabilities · AI’s impact on labour markets · policy windows

How long does it take?

Target ~2–3 months · prioritisation → published package

Recruit ~2 evaluators — ~1–2 weeks
Evaluations (reports + ratings) — 5+ weeks (~3-week turnaround target each)
Author response — ~2 weeks (longer if revising)
Total target: ~7–10 weeks

Versus a traditional economics journal: ~1–3 years (often 24+ months to acceptance).

Self-reported evaluator effort ≈ 8–32 hours per evaluation. The target above is from our process docs; we track the dates but haven’t yet published a measured median end-to-end.

How it works

Find / receive the research
Prioritise for decision-relevance (as a team)
Recruit an evaluation manager → ~2 paid expert evaluators
Reports + ratings + author response (evaluators may adjust)
Manager synthesis → publish the package, with a DOI

Who evaluates? Domain experts from our 180+ pool (½ hold doctorates, ~40 professors), matched to each paper — paid, named or anonymous. ▶ 2-min explainer · ↓ full workflow & video · who’s behind it (§3)

Our workflow

Watch the 2-minute explainer

▶ Watch the 2-minute explainer on YouTube

A short narrated walk-through of the Unjournal evaluation process · youtu.be/ZCSeAmzMB50

We prioritise research for impact-potential

Prioritisation is triage, not evaluation

First question: will better evidence here change real decisions?

We do prioritise influential, widely-read work — but we don’t chase the merely clever.

↓ how the triage actually runs

How the triage runs

How the team votes

Every candidate paper gets a team vote on impact-potential — Strong Yes / Weak Yes / Unsure / Weak No / Strong No, with vote counts and an average. This is the actual voting board (Coda).

Some considerations

3 · What we’ve done

Where we are now

57 evaluation packages on PubPub

100+ expert evaluations

180+ evaluators (120+ PhDs, ~40 profs)

~$450 avg evaluator payment

1,000+ structured ratings recorded

40+ field specialists

ISSN 3071-2173 · 501(c)(3) · DOIs

Plus a live prioritized-research pipeline, and Pivotal-Questions workshops & belief elicitations underway. · Founded 2022, public since 2023.

Every rating comes with a credible interval

Dots = evaluator medians; bars = stated uncertainty. Published, decision-relevant evaluations, sorted by midpoint. ratings dashboard → · ↓ a bridge to journal tiers

One number hides too much

Each paper is rated on ~8 dimensions — Overall, Open Science, Advances Knowledge, Methods, Logic & Communication, Real-World relevance, Global Relevance — each with uncertainty. A spider plot shows strengths and weaknesses a single tier would flatten. explore the dashboard →

Benchmarking existing signals: a known currency

Predicted vs. merited journal tier (0–5). A translation layer — not an endorsement of placement as the right endpoint.

Overall ratings by research area

Every published evaluation’s overall percentile rating, grouped by research area (✗ marks the area median). dashboard →

What we’ve evaluated — 57 packages by area

Global health & wellbeing15

Development & governance10

Economics, welfare & policy7

Environment & climate6

Meta-science & methods6

Animal welfare & markets5

Catastrophic & long-term risk4

AI & emerging tech2

Behaviour & attitudes2

Published packages (n=57). Health, development, environment & applied micro — the wheelhouse of many economics departments.

A concrete example: an award-winning evaluation

2024–25 Evaluator Prize · 1st Evaluation of “Water Treatment & Child Mortality: a meta-analysis”

“Very influential.”— GiveWell water team (Teryn Mattox); they had been weighing commissioning their own replication. The eval informs chlorination grantmaking.

“Thorough and thoughtful… extensive write-up and precise recommendations.”— the paper’s authors, who revised the framing in response.

Read it →

Do authors find it useful?

Across tracked evaluations

19 of 57 tracked evaluations drew an author response (16 formal)
Of 22 closely assessed: 15 a positive signal; ~a third substantively revised
For 8 papers we compared drafts — a median ~22% of changes traced to our feedback (LLM-assisted)
Author survey (n≈8): quality 30–90; one — “as good as a standard referee report, or better.”

Author responses · author survey

Did authors adapt? All 57, tracked

Each square = 1 of 57 tracked papers, by combined evidence tier. Green = LLM-analysed, shaded by share of major post-evaluation changes attributed to our feedback · blue = manually-confirmed update · orange = mixed / weak signal · grey = not yet assessed. LLM attribution via Claude Opus 4.6 — indicative, human verification ongoing.

The people behind The Unjournal

A management team (7) and advisory board (16) govern the process and standards; field specialist team members (~60) source and prioritize research:

David ReinsteinFounding Director

Anirudh TagatCo-Director

Gavin TaylorManagement

Bob KubinecManagement

Hansika KapoorManagement

Ryan BriggsManagement

Alexander HerwixManagement

Each paper is evaluated by ~2 domain experts, often matched from our evaluator pool:

180+ evaluators in the pool

½+ are economists

½+ hold doctorates

40+ field specialists · 8 areas

Full team → · Explore the evaluator pool →

↓ the advisory board

Keep the distinction clear: the faces are the seven-person management team who run the process and set standards (David Reinstein, Anirudh Tagat, Gavin Taylor, Bob Kubinec, Hansika Kapoor, Ryan Briggs, Alexander Herwix). Behind them, a 16-person advisory board lends credibility/oversight (next detail slide). The evaluators of any given paper are 2–3 domain experts drawn from the 180+ pool — different people, matched to the paper. Field specialists help prioritise and recruit, and they’re spread across many universities. Full team at unjournal.org/team.

(Open question we’re weighing — raised at the Exeter talk by Ben Z: should some people specialise in the evaluator role itself, rather than purely ad-hoc per-paper recruitment? A standing, trained, partly-specialised evaluator corps could improve calibration, consistency and turnaround. Worth flagging as a direction, not a settled plan.)

The advisory board

Our advisory board — methodologists, forecasters & meta-science researchers across economics, statistics, and policy.

Field-specialist teams (8 areas)

Development economics Anirudh Tagat · Ryan Briggs · Michael Wiebe · Nathan Fiala · Emmanuel Orkoh · Robert Kubinec · Masyhur Hilmy · Wayne Sandholtz · Lee Crawfurd · Yannick Dupraz · Leena Bhattacharya · William Seitz

Global health & well-being Jake Eaton · Rosie Bettle · Charlotte Lane · Shobhit Kulshreshtha · Jonah Goldberg · Valentin Klotzbücher · Priya Lall · Francesco Ramponi · Sarah Reynolds

Economics, welfare & governance David Reinstein · Julian Jamison · Tabaré Capitán · Joel Christoph · Andrei Potlogea · Greg Sasso · Brian Weber · Daniel Horn · Moritz Hennecke · Seth Benzell

Psychology, behavioral science, attitudes Hansika Kapoor · Jonathan Berman · Mattie Toma · Carina Ines Hausladen · Hannah Metzler

Innovation, meta-science, social impact of technology Daniela Cialfi · Jordan Dworkin · Kris Gulati · Andrew Kao · Gavin Taylor · Gary McDowell

Environmental economics Tanya O’Garra · Ben Balmford

Animal welfare (markets, attitudes) Josh Tasoff · Kevin Kuruc · Florian Habermacher · Nicolas Treich · Ash Mader · Brinda Poojary

Catastrophic risks, AI governance & safety David Manheim · Anca Hanea · Alexander Herwix · Tristan Williams

These specialists span many universities and institutions worldwide — a good chance some are already in your department.

4 · Pivotal Questions

The Pivotal Questions project

From single papers → identifying stakeholders’ specific ‘operationalised’ questions that matter:

What would change key decisions — and what research evidence informs it?

What do experts believe now, and how uncertain are they?

Researchers + practitioners + stakeholders — incl. Founders Pledge, Animal Charity Evaluators; participants from Coefficient Giving & more.

Beliefs on our platforms · overview · workshops: cultured-meat · wellbeing

What 10 workshop forecasts looked like

n = 10 expert forecasts from our cultured-meat workshop — medians with 80% credible intervals.

Voices from the workshops

Cultured-meat workshop — Oana Kubinyecz on cell-line cost drivers (top). Wellbeing workshop — Matt Lerner’s DALY ↔︎ life-satisfaction comparison & Michael Plant (HLI) on imperfect-but-usable metrics (bottom).

5 · Would this be useful to you?

Maybe people here are already involved?

Our field specialists and evaluators are spread across many universities — there’s a good chance some are already in your department.

Some are field specialists who help us prioritise and recruit.

Others sit in our 180+ evaluator pool, matched to papers in their area.

And we’ve evaluated work co-authored by university economists like those in this room.

↓ where a department’s strengths might fit

Where a department’s strengths might fit

A department strength	…maps onto Unjournal work
Behavioural & experimental	belief elicitation; the Pivotal-Questions forecasts
Environmental & climate	natural-capital valuation; our climate & animal-welfare evaluations
Health, wellbeing & development	cost-effectiveness, WELLBYs, the RCTs we prioritise
Econometrics & methods	meta-science; calibrating our ratings
Open research & reproducibility	public disciplinary judgement alongside repositories & compliance
Development	the field RCTs we prioritise
AI, technological change & labour	AI’s societal & labour-market impact — a fast-growing Unjournal priority

Capability-based, not status claims — read across to find where this department’s groups overlap with what we do. - Behavioural & experimental: belief elicitation and policy-relevant experiments map straight onto our Pivotal-Questions forecasts. - Environmental & climate: evidence-based environmental decision-making is the same demand-side logic our evaluations and PQs serve. - Open research is often the strongest, lowest-friction fit: the Unjournal isn’t another repository or compliance layer — it’s public disciplinary judgement that complements them. If there’s an open-research agenda, that’s the natural conversation. - Whoever thinks about how research counts as impact / esteem evidence (e.g. for national research-assessment exercises like the UK’s REF) is the right person to weigh whether a public evaluation can serve as that evidence — but that’s their call, not an ask. - AI & technological change is a fast-growing Unjournal priority — the speed argument, the flood of AI-generated papers, and AI’s labour-market effects.

Tailor live to whichever groups the department actually has. If this feels too generic on the day, it’s safe to skip.

Ways to engage and adopt this

Join our team or evaluator pool
Suggest research (and pivotal questions)
Use our outputs & data
Bring students in
Recognise better signals

Evaluate — paid

Staff · postdocs · advanced PhDs

Paid (~$450 avg) — for work you partly do already.

Faster and more visible than a report that vanishes into a journal.

Named or anonymous; counts as service; citable with a DOI.

A referee report you’re proud of becomes a public, citable output.

Submit or suggest research

Authors

Submit a working paper → credible public evaluations and ratings.

The journal path stays open — get feedback and a public signal before it resolves.

Or suggest others’ high-impact work — anonymously if you like.

Why request public evaluation?

A public commitment — and a signal.

“I’m willing to have this evaluated openly.”

Requesting open evaluation can itself carry information — strong-but-under-credited work has the most to gain.

Try the model live

Interactive: adjust the prior, the selection effect, and the evaluation’s informativeness. open in a new tab → (if the embed doesn’t load)

Evaluation unlocks credibility — wherever you’re from

A paper from a famous department is trusted on pedigree. Strong work from a less-prominent university can stay locked behind prestige filters — a credible public evaluation is the key that opens it: portable, structured evidence that travels independently of where the author sits.

When is requesting worth it?

Most valuable when your work is strong but under-credited — or sitting just below the bar.

If committing to open evaluation becomes a positive signal, you’ll want in early.

Less of a clear win when the work already clears the bar and it’s a sensitive career moment.

Timing concerns? Talk to us — we can embargo or schedule.

Full “model” (v. preliminary, ~Fable-generated with human feedback): unjournal-reluctance-note.netlify.app

Students & early-career researchers

See what economists, funders and practitioners actually care about — a methodological conversation that sharpens your own work.

Do real peer review, and get feedback on your evaluation from us, often from the authors.

Gain visibility within a network of funders, grantmakers and impact-minded researchers.

And potential RA / fellowship roles: evaluation, meta-research, Pivotal-Questions support.

Use our outputs

Evaluation packages & prioritisation → a vetted evidence base to build on, teach, cite, and discuss.

Pivotal Questions & workshops → framing for agendas, grants, collaborations, research-impact cases.

Public evaluations → possible research-assessment, grant, or esteem evidence (e.g. the UK’s REF).

The ratings dataset → meta-analysis, and field-experiment collaborations on the evaluation process itself.

Visibility to research users

Funders and nonprofits read these evaluations.

Some use them in grantmaking and methodology.

A route to feedback, uptake, and sometimes collaboration.

A way to put careful work in front of people who actually use evidence.

Recognise better signals

A strong public evaluation speaks to quality and usefulness directly — alongside what the venue signals, not only where it landed.

Multidimensional ratings with uncertainty, expert reports and discussion, an author response, a citable DOI.

For research leaders & managers — encouraging engagement signals a commitment to rigour, transparency and innovation.

And it opens the research-impact channel: our funder and practitioner network, including Pivotal Questions.

6 · Looking ahead

AI makes evaluation more important

A flood of plausible AI-generated papers — some correct and useful, many not.

So more need for efficient, transparent evaluation — connected to real stakeholders and impact.

AI can help: scalable code and data checks.

But the current consensus: keep a human in the loop for the final calls.

Not “does it fit a top-5 template” — but “is it true, and does it matter?”

How does AI evaluation compare to humans?

One exploratory pilot · ~45 papers

A “frontier” (Jan. 2026) LLM vs. our human ratings: only modest rank agreement (r ≈ 0.3).

Human–human agreement still exceeds human–LLM.

On written critiques, LLMs catch ~¾ of human concerns — but ~half their flags aren’t substantive.

Not yet a substitute — but an open question, and we’re exploring AI prioritisation, research reasoning, and alignment here.

Preliminary methods & results: llm-uj-research-eval.netlify.app/methods

Questions for you

What would make an evaluation count as evidence of quality — reliable, meaningful, valued?
Where would faster public evaluation be most useful?
What would make public evaluation feel safe and valuable for authors?
How could this invigorate teaching and research training?
How could it help build agendas, attract funding, and demonstrate value (e.g. research-assessment exercises like the UK’s REF)?
Which of your department’s strengths connect most naturally?

Thank you

What does open (Unjournal) evaluation provide?

Now: faster, useful feedback + a credible public signal, and useful inputs to practitioners and funders.

Soon: it starts to carry career value.

Eventually: it can replace much or all of what we ask the journal stamp to do.

Which of these would actually help your work?

David Reinstein · contact@unjournal.org · unjournal.org · unjournal.pubpub.org