research-router-skill · Data Science & AI

Research Router

Use this skill as the default orchestration layer for broad life-sciences research requests.

Do not use it for narrow single-source lookups when a more specific skill already matches the request cleanly.

Turn an open-ended research question into a small, defensible retrieval plan:

The router owns the framing and the final synthesis. It should not dump raw source payloads unless the user explicitly asks for them.

Use this skill when any of the following are true:

the user asks a broad question such as what is known about ...
the question could require more than one evidence type
the right source is unclear at the start
the request mixes entities, for example gene plus disease, variant plus phenotype, protein plus ligand, or pathway plus dataset
the user wants a synthesized answer rather than a single database lookup

Start by classifying the request into one or more lanes:

Prefer 1 to 3 lanes. Only expand further if the user explicitly asks for a broad landscape review.

Normalize the key entities before deep retrieval.

Common patterns:

gene or protein: ncbi-clinicaltables-skill, ensembl-skill, uniprot-skill
disease or phenotype: efo-ontology-skill, opentargets-skill
variant: clinvar-variation-skill, ensembl-skill, cohort-specific PheWAS skills
compound or metabolite: chembl-skill, pubchem-pug-skill, chebi-skill, hmdb-skill
pathway or function: reactome-skill, quickgo-skill, string-skill
accession or dataset identifier: ncbi-datasets-skill, biostudies-arrayexpress-skill, pride-skill, metabolights-skill

Do not start broad evidence collection until the important entities are stable enough to route correctly.

Choose the smallest set of skills that can answer the question well.

Examples:

target or disease evidence review: opentargets-skill, gwas-catalog-skill, gtex-eqtl-skill, human-protein-atlas-skill
variant interpretation: clinvar-variation-skill, gnomad-graphql-skill, ensembl-skill, one or more cohort PheWAS skills
locus-to-gene mapping: locus-to-gene-mapper-skill, or its component genetics skills when the user wants a custom workflow
structure and mechanism: alphafold-skill, rcsb-pdb-skill, uniprot-skill, reactome-skill
chemistry and pharmacology: chembl-skill, bindingdb-skill, pubchem-pug-skill, pharmgkb-skill
clinical and translational: clinicaltrials-skill, cbioportal-skill, civic-skill
literature and dataset discovery: ncbi-entrez-skill, ncbi-pmc-skill, biorxiv-skill, biostudies-arrayexpress-skill, ncbi-datasets-skill

Prefer direct lookups before expensive multi-step chains.

If Codex subagents are available, use them only when the work cleanly decomposes into independent lanes.

Good candidates for subagents:

genetics, expression, structure, chemistry, and clinical evidence can be gathered independently for the same question
multiple loci, variants, genes, compounds, or datasets need parallel comparison
a broad landscape review requires separate evidence summaries before synthesis

Keep these steps with the coordinating agent:

Avoid subagents when:

When delegating, give each subagent a bounded read-only objective such as one evidence family or one comparison unit. Each subagent should return:

The coordinating agent is responsible for reconciling overlaps, contradictions, and evidence gaps.

Return a concise answer structured around the user's question, not around the tools.

Unless the user asks for a different format, include:

If the task is exploratory, explicitly distinguish:

prefer concise source-backed synthesis over large raw dumps
escalate to multi-skill workflows only when the question requires synthesis
state important cohort, ancestry, assay, tissue, and study-design limitations
do not overstate causality from association-only evidence
if a downstream skill can answer the request directly, hand off to it instead of keeping the router in the foreground