Configuration Guide
Autonima uses a YAML config file for the pipeline. The canonical example lives at examples/sample_config.yml and is also emitted by autonima create-sample-config.
Minimal Working Config
This is the smallest practical config that passes current validation:
search:
database: "pubmed"
query: "schizophrenia AND working memory AND fMRI"
max_results: 100
retrieval:
sources:
- pubget
load_excluded: false
screening:
abstract:
model: "gpt-5-mini-2025-08-07"
objective: "Identify fMRI studies of working memory in schizophrenia"
inclusion_criteria:
- Human participants
- fMRI neuroimaging
fulltext:
model: "gpt-5-mini-2025-08-07"
objective: "Identify fMRI studies of working memory in schizophrenia"
inclusion_criteria:
- Human participants
- fMRI neuroimaging
parsing:
parse_coordinates: false
coordinate_model: "gpt-4o-mini"
output:
directory: "results"
annotation:
enabled: false
Validation Rules to Know
search.querymust be non-empty unless you providepmids_fileorpmids_list.screening.abstractmust define anobjectiveandinclusion_criteria, or setskip_stage: true.screening.fulltextmust define anobjectiveandinclusion_criteria, or setskip_stage: true.output.directorymust be non-empty.- In
retrieval.full_text_sources,coordinates_path_templatesandprocessed_data_pathare mutually exclusive for a given source.
Top-Level Sections
search
Purpose: define the search source and the studies to start from.
Common fields:
databaseType: string Values:pubmed,pmcDefault:pubmedqueryType: string Required when not using PMIDs inputmax_resultsType: integer Must be positivedate_from,date_toType: string,YYYY/MM/DDemailType: string Recommended for NCBI API usagepmids_fileType: path stringpmids_listType: list of PMID strings
Use pmids_file or pmids_list when you already know the study IDs and want to skip query-based discovery.
retrieval
Purpose: control full-text fetching and external source mapping.
Common fields:
sourcesType: list of strings Example:["pubget"]load_excludedType: boolean Iftrue, retrieval also includes studies excluded at abstract screeningfull_text_sourcesType: list of source mappings Used when you already have a local full-text corpus or custom layout
retrieval.full_text_sources
Each source can map PMIDs to local text files using one of three modes:
pmid_source: "file_name"pmid_source: "folder_name"pmid_source: "json"
Common fields:
root_path: root directory for that sourcetext_path_templates: preferred relative text-file paths for folder/json modesallowed_extensions: valid file extensions forfile_namemodejson_filename: metadata filename forjsonmodejson_pmid_key: key holding the PMID in the JSON metadataprocessed_data_path: path to pubget-like processed coordinate/table CSVscoordinates_path_templates: direct coordinate-file templates when not using processed CSVs
Important interaction:
- Use either
processed_data_pathorcoordinates_path_templatesfor a source, not both.
screening.abstract
Purpose: define the first screening pass over abstracts.
Common fields:
modelType: stringobjectiveType: string Required unlessskip_stage: trueinclusion_criteriaType: list of strings Required when the stage is enabledexclusion_criteriaType: list of stringsconfidence_reportingType: boolean Default:falsethresholdType: float between0.0and1.0Only meaningful withconfidence_reporting: trueadditional_instructionsType: stringskip_stageType: boolean
screening.fulltext
Purpose: define the second screening pass over retrieved full text.
Current behavior:
- This stage does not currently inherit required fields from
screening.abstract. - If it is enabled, it must define its own
objectiveandinclusion_criteria. - If you do not want to run full-text screening, set
skip_stage: true.
Fields are the same shape as screening.abstract.
parsing
Purpose: control coordinate parsing.
Fields:
parse_coordinatesType: booleancoordinate_modelType: string
If parse_coordinates is false, later coordinate-dependent outputs will be limited.
output
Purpose: define output behavior.
Fields:
directoryType: string Required by config validationformatsType: list of stringsnimadsType: booleanexport_excluded_studiesType: boolean
CLI note:
- The CLI may override
output.directoryat runtime. autonima run config.yamldefaults the runtime output path to a folder derived from the config filename.- The YAML field should still be present because config validation requires it.
annotation
Purpose: label parsed analyses after screening and parsing.
Common fields:
enabledType: booleanmodelType: stringprompt_typeType: string Values:single_analysis,multi_analysiscreate_all_included_annotationType: booleancreate_all_from_search_annotationType: booleanmetadata_fieldsType: list of stringsinclusion_criteria,exclusion_criteriaType: list of stringsannotationsType: list of custom annotation definitions
Custom annotation item fields:
namedescriptioninclusion_criteriaexclusion_criteria
Annotated Full Example
Start from:
examples/sample_config.yml
That file is kept aligned with:
autonima create-sample-config
Use it as the authoritative starting point rather than composing a config from scratch.