Skip to main content

4 posts tagged with "neurosynth"

View All Tags

· 9 min read
Alejandro de la Vega

We’re excited to share the release of a significant new innovation in Neurosynth Compose: AI-Assisted Curation. This new feature aims to simplify and accelerate custom neuroimaging meta-analyses, making it easier than ever to curate a set of studies for inclusion into a quantitative meta-analysis.

As Neurosynth Compose users well know, even with the help of a streamlined user interface, manually reviewing studies for inclusion into a meta-analysis is very time-consuming. In fact, a single systematic review can take hundreds of hours, which severely limits the kinds of research questions we can explore. For this very reason, the original Neurosynth platform, which used text-mining to group neuroimaging studies by keywords or topics, was a big step forward. However, the low hanging fruit from these methods have largely been picked. Additionally, crucial details about how a study was done or who participated (like sample size, age, patient group, and experimental specifics) are hard to extract automatically using simple methods such as text frequency or heuristics because they're often described inconsistently or not mentioned frequently in the text.

Here, we aimed to leverage innovations in Zero-Shot Learning using Large Language Models (LLMs) and pair this with our platform for custom meta-analysis (Neurosynth Compose) to make precise, automated meta-analysis of neuroimaging literature a reality.

Large Language Models (LLMs) for Neuroscientific Information Extraction

At the heart of this effort is the recent ability for LLMs to understand language with little specialized training. Historically, developing AI models for scientific information extraction was difficult due to the large number of annotated examples required for training. For low-annotation fields like neuroimaging, that largely meant that state-of-the-art biomedical NLP models were out of reach.

However, recent advancements in LLM transfer learning have made it possible to automatically extract information from articles, even in areas where there are very few existing human-labeled examples. Newer LLMs that are trained on vast amounts of general text can be prompted to learn new information with no training data. This approach, called "zero-shot learning", means SOTA LLMs can extract information even if they haven't seen that exact type of task before.

Here, we use these models to extract specific details directly from the full text of over 30,000 neuroimaging studies indexed in the NeuroStore database. By carefully guiding the AI to focus on verifiable facts within the paper, we can reduce the chance of hallucinations, and allow us to verify how accurately key details are extracted. Using this information, we can build a large, structured collection of neuroscientific facts—including participant demographics, study designs, task information, and more—which is then seamlessly presented to you during the curation stage.

Figure 1 Figure 1. High-level overview of Zero Shot Information Extraction using LLM prompting

A Revamped Curation Experience

This information is presented in two key places within the revamped curation user interface: a concise table view, allowing for easy comparison across papers, and more detailed individual study pages where you can delve deeper into the extracted specifics for a single paper. This dramatically facilitates systematic literature review and helps you efficiently screen studies for eligibility into your meta-analysis research question, all within a fully web-based, PRISMA-compliant workflow (Note that for the PRISMA-workflow AI features are available in the Screening and Eligibility phases).

Figure 2 Figure 2. Table view showing AI-extracted information (Task Name, Group names, Diagnosis), across three studies

By clicking on a row in the table view, you can see study-level meta-data and extracted features in more detail:

Figure 3 Figure 3. Detailed study-evel AI-extracted information, showing Participant Demographics.

Iterative Approach to Validation and Development

Our approach to information extraction is specifically focused on study details relevant for neuroimaging meta-analysis. We have developed specific extraction schemas that capture the nuanced details crucial for meta-analysis in this field. For each set of guidelines, a sample of studies is manually reviewed and tagged, and the automated extractions are checked for accuracy against these manual tags, both by numbers and by human review. This thorough process makes sure that when new extraction features are introduced to the platform, a reasonable level of accuracy can be established. In contrast with domain-general automated literature review platform and deep review platforms (e.g Elict, Perplexity, Google Notebook LM), the specific extraction schemas have been validated and aligned with expert-guided knowledge representations.

Figure 4

Figure 4. Iterative annotation and prompt development workflow.

This effort is an ongoing process that is open to community feedback. The goal is to continuously refine and check our extraction guidelines for neuroimaging-specific study information that help researchers find and screen studies for inclusion into meta-analysis. This extracted data can then be used with the rest of our existing comprehensive meta-analysis ecosystem (i.e. Neurosynth Compose & NiMARE), to perform detailed meta-analyses.

All of the studies annotated for Neurosynth Compose are sourced from PubMed Central using pubget and annotated using labelbuddy— a set of tools our group recently introduced by our group for literature mining (Dockes et al., 2024). All of the annotations we have generated are openly accessible under the labelbuddy annotations GitHub repository

The extraction pipelines that are validated and iteratively developed using these annotations (and put into production on Neurosynth Compose), are also openly available.

Initial Extraction Schemas

At launch, we have extracted two schemas across the articles indexed by NeuroStore: participant demographics and experimental details. To begin, these schemas were extracted from the full text of articles using GPT-4— a model whos performance has already been established in previous internal validations.

Participant Demographics

Participant demographics were extracted for each experimental group in the study. LLM models were instructed to focus on values that were explicitly mentioned in the text

FieldDescription
countTotal participants in the group (exclude dropouts).
diagnosisExact clinical/medical diagnosis, including subtypes and comorbidities.
group_nameGroup type: "healthy" (controls) or "patients" (clinical).
subgroup_nameVerbatim group name, if provided.
male_countNumber of males, if explicitly reported.
female_countNumber of females, if explicitly reported.
age_meanMean age, if stated directly in the text.
age_rangeAge range as stated (e.g., "18-65"); use dash format.
age_minimumLowest age reported or lower bound of range.
age_maximumHighest age reported or upper bound of range.
age_medianMedian age, only if explicitly provided.

Preliminary Validation.

  • We annotated over 220 articles for participant demographics.
  • We observed a high level of accuracy for most fields, notably for participant count (\<0.15 Mean Percentage Error).
  • In our annotated sample, we identified 100 individual participant groups with diagnosis labels (e.g. “schizophrenia”). Using BERTScore to quantitatively compare the extracted and annotated diagnoses, the best performing models achieved >0.8 F1-score, indicating moderate to high accuracy. (higher scores are better).
  • Qualitative analysis confirmed that LLMs are increasingly adept at capturing specific diagnostic information (e.g., "Autism Spectrum Disorder", "phobic prone", "eating disorders prone") and associating it correctly with relevant demographic data, even if the specific form differed from the manual annotation.

Experimental Details

The goal of this schema was to extract key details of the overall study, and the individual fMRI Tasks that were used.

For the overall study, the following was extracted:

FieldDescription
ModalityImaging modalities used (e.g., "fMRI-BOLD", "MEG", "PET").
StudyObjectiveBrief summary of the study’s main research question or goal.

For each fMRI task presented within the study, the following was extracted:

FieldDescription
TaskNameExact task name as stated in the text; if not named, provide brief description.
TaskDescription1–2 sentence summary of instructions, stimuli, measures, and objectives.
DesignDetailsDetailed design: type, timing, structure, presentation, response methods.
ConditionsAll experimental and control conditions mentioned.
TaskMetricsAll measured outcomes: behavioral, neural, and subjective.
ConceptsSpecific mental/cognitive concepts explicitly mentioned.
DomainPrimary cognitive domains engaged, if stated.
RestingStateTrue only if described explicitly as a resting-state scan.
RestingStateMetadataRest-specific details: duration, instructions, eyes open/closed, etc.
TaskDesignTask design type(s): Blocked, EventRelated, Mixed, or Other.
TaskDurationTotal task duration (e.g., "10 minutes" or "600 seconds").

Preliminary Validation

We annotated 104 papers to validate study/task information, with the majority of these papers sourced from the NeuroVault collection.

  • Modality & RestingState: Modality and Resting State fields demonstrated very high accuracy. For instance, with 94% accuracy for these fields using GPT 4.
  • TaskName and TaskDescription Accuracy: TaskName is accurate for studies with a clearly defined task name (64/104 of studies), with a score of 0.9 (1-Levenshtein distance). For studies without a clearly defined task name, qualitative review of examples showed that the models often provided a coherent and plausible description of the task based on the provided context, even if it wasn't a direct match to a predefined label.

This preliminary validation is just a first step. Stay tuned for a more comprehensive evaluation of AI-extracted neuroimaging features!

Try AI-Assisted Curation now!

These new features are available for you to try out now on compose.neurosynth.org.

Remember, this is an ongoing, iterative effort, and we have many more features planned on the horizon to increase the level of accuracy and transparency of these AI-extracted features. Feel free to suggest features that would be useful for us to extract.

We're always looking for ways to improve Neurosynth Compose, and your feedback is incredibly valuable! If you have thoughts, questions, or suggestions about AI-Assisted Curation or any other feature, please don't hesitate to reach out. You can also engage with us and the broader community on NeuroStars, our discussion forum.

· 2 min read
Alejandro de la Vega

Hello Neurosynth Users,

2023 was a very exciting year for Neurosynth, having launched our Compose platform to the public and announced it on social media. In the December we’ve saw over 500 new user visits, with 200 users signing up for an account! 🚀

Help us keep this growth going by sharing our announcement with your colleagues. 🧑‍🔬

🌟 What’s New 🌟

We’ve also continued to introduce new features and improve the user experience. Here’s some highlights:

Large-scale association tests

A key feature that set Neurosynth aside were large-scale association maps (previously known as “reverse inference”).

Whereas a typical meta-analysis tells you if activity is consistently reported in a target set of studies, an association test tells you if activation occurs more consistently in this set of studies versus a large and diverse reference sample.

That's important, because this allows you to control for base rate differences between regions. Certain regions, such as the insula or lateral PFC for instance, play a very broad role in cognition, and hence are consistently activated for many different tasks and cognitive states. Using MKDA Chi-Squared, you can test if brain activity in a region (such as the insula) is specifically associated with the studies in your meta-analysis.

Previously association tests were available for the automatically generated maps on neurosynth.org. Now you can perform large-scale association tests for your custom meta-analyses in Neurosynth Compose.

We have created a full primer and tutorial on MKDA Chi-Squared, including an example from a recent meta-analysis on social processing. Check it out!

MKDA Chi-Squared Tutorial 🧑‍🎓

UX Enhancements ✨

Based on your valuable feedback, we've made numerous bug fixes and improvements:

  • Simplified Curation: The review import page has been removed, and summary information is now added directly to the tag step.

  • Searching UI: We've replaced the dropdown with a selection gallery, making it easier to choose your preferred search method, and we now auto-generate search import names. In addition, resolving duplicates is skipped if none are present.

  • Improved Editing Workflow: The editing interface has been improved, streamlining the extraction process.

  • Various UX Improvements and Fixes: We fixed many papercuts, especially in the Extraction phase.

We hope you enjoy these changes.

Email us any feedback, or ask a question on NeuroStars if you have issues.

Cheers,

The Neurosynth Team 🧠

· 2 min read
Alejandro de la Vega

Dear Neurosynth Community,

I'm excited to announce important updates to Neurosynth Compose: A free and open platform for neuroimaging meta-analysis.

First, we have added some easy to follow tutorials to our documentation, making it easy to become familiar with our platform.

The tutorials cover two main uses cases we support: Manual and Automated Meta-analyses. Our platform make gold-standard manual meta-analyses much easier, by leveraging pre-extracted imaging data and streamline user interfaces. Automated meta-analyses are ideal for generating exploratory results rapidly, enabling meta-analysis as part of routine scientific practice.

We've also made many small but important updates to our platform, including significant performance updates and improvements to the user interface. Neurosynth Compose is now more intuitive and easier to use. Give it a try by following our manual meta-analysis tutorial.

We also have some exciting new features in the pipeline that we'll release in early 2024 including:

  • Image-based Meta-Analysis (IBMA). Soon, you will be able to use NeuroVault data as inputs for IBMA-- a more powerful and sensitive alternative to Coordinate Based Meta-Analysis.
  • Advanced data extraction using Large Language Models (GPT). Early protypes to extract detailed information (such as participant demographics) from neuroimaging articles using LLMs have shown promise. We are working on incorporating these workflows into Neurosynth Compose, making it even easier to identify relevant studies for meta-analysis.

We look forward to your feedback!

-Alejandro

· 3 min read
Alejandro de la Vega

Dear Neurosynth Community,

My name is Alejandro, and I am the current project leader of the Neurosynth project.

I am very excited to announce to you that the Neurosynth project lives on, and we are officially announcing the (beta) release of the latest member of the ecosystem: Neurosynth Compose.

Neurosynth Compose enables users to easily perform custom neuroimaging meta-analyses using a web-based platform, with no programming experience required. This project addresses one of the most commonly request features, which is the ability to customize large-scale meta-analyses using you own expert knowledge.

Neurosynth Compose is free to use and helps users:

  • 🔎 Search across over 20,000 studies in the Neurosynth database, or import from external databses such as PubMed.
  • 🗃️ Curate your StudySet using systematic review tools conforming to the PRISMA guidelines.
  • 📝 Extract coordinates and metadata for each study, leveraging thousands of pre-extracted studies to minimize effort.
  • 📊 Analyze by specifying a reproducible NiMARE workflow, and execute it locally or in the cloud.
  • 🔗 Share with the community with complete provenance and reproducibility.

The goal of Neurosynth Compose is to enable researchers to go beyond the finite set of automatically generated meta-analyses from the original platform and overcome limitations from automated coordinate and semantic extraction. The end result is a gold standard meta-analysis, in much less time than a manual workflow, and with much greater reproducible.

Currently, Neurosynth Compose is in beta, and under active development. We welcome feedback to ensure our platform meets the needs of the community. Please leave us feedback using the button on the bottom right corner of the screen!

We are working on several upcoming features that will make the platform even better. Many of these features are already available in our Python meta-analysis library, NiMARE, and we are actively working on the user facing online interfaces.

  • Image-based Meta-Analysis (IBMA). We have developed algorithms in NiMARE for using whole-brain statistical maps as inputs to meta-analysis. This is more powerful and sensitive technique compared to Coordinate-base Meta-Analysis. Soon, you will be able to use NeuroVault data as inputs for your meta-analyses.
  • MKDA Chi-squared / Association test. A hallmark feature of Neurosynth is the ability to relate meta-analytic findings to the rest of the literature, to determine the strength and specificity of an association (this was previously called "reverse inference"). This will soon be possible on your custom meta-analyses.
  • A wide range of improvements to the user experience. We are in the process of re-working parts of the online interface to decrease friction when creating a StudySet, making study utilization, and editing more intuitive.

I would like to thank everyone involved in this highly-collaborative project, but especially James Kent, a postdoctoral fellow, and Nick Lee, a software engineer, who did the lion's share of the work.

We are excited for you to try it and let us know what you think.

-Alejandro