Meta’s OMol25 and UMA Models: Redefining the Future of Molecular Simulation

2025-05-24

Meta’s FAIR Chemistry group releases OMol25—a 100M+ DFT-calculated molecule dataset—and UMA, a universal neural network potential, transforming atomistic machine learning and simulation workflows.


Introduction

Atomistic machine learning is undergoing a seismic transformation. In May 2025, Meta’s FAIR Chemistry team introduced the Open Molecules 2025 (OMol25) dataset—comprising over 100 million quantum chemical calculations—alongside a family of Universal Models for Atoms (UMA).

These releases enable DFT-level molecular simulation at massive scale, ushering in a new era for drug discovery, catalysis, battery design, and materials modeling.

Background: From Data Scarcity to Molecular-Scale Intelligence

Traditional computational chemistry has always battled the trade-off between accuracy and scalability. While DFT methods provide precision, their computational cost makes them impractical for large-scale use. Conversely, classical force fields are scalable but lack generalizability and fidelity.

Datasets like ANI-1x and QM9 tried to bridge this gap, but limitations in element diversity, charged species, and theoretical consistency restricted their utility.

OMol25 overcomes these barriers, shifting the field toward general-purpose atomistic intelligence.

The OMol25 Dataset: Scale, Accuracy, and Diversity

OMol25 was computed with the ωB97M-V functional and def2-TZVPD basis set, offering consistency and precision across entries. The total compute budget exceeded 6 billion CPU hours.

Key Highlights:

  • 100M+ DFT-calculated conformers, energies, and geometries
  • High-accuracy 99,590-point integration grid
  • Uniform theoretical level for clean, transferable model training

Coverage Includes:

  • Biomolecules: Protein–ligand/nucleic acid systems with MD- and docking-sampled states
  • Electrolytes: Ionic liquids, redox clusters, solvent–gas interfaces
  • Metal Complexes: Generated via GFN2-xTB + AFIR for reactive diversity
  • Recomputed Datasets: ANI-2x, SPICE, Transition-1x, and OrbNet recalibrated at a consistent theory level

OMol25 is 10–100× larger than prior datasets and radically more diverse.

Modeling Breakthroughs: eSEN and UMA Architectures

To show OMol25’s potential, Meta released two advanced NNP architectures:

UMA + eSEN

1. eSEN (Equivariant Spherical-harmonic Embedding Network)

  • Combines transformer design with rotationally equivariant encodings
  • Trained in two phases: direct-force + conservative-force fine-tuning
  • Released in small (sm), medium (md), and large (lg) model variants
  • eSEN-sm-conserving is publicly available and stable for MD tasks

2. UMA (Universal Models for Atoms)

  • Expands on eSEN using Mixture of Linear Experts (MoLE)
  • Trained across OMol25, OC20, ODAC23, OMat24, and more
  • Offers multi-domain generalization with low inference cost
  • Designed as a GPT-equivalent for atomistic modeling

Benchmarking Performance: Accuracy at DFT Level

Benchmark

Tested across rigorous benchmarks:

  • Wiggle150: Conformer energy ranking
  • GMTKN55 (filtered): Organic molecule stability/reactivity
  • Transition State Barriers for catalytic reactions
  • Spin-State Energetics in metal complexes

Benchmark

Results:

  • MAE < 1 kcal/mol for total and conformer energies
  • Accurately models Pd-mediated reactions and spin-state ordering
  • Performance aligns with r2SCAN-3c DFT benchmarks

Applications in Scientific and Industrial Domains

Drug Discovery

  • Models ligand strain, tautomers, and protonation states
  • Enables rapid conformer screening and fragment design
  • Supports DFT-accuracy simulations for medicinal chemistry

Catalysis

  • Models metal-centered reactivity, spin states, and redox mechanisms
  • Shrinks multi-day DFT workflows into minutes

Battery & Electrolyte Design

  • Captures solvation, decomposition, and ionic cluster behavior
  • Supports electrolyte design for energy storage applications

Molecular Dynamics

  • Serves as a surrogate force field for small to medium-sized systems
  • Allows energy landscape modeling at interactive time scales

Known Challenges and Limitations

Despite its promise, OMol25 and UMA models still face:

  • No explicit charge or spin modeling (performance drops on open-shell systems)
  • No solvent models included
  • Long-range interactions truncated (~6–12 Å cutoff)
  • No uncertainty quantification, limiting use in risk-sensitive domains

These challenges create opportunities for hybrid modeling and physics-aware NNPs.

Future Outlook: Towards Universal Molecular Models

OMol25 will shift the community’s focus toward:

  • Fine-tuning and distillation for downstream tasks
  • Hybrid physics–ML models with higher generalizability
  • Uncertainty-aware NNPs for predictive confidence
  • Solvent-inclusive models for real-world chemistry

Like ImageNet for computer vision, OMol25 is positioned to foundationally influence molecular ML tools across industries.

Conclusion

Meta’s release of OMol25 and UMA signifies a leap forward in applying AI to molecular science. These tools make DFT-precision simulations routine, moving the challenge from computation to creativity.

Whether you're in drug discovery, catalysis, or materials R&D—OMol25 offers a powerful new foundation for faster, more informed molecular design.

Explore what OMol25 can unlock in your simulation pipeline.

SUBSCRIBE TO OUR NEWSLETTER