Research Data Scientist

Fundamental
Fundamental

Data Science

Barcelona, Spain

Posted on Apr 13, 2026

About Fundamental

Fundamental is an AI company pioneering the future of enterprise decision-making. Founded by DeepMind alumni, Fundamental has developed NEXUS – the world's most powerful Large Tabular Model (LTM) – purpose-built for the structured records that actually drive enterprise decisions. Backed by world class investors and trusted by Fortune 100 companies, Fundamental unlocks trillions of dollars of value by giving businesses the Power to Predict.

At Fundamental, you'll work on unprecedented technical challenges in foundation model development and build technology that transforms how the world's largest companies make decisions. This is your opportunity to be part of a category-defining company from the ground-up. Join the team defining the future of enterprise AI.

Key responsibilities

As part of the Research team, you will contribute to the development of breakthrough machine learning models by working on one of the most important frontiers in model training and evaluation: high-quality real and synthetic data.

This role is especially focused on synthetic data generation, Structural Causal Models (SCMs), and realistic simulation-based data sources. You will help us design, evaluate, and scale datasets that capture the structure, dependencies, and edge cases needed to train foundation models for enterprise tabular data.

The main responsibilities of this role are:

  • Identifying, characterizing, and evaluating high-value data sources for training and evaluating ML models, including real-world data, synthetic data, SCM-generated data, and physical or systems-based simulator outputs

  • Designing and analysing synthetic data generation approaches based on Structural Causal Models, probabilistic models, simulators, and other mechanisms that capture realistic relationships between variables

  • Working with researchers to define what makes a synthetic dataset useful, realistic, diverse, causally meaningful, and appropriate for model training or evaluation

  • Building tools and workflows to generate, validate, benchmark, and iterate on synthetic datasets at scale

  • Developing metrics and evaluation procedures for synthetic data quality

  • Transforming structured, unstructured, simulated, and causally generated data into formats suitable for training and evaluating large-scale ML models

  • Collaborating with the research team to maintain a reliable, efficient training pipeline where data quality, data diversity, and synthetic data generation are critical components

  • Collaborating with the wider engineering and infrastructure team to ensure data generation and processing workflows are scalable, reproducible, and robust

Must have

Experience with:

  • Synthetic data generation for machine learning, especially for structured or tabular data

  • Structural Causal Models, causal graphs, causal inference, probabilistic modelling, or simulation-based data generation

  • Identifying and evaluating high-quality data sources to train and evaluate ML models, including both real-world and realistic synthetic data sources

  • Bringing data from structured and unstructured sources, simulators, causal models, or generative processes into formats accessible by ML models

  • Designing quantitative analyses to assess data quality, realism, diversity, bias, coverage, and downstream model performance

Strong fundamentals in:

  • Statistics, probability, and applied machine learning

  • Data science workflows, including exploratory analysis, feature understanding, validation, and experimental design

  • Software engineering for research-grade and production-grade data workflows

Strong knowledge of:

  • Python data processing and scientific computing stack, including numpy, pandas, scipy, scikit-learn, or similar tools

Familiarity with:

  • Causal modelling, graphical models, probabilistic programming, agent-based simulation, discrete-event simulation, or physical / systems-based simulators

  • Data storage and data versioning solutions

  • Classical machine learning and deep learning methods, especially outside of purely LLM-based workflows

Nice to have

  • Contributions to open source ML, causal inference, synthetic data, simulation, or data science projects

  • BSc, MSc, or PhD in computer science, machine learning, statistics, mathematics, physics, engineering, economics, or another quantitative field

  • Experience working with tabular data, predictive analytics, or enterprise decision-making systems

  • Experience building or evaluating synthetic datasets for model training

  • Experience with SCM libraries, probabilistic programming frameworks, simulation environments, or custom data generation pipelines

Benefits

  • Competitive compensation with salary and equity

  • Comprehensive health coverage for you and your dependents

  • Paid parental leave for all new parents, inclusive of adoptive and surrogate journeys

  • Relocation support for employees moving to join the team in one of our office locations

  • A mission-driven, low-ego culture that values diversity of thought, ownership, and bias toward action