PRATIK PRIYANSHU

ML Researcher

Rigorous Evaluation & Reproducible Frameworks for

Quantum ML•Scientific ML•NLP & RAG•Multimodal Evaluation

Heidelberg, Germany

About Me

I research machine learning with a focus on building rigorous, reproducible evaluation frameworks for complex scientific and real-world problems.

My work spans hybrid quantum-classical evaluation, scientific machine learning (exoplanet detection, Earth observation), and multimodal coherence evaluation. The consistent thread is methodological depth: uncertainty quantification, conformal prediction, statistical rigor, and honest reporting of null results.

I'm driven by the question of how we know what we claim to know about ML systems. I design evaluation protocols that separate genuine capability from artifacts of experimental setup.

Currently completing my M.Sc. in Applied Data Science at SRH Heidelberg (Grade: 1.6), and seeking PhD positions in machine learning where reproducibility, principled evaluation, and methodological rigour are central values.

Publications

Preprint2026

ExoVeil: Detecting Single-Transit Exoplanets Through Learned Stellar Behaviour

Pratik Priyanshu

arXiv preprint 2606.02778 — in preparation for A&A submission

A self-supervised Transformer world model that predicts stellar brightness from raw flux and treats transits as anomalies — enabling single-transit detection where phase-folding classifiers (ExoMiner, AstroNet, RAVEN) score 0%. Combines matched-filter detection, XGBoost classification, and split conformal prediction with aleatoric/epistemic uncertainty decomposition.

AUC 0.938 on Kepler DR25; 179 new transit-like signals in a blind search of 3,737 stars (46 vetted monotransit candidates)
Zero-shot cross-mission transfer: 47/47 confirmed TESS planets in the PLATO LOPS2 field recovered without retraining
First application of conformal prediction to transit detection (95.9% empirical coverage at 95% nominal)

arXiv PyPI GitHub

Published2025

Detecting 2022 Russo-Ukrainian Conflict Misinformation Using a Hybrid Transformer Approach

Pratik Priyanshu

PROMID Shared Task, FIRE 2025 (Forum for Information Retrieval Evaluation), CEUR Workshop Proceedings, Vol. 4173

Hybrid XLM-RoBERTa system with engineered linguistic features for conflict misinformation detection under extreme class imbalance (94:1 ratio) on 34K+ tweets. Ranked 4th in the PROMID shared task.

0.918 weighted F1, recall 0.94, precision 0.87
Addressed 94:1 class imbalance via class-weighted loss and decision threshold tuning
Fused XLM-RoBERTa embeddings with 15 engineered linguistic features

Paper

Under Review2026

ImageCLEF 2026 — Multimodal Reasoning: QLoRA Fine-Tuning of Qwen3-VL for Multilingual Exam Question Answering

Pratik Priyanshu

CLEF 2026 Working Notes (under review)

Fine-tuned Qwen3-VL-8B-Thinking with QLoRA on EXAMS-V for multilingual exam question answering across 6 languages and 32 subjects, targeting both textual multiple-choice and open-question answering tracks.

Ranked 1st in Textual MCQ (0.754 accuracy) on the official ImageCLEF 2026 leaderboard
Ranked 2nd in Textual OpenQA (COMET 0.529) among all participating systems
Evaluated across 6 languages and 32 subject areas from EXAMS-V

GitHub

Under Review2026

ELOQUENT 2026 — CultuRAG: Explicit Cultural Grounding for Multilingual LLMs

Pratik Priyanshu

CLEF 2026 Working Notes (under review)

Proposed CultuRAG, an explicit cultural-grounding method for multilingual LLMs that retrieves culturally-anchored context in the target language to raise cultural specificity in generated responses.

+34% cultural-specificity score (CSP) on Qwen2.5-32B via explicit cultural grounding
Native-language retrieval outperformed English-language context by an additional 11%
Evaluated on the ELOQUENT 2026 shared task at CLEF

GitHub

Under Review2026

Calibrated Multimodal Semantic Coherence Index (cMSCI)

Pratik Priyanshu

Manuscript under review

A novel geometric metric for tri-modal (text-image-audio) coherence that integrates Gramian volume geometry, contrastive calibration, and training-free Matryoshka scale-consistency estimation. Instantiated on both dual-space (CLIP+CLAP) and unified-space (Gemini Embedding 2) backbones; ensemble of uncorrelated errors outperforms either individually.

Spearman ρ = 0.785 (p < 10⁻⁶), ICC(3,k) = 0.872 on 100 human-annotated samples
Embedding-agnostic across dual-space (CLIP+CLAP, 512-d) and unified-space (Gemini Embedding 2, 3072-d)
630+ controlled experimental runs; outperformed cosine+z-norm, CCA, CLIPScore, and retrieval-rank baselines

GitHub

Featured Projects

Deep dives into systems I've built, from research to production.

Research threads

HNEP | Reproducible Benchmarking Framework for Quantum-Classical Hybrid Learning

A multi-method evaluation protocol that reveals how quantum components contribute — not just whether they help — through the Quantum Contribution Taxonomy (GENUINE / REGULARIZER / IGNORED / DEAD WEIGHT)

M.Sc. thesis project. HNEP (Hybrid Network Evaluation Protocol, v3.0) is a reproducible benchmarking framework for quantum-classical hybrid learning that combines graded surrogation, structural interventions, and convergent validity analysis across 7 model families and 4 molecular datasets. It introduces the Quantum Contribution Taxonomy — the first two-dimensional classification of quantum roles in hybrid models — and shows that single-method QML evaluations can produce systematically incomplete or contradictory conclusions.

Model Families

Datasets

Controlled Runs

v0.0.0

PyPI Release

PythonJAX/FlaxPennyLaneRDKitNumPyPyTorchFastAPI

GitHub →PyPI →Article →

ExoVeil | Detecting Single-Transit Exoplanets via Learned Stellar Behaviour

Transformer world model that predicts stellar brightness and flags transits as anomalies — detecting planets that classification-based systems structurally cannot see

ExoVeil is a self-supervised, prediction-based transit detection system for exoplanet science. It learns each star’s quiescent photometric behaviour and treats transits as departures from that baseline, enabling single-transit detection — a regime in which phase-folding classifiers (ExoMiner, AstroNet, RAVEN) score 0% by construction. Released as an open-source package (pip install exoveil) with pretrained weights and a candidate catalogue.

0.000

Kepler DR25 AUC

New Candidates

0 / 47

TESS Zero-Shot

0.0%

Conformal Coverage

PythonPyTorchTransformerXGBoostConformal PredictionMC DropoutAstropyLightkurveMAST

arXiv →PyPI →GitHub →

cMSCI | Calibrated Multimodal Coherence Evaluation

Embedding-agnostic geometric metric for tri-modal (text-image-audio) coherence — validated across dual-space (CLIP+CLAP) and unified-space (Gemini Embedding 2) backbones; manuscript under review

Proposed cMSCI (calibrated Multimodal Semantic Coherence Index), a novel geometric metric for tri-modal semantic coherence evaluation that integrates Gramian volume geometry, contrastive calibration, and uncertainty-aware adaptive weighting (including training-free Matryoshka scale-consistency estimation). Instantiated on both dual-space (CLIP + CLAP, 512-d) and unified-space (Gemini Embedding 2, 3072-d) backbones, with an ensemble of uncorrelated errors outperforming either individually.

ρ = 0.000

Human Correlation

0.000

ICC(3,k)

Controlled Runs

p < 0⁻⁶

Statistical Significance

PythonCLIPCLAPGemini Embedding 2Matryoshka EmbeddingsHugging FaceStreamlitPlotly

GitHub →

SWIM | Multi-Agent AI for Environmental Monitoring

Surface Water Intelligence & Monitoring: a multi-agent system for predicting Harmful Algal Blooms across German lakes using satellite, in-situ, and visual data

A multi-agent environmental monitoring system that predicts Harmful Algal Blooms (HABs) across German lakes by fusing satellite imagery, water quality sensors, weather data, and visual analysis through autonomous AI agents communicating via Google's Agent-to-Agent (A2A) protocol.

0.000

AUROC (Bloom Prediction)

Autonomous AI Agents

Data Modalities Fused

German Lakes Evaluated

PythonLangGraphGoogle A2AFastAPIPyTorchSentinel-2DockerRAGStreamlit

GitHub →

JuRAG | Graph-Augmented Legal Retrieval & Responsible AI

Research framework evaluating how retrieval strategy affects faithfulness, fairness, and grounding in AI-assisted legal decision support across 251k German court decisions

A research framework for building trustworthy legal AI systems, evaluated on 251,038 real German court decisions. It investigates how retrieval strategy — from dense embedding search to citation graph-augmented hybrid retrieval — affects the quality, faithfulness, and fairness of AI-assisted legal decision support.

Court Decisions

Retrieval Variants

RAI Dimensions

Evaluation Metrics

PythonLangGraphQdrantBGE-M3BM25mDeBERTa (NLI)NetworkXOllama / GroqFastAPIStreamlitDocker

GitHub →

Haftung-AI | Multi-Agent Traffic Accident Liability Analysis

9-agent system for analyzing traffic accident liability under German law (StVO) with vision perception, telemetry parsing, and RAG-augmented legal reasoning

An LLM-powered multi-agent system for analyzing traffic accident liability under German traffic law (StVO). It orchestrates nine specialized agents through LangGraph — from YOLOv8 scene perception to CAN bus telemetry parsing to RAG-augmented legal reasoning — and compares three structurally distinct pipeline variants against 30 hand-authored ground-truth scenarios.

Specialized Agents

Pipeline Variants

Test Scenarios

Accident Categories

PythonLangGraphGroq (LLaMA 3.3 70B)QdrantBGE-largeBM25YOLOv8DeepSORTFastAPIStreamlitDockerWeasyPrint

GitHub →

ARKIS | Trust-Aware Agentic RAG System

Epistemically-grounded multi-agent retrieval system with contradiction detection and adaptive hybrid retrieval

A research-grade, trust-aware Retrieval-Augmented Generation (RAG) system that integrates domain gating, hybrid retrieval, evidence clustering, contradiction detection, and confidence calibration to minimize hallucinations in high-stakes environments.

0.0

Contradiction Penalty Cap

0-Layer

Hallucination Mitigation

0.00

Avg Confidence

Ungrounded Responses

PythonLangGraphSentenceTransformers (BGE)QdrantBM25Hybrid RetrievalFastAPIRedisDockerOllama (LLaMA 3)

GitHub →

Autobahn | Autonomous Perception & ADAS Stack

Production-grade multi-sensor perception engine with ISO-26262 safety architecture and real-time latency guarantees

A modular ADAS perception and safety stack integrating camera, LiDAR, and radar fusion with interaction-aware prediction, explainable AI, safety diagnostics, and scenario validation, built to mirror German OEM architecture principles.

<0.0ms

Mean Latency/Stage

Passing Tests

0-Modal

Sensor Fusion

ADAS Scenarios

PythonPyTorchONNXONNX RuntimeOpenCVNumPyScikit-learnDeepSORT / ByteTrackMsgPack + GZipStreamlitGitHub Actions CIISO 26262

GitHub →

Latest Writing

Thoughts on ML engineering, research, and building intelligent systems.

⚛️Quantum

QuantumMLResearch

What It Really Takes to Evaluate Quantum Machine Learning

Why most QML benchmarks are misleading, and what a rigorous evaluation framework actually looks like: scaffold splits, uncertainty estimation, and trainability diagnostics.

2025-01-1012 min read

⚛️Quantum

QuantumMLArchitecture

Building a Reproducible Classical Quantum ML Platform for Molecular Prediction

End to end walkthrough of building a config-driven, reproducible framework for fair comparison of classical GNNs, variational quantum circuits, and hybrid architectures.

2025-01-0510 min read

🤖GenAI

Coming Soon

GenAIRAGArchitecture

Production RAG Systems: A Deep Dive

From naive retrieval to trust-aware, multi-agent RAG: domain gating, contradiction detection, confidence calibration, and epistemic safety in production.

2025-02-0115 min read

Skills & Technologies

Tools and technologies I work with across the ML stack.

💻Languages

Python

C++

TypeScript

SQL

🧠ML Frameworks

PyTorch

TensorFlow

JAX

Keras

scikit-learn

Hugging Face

🔬Deep Learning

Transformers

CNNs

GANs

RNNs/LSTMs

Diffusion Models

Graph Neural Nets

🤖LLM & Agents

LangChain

LangGraph

RAG

Fine-tuning

Prompt Engineering

Multi-Agent Systems

⚙️MLOps & Infra

Docker

Kubernetes

MLflow

Weights & Biases

DVC

Airflow

📊Data & Databases

PostgreSQL

MongoDB

Redis

Pinecone

ChromaDB

Pandas

☁️Cloud & Compute

AWS

GCP

CUDA

TensorRT

NVIDIA Jetson

⚛️Quantum Computing

Qiskit

PennyLane

Cirq

JAX

FLAX

Quantum ML

🛠️Tools & Practices

Git

Linux

CI/CD

FastAPI

Jupyter

VS Code

Trajectory

Each research milestone, plotted as a brightening event in the sky.

hover a star to resolve the event

2023

B.Tech, Computer Science

Thesis: deep learning for melanoma detection from dermatoscopic images.

Oct 2024

M.Sc. begins — SRH Heidelberg

Applied Data Science & Analytics. The research arc starts here.

Winter 2024–25

ARKIS

Trust-aware agentic RAG with contradiction-penalized confidence.

Early 2025

Autobahn — ADAS stack

Camera/LiDAR/radar fusion with ISO 26262 safety architecture.

Mar–Sep 2025

SWIM

5-agent system forecasting harmful algal blooms across German lakes.

Oct–Dec 2025

JuRAG

Retrieval & faithfulness evaluation over 251k German court decisions.

Dec 2025

First publication — FIRE 2025

PROMID shared task, ranked 4th. CEUR Workshop Proceedings Vol. 4173.

Jan–Mar 2026

cMSCI

Tri-modal coherence metric, ρ = 0.785 vs human judgment. Under review.

Dec 2025 – May 2026

ExoVeil — arXiv + PyPI

Single-transit exoplanet detection. 179 new candidates, 47/47 zero-shot TESS.

2026

CLEF 2026 — two shared tasks

1st in ImageCLEF Textual MCQ; CultuRAG at ELOQUENT. Working notes under review.

Mar 2026 – present

HNEP — M.Sc. thesis

Quantum Contribution Taxonomy. pip install hnep. Defence: July 2026.

·····✦ PhD

Certifications

Professional credentials validating deep learning expertise.

NVIDIA Deep Learning Institute (DLI)

2024

Verified

Fundamentals of Deep Learning

Comprehensive certification covering neural network architectures, training techniques, and deployment strategies using NVIDIA tools.

Verify Credential

NVIDIA Deep Learning Institute (DLI)

2024

Verified

Building Transformer-Based NLP Applications

Advanced certification on transformer architectures, attention mechanisms, and NLP application development with GPU-accelerated computing.

Verify Credential

Get in Touch

Interested in research collaboration, PhD supervision, or discussing evaluation methodology in ML? I'd love to hear from you.

Open to PhD positions

pratikpriyanshu12345@gmail.com

GitHub

github.com/Pratik25priyanshu20

linkedin.com/in/pratikpriyanshu