Skip to content

PRATIK PRIYANSHU

ML Researcher

Rigorous Evaluation & Reproducible Frameworks for

Quantum MLTrustworthy AINLP & RAGMultimodal Evaluation

Heidelberg, Germany

Pratik Priyanshu, Ghibli style avatar

About Me

I research machine learning with a focus on building rigorous, reproducible evaluation frameworks for complex scientific and real-world problems.

My work spans hybrid quantum-classical evaluation, trustworthy legal AI, and multimodal coherence evaluation — but the consistent thread is methodological depth: ablation studies, uncertainty quantification, statistical rigor, and honest reporting of null results.

I'm driven by the question of how we know what we claim to know about ML systems — designing evaluation protocols that separate genuine capability from artifacts of experimental setup.

Currently completing my M.Sc. in Applied Data Science at SRH Heidelberg (Grade: 1.8), and seeking PhD positions in machine learning where reproducibility, principled evaluation, and methodological rigour are central values.

Publications

Published2025

Detecting 2022 Russo-Ukrainian Conflict Misinformation Using a Hybrid Transformer Approach

Pratik Priyanshu

PROMID Shared Task, FIRE 2025 (Forum for Information Retrieval Evaluation), CEUR Workshop Proceedings, Vol. 4173

Hybrid XLM-RoBERTa system with engineered linguistic features for conflict misinformation detection under extreme class imbalance (94:1 ratio) on 34K+ tweets.

  • 0.918 weighted F1, recall 0.94, precision 0.87
  • Addressed 94:1 class imbalance via class-weighted loss and decision threshold tuning
  • Fused XLM-RoBERTa embeddings with 15 engineered linguistic features
Under Review2026

Calibrated Multimodal Semantic Coherence Index (cMSCI)

Pratik Priyanshu

Manuscript under review

Calibrated uncertainty-aware evaluation metric for tri-modal (text-image-audio) coherence, extending GRAM (ICLR 2025) to heterogeneous embedding spaces with probabilistic scoring via ProbVLM and contrastive calibration.

  • Statistically significant human correlation (ρ = 0.379, p = 0.039, ICC = 0.70)
  • 270 controlled experimental runs with effect sizes d > 2.2
  • Outperformed CLIPScore, BLIPScore, and CCA baselines

Featured Projects

Deep dives into systems I've built, from research to production.

Q

Quantum ML | Hybrid Classical Quantum Architectures

Reproducible experimental framework for evaluating classical, quantum, and hybrid architectures on molecular property prediction

A research-grade framework for fair, controlled comparison of classical graph neural networks, variational quantum circuits, and hybrid classical–quantum architectures for molecular property prediction. Rather than assuming quantum advantage, the goal is to isolate architectural effects under consistent data preprocessing, batching, training, and evaluation protocols.

8-qubit
Variational Circuit
Scaffold
Evaluation
Gated
Fusion Architecture
Reproducible
Framework
PythonJAXPyTorchPennyLaneQiskitRDKitFastAPINumPy
J

JuRAG | Graph-Augmented Legal Retrieval & Responsible AI

Research framework evaluating how retrieval strategy affects faithfulness, fairness, and grounding in AI-assisted legal decision support across 251k German court decisions

A research framework for building trustworthy legal AI systems, evaluated on 251,038 real German court decisions. It investigates how retrieval strategy — from dense embedding search to citation graph-augmented hybrid retrieval — affects the quality, faithfulness, and fairness of AI-assisted legal decision support.

251,038
Court Decisions
3
Retrieval Variants
6
RAI Dimensions
14
Evaluation Metrics
PythonLangGraphQdrantBGE-M3BM25mDeBERTa (NLI)NetworkXOllama / GroqFastAPIStreamlitDocker
c

cMSCI | Calibrated Multimodal Coherence Evaluation

Novel uncertainty-aware evaluation metric for tri-modal (text-image-audio) coherence, extending GRAM to heterogeneous embedding spaces — manuscript under review

Proposed cMSCI (calibrated Multimodal Semantic Coherence Index), a geometrically grounded, uncertainty-aware evaluation metric for tri-modal coherence. Extends GRAM (ICLR 2025) to heterogeneous embedding spaces (CLIP + CLAP) with probabilistic scoring via ProbVLM and contrastive calibration against domain-specific negative banks.

d > 2.2
Effect Size (RQ1)
ρ = 0.379
Human Correlation
270
Controlled Runs
p < 10⁻¹³
Statistical Significance
PythonCLIPCLAPProbVLMStable DiffusionAudioLDMHugging FaceStreamlitPlotly
S

SWIM | Multi-Agent AI for Environmental Monitoring

Surface Water Intelligence & Monitoring: a multi-agent system for predicting Harmful Algal Blooms across German lakes using satellite, in-situ, and visual data

A multi-agent environmental monitoring system that predicts Harmful Algal Blooms (HABs) across German lakes by fusing satellite imagery, water quality sensors, weather data, and visual analysis through autonomous AI agents communicating via Google's Agent-to-Agent (A2A) protocol.

0.814
AUROC (Bloom Prediction)
5
Autonomous AI Agents
14,000+
Lines of Python
112+
Unit Tests
PythonLangGraphGoogle A2AFastAPIPyTorchSentinel-2DockerRAGStreamlit
H

Haftung-AI | Multi-Agent Traffic Accident Liability Analysis

9-agent system for analyzing traffic accident liability under German law (StVO) with vision perception, telemetry parsing, and RAG-augmented legal reasoning

An LLM-powered multi-agent system for analyzing traffic accident liability under German traffic law (StVO). It orchestrates nine specialized agents through LangGraph — from YOLOv8 scene perception to CAN bus telemetry parsing to RAG-augmented legal reasoning — and compares three structurally distinct pipeline variants against 30 hand-authored ground-truth scenarios.

9
Specialized Agents
3
Pipeline Variants
30
Test Scenarios
244
Passing Tests
PythonLangGraphGroq (LLaMA 3.3 70B)QdrantBGE-largeBM25YOLOv8DeepSORTFastAPIStreamlitDockerWeasyPrint
A

ARKIS | Trust-Aware Agentic RAG System

Epistemically-grounded multi-agent retrieval system with contradiction detection and adaptive hybrid retrieval

A research-grade, trust-aware Retrieval-Augmented Generation (RAG) system that integrates domain gating, hybrid retrieval, evidence clustering, contradiction detection, and confidence calibration to minimize hallucinations in high-stakes environments.

29+
Unit Tests (Pillar 2)
4-Layer
Hallucination Mitigation
0.78
Avg Confidence
0%
Ungrounded Responses
PythonLangGraphSentenceTransformers (BGE)QdrantBM25Hybrid RetrievalFastAPIRedisDockerOllama (LLaMA 3)
A

Autobahn | Autonomous Perception & ADAS Stack

Production-grade multi-sensor perception engine with ISO-26262 safety architecture and real-time latency guarantees

A modular ADAS perception and safety stack integrating camera, LiDAR, and radar fusion with interaction-aware prediction, explainable AI, safety diagnostics, and scenario validation, built to mirror German OEM architecture principles.

<0.5ms
Mean Latency/Stage
179
Passing Tests
3-Modal
Sensor Fusion
20+
ADAS Scenarios
PythonPyTorchONNXONNX RuntimeOpenCVNumPyScikit-learnDeepSORT / ByteTrackMsgPack + GZipStreamlitGitHub Actions CIISO 26262

Skills & Technologies

Tools and technologies I work with across the ML stack.

💻Languages

Python
C++
TypeScript
SQL

🧠ML Frameworks

PyTorch
TensorFlow
JAX
Keras
scikit-learn
Hugging Face

🔬Deep Learning

Transformers
CNNs
GANs
RNNs/LSTMs
Diffusion Models
Graph Neural Nets

🤖LLM & Agents

LangChain
LangGraph
RAG
Fine-tuning
Prompt Engineering
Multi-Agent Systems

⚙️MLOps & Infra

Docker
Kubernetes
MLflow
Weights & Biases
DVC
Airflow

📊Data & Databases

PostgreSQL
MongoDB
Redis
Pinecone
ChromaDB
Pandas

☁️Cloud & Compute

AWS
GCP
CUDA
TensorRT
NVIDIA Jetson

⚛️Quantum Computing

Qiskit
PennyLane
Cirq
JAX
FLAX
Quantum ML

🛠️Tools & Practices

Git
Linux
CI/CD
FastAPI
Jupyter
VS Code

Certifications

Professional credentials validating deep learning expertise.

NVIDIA Deep Learning Institute (DLI)
2024
Verified

Fundamentals of Deep Learning

Comprehensive certification covering neural network architectures, training techniques, and deployment strategies using NVIDIA tools.

Verify Credential
NVIDIA Deep Learning Institute (DLI)
2024
Verified

Building Transformer-Based NLP Applications

Advanced certification on transformer architectures, attention mechanisms, and NLP application development with GPU-accelerated computing.

Verify Credential

Get in Touch

Interested in research collaboration, PhD supervision, or discussing evaluation methodology in ML? I'd love to hear from you.

Open to PhD positions