Team

Who runs the Macro Tracker Lab macro tracking benchmark?

The Macro Tracker Lab benchmark is built and maintained by a small, named team of AI researchers and engineers: machine learning, computer vision, MLOps, AI evaluation, and an AI product manager. We are the authors behind the benchmark methodology and the named reviewers on every app review on this site.

Research leadership

Research leadership

Headshot of Dr. Naomi Vargas

Dr. Naomi Vargas

Director of AI Research & Lead Author · Brooklyn, NY

Naomi sets the research agenda for the Macro Tracker Lab benchmark and is the named lead author of the 2026 cycle. Her background is in evaluating vision-language models on real-world consumer tasks; before Macro Tracker Lab she led applied research at an industrial computer-vision lab.

She designed the gram-weighed reference-plate protocol that anchors the benchmark, including the move away from menu-photography training data that the leading apps now follow. She also owns the cross-site inter-rater reconciliation with AI Calorie Tracker and Food-Trackers.com.

Areas of expertise

  • Vision-language model evaluation
  • Benchmark protocol design
  • Real-world AI accuracy

Credentials

  • PhD Machine Learning
  • Prior: Applied research lead, industrial vision lab

Selected work

  • Reference-plate methodology for AI food-tracking evaluation, Macro Tracker Lab methodology paper
  • Bias in menu-photography training data for portion estimation, Workshop on Vision-Language Models

Lead reviewer on:  welling , myfitnesspal , lose it , cronometer

AI & engineering

AI & engineering

Headshot of Marcus Holm

Marcus Holm

Senior Benchmark Engineer · Stockholm, SE

Marcus owns day-to-day execution of the benchmark, the evaluation harness, the 21-day real-world use studies, and the per-cuisine sub-tests. He instrumented the timing pipeline that produces the median capture-speed metric every app on this site is scored on.

Before joining Macro Tracker Lab he led data engineering at a consumer health-data platform, where he built the methodology for the company's first publicly reproducible accuracy report.

Areas of expertise

  • Evaluation harness design
  • Data engineering
  • Latency and throughput instrumentation

Credentials

  • MSc Computer Science
  • Prior: Data engineering lead, consumer health-data platform

Lead reviewer on:  macrofactor , yazio , lifesum

Headshot of Priya Banerjee

Priya Banerjee

Computer Vision Lead · Zürich, CH

Priya leads the computer-vision side of the benchmark: standing up the harness that runs the 22,400 reference meals through every app's capture flow, instrumenting per-frame timing, and analysing portion-error distributions cuisine by cuisine.

Her published work on multi-component plate segmentation underpins how the benchmark distinguishes a true composite-plate failure from a single-food misclassification.

Areas of expertise

  • Food image segmentation
  • Portion-grounding evaluation
  • On-device vision benchmarks

Credentials

  • MSc Computer Vision (ETH Zürich)
  • Prior: Senior CV engineer, mobile imaging team

Selected work

  • Per-component portion error in mixed-plate logging, Computer Vision and Pattern Recognition workshop

Lead reviewer on:  foodvisor , snapcalorie

Headshot of Dr. Jordan Oliver

Dr. Jordan Oliver

Senior Machine Learning Engineer · Toronto, CA

Jordan owns the model-evaluation harness: how each app's predictions are scored against gram-weighed ground truth without unfairly penalising honest refusals or correct top-3 answers.

Their background spans foundation-model evaluation in non-food domains, which is reflected in the benchmark's separation of identification accuracy from portion grounding, two metrics most consumer comparisons still conflate.

Areas of expertise

  • Foundation model evaluation
  • Vision-language model probing
  • Benchmark fairness

Credentials

  • PhD Computer Science (Machine Learning)
  • Prior: Research scientist, applied ML lab
Headshot of Dr. Rohit Kapoor

Dr. Rohit Kapoor

AI Evaluation Researcher · Bengaluru, IN

Rohit designs the scoring rubric that turns raw model outputs into a single composite number, the work most users never see, but which decides who tops the leaderboard. He owns the weight calibration between identification, portion grounding, capture speed and coverage.

His prior research focused on robustness benchmarks for production AI systems, and he is responsible for the failure-mode analysis we run when an app pushes a model update mid-cycle.

Areas of expertise

  • AI system evaluation
  • Robustness benchmarking
  • Composite scoring rubrics

Credentials

  • PhD Computer Science
  • Prior: AI evaluation researcher, large-model lab
Headshot of Ellie Cho

Ellie Cho

MLOps & Data Engineer · Seoul, KR

Ellie keeps the benchmark reproducible, the dataset versioning, the device farm that holds the iPhone 16 Pro, Pixel 9 Pro, Galaxy S25 Ultra, OnePlus 13 and iPhone 14 fleet, and the pipeline that ingests every app's API or capture flow into a shared evaluation store.

Her work is the reason a benchmark cycle can be re-run in days rather than weeks, and the reason an outside researcher can ask for the per-cuisine confusion matrix and actually receive it.

Areas of expertise

  • MLOps pipelines
  • Device farm operations
  • Dataset versioning

Credentials

  • MSc Software Engineering
  • Prior: MLOps lead, AI infrastructure startup
Product & analysis

Product & analysis

Headshot of Sofia Mendes

Sofia Mendes

AI Product Manager · Lisbon, PT

Sofia translates the engineering output into the comparisons readers actually want, head-to-heads, best-for categories, and the editorial voice across the site. She runs the brief for every app review.

She previously shipped two consumer AI products to market and brings a working knowledge of how the apps in our benchmark are actually built, which sharpens the qualitative read on each of them.

Areas of expertise

  • Consumer AI product
  • Editorial strategy
  • Comparative AI reviews

Credentials

  • MBA, focus on AI / consumer tech
  • Prior: PM, two shipped consumer AI products

Lead reviewer on:  carbon

How we work

How does the Macro Tracker Lab team produce a benchmark cycle?

Every quarter the team runs the full benchmark protocol against every app on the site. Engineering (Priya, Jordan, Marcus) operates the evaluation harness and the model-scoring pipeline; Rohit owns the scoring rubric and the robustness analysis; Ellie keeps the MLOps stack and device farm reproducible from one cycle to the next; Sofia leads the comparative writing and editorial brief. Naomi sets the protocol, signs off the final scoring, and is the named lead author of the 2026 cycle.

We publish what we test and what we exclude; the methodology document is updated when the protocol changes. Inter-rater agreement is cross-checked against AI Calorie Tracker and Food-Trackers.com each cycle. None of the team accepts personal compensation from the apps we rank.

For methodology questions, academic data requests, or research collaborations: research@macro-trackers.com.