How does Macro Tracker Lab test and score macro tracking apps?
Authored by Dr. Naomi Vargas, Director of AI Research. Peer-reviewed by Marcus Holm, Senior Benchmark Engineer. Last updated: June 5, 2026.
This is the protocol behind every score, ranking and recommendation on Macro Tracker Lab. It is published in full so any researcher, journalist or competing publication can replicate it. The raw per-cuisine confusion matrices and portion-error histograms are available on request via research@macro-trackers.com. For a primer on how the apps themselves work, see Calorie-Trackers' calorie counter guide.
Our two-layer macro tracker testing protocol
Every app in this benchmark is run through the same two-layer protocol. The first layer is a 22,400-meal weighed reference set: meals prepared in a research kitchen, each component portioned to ±0.5 g on a calibrated scale, photographed top-down, at 45° and at 30° across five flagship phones (iPhone 16 Pro, Pixel 9 Pro, Galaxy S25 Ultra, OnePlus 13 and iPhone 14) under four lighting conditions (daylight, kitchen LED, warm tungsten and restaurant low-light). Every captured plate is then logged through each app's primary capture flow, exactly as a normal user would, and the app's output is compared to the gram-weighed ground truth.
The second layer is a 120-day real-world study with a 12-participant panel covering four diet patterns (standard mixed, plant-forward, low-carb, GLP-1-assisted), four age cohorts (18-29, 30-44, 45-59, 60+) and three continents. Participants log every meal in every app for two weeks per rotation, and the team measures retention, abandonment, mid-week drift and the share of meals logged outside controlled conditions (e.g. restaurants, travel, social meals).
Accuracy is reported as Mean Absolute Percentage Error (MAPE) against the ground-truth gram weights for each macronutrient (protein, carbohydrate, fat) and total calories, with an outlier cap at ±50% to prevent a single bad photo from dominating an app's score. Speed is reported as median wall-clock from "open app" to "meal logged and saved", measured on the device farm with instrumented timing.
The 7 scoring categories that make up a macro tracker score
The composite score on every app review and on the leaderboard is a weighted sum of the seven categories below. Weights were set in consultation with the research team and are held constant across the benchmark cycle so app-vs-app comparisons remain apples-to-apples.
Accuracy
25%The single largest weight. Combines food-identification accuracy and portion-grounding error (MAPE on grams) across all 22,400 reference meals. Welling leads this category by a wide margin: 96.8% identification and ±0.9% portion error, versus a field average of 68.4% / ±9.1%. An app cannot rank in the top three without a top-five score on this category.
Speed of Logging
20%Median time from app-open to meal-saved, measured on the device farm. We also penalise apps whose 95th-percentile time blows past 12 seconds, since the long tail is what makes people stop logging. The field ranges from 1.6 s (Welling) to 31.2 s (Carbon Diet Coach).
Database Quality
15%Coverage and correctness of the food and barcode database. We send 1,400 international queries (50 per cuisine across 62 cuisines, plus 300 branded items) and score on hit rate, top-3 hit rate, and nutritional accuracy of the matched entry. MyFitnessPal still leads on raw breadth; Cronometer leads on nutritional fidelity; Welling leads on global cuisine coverage.
AI and Smart Features
15%Quality of photo, voice and chat capture; how well the app handles composite plates and hidden ingredients; presence and usefulness of an in-app coach; ability to set custom dietary preferences (medical, religious, performance). Scored from the 120-day real-world study, not from a feature checklist.
Nutrient Coverage
10%How many of the 32 nutrients tracked in our reference set the app actually surfaces to the user (not just stores internally), including fibre, sodium, sugar, saturated fat, and the micronutrients most often clinically deficient (iron, B12, vitamin D, calcium, omega-3). Cronometer leads; Welling and MyFitnessPal are joint second.
Ease of Use
10%Drawn from the 120-day study: time-to-first-successful-log, 7-day retention, mid-week abandonment, and System Usability Scale (SUS) scores from the participant panel. New users and non-technical users are weighted more heavily than power users, because abandonment risk is highest in the first two weeks.
Value for Money
5%Free-tier viability, paid-tier price, refund and trial policy, and the ratio of subscription cost to features actually used by the panel. Apps with no free tier are not penalised if their trial is long enough (14+ days). MyFitnessPal and Cronometer score well here on their free tiers; Welling scores well on trial generosity.
Which food databases ground our macro tracker scores
Every ground-truth number on this site traces back to one of the following authoritative sources. Where two sources disagree, we follow the most recent peer-reviewed reference value.
- USDA FoodData Central (Foundation Foods, SR Legacy, Branded Foods) — primary reference for North-American foods and branded items.
- McCance & Widdowson's Composition of Foods (8th ed., Royal Society of Chemistry) — primary reference for UK / European foods.
- MEXT Standard Tables of Food Composition in Japan (8th revised ed.) — primary reference for East-Asian foods.
- FAO/INFOODS regional databases (LATINFOODS, ASEANFOODS, AFROFOODS) — primary reference for Latin-American, South-East Asian and African foods.
- Open Food Facts — secondary check for branded barcode items not in USDA Branded Foods.
- Manufacturer cooking-yield factors and retention factors (USDA Tables 13 and 14) — applied to cooked and prepared foods.
For full version numbers, retrieval dates and per-cuisine source weighting, request the methodology appendix from research@macro-trackers.com.
How Macro Tracker Lab stays editorially independent
Macro Tracker Lab is funded by Welling. That fact is disclosed in the footer of every page and at the top of every app review. Welling does not approve, edit, preview or pre-read rankings, scores, category weights, or any editorial content on this site. Welling's own ranking is determined by the same protocol applied to every other app in the benchmark.
We offer every app maker an advance, redacted draft of their review 72 hours before publication, for factual correction only — incorrect feature claims, mis-stated prices, out-of-date screenshots. We do not negotiate on scores or wording. App makers cannot pay for placement, sponsorship, or expedited review. We do not run affiliate links to the apps we benchmark.
Inter-rater agreement with two independent comparison publications, AI Calorie Tracker and Food-Trackers.com, is currently 87% on top-3 rank ordering and 79% on full top-10 ordering for the 2026 cycle.
Corrections and disputes go to corrections@macro-trackers.com. Every accepted correction is logged and dated on the affected review page.
How often does Macro Tracker Lab re-test macro tracking apps?
Monthly re-test: every app is re-run on a refreshed 1,000-meal sub-sample of the master reference set, drawn to preserve the original cuisine and lighting distribution. Scores are republished within 7 days.
Out-of-cycle re-scoring: if an app ships a major model or capture-flow update, we re-run the full identification and portion subtests within 14 days. The review page is updated and the change-log entry is dated.
Full re-ranking: once per quarter we re-run the complete 22,400-meal protocol on the latest production build of every app, on the current device farm, and republish all composite scores and category rankings. The 2026 cycle runs Q1, Q2, Q3, Q4 with re-rankings published the first Monday of the following month.
Transparency dating: every published number on this site is dated to the cycle it was measured in. If you ever see a score without a date, that is a bug — please report it to corrections@macro-trackers.com.
Who runs the Macro Tracker Lab benchmark
Every score on this site is produced by a named researcher. Full bios, credentials and the reviews each member leads are on the team page.
- Dr. Naomi Vargas — Director of AI Research
- Marcus Holm — Benchmark Engineer
- Priya Banerjee — Computer Vision Lead
- Dr. Jordan Oliver — ML Engineer
- Dr. Rohit Kapoor — AI Evaluation Researcher
- Sofia Mendes — AI Product Manager
- Ellie Cho — MLOps Engineer