marketbayesmeta is a focused Python/PyMC library for audit-oriented Bayesian
meta-analysis of marketing measurement evidence. It is designed for small-sample
workflows where evidence comes from geo tests, brand lift studies, MMM, and related
measurement sources.
The core stance is conservative: pool only comparable effects, keep units and uncertainty
assumptions explicit, and treat model output as review material rather than automatic
reporting.
If PyMC fails to import, recreate .venv before running model fits. Tracker and config
validation are still useful without sampling, but analysis runs require a working
PyMC/ArviZ stack.
Avoid mixed base conda/user-site environments for release validation. Either activate
.venv or prefix checks with:
PATH=.venv/bin:$PATH make check-statistical
Developer checks
make check
make check-statistical
make check is required before finalising code or documentation changes.
make check-statistical runs sampling-based contract tests and should pass in the
environment used for model work.
The example config points to examples/example_tracker.csv and writes outputs to:
output/example_geo_sales_uplift/
The example is intentionally small. It should complete and write the full artefact set,
but it is not expected to be reportable because readiness is directional.
Check the config
marketbayesmeta-check-config examples/config.yaml
Run the config
marketbayesmeta-run examples/config.yaml
Review the result
Start with:
analysis_report.md
run_status.json
readiness.csv
diagnostics.csv
prior_diagnostics.csv
effect_preparation.csv
ppc.csv
The relevant question is not just whether the run completed. It is whether
run_status.reportable is true after reviewing diagnostics, readiness, uncertainty
provenance, and sensitivity outputs.
How-To Guides
Task-oriented guides for preparing evidence, running analyses, and reviewing outputs.
rows marked analysis-ready but using a unit incompatible with the requested scale.
Estimand judgement
The tracker and config enforce metric names and units, but they cannot prove that studies
answer the same business question. Before pooling, confirm that population, measurement
window, study design, and estimand are comparable enough for a pooled effect to mean
something.
Run an Analysis
Use a YAML config when an analysis should be repeatable and reviewable.
Override these only for explicit exploratory work. Record the reason in project.notes
or the downstream analysis review.
Paths
data.tracker and outputs.directory are resolved relative to the config file when
they are not absolute paths.
Reportability
Successful execution does not mean the result is reportable. Always review
run_status.reportable and the supporting artefacts before using pooled summaries.
Review Outputs
Config runs write a compact artefact set intended for analyst review.
First files to open
File
Review question
analysis_report.md
What is the run status, readiness, and headline result?
run_status.json
Is reportable true? Why did the run complete, warn, block, or fail?
readiness.csv
How many comparable rows were eligible, and are they enough?
diagnostics.csv
Did sampler diagnostics pass configured thresholds?
prior_diagnostics.csv
Do priors imply warnings or implausible prior-predictive ranges?
effect_preparation.csv
Which rows were included, transformed, excluded, or scenario-based?
ppc.csv
Are individual studies out of line with in-sample posterior predictive checks?
Status meanings
run_status.json may report:
completed
completed_with_warnings
blocked
failed
A completed run is not automatically reportable. Treat reportable: false as a hard
prompt for analyst review before any downstream presentation.
Common reasons a run is not reportable
Readiness is directional or not_recommended.
Sampler diagnostics failed or were downgraded to warnings.
The tracker contains blocking duplicate rows.
Prior diagnostics or sensitivity indicate material prior dependence.
One or more rows rely on scenario uncertainty.
Run Sensitivity
Small-sample Bayesian meta-analysis can be prior-sensitive and uncertainty-sensitive.
For serious reporting, sensitivity is part of the result.
Enable prior sensitivity
sensitivity:prior:true
When prior_specs are omitted, the package uses a scale-aware grid:
log_relative: regularising, default, weak
percentage_point: regularising, default, weak
Review prior_sensitivity.csv for pooled mean, probability positive, interval width,
diagnostics, and deltas versus the default prior.
Enable uncertainty sensitivity
sensitivity:uncertainty:true
Rows with source-derived uncertainty keep that uncertainty. Rows with
uncertainty_scenario use the configured low, medium, or high model-scale standard
error.
The audit mode reports model-readiness and tracker-quality checks for supported
evidence/scale pairs present in the tracker.
Check a config
marketbayesmeta-check-config examples/config.yaml
This validates the YAML schema and confirms that the configured tracker path exists.
Run an analysis
marketbayesmeta-run examples/config.yaml
This runs the full config-driven workflow and writes configured outputs. It returns a
non-zero exit code for package-level analysis failures, including blocked readiness,
tracker validation errors, pooling errors, or blocking diagnostic failures.
marketbayesmeta produces review artefacts, not automatic decisions.
Reportability
A completed run is not automatically reportable. Check:
run_status.reportable
readiness.csv
diagnostics.csv
prior_diagnostics.csv
sensitivity outputs
analysis_report.md
Future true effect
future_true_effect_summary.csv describes the latent true effect for a comparable future
study. It excludes measurement error and is not a prediction for the next observed study
estimate.
Posterior predictive checks
ppc.csv is an in-sample study-level posterior predictive check. It is useful for
spotting studies that look out of line with the fitted model, but it is not leave-one-out
validation and should not be interpreted as out-of-sample accuracy.
Reporting language
Recommended reporting should distinguish:
the pooled posterior estimate for comparable evidence;
the latent true-effect distribution for a future comparable study;
uncertainty-scenario dependence;
prior dependence;
diagnostics and readiness caveats.
Release Status
The current internal release candidate is 0.3.0.
Suitable use
0.3.0 is suitable for supervised Data Science analyst workflows where outputs are
reviewed before reporting. It is not intended for unattended production reporting.
Validation snapshot
Validated locally on June 3, 2026 using the repository virtual environment:
PATH=.venv/bin:$PATH make check: passed
PATH=.venv/bin:$PATH make check-statistical: passed
Leave-one-out influence summaries are not part of the default workflow.
This private repository intentionally does not use CI/CD.
FAQ
Does the library fit Bayesian random-effects models with PyMC?
Yes. The default model is a Bayesian normal-normal random-effects meta-analysis in PyMC.
Random effects are a sensible default for comparable marketing measurement studies
because campaigns, markets, periods, execution, and measurement designs often differ.
When should studies not be pooled?
Do not pool studies just because they share a broad label such as “sales uplift” or
“awareness”. Pool only when the estimand, scale, population, measurement window, and
study design are comparable enough for a pooled effect to mean something.
MMM, PA2, ROI, CPA, CPO, and iROAS should usually be triangulated rather than pooled
unless harmonised explicitly upstream.
How many studies are enough?
There is no universal threshold. With two studies, random-effects meta-analysis is
usually not recommended. With three to five studies, the pooled estimate can be useful
but should be treated as directional and prior-sensitive.
By default, config runs require at least three comparable rows for an exploratory
directional run and at least four for ready.
Where do priors come from?
Priors should come from defensible external information and explicit scale calibration,
not from tuning the model until the result looks convenient.
Defaults are scale-aware:
log_relative: mu_sd=1.0, tau_scale=0.5
percentage_point: mu_sd=10.0, tau_scale=5.0
For serious reporting, set and justify priors in YAML and run prior sensitivity.
Why not use LOO by default?
LOO can be useful with enough independent studies, but it is easy to over-read at very
small K. In the main use case, readiness checks, sampler diagnostics, prior sensitivity,
and posterior predictive checks are more useful first-line review tools.
Methodology
Statistical framing for the default Bayesian model.
For study (i = 1,\ldots,k), the default model is a normal-normal Bayesian
random-effects model. The likelihood is marginalised over the latent study effects for
more stable small-(k) sampling:
Here (y_i) is the observed model-scale effect and (s_i) is its known standard error.
The pooled mean is (\mu), and (\tau) is between-study heterogeneity.
Posterior study-level (\theta_i) draws are exposed as derived quantities using the
Normal conditional distribution implied by (y_i), (\mu), (\tau), and (s_i).
The model also samples future_true_effect, representing the latent true effect for a
comparable future study. It excludes sampling error, so it is not a prediction for the
next observed study estimate.
Small-sample stance
When study counts are small, posterior intervals, prior sensitivity, and uncertainty
scenario sensitivity are more informative than a single headline pooled estimate.
prior_diagnostics.csv includes a fixed-effect precision approximation for the Normal
prior on mu and prior-predictive checks for the tau prior. Treat warnings as prompts
to inspect prior sensitivity, not as formal pass/fail tests.