marketbayesmeta

marketbayesmeta is a focused Python/PyMC library for audit-oriented Bayesian meta-analysis of marketing measurement evidence. It is designed for small-sample workflows where evidence comes from geo tests, brand lift studies, MMM, and related measurement sources.

The core stance is conservative: pool only comparable effects, keep units and uncertainty assumptions explicit, and treat model output as review material rather than automatic reporting.

Current release0.3.0 internal release candidate

AudienceData Science analysts reviewing marketing measurement evidence

ReadinessSupervised analyst use, not unattended production reporting

Where to start

You want to…	Start here
Install the package and run the first workflow	Quick Start
Prepare a tracker for modelling	Prepare a Tracker
Run a YAML-configured analysis	Run an Analysis
Review whether outputs are reportable	Review Outputs
Check exact YAML fields	Configuration Reference
Check command-line entry points	CLI Reference
Understand the Bayesian model	Statistical Model
Understand small-sample cautions	FAQ

Sections

Tutorials — learning-oriented install and first-run material.
How-To Guides — task-focused operational guides.
Reference — exact CLI, config, API, and output artefact reference.
Explanation — interpretation guidance, release status, and FAQ.
Methodology — statistical framing for the default model.

Tutorials

Learning-oriented guides for getting marketbayesmeta installed and running a first example workflow.

Installation

Use Python 3.11 or newer.

Install from a source checkout

git clone https://github.com/tandpds/marketbayesmeta.git
cd marketbayesmeta

python -m venv .venv
source .venv/bin/activate
python -m pip install -e ".[dev]"

Confirm that model dependencies import from the repository virtual environment:

which python
python - <<'PY'
import pymc
import numpy
print("pymc", pymc.__version__)
print("numpy", numpy.__version__)
PY

If PyMC fails to import, recreate .venv before running model fits. Tracker and config validation are still useful without sampling, but analysis runs require a working PyMC/ArviZ stack.

Avoid mixed base conda/user-site environments for release validation. Either activate .venv or prefix checks with:

PATH=.venv/bin:$PATH make check-statistical

Developer checks

make check
make check-statistical

make check is required before finalising code or documentation changes. make check-statistical runs sampling-based contract tests and should pass in the environment used for model work.

Next steps

Quick Start

The quickest way to see the package workflow is to run the synthetic example config.

source .venv/bin/activate
python runme.py examples/config.yaml

The example config points to examples/example_tracker.csv and writes outputs to:

output/example_geo_sales_uplift/

The example is intentionally small. It should complete and write the full artefact set, but it is not expected to be reportable because readiness is directional.

Check the config

marketbayesmeta-check-config examples/config.yaml

Run the config

marketbayesmeta-run examples/config.yaml

Review the result

Start with:

analysis_report.md
run_status.json
readiness.csv
diagnostics.csv
prior_diagnostics.csv
effect_preparation.csv
ppc.csv

The relevant question is not just whether the run completed. It is whether run_status.reportable is true after reviewing diagnostics, readiness, uncertainty provenance, and sensitivity outputs.

How-To Guides

Task-oriented guides for preparing evidence, running analyses, and reviewing outputs.

Prepare a Tracker

The modelling workflow starts from a long-format tracker. Use one row per extracted effect estimate.

Required columns

client,evidence_type,metric,value,unit,period,source_type,source_status,source_file_or_link,owner_or_contact,analysis_ready,notes

Optional uncertainty columns

study_id,estimand_note,standard_error,ci_lower,ci_upper,ci_level,ci_type,p_value,p_value_sidedness,uncertainty_scenario

Supported pooled evidence

Evidence	Default treatment
`geo_test`	Pool comparable effects on `log_relative` scale.
`bls`	Pool comparable effects on `percentage_point` scale.
`mmm`, `pa2`, ROI, CPA, CPO, iROAS	Treat as triangulation unless harmonised upstream.

Validate the tracker

marketbayesmeta-check-tracker path/to/tracker.csv
marketbayesmeta-check-tracker path/to/tracker.csv --audit --include-partial

Resolve or explicitly accept:

duplicate (evidence_type, client, metric, period) candidates;
missing source uncertainty;
interval rows missing ci_type or ci_level;
p-values without p_value_sidedness=two_sided;
rows marked analysis-ready but using a unit incompatible with the requested scale.

Estimand judgement

The tracker and config enforce metric names and units, but they cannot prove that studies answer the same business question. Before pooling, confirm that population, measurement window, study design, and estimand are comparable enough for a pooled effect to mean something.

Run an Analysis

Use a YAML config when an analysis should be repeatable and reviewable.

marketbayesmeta-check-config examples/config.yaml
marketbayesmeta-run examples/config.yaml

For local repository use:

python runme.py examples/config.yaml

Conservative default gates

Config runs stop on directional and not_recommended readiness by default, and failed sampler diagnostics are blocking by default.

diagnostics:
  allow_directional: false
  allow_not_recommended: false
  allow_duplicates: false
  fail_on_diagnostic_failure: true
  minimum_studies_ready: 4
  minimum_studies_directional: 3

Override these only for explicit exploratory work. Record the reason in project.notes or the downstream analysis review.

Paths

data.tracker and outputs.directory are resolved relative to the config file when they are not absolute paths.

Reportability

Successful execution does not mean the result is reportable. Always review run_status.reportable and the supporting artefacts before using pooled summaries.

Review Outputs

Config runs write a compact artefact set intended for analyst review.

First files to open

File	Review question
`analysis_report.md`	What is the run status, readiness, and headline result?
`run_status.json`	Is `reportable` true? Why did the run complete, warn, block, or fail?
`readiness.csv`	How many comparable rows were eligible, and are they enough?
`diagnostics.csv`	Did sampler diagnostics pass configured thresholds?
`prior_diagnostics.csv`	Do priors imply warnings or implausible prior-predictive ranges?
`effect_preparation.csv`	Which rows were included, transformed, excluded, or scenario-based?
`ppc.csv`	Are individual studies out of line with in-sample posterior predictive checks?

Status meanings

run_status.json may report:

completed
completed_with_warnings
blocked
failed

A completed run is not automatically reportable. Treat reportable: false as a hard prompt for analyst review before any downstream presentation.

Common reasons a run is not reportable

Readiness is directional or not_recommended.
Sampler diagnostics failed or were downgraded to warnings.
The tracker contains blocking duplicate rows.
Prior diagnostics or sensitivity indicate material prior dependence.
One or more rows rely on scenario uncertainty.

Run Sensitivity

Small-sample Bayesian meta-analysis can be prior-sensitive and uncertainty-sensitive. For serious reporting, sensitivity is part of the result.

Enable prior sensitivity

sensitivity:
  prior: true

When prior_specs are omitted, the package uses a scale-aware grid:

log_relative: regularising, default, weak
percentage_point: regularising, default, weak

Review prior_sensitivity.csv for pooled mean, probability positive, interval width, diagnostics, and deltas versus the default prior.

Enable uncertainty sensitivity

sensitivity:
  uncertainty: true

Rows with source-derived uncertainty keep that uncertainty. Rows with uncertainty_scenario use the configured low, medium, or high model-scale standard error.

Scenario assumptions live in:

uncertainty:
  scenario_standard_errors:
    log_relative:
      low: 0.03
      medium: 0.08
      high: 0.15
    percentage_point:
      low: 1.0
      medium: 3.0
      high: 5.0

Report when reasonable sensitivity settings change the substantive conclusion.

Reference

Exact operational and technical reference for marketbayesmeta.

CLI Reference

marketbayesmeta installs three console scripts.

Check a tracker

marketbayesmeta-check-tracker path/to/tracker.csv
marketbayesmeta-check-tracker path/to/tracker.csv --audit --include-partial

The audit mode reports model-readiness and tracker-quality checks for supported evidence/scale pairs present in the tracker.

Check a config

marketbayesmeta-check-config examples/config.yaml

This validates the YAML schema and confirms that the configured tracker path exists.

Run an analysis

marketbayesmeta-run examples/config.yaml

This runs the full config-driven workflow and writes configured outputs. It returns a non-zero exit code for package-level analysis failures, including blocked readiness, tracker validation errors, pooling errors, or blocking diagnostic failures.

Repository runner

python runme.py
python runme.py examples/config.yaml

Without an argument, runme.py looks for config.yaml in the repository root.

Configuration Reference

Use YAML config files for repeatable analysis runs.

Core fields

Field	Required	Default	Notes
`project.name`	No	`marketbayesmeta_analysis`	Human-readable project name.
`project.analyst`	No	`null`	Analyst or owner.
`project.notes`	No	`""`	Short project note.
`data.tracker`	Yes	None	CSV tracker path, resolved relative to the config file.
`data.include_partial`	No	`false`	Include `analysis_ready=partial` rows.
`analysis.evidence_type`	Yes	None	Currently pooled by default: `geo_test`, `bls`.
`analysis.metric`	Yes	None	Exact metric name to pool.
`analysis.scale`	Yes	None	`log_relative` or `percentage_point`.
`analysis.max_abs_percent_uplift`	No	`90`	Hard limit for `%` geo uplift to log-relative conversion.
`model.draws`	No	`2000`	Posterior draws per chain.
`model.tune`	No	`1000`	Tuning draws per chain.
`model.chains`	No	`4`	Number of chains.
`model.target_accept`	No	`0.9`	Passed to PyMC sampling.
`model.random_seed`	No	`null`	Sampling seed.
`model.progressbar`	No	`false`	Keep false for scripted runs.
`priors.mu_sd`	No	scale-aware	Prior SD for pooled mean `mu`.
`priors.tau_scale`	No	scale-aware	HalfNormal scale for heterogeneity `tau`.
`diagnostics.allow_directional`	No	`false`	Allow directional readiness runs only for exploratory work.
`diagnostics.allow_not_recommended`	No	`false`	Override only for explicit exploratory work.
`diagnostics.fail_on_diagnostic_failure`	No	`true`	Make failed sampler diagnostics blocking.
`outputs.directory`	No	`output/analysis`	Output folder, resolved relative to config file.

Uncertainty fields

Rows with intervals must state ci_type. They must also state ci_level unless uncertainty.default_ci_level is explicitly set in config.

uncertainty:
  default_ci_level: null
  scenario_standard_errors:
    log_relative:
      low: 0.03
      medium: 0.08
      high: 0.15
    percentage_point:
      low: 1.0
      medium: 3.0
      high: 5.0

P-values are converted only when p_value_sidedness=two_sided.

Sensitivity fields

sensitivity:
  uncertainty: false
  prior: false

Custom prior sensitivity specs can be provided as sensitivity.prior_specs; when omitted the package uses scale-aware defaults.

Outputs Reference

Config runs write a small artefact set intended for review and downstream reporting.

File	Purpose
`config.resolved.yaml`	Config with defaults made explicit.
`model_input.csv`	Effects and standard errors used for modelling.
`effect_preparation.csv`	Row-level source values, uncertainty provenance, transformations, exclusions, and warnings.
`readiness.csv`	Model-readiness status and messages.
`tracker_issues.csv`	Tracker quality warnings/errors.
`effect_summary.csv`	Pooled mean posterior summary.
`future_true_effect_summary.csv`	Latent true effect for a comparable future study.
`analysis_report.md`	Analyst-facing summary of reportability, diagnostics, readiness, and headline effects.
`diagnostics.csv`	Sampler diagnostic checks.
`prior_diagnostics.csv`	Approximate prior-information warning for `mu` plus `tau` prior-predictive checks.
`ppc.csv`	In-sample study-level posterior predictive checks.
`run_status.json`	Machine-readable run outcome, including `reportable`.
`outputs_manifest.json`	Machine-readable artefact list.

Optional files:

File	Written when
`uncertainty_sensitivity.csv`	`sensitivity.uncertainty: true`
`prior_sensitivity.csv`	`sensitivity.prior: true`

Prefer adding columns over renaming or removing columns. If a public artefact changes, update fixture tests and docs in the same change.

API Reference

The package exports the main workflow helpers from marketbayesmeta.

Tracker and readiness

load_tracker_csv
assess_model_readiness
tracker_quality_issues
readiness_frame
tracker_issue_frame

Model input and fitting

make_meta_analysis_input
make_effect_preparation_rows
fit_random_effects

Reporting and diagnostics

summarise_effect
summarise_diagnostics
check_diagnostics
make_posterior_predictive_check_rows
posterior_predictive_check_frame

Priors and sensitivity

default_priors_for_scale
assess_prior_influence
fit_prior_sensitivity
make_prior_sensitivity_report_frame
make_sensitivity_inputs
make_sensitivity_report_frame

Config runner

load_config
run_analysis
run_config

Minimal Python example

from marketbayesmeta import EffectScale, EvidenceType
from marketbayesmeta import fit_random_effects, load_tracker_csv, make_meta_analysis_input
from marketbayesmeta.reporting import summarise_effect

dataset = load_tracker_csv("examples/example_tracker.csv")
model_input = make_meta_analysis_input(
    dataset,
    evidence_type=EvidenceType.GEO_TEST,
    scale=EffectScale.LOG_RELATIVE,
    metric="Sales uplift",
    include_partial=True,
)

result = fit_random_effects(model_input, random_seed=20260521)
print(summarise_effect(result.idata, scale=model_input.scale))

Explanation

Background material for interpreting results and understanding the package’s small-sample stance.

Interpretation

marketbayesmeta produces review artefacts, not automatic decisions.

Reportability

A completed run is not automatically reportable. Check:

run_status.reportable
readiness.csv
diagnostics.csv
prior_diagnostics.csv
sensitivity outputs
analysis_report.md

Future true effect

future_true_effect_summary.csv describes the latent true effect for a comparable future study. It excludes measurement error and is not a prediction for the next observed study estimate.

Posterior predictive checks

ppc.csv is an in-sample study-level posterior predictive check. It is useful for spotting studies that look out of line with the fitted model, but it is not leave-one-out validation and should not be interpreted as out-of-sample accuracy.

Reporting language

Recommended reporting should distinguish:

the pooled posterior estimate for comparable evidence;
the latent true-effect distribution for a future comparable study;
uncertainty-scenario dependence;
prior dependence;
diagnostics and readiness caveats.

Release Status

The current internal release candidate is 0.3.0.

Suitable use

0.3.0 is suitable for supervised Data Science analyst workflows where outputs are reviewed before reporting. It is not intended for unattended production reporting.

Validation snapshot

Validated locally on June 3, 2026 using the repository virtual environment:

PATH=.venv/bin:$PATH make check: passed
PATH=.venv/bin:$PATH make check-statistical: passed
PATH=.venv/bin:$PATH python runme.py examples/config.yaml: completed

The synthetic example completed with diagnostics passed, readiness directional, and run_status.reportable false, as expected for the small-K example.

Known limitations

Real project trackers still require analyst judgement about estimand comparability.
Small-K random-effects analyses remain prior-sensitive.
Leave-one-out influence summaries are not part of the default workflow.
This private repository intentionally does not use CI/CD.

FAQ

Does the library fit Bayesian random-effects models with PyMC?

Yes. The default model is a Bayesian normal-normal random-effects meta-analysis in PyMC. Random effects are a sensible default for comparable marketing measurement studies because campaigns, markets, periods, execution, and measurement designs often differ.

When should studies not be pooled?

Do not pool studies just because they share a broad label such as “sales uplift” or “awareness”. Pool only when the estimand, scale, population, measurement window, and study design are comparable enough for a pooled effect to mean something.

MMM, PA2, ROI, CPA, CPO, and iROAS should usually be triangulated rather than pooled unless harmonised explicitly upstream.

How many studies are enough?

There is no universal threshold. With two studies, random-effects meta-analysis is usually not recommended. With three to five studies, the pooled estimate can be useful but should be treated as directional and prior-sensitive.

By default, config runs require at least three comparable rows for an exploratory directional run and at least four for ready.

Where do priors come from?

Priors should come from defensible external information and explicit scale calibration, not from tuning the model until the result looks convenient.

Defaults are scale-aware:

log_relative: mu_sd=1.0, tau_scale=0.5
percentage_point: mu_sd=10.0, tau_scale=5.0

For serious reporting, set and justify priors in YAML and run prior sensitivity.

Why not use LOO by default?

LOO can be useful with enough independent studies, but it is easy to over-read at very small K. In the main use case, readiness checks, sampler diagnostics, prior sensitivity, and posterior predictive checks are more useful first-line review tools.

Methodology

Statistical framing for the default Bayesian model.

Statistical Model

For study (i = 1,\ldots,k), the default model is a normal-normal Bayesian random-effects model. The likelihood is marginalised over the latent study effects for more stable small-(k) sampling:

[ y_i \sim \mathrm{Normal}\left(\mu, \sqrt{s_i^{2 + \tau}2}\right), \qquad \mu \sim \mathrm{Normal}(0, \sigma_\mu), \qquad \tau \sim \mathrm{HalfNormal}(\sigma_\tau). ]

Here (y_i) is the observed model-scale effect and (s_i) is its known standard error. The pooled mean is (\mu), and (\tau) is between-study heterogeneity.

Posterior study-level (\theta_i) draws are exposed as derived quantities using the Normal conditional distribution implied by (y_i), (\mu), (\tau), and (s_i).

The model also samples future_true_effect, representing the latent true effect for a comparable future study. It excludes sampling error, so it is not a prediction for the next observed study estimate.

Small-sample stance

When study counts are small, posterior intervals, prior sensitivity, and uncertainty scenario sensitivity are more informative than a single headline pooled estimate.

prior_diagnostics.csv includes a fixed-effect precision approximation for the Normal prior on mu and prior-predictive checks for the tau prior. Treat warnings as prompts to inspect prior sensitivity, not as formal pass/fail tests.

marketbayesmeta

Where to start

Sections

Subsections of marketbayesmeta

Tutorials

Pages

Subsections of Tutorials

Installation

Install from a source checkout

Developer checks

Next steps

Quick Start

Check the config

Run the config

Review the result

How-To Guides

Pages

Subsections of How-To Guides

Prepare a Tracker

Required columns

Optional uncertainty columns

Supported pooled evidence

Validate the tracker

Estimand judgement

Run an Analysis

Conservative default gates

Paths

Reportability

Review Outputs

First files to open

Status meanings

Common reasons a run is not reportable

Run Sensitivity

Enable prior sensitivity

Enable uncertainty sensitivity

Reference

Pages

Subsections of Reference

CLI Reference

Check a tracker

Check a config

Run an analysis

Repository runner

Configuration Reference

Core fields

Uncertainty fields

Sensitivity fields

Outputs Reference

API Reference

Tracker and readiness

Model input and fitting

Reporting and diagnostics

Priors and sensitivity

Config runner

Minimal Python example

Explanation

Pages

Subsections of Explanation

Interpretation

Reportability

Future true effect

Posterior predictive checks

Reporting language

Release Status

Suitable use

Validation snapshot

Known limitations

FAQ

Does the library fit Bayesian random-effects models with PyMC?

When should studies not be pooled?

How many studies are enough?

Where do priors come from?

Why not use LOO by default?

Methodology

Pages

Subsections of Methodology

Statistical Model

Small-sample stance