marketbayesmeta

marketbayesmeta is a focused Python/PyMC library for audit-oriented Bayesian meta-analysis of marketing measurement evidence. It is designed for small-sample workflows where evidence comes from geo tests, brand lift studies, MMM, and related measurement sources.

The core stance is conservative: pool only comparable effects, keep units and uncertainty assumptions explicit, and treat model output as review material rather than automatic reporting.

Current release0.3.0 internal release candidate
AudienceData Science analysts reviewing marketing measurement evidence
ReadinessSupervised analyst use, not unattended production reporting

Where to start

You want to… Start here
Install the package and run the first workflow Quick Start
Prepare a tracker for modelling Prepare a Tracker
Run a YAML-configured analysis Run an Analysis
Review whether outputs are reportable Review Outputs
Check exact YAML fields Configuration Reference
Check command-line entry points CLI Reference
Understand the Bayesian model Statistical Model
Understand small-sample cautions FAQ

Sections

  • Tutorials — learning-oriented install and first-run material.
  • How-To Guides — task-focused operational guides.
  • Reference — exact CLI, config, API, and output artefact reference.
  • Explanation — interpretation guidance, release status, and FAQ.
  • Methodology — statistical framing for the default model.

Subsections of marketbayesmeta

Tutorials

Learning-oriented guides for getting marketbayesmeta installed and running a first example workflow.

Pages

Subsections of Tutorials

Installation

Use Python 3.11 or newer.

Install from a source checkout

git clone https://github.com/tandpds/marketbayesmeta.git
cd marketbayesmeta

python -m venv .venv
source .venv/bin/activate
python -m pip install -e ".[dev]"

Confirm that model dependencies import from the repository virtual environment:

which python
python - <<'PY'
import pymc
import numpy
print("pymc", pymc.__version__)
print("numpy", numpy.__version__)
PY

If PyMC fails to import, recreate .venv before running model fits. Tracker and config validation are still useful without sampling, but analysis runs require a working PyMC/ArviZ stack.

Avoid mixed base conda/user-site environments for release validation. Either activate .venv or prefix checks with:

PATH=.venv/bin:$PATH make check-statistical

Developer checks

make check
make check-statistical

make check is required before finalising code or documentation changes. make check-statistical runs sampling-based contract tests and should pass in the environment used for model work.

Next steps

Quick Start

The quickest way to see the package workflow is to run the synthetic example config.

source .venv/bin/activate
python runme.py examples/config.yaml

The example config points to examples/example_tracker.csv and writes outputs to:

output/example_geo_sales_uplift/

The example is intentionally small. It should complete and write the full artefact set, but it is not expected to be reportable because readiness is directional.

Check the config

marketbayesmeta-check-config examples/config.yaml

Run the config

marketbayesmeta-run examples/config.yaml

Review the result

Start with:

  • analysis_report.md
  • run_status.json
  • readiness.csv
  • diagnostics.csv
  • prior_diagnostics.csv
  • effect_preparation.csv
  • ppc.csv

The relevant question is not just whether the run completed. It is whether run_status.reportable is true after reviewing diagnostics, readiness, uncertainty provenance, and sensitivity outputs.

Subsections of How-To Guides

Prepare a Tracker

The modelling workflow starts from a long-format tracker. Use one row per extracted effect estimate.

Required columns

client,evidence_type,metric,value,unit,period,source_type,source_status,source_file_or_link,owner_or_contact,analysis_ready,notes

Optional uncertainty columns

study_id,estimand_note,standard_error,ci_lower,ci_upper,ci_level,ci_type,p_value,p_value_sidedness,uncertainty_scenario

Supported pooled evidence

Evidence Default treatment
geo_test Pool comparable effects on log_relative scale.
bls Pool comparable effects on percentage_point scale.
mmm, pa2, ROI, CPA, CPO, iROAS Treat as triangulation unless harmonised upstream.

Validate the tracker

marketbayesmeta-check-tracker path/to/tracker.csv
marketbayesmeta-check-tracker path/to/tracker.csv --audit --include-partial

Resolve or explicitly accept:

  • duplicate (evidence_type, client, metric, period) candidates;
  • missing source uncertainty;
  • interval rows missing ci_type or ci_level;
  • p-values without p_value_sidedness=two_sided;
  • rows marked analysis-ready but using a unit incompatible with the requested scale.

Estimand judgement

The tracker and config enforce metric names and units, but they cannot prove that studies answer the same business question. Before pooling, confirm that population, measurement window, study design, and estimand are comparable enough for a pooled effect to mean something.

Run an Analysis

Use a YAML config when an analysis should be repeatable and reviewable.

marketbayesmeta-check-config examples/config.yaml
marketbayesmeta-run examples/config.yaml

For local repository use:

python runme.py examples/config.yaml

Conservative default gates

Config runs stop on directional and not_recommended readiness by default, and failed sampler diagnostics are blocking by default.

diagnostics:
  allow_directional: false
  allow_not_recommended: false
  allow_duplicates: false
  fail_on_diagnostic_failure: true
  minimum_studies_ready: 4
  minimum_studies_directional: 3

Override these only for explicit exploratory work. Record the reason in project.notes or the downstream analysis review.

Paths

data.tracker and outputs.directory are resolved relative to the config file when they are not absolute paths.

Reportability

Successful execution does not mean the result is reportable. Always review run_status.reportable and the supporting artefacts before using pooled summaries.

Review Outputs

Config runs write a compact artefact set intended for analyst review.

First files to open

File Review question
analysis_report.md What is the run status, readiness, and headline result?
run_status.json Is reportable true? Why did the run complete, warn, block, or fail?
readiness.csv How many comparable rows were eligible, and are they enough?
diagnostics.csv Did sampler diagnostics pass configured thresholds?
prior_diagnostics.csv Do priors imply warnings or implausible prior-predictive ranges?
effect_preparation.csv Which rows were included, transformed, excluded, or scenario-based?
ppc.csv Are individual studies out of line with in-sample posterior predictive checks?

Status meanings

run_status.json may report:

  • completed
  • completed_with_warnings
  • blocked
  • failed

A completed run is not automatically reportable. Treat reportable: false as a hard prompt for analyst review before any downstream presentation.

Common reasons a run is not reportable

  • Readiness is directional or not_recommended.
  • Sampler diagnostics failed or were downgraded to warnings.
  • The tracker contains blocking duplicate rows.
  • Prior diagnostics or sensitivity indicate material prior dependence.
  • One or more rows rely on scenario uncertainty.

Run Sensitivity

Small-sample Bayesian meta-analysis can be prior-sensitive and uncertainty-sensitive. For serious reporting, sensitivity is part of the result.

Enable prior sensitivity

sensitivity:
  prior: true

When prior_specs are omitted, the package uses a scale-aware grid:

  • log_relative: regularising, default, weak
  • percentage_point: regularising, default, weak

Review prior_sensitivity.csv for pooled mean, probability positive, interval width, diagnostics, and deltas versus the default prior.

Enable uncertainty sensitivity

sensitivity:
  uncertainty: true

Rows with source-derived uncertainty keep that uncertainty. Rows with uncertainty_scenario use the configured low, medium, or high model-scale standard error.

Scenario assumptions live in:

uncertainty:
  scenario_standard_errors:
    log_relative:
      low: 0.03
      medium: 0.08
      high: 0.15
    percentage_point:
      low: 1.0
      medium: 3.0
      high: 5.0

Report when reasonable sensitivity settings change the substantive conclusion.

Subsections of Reference

CLI Reference

marketbayesmeta installs three console scripts.

Check a tracker

marketbayesmeta-check-tracker path/to/tracker.csv
marketbayesmeta-check-tracker path/to/tracker.csv --audit --include-partial

The audit mode reports model-readiness and tracker-quality checks for supported evidence/scale pairs present in the tracker.

Check a config

marketbayesmeta-check-config examples/config.yaml

This validates the YAML schema and confirms that the configured tracker path exists.

Run an analysis

marketbayesmeta-run examples/config.yaml

This runs the full config-driven workflow and writes configured outputs. It returns a non-zero exit code for package-level analysis failures, including blocked readiness, tracker validation errors, pooling errors, or blocking diagnostic failures.

Repository runner

python runme.py
python runme.py examples/config.yaml

Without an argument, runme.py looks for config.yaml in the repository root.

Configuration Reference

Use YAML config files for repeatable analysis runs.

Core fields

Field Required Default Notes
project.name No marketbayesmeta_analysis Human-readable project name.
project.analyst No null Analyst or owner.
project.notes No "" Short project note.
data.tracker Yes None CSV tracker path, resolved relative to the config file.
data.include_partial No false Include analysis_ready=partial rows.
analysis.evidence_type Yes None Currently pooled by default: geo_test, bls.
analysis.metric Yes None Exact metric name to pool.
analysis.scale Yes None log_relative or percentage_point.
analysis.max_abs_percent_uplift No 90 Hard limit for % geo uplift to log-relative conversion.
model.draws No 2000 Posterior draws per chain.
model.tune No 1000 Tuning draws per chain.
model.chains No 4 Number of chains.
model.target_accept No 0.9 Passed to PyMC sampling.
model.random_seed No null Sampling seed.
model.progressbar No false Keep false for scripted runs.
priors.mu_sd No scale-aware Prior SD for pooled mean mu.
priors.tau_scale No scale-aware HalfNormal scale for heterogeneity tau.
diagnostics.allow_directional No false Allow directional readiness runs only for exploratory work.
diagnostics.allow_not_recommended No false Override only for explicit exploratory work.
diagnostics.fail_on_diagnostic_failure No true Make failed sampler diagnostics blocking.
outputs.directory No output/analysis Output folder, resolved relative to config file.

Uncertainty fields

Rows with intervals must state ci_type. They must also state ci_level unless uncertainty.default_ci_level is explicitly set in config.

uncertainty:
  default_ci_level: null
  scenario_standard_errors:
    log_relative:
      low: 0.03
      medium: 0.08
      high: 0.15
    percentage_point:
      low: 1.0
      medium: 3.0
      high: 5.0

P-values are converted only when p_value_sidedness=two_sided.

Sensitivity fields

sensitivity:
  uncertainty: false
  prior: false

Custom prior sensitivity specs can be provided as sensitivity.prior_specs; when omitted the package uses scale-aware defaults.

Outputs Reference

Config runs write a small artefact set intended for review and downstream reporting.

File Purpose
config.resolved.yaml Config with defaults made explicit.
model_input.csv Effects and standard errors used for modelling.
effect_preparation.csv Row-level source values, uncertainty provenance, transformations, exclusions, and warnings.
readiness.csv Model-readiness status and messages.
tracker_issues.csv Tracker quality warnings/errors.
effect_summary.csv Pooled mean posterior summary.
future_true_effect_summary.csv Latent true effect for a comparable future study.
analysis_report.md Analyst-facing summary of reportability, diagnostics, readiness, and headline effects.
diagnostics.csv Sampler diagnostic checks.
prior_diagnostics.csv Approximate prior-information warning for mu plus tau prior-predictive checks.
ppc.csv In-sample study-level posterior predictive checks.
run_status.json Machine-readable run outcome, including reportable.
outputs_manifest.json Machine-readable artefact list.

Optional files:

File Written when
uncertainty_sensitivity.csv sensitivity.uncertainty: true
prior_sensitivity.csv sensitivity.prior: true

Prefer adding columns over renaming or removing columns. If a public artefact changes, update fixture tests and docs in the same change.

API Reference

The package exports the main workflow helpers from marketbayesmeta.

Tracker and readiness

  • load_tracker_csv
  • assess_model_readiness
  • tracker_quality_issues
  • readiness_frame
  • tracker_issue_frame

Model input and fitting

  • make_meta_analysis_input
  • make_effect_preparation_rows
  • fit_random_effects

Reporting and diagnostics

  • summarise_effect
  • summarise_diagnostics
  • check_diagnostics
  • make_posterior_predictive_check_rows
  • posterior_predictive_check_frame

Priors and sensitivity

  • default_priors_for_scale
  • assess_prior_influence
  • fit_prior_sensitivity
  • make_prior_sensitivity_report_frame
  • make_sensitivity_inputs
  • make_sensitivity_report_frame

Config runner

  • load_config
  • run_analysis
  • run_config

Minimal Python example

from marketbayesmeta import EffectScale, EvidenceType
from marketbayesmeta import fit_random_effects, load_tracker_csv, make_meta_analysis_input
from marketbayesmeta.reporting import summarise_effect

dataset = load_tracker_csv("examples/example_tracker.csv")
model_input = make_meta_analysis_input(
    dataset,
    evidence_type=EvidenceType.GEO_TEST,
    scale=EffectScale.LOG_RELATIVE,
    metric="Sales uplift",
    include_partial=True,
)

result = fit_random_effects(model_input, random_seed=20260521)
print(summarise_effect(result.idata, scale=model_input.scale))

Subsections of Explanation

Interpretation

marketbayesmeta produces review artefacts, not automatic decisions.

Reportability

A completed run is not automatically reportable. Check:

  • run_status.reportable
  • readiness.csv
  • diagnostics.csv
  • prior_diagnostics.csv
  • sensitivity outputs
  • analysis_report.md

Future true effect

future_true_effect_summary.csv describes the latent true effect for a comparable future study. It excludes measurement error and is not a prediction for the next observed study estimate.

Posterior predictive checks

ppc.csv is an in-sample study-level posterior predictive check. It is useful for spotting studies that look out of line with the fitted model, but it is not leave-one-out validation and should not be interpreted as out-of-sample accuracy.

Reporting language

Recommended reporting should distinguish:

  • the pooled posterior estimate for comparable evidence;
  • the latent true-effect distribution for a future comparable study;
  • uncertainty-scenario dependence;
  • prior dependence;
  • diagnostics and readiness caveats.

Release Status

The current internal release candidate is 0.3.0.

Suitable use

0.3.0 is suitable for supervised Data Science analyst workflows where outputs are reviewed before reporting. It is not intended for unattended production reporting.

Validation snapshot

Validated locally on June 3, 2026 using the repository virtual environment:

  • PATH=.venv/bin:$PATH make check: passed
  • PATH=.venv/bin:$PATH make check-statistical: passed
  • PATH=.venv/bin:$PATH python runme.py examples/config.yaml: completed

The synthetic example completed with diagnostics passed, readiness directional, and run_status.reportable false, as expected for the small-K example.

Known limitations

  • Real project trackers still require analyst judgement about estimand comparability.
  • Small-K random-effects analyses remain prior-sensitive.
  • Leave-one-out influence summaries are not part of the default workflow.
  • This private repository intentionally does not use CI/CD.

FAQ

Does the library fit Bayesian random-effects models with PyMC?

Yes. The default model is a Bayesian normal-normal random-effects meta-analysis in PyMC. Random effects are a sensible default for comparable marketing measurement studies because campaigns, markets, periods, execution, and measurement designs often differ.

When should studies not be pooled?

Do not pool studies just because they share a broad label such as “sales uplift” or “awareness”. Pool only when the estimand, scale, population, measurement window, and study design are comparable enough for a pooled effect to mean something.

MMM, PA2, ROI, CPA, CPO, and iROAS should usually be triangulated rather than pooled unless harmonised explicitly upstream.

How many studies are enough?

There is no universal threshold. With two studies, random-effects meta-analysis is usually not recommended. With three to five studies, the pooled estimate can be useful but should be treated as directional and prior-sensitive.

By default, config runs require at least three comparable rows for an exploratory directional run and at least four for ready.

Where do priors come from?

Priors should come from defensible external information and explicit scale calibration, not from tuning the model until the result looks convenient.

Defaults are scale-aware:

  • log_relative: mu_sd=1.0, tau_scale=0.5
  • percentage_point: mu_sd=10.0, tau_scale=5.0

For serious reporting, set and justify priors in YAML and run prior sensitivity.

Why not use LOO by default?

LOO can be useful with enough independent studies, but it is easy to over-read at very small K. In the main use case, readiness checks, sampler diagnostics, prior sensitivity, and posterior predictive checks are more useful first-line review tools.

Methodology

Statistical framing for the default Bayesian model.

Pages

Subsections of Methodology

Statistical Model

For study (i = 1,\ldots,k), the default model is a normal-normal Bayesian random-effects model. The likelihood is marginalised over the latent study effects for more stable small-(k) sampling:

[ y_i \sim \mathrm{Normal}\left(\mu, \sqrt{s_i2 + \tau2}\right), \qquad \mu \sim \mathrm{Normal}(0, \sigma_\mu), \qquad \tau \sim \mathrm{HalfNormal}(\sigma_\tau). ]

Here (y_i) is the observed model-scale effect and (s_i) is its known standard error. The pooled mean is (\mu), and (\tau) is between-study heterogeneity.

Posterior study-level (\theta_i) draws are exposed as derived quantities using the Normal conditional distribution implied by (y_i), (\mu), (\tau), and (s_i).

The model also samples future_true_effect, representing the latent true effect for a comparable future study. It excludes sampling error, so it is not a prediction for the next observed study estimate.

Small-sample stance

When study counts are small, posterior intervals, prior sensitivity, and uncertainty scenario sensitivity are more informative than a single headline pooled estimate.

prior_diagnostics.csv includes a fixed-effect precision approximation for the Normal prior on mu and prior-predictive checks for the tau prior. Treat warnings as prompts to inspect prior sensitivity, not as formal pass/fail tests.