aind_ephys_utils.metrics.connectivity module¶

CCG-based connectivity detection: pre-filtering and surrogate testing.

A higher-level workflow built on the cross-correlogram primitives in aind_ephys_utils.metrics.ccg:

build_shape_prefilter() — Bourgon+ 2010 independent (latency + FWHM) pre-filter that shrinks the multiple-testing burden before surrogates are run.
run_surrogates() — derangement surrogate driver returning the per-pair surrogate peak distribution.
run_two_stage_mc() — screen-then-refine Monte Carlo significance testing over the pre-filtered pair set.

The surrogate functions drive the numba CCG kernel via ccg_trial_surrogates, so they require the numba optional extra at run time (pip install aind-ephys-utils[numba]).

aind_ephys_utils.metrics.connectivity.build_shape_prefilter(*, n_units: int, cross_pairs: ndarray, peak_lag: ndarray, peak_fwhm: ndarray, fwhm_range_s: tuple[float, float] | None, latency_range_s: tuple[float, float] | None) → tuple[ndarray, ndarray, ndarray, ndarray, ndarray]¶

Build the latency + FWHM independent pre-filter for pairs.

Bourgon+ 2010: pre-filtering on covariates that are approximately null-independent of the test statistic (peak bin location for latency; noise-driven FWHM for FWHM) is statistically sound and shrinks the multiple-testing burden. Prominence is not part of this — it correlates with peak height under the null and would inflate FDR if used as a pre-filter; apply it post-hoc instead.

Parameters:

n_units – N — used to allocate the (N, N) mask arrays.
cross_pairs – Iterable of (i, j) pair tuples with i < j.
peak_lag – (N, N) peak-lag-in-seconds array.
peak_fwhm – (N, N) peak-FWHM-in-seconds array, inf for un-measured.
fwhm_range_s – (low, high) FWHM bounds in seconds, or None to disable.
latency_range_s – (low, high) |lag| bounds in seconds, or None to disable.

Returns:

upper_mask, lat_ok, fwhm_ok, prefilter_mask ((N, N) bool arrays) – upper_mask marks the upper triangle of supplied pairs; lat_ok / fwhm_ok are per-pair shape masks (all-True when the corresponding range is None); prefilter_mask is their conjunction.
prefilter_pairs ((K, 2) int64 array) – The subset of cross_pairs that survived the pre-filter.

aind_ephys_utils.metrics.connectivity.run_surrogates(trial_segs: TrialSegments, n_trials: int, n_units: int, n_surr: int, pairs: ndarray, rng: Generator, *, bin_size: float, max_lag: float, normalize: str, label: str = '') → ndarray¶

Run derangement surrogates, return a (n_surr, n_pairs) array.

Each surrogate re-pairs trials with a derangement (no fixed points), recomputes the CCG for every pair in pairs, and reduces it to the per-pair peak (max over lag bins). Output rows are in pairs order and are float32 (surrogate corrcoef peaks live in roughly [-1, 1], so float32 precision is ample and halves memory).

Parameters:

trial_segs – Trial-clipped spike segments (see clip_spikes_to_trials).
n_trials – Number of trials (the derangement length).
n_units – Number of units; kept for signature symmetry — the output shape is per-pair, so callers project to (N, N) themselves.
n_surr – Number of surrogate draws.
pairs – (n_pairs, 2) unit-index pairs to evaluate.
rng – NumPy random generator.
bin_size – CCG kernel parameters.
max_lag – CCG kernel parameters.
normalize – CCG kernel parameters.
label – Prefix for progress log messages.

Returns:

(n_surr, n_pairs) float32 surrogate peak values.

Return type:

np.ndarray

aind_ephys_utils.metrics.connectivity.run_two_stage_mc(*, trial_segs: TrialSegments, n_trials: int, n_units: int, raw_peak_val: ndarray, prefilter_pairs: ndarray, prefilter_mask: ndarray, n_screen: int, n_total: int, bin_size: float, max_lag: float, normalize: str, rng: Generator, fdr_alpha: float) → tuple[ndarray, ndarray, int, int, ndarray]¶

Two-stage Monte Carlo significance testing on pre-filtered pairs.

Stage 1 runs n_screen derangement surrogates over every pre-filtered pair, building coarse p-values. Pairs at the 1/(n_screen + 1) floor become candidates for stage 2, which runs max(n_total - n_screen, 0) additional surrogates so candidate p-values are computed against n_total exchangeable surrogates in total (stage-1 and stage-2 draws come from the same null and are concatenated; nothing is wasted). The achievable minimum p-value for a candidate is 1/(n_total + 1).

Pairs outside the pre-filter never have MC computed and get p = 1, so a Benjamini-Hochberg pass downstream cannot declare them significant.

Parameters:

trial_segs – Trial-clipped spike segments.
n_trials – Trial and unit counts.
n_units – Trial and unit counts.
raw_peak_val – (N, N) observed peak heights (the test statistic).
prefilter_pairs – Output of build_shape_prefilter() — the pair list to test.
prefilter_mask – (N, N) bool mask matching prefilter_pairs (accepted for API symmetry; the function operates on prefilter_pairs).
n_screen – Stage-1 surrogate count.
n_total – Target total surrogate count for refined candidates (stage 1 + stage 2). Stage 2 runs max(n_total - n_screen, 0) extra surrogates; if n_total <= n_screen the refinement is a no-op.
bin_size – CCG kernel parameters.
max_lag – CCG kernel parameters.
normalize – CCG kernel parameters.
rng – CCG kernel parameters.
fdr_alpha – Used for the surrogate threshold percentile returned for plotting.

Returns:

p_values ((N, N) array) – Final p-values; held at 1.0 outside the pre-filter.
surr_threshold ((N, N) array) – Per-pair (1 - fdr_alpha) percentile of surrogate peaks.
n_candidates (int) – Number of pairs that hit the stage-1 floor (= stage-2 size).
n_surr_total (int) – n_screen if there were no candidates / no extra surrogates, else n_total.
candidate_pairs ((n_candidates, 2) int64 array) – The pairs sent through stage 2 (empty when none).