aind_ephys_utils.metrics.connectivity module

CCG-based connectivity detection: pre-filtering and surrogate testing.

A higher-level workflow built on the cross-correlogram primitives in aind_ephys_utils.metrics.ccg:

  • build_shape_prefilter() — Bourgon+ 2010 independent (latency + FWHM) pre-filter that shrinks the multiple-testing burden before surrogates are run.

  • run_surrogates() — derangement surrogate driver returning the per-pair surrogate peak distribution.

  • run_two_stage_mc() — screen-then-refine Monte Carlo significance testing over the pre-filtered pair set.

The surrogate functions drive the numba CCG kernel via ccg_trial_surrogates, so they require the numba optional extra at run time (pip install aind-ephys-utils[numba]).

aind_ephys_utils.metrics.connectivity.build_shape_prefilter(*, n_units: int, cross_pairs: ndarray, peak_lag: ndarray, peak_fwhm: ndarray, fwhm_range_s: tuple[float, float] | None, latency_range_s: tuple[float, float] | None) tuple[ndarray, ndarray, ndarray, ndarray, ndarray]

Build the latency + FWHM independent pre-filter for pairs.

Bourgon+ 2010: pre-filtering on covariates that are approximately null-independent of the test statistic (peak bin location for latency; noise-driven FWHM for FWHM) is statistically sound and shrinks the multiple-testing burden. Prominence is not part of this — it correlates with peak height under the null and would inflate FDR if used as a pre-filter; apply it post-hoc instead.

Parameters:
  • n_unitsN — used to allocate the (N, N) mask arrays.

  • cross_pairs – Iterable of (i, j) pair tuples with i < j.

  • peak_lag(N, N) peak-lag-in-seconds array.

  • peak_fwhm(N, N) peak-FWHM-in-seconds array, inf for un-measured.

  • fwhm_range_s(low, high) FWHM bounds in seconds, or None to disable.

  • latency_range_s(low, high) |lag| bounds in seconds, or None to disable.

Returns:

  • upper_mask, lat_ok, fwhm_ok, prefilter_mask ((N, N) bool arrays) – upper_mask marks the upper triangle of supplied pairs; lat_ok / fwhm_ok are per-pair shape masks (all-True when the corresponding range is None); prefilter_mask is their conjunction.

  • prefilter_pairs ((K, 2) int64 array) – The subset of cross_pairs that survived the pre-filter.

aind_ephys_utils.metrics.connectivity.run_surrogates(trial_segs: TrialSegments, n_trials: int, n_units: int, n_surr: int, pairs: ndarray, rng: Generator, *, bin_size: float, max_lag: float, normalize: str, label: str = '') ndarray

Run derangement surrogates, return a (n_surr, n_pairs) array.

Each surrogate re-pairs trials with a derangement (no fixed points), recomputes the CCG for every pair in pairs, and reduces it to the per-pair peak (max over lag bins). Output rows are in pairs order and are float32 (surrogate corrcoef peaks live in roughly [-1, 1], so float32 precision is ample and halves memory).

Parameters:
  • trial_segs – Trial-clipped spike segments (see clip_spikes_to_trials).

  • n_trials – Number of trials (the derangement length).

  • n_units – Number of units; kept for signature symmetry — the output shape is per-pair, so callers project to (N, N) themselves.

  • n_surr – Number of surrogate draws.

  • pairs(n_pairs, 2) unit-index pairs to evaluate.

  • rng – NumPy random generator.

  • bin_size – CCG kernel parameters.

  • max_lag – CCG kernel parameters.

  • normalize – CCG kernel parameters.

  • label – Prefix for progress log messages.

Returns:

(n_surr, n_pairs) float32 surrogate peak values.

Return type:

np.ndarray

aind_ephys_utils.metrics.connectivity.run_two_stage_mc(*, trial_segs: TrialSegments, n_trials: int, n_units: int, raw_peak_val: ndarray, prefilter_pairs: ndarray, prefilter_mask: ndarray, n_screen: int, n_total: int, bin_size: float, max_lag: float, normalize: str, rng: Generator, fdr_alpha: float) tuple[ndarray, ndarray, int, int, ndarray]

Two-stage Monte Carlo significance testing on pre-filtered pairs.

Stage 1 runs n_screen derangement surrogates over every pre-filtered pair, building coarse p-values. Pairs at the 1/(n_screen + 1) floor become candidates for stage 2, which runs max(n_total - n_screen, 0) additional surrogates so candidate p-values are computed against n_total exchangeable surrogates in total (stage-1 and stage-2 draws come from the same null and are concatenated; nothing is wasted). The achievable minimum p-value for a candidate is 1/(n_total + 1).

Pairs outside the pre-filter never have MC computed and get p = 1, so a Benjamini-Hochberg pass downstream cannot declare them significant.

Parameters:
  • trial_segs – Trial-clipped spike segments.

  • n_trials – Trial and unit counts.

  • n_units – Trial and unit counts.

  • raw_peak_val(N, N) observed peak heights (the test statistic).

  • prefilter_pairs – Output of build_shape_prefilter() — the pair list to test.

  • prefilter_mask(N, N) bool mask matching prefilter_pairs (accepted for API symmetry; the function operates on prefilter_pairs).

  • n_screen – Stage-1 surrogate count.

  • n_total – Target total surrogate count for refined candidates (stage 1 + stage 2). Stage 2 runs max(n_total - n_screen, 0) extra surrogates; if n_total <= n_screen the refinement is a no-op.

  • bin_size – CCG kernel parameters.

  • max_lag – CCG kernel parameters.

  • normalize – CCG kernel parameters.

  • rng – CCG kernel parameters.

  • fdr_alpha – Used for the surrogate threshold percentile returned for plotting.

Returns:

  • p_values ((N, N) array) – Final p-values; held at 1.0 outside the pre-filter.

  • surr_threshold ((N, N) array) – Per-pair (1 - fdr_alpha) percentile of surrogate peaks.

  • n_candidates (int) – Number of pairs that hit the stage-1 floor (= stage-2 size).

  • n_surr_total (int) – n_screen if there were no candidates / no extra surrogates, else n_total.

  • candidate_pairs ((n_candidates, 2) int64 array) – The pairs sent through stage 2 (empty when none).