aind_ephys_utils.metrics package¶

Submodules¶

Module contents¶

Higher-level metrics for exploratory analysis.

The names re-exported here are the main entry points; the full surface (measurement helpers, buffers, etc.) is reachable via the submodules metrics.ccg and metrics.connectivity.

aind_ephys_utils.metrics.NN_to_pair_vec(arr_NN: ndarray, pairs: ndarray) → ndarray¶

Gather a per-pair vector from an (n_units, n_units) array.

Inverse of pair_vec_to_NN() (modulo fill cells outside the pair set), for projecting an existing (N, N) array onto the per-pair representation.

Parameters:

arr_NN – (n_units, n_units) array.
pairs – (n_pairs, 2) array of (i, j) unit-index pairs.

Returns:

Per-pair vector arr_NN[i, j] in pairs row order.

Return type:

np.ndarray

aind_ephys_utils.metrics.build_shape_prefilter(*, n_units: int, cross_pairs: ndarray, peak_lag: ndarray, peak_fwhm: ndarray, fwhm_range_s: tuple[float, float] | None, latency_range_s: tuple[float, float] | None) → tuple[ndarray, ndarray, ndarray, ndarray, ndarray]¶

Build the latency + FWHM independent pre-filter for pairs.

Bourgon+ 2010: pre-filtering on covariates that are approximately null-independent of the test statistic (peak bin location for latency; noise-driven FWHM for FWHM) is statistically sound and shrinks the multiple-testing burden. Prominence is not part of this — it correlates with peak height under the null and would inflate FDR if used as a pre-filter; apply it post-hoc instead.

Parameters:

n_units – N — used to allocate the (N, N) mask arrays.
cross_pairs – Iterable of (i, j) pair tuples with i < j.
peak_lag – (N, N) peak-lag-in-seconds array.
peak_fwhm – (N, N) peak-FWHM-in-seconds array, inf for un-measured.
fwhm_range_s – (low, high) FWHM bounds in seconds, or None to disable.
latency_range_s – (low, high) |lag| bounds in seconds, or None to disable.

Returns:

upper_mask, lat_ok, fwhm_ok, prefilter_mask ((N, N) bool arrays) – upper_mask marks the upper triangle of supplied pairs; lat_ok / fwhm_ok are per-pair shape masks (all-True when the corresponding range is None); prefilter_mask is their conjunction.
prefilter_pairs ((K, 2) int64 array) – The subset of cross_pairs that survived the pre-filter.

aind_ephys_utils.metrics.ccg(spikes: DataArray, *, source_units: Iterable[Any] | None = None, target_units: Iterable[Any] | None = None, pairs: Iterable[tuple[Any, Any]] | ndarray | None = None, pairing: ndarray | Iterable[int] | None = None, window: tuple[float, float] | None = None, bin_size: float = 0.001, max_lag: float = 0.1, normalize: str = 'none', exclude_zero_lag_autocorr: bool = True, include_autocorr: bool = True) → tuple[ndarray, DataArray]¶

Compute CCGs directly from a canonical ragged spikes DataArray.

Single-trial ragged spikes dispatch to the session-wide CCG helpers. Multi-trial ragged spikes are packed internally and dispatched to the trial-paired CCG helper. Unit selectors are resolved by coordinate label when possible and by integer position otherwise.

aind_ephys_utils.metrics.ccg_allpairs_sparse(spike_times_by_unit: list[ndarray], bin_size: float = 0.001, max_lag: float = 0.1, normalize: str = 'none', exclude_zero_lag_autocorr: bool = True, include_autocorr: bool = True, observation_window: float | tuple[float, float] | None = None) → tuple[ndarray, ndarray]¶

Memory-lean all-pairs CCGs using event-driven sweep.

Parameters:

spike_times_by_unit – List of sorted spike time arrays, one per unit.
bin_size – Histogram bin width in seconds.
max_lag – Maximum time lag (symmetric around 0).
normalize – "none", "counts", "rate", "conditional", "unbiased", or "corrcoef" (edge-corrected correlation coefficient).
exclude_zero_lag_autocorr – Zero out the central bin for autocorrelograms.
include_autocorr – Compute autocorrelograms on the diagonal.
observation_window – Total observation time.

Returns:

lags ((B,) float64)
C ((N, N, B) float32 (float64 when normalize="corrcoef")) – C[i,j](τ) = C[j,i](-τ)

aind_ephys_utils.metrics.ccg_between_sets_sparse(spike_times_set1: list[ndarray], spike_times_set2: list[ndarray], bin_size: float = 0.001, max_lag: float = 0.1, normalize: str = 'none', observation_window: float | tuple[float, float] | None = None) → tuple[ndarray, ndarray]¶

Memory-lean CCGs between two sets of spike trains.

Parameters:

spike_times_set1 – M spike trains (reference/source units).
spike_times_set2 – N spike trains (target units).
bin_size – Histogram bin width in seconds.
max_lag – Maximum time lag (symmetric around 0).
normalize – "none", "counts", "rate", "conditional", "unbiased", or "corrcoef" (edge-corrected correlation coefficient matching SpikeAnalysis.jl).
observation_window – Total observation time. Float for duration, tuple (start, end) for explicit bounds, or None to infer from data (not recommended).

Returns:

lags ((B,) float64)
C ((M, N, B) float32 (float64 when normalize="corrcoef"))

aind_ephys_utils.metrics.ccg_trial_paired(trial_segments: TrialSegments, pairing: ndarray, *, bin_size: float = 0.001, max_lag: float = 0.1, normalize: str = 'corrcoef', exclude_zero_lag_autocorr: bool = True, include_autocorr: bool = True, pairs: ndarray | None = None, buffers: _CCGBuffers | None = None) → tuple[ndarray, ndarray]¶

Compute CCGs with a given trial pairing.

The pairs x trials loop runs entirely inside numba with prange over pairs. Use clip_spikes_to_trials() once to build the TrialSegments, then call this function repeatedly with different pairing arrays.

Parameters:

trial_segments – Output of clip_spikes_to_trials().
pairing – (n_trials,) integer array. pairing[k] is the trial index for the target unit when the reference unit uses trial k. Identity (np.arange(n_trials)) for raw CCG.
bin_size – Histogram bin width (seconds).
max_lag – Maximum lag (seconds).
normalize – "none", "counts", "rate", "conditional", "unbiased", or "corrcoef".
exclude_zero_lag_autocorr – Zero the central bin for autocorrelograms.
include_autocorr – Compute autocorrelograms on the diagonal.
pairs – Optional (n_pairs, 2) int64 array of (i, j) unit index pairs to compute. If None, computes all upper-triangle pairs. Use this to restrict to e.g. cross-region pairs.
buffers – Pre-allocated buffers from _CCGBuffers.for_segments(). If None, buffers are allocated internally (slower for repeated calls). Must have been created with the same pairs.

Returns:

lags ((B,) float64)
C (float64) – Shape (N, N, B) when pairs is None (with auto/cross symmetric mirror). Shape (n_pairs, B) when pairs is provided, with row p corresponding to the input pair pairs[p]. In the per-pair case the lower-triangle mirror is not applied; reverse C[p, :] to obtain the (j, i) CCG. This avoids the (N, N, B) materialisation entirely when callers only need a subset of pairs (e.g. cross-region pairs for surrogate testing).

Notes

corrcoef normalization (normalize="corrcoef"):

Produces a unitless correlation coefficient that is zero when the two spike trains are independent and conditionally uniform within each trial. Derived from first principles (see SpikeAnalysis.jl xcorr_discrete_normed for the full derivation).

For each pair (i, j) the normalized value at lag bin b is:

C[b] = (h[b] - E[b]) / sqrt(A_i * A_j)

h[b] is the raw coincidence count accumulated across all trials by the numba kernel (_ccg_all_pairs_trials).

E[b] is the expected count under within-trial uniformity with edge correction:

E[b] = ec_shape[b] * sum_k( n_i(k) * n_j(pairing[k]) )

where ec_shape[b] = scale[b] * w[b] * (d - binstop[b] + w[b]/2) / d^2 and the sum is over trials k. The per-trial product n_i(k) * n_j(pairing[k]) respects trial-by-trial rate variation: trials where both neurons fire more contribute proportionally more expected coincidences. When trial durations are uniform (after clip_to_window), ec_shape factors out and the per-trial products are computed as a single matrix multiply (trial_counts @ paired_counts.T).

The centre bin receives coincidences from both positive and negative lags, so scale = 2 and w = bin_size / 2 for that bin; side bins have scale = 1 and w = bin_size.

A_i, A_j are the edge-corrected auto-correlation terms, each summed across trials:

A_i = sum_k [ count_auto_first(u_i_k, bin_size)
              - 2 * E_auto(n_i_k, d_k) ]

count_auto_first counts spike pairs within bin_size of each other (including self-pairs). The factor of 2 on the expected auto count (rather than 1) matches the cross-correlation context: the denominator must normalize the two-sided cross-histogram at lag 0.

aind_ephys_utils.metrics.ccg_trial_surrogates(trial_segments: TrialSegments, pairings: Iterable[ndarray], *, bin_size: float = 0.001, max_lag: float = 0.1, normalize: str = 'corrcoef', exclude_zero_lag_autocorr: bool = True, include_autocorr: bool = True, pairs: ndarray | None = None, reduce: Callable[[ndarray, ndarray], ndarray] | None = None) → Iterator[ndarray]¶

Compute trial-paired CCGs for many pairings with double-buffered pipeline.

Overlaps the numba kernel (worker thread) with Python scatter/normalization (main thread) for ~30% throughput improvement over sequential ccg_trial_paired() calls.

Parameters:

trial_segments – Output of clip_spikes_to_trials().
pairings – Iterable of (n_trials,) integer pairing arrays.
bin_size – Histogram bin width (seconds).
max_lag – Maximum lag (seconds).
normalize – Normalization mode.
exclude_zero_lag_autocorr – Zero the central bin for autocorrelograms.
include_autocorr – Compute autocorrelograms on the diagonal.
reduce – Optional reduction (lags, C) -> result. Applied during scatter so the full CCG is never returned.

Yields:

np.ndarray – The reduced result if reduce is provided, otherwise the per-pair CCG.

When pairs is None: the un-reduced CCG is shape (N, N, B) with the cross/auto symmetric structure, and reduce is called with that shape.

When pairs is provided: the un-reduced CCG is shape (n_pairs, B), with row p corresponding to the input pair pairs[p]. reduce is called with this per-pair layout — much cheaper for np.max(C, axis=-1)-style reductions when n_pairs ≪ N². The lower triangle is not mirrored; reverse C[p, :] to obtain the (j, i) CCG from a (i, j) row.

aind_ephys_utils.metrics.clip_spikes_to_trials(spike_times_by_unit: list[ndarray], trial_epochs: ndarray, align_times: ndarray | None = None) → TrialSegments¶

Clip each unit’s spikes into per-trial segments, relative to an alignment time.

Parameters:

spike_times_by_unit – Sorted spike time arrays, one per unit.
trial_epochs – (n_trials, 2) array of [start, stop] times.
align_times – (n_trials,) alignment times (t=0 in the output). Defaults to trial starts (left edge).

aind_ephys_utils.metrics.measure_fwhm(ccg_1d: ndarray, lags_1d: ndarray, *, search_half_width: float | None = None, half_height_mode: str = 'absolute') → float¶

Estimate FWHM of the tallest peak in a 1-D CCG slice.

By default uses absolute half-max (the textbook FWHM definition: width at 0.5 * peak_height), which is stable for broad peaks and appropriate for shift-corrected CCGs whose baseline is already zero-mean. Pass half_height_mode= "prominence" for the legacy scipy behaviour (width at peak_height − 0.5 * prominence).

No smoothing happens here — pass an already-smoothed slice if you want denoising. Pre-smoothing once at the array level (e.g. scipy.ndimage.gaussian_filter1d() along the lag axis of an (N, N, K) shift-corrected CCG) is cheaper than per-pair smoothing and avoids edge effects from smoothing-after-windowing.

Parameters:

ccg_1d – 1-D correlogram values (e.g. one row of the corrcoef matrix).
lags_1d – Corresponding lag values (seconds). Must be uniformly spaced.
search_half_width – If given, restrict measurement to lags within [-search_half_width, search_half_width] (same units as lags_1d). Prevents distant baseline wiggles from bounding peak_widths’ outward walk.
half_height_mode – "absolute" (default) or "prominence". See above.

Returns:

Full width at half maximum in the same units as lags_1d. Returns np.inf if no positive peak is found.

Return type:

float

aind_ephys_utils.metrics.measure_prominence(ccg_1d: ndarray, lags_1d: ndarray, *, search_half_width: float | None = None) → float¶

Measure the prominence of the tallest peak in a 1-D CCG slice.

Prominence quantifies how much a peak stands out from its surrounding baseline. Uses scipy.signal.peak_prominences().

No smoothing happens here — pass an already-smoothed slice if you want denoising. See measure_fwhm() for guidance.

Parameters:

ccg_1d – 1-D correlogram values (e.g. one row of the corrcoef matrix).
lags_1d – Corresponding lag values (seconds).
search_half_width – If given, restrict measurement to lags within [-search_half_width, search_half_width].

Returns:

Prominence of the tallest peak (same units as ccg_1d). Returns 0.0 if no positive peak is found.

Return type:

float

aind_ephys_utils.metrics.monte_carlo_pvalue(t_obs: ndarray, t_surr: ndarray, tail: str = 'upper') → ndarray¶

Monte Carlo p-value with correct finite-sample correction.

Implements p = (1 + sum(I(t_b >= t_obs))) / (B + 1) (Davison & Hinkley, 1997) which accounts for the observed statistic being one of the B + 1 values under the null.

Parameters:

t_obs – Observed test statistic, any shape.
t_surr – Surrogate statistics, shape (B, *t_obs.shape) where B is the number of surrogates.
tail – "upper" — p = P(T >= t_obs) (right tail, e.g., peak height). "lower" — p = P(T <= t_obs) (left tail). "both" — p = P(|T| >= |t_obs|) (two-tailed).

Returns:

p-values, same shape as t_obs.

Return type:

np.ndarray

aind_ephys_utils.metrics.pair_vec_to_NN(pair_values: ndarray, pairs: ndarray, n_units: int, *, fill: float = 0.0, mirror: bool = True, dtype: object = None) → ndarray¶

Scatter a per-pair vector into an (n_units, n_units) array.

The sparse per-pair representation used by the trial-paired / surrogate CCG path (and by ccg_between_sets_sparse) is projected back into the dense (N, N) matrix that arr[i, j] lookups expect. Cells outside pairs (and their mirror, when mirror=True) hold fill.

Parameters:

pair_values – Per-pair values in the same row order as pairs.
pairs – (n_pairs, 2) array of (i, j) unit-index pairs.
n_units – N — size of the square output.
fill – Value for cells not covered by pairs.
mirror – If True, also write each value at the transposed cell (j, i) (symmetric fill).
dtype – Output dtype; defaults to pair_values.dtype.

Returns:

(n_units, n_units) array.

Return type:

np.ndarray

aind_ephys_utils.metrics.rescale_ccgs(C: ndarray, axis: int = -1, eps: float = 1e-12) → ndarray¶: Rescale correlograms so each (i,j) slice has min=0 and max=1.

aind_ephys_utils.metrics.run_surrogates(trial_segs: TrialSegments, n_trials: int, n_units: int, n_surr: int, pairs: ndarray, rng: Generator, *, bin_size: float, max_lag: float, normalize: str, label: str = '') → ndarray¶

Run derangement surrogates, return a (n_surr, n_pairs) array.

Each surrogate re-pairs trials with a derangement (no fixed points), recomputes the CCG for every pair in pairs, and reduces it to the per-pair peak (max over lag bins). Output rows are in pairs order and are float32 (surrogate corrcoef peaks live in roughly [-1, 1], so float32 precision is ample and halves memory).

Parameters:

trial_segs – Trial-clipped spike segments (see clip_spikes_to_trials).
n_trials – Number of trials (the derangement length).
n_units – Number of units; kept for signature symmetry — the output shape is per-pair, so callers project to (N, N) themselves.
n_surr – Number of surrogate draws.
pairs – (n_pairs, 2) unit-index pairs to evaluate.
rng – NumPy random generator.
bin_size – CCG kernel parameters.
max_lag – CCG kernel parameters.
normalize – CCG kernel parameters.
label – Prefix for progress log messages.

Returns:

(n_surr, n_pairs) float32 surrogate peak values.

Return type:

np.ndarray

aind_ephys_utils.metrics.run_two_stage_mc(*, trial_segs: TrialSegments, n_trials: int, n_units: int, raw_peak_val: ndarray, prefilter_pairs: ndarray, prefilter_mask: ndarray, n_screen: int, n_total: int, bin_size: float, max_lag: float, normalize: str, rng: Generator, fdr_alpha: float) → tuple[ndarray, ndarray, int, int, ndarray]¶

Two-stage Monte Carlo significance testing on pre-filtered pairs.

Stage 1 runs n_screen derangement surrogates over every pre-filtered pair, building coarse p-values. Pairs at the 1/(n_screen + 1) floor become candidates for stage 2, which runs max(n_total - n_screen, 0) additional surrogates so candidate p-values are computed against n_total exchangeable surrogates in total (stage-1 and stage-2 draws come from the same null and are concatenated; nothing is wasted). The achievable minimum p-value for a candidate is 1/(n_total + 1).

Pairs outside the pre-filter never have MC computed and get p = 1, so a Benjamini-Hochberg pass downstream cannot declare them significant.

Parameters:

trial_segs – Trial-clipped spike segments.
n_trials – Trial and unit counts.
n_units – Trial and unit counts.
raw_peak_val – (N, N) observed peak heights (the test statistic).
prefilter_pairs – Output of build_shape_prefilter() — the pair list to test.
prefilter_mask – (N, N) bool mask matching prefilter_pairs (accepted for API symmetry; the function operates on prefilter_pairs).
n_screen – Stage-1 surrogate count.
n_total – Target total surrogate count for refined candidates (stage 1 + stage 2). Stage 2 runs max(n_total - n_screen, 0) extra surrogates; if n_total <= n_screen the refinement is a no-op.
bin_size – CCG kernel parameters.
max_lag – CCG kernel parameters.
normalize – CCG kernel parameters.
rng – CCG kernel parameters.
fdr_alpha – Used for the surrogate threshold percentile returned for plotting.

Returns:

p_values ((N, N) array) – Final p-values; held at 1.0 outside the pre-filter.
surr_threshold ((N, N) array) – Per-pair (1 - fdr_alpha) percentile of surrogate peaks.
n_candidates (int) – Number of pairs that hit the stage-1 floor (= stage-2 size).
n_surr_total (int) – n_screen if there were no candidates / no extra surrogates, else n_total.
candidate_pairs ((n_candidates, 2) int64 array) – The pairs sent through stage 2 (empty when none).

aind_ephys_utils.metrics.smooth_ccgs(C: ndarray, bin_size: float, kernel_width: float = 0.005) → ndarray¶

Smooth correlograms with a Gaussian kernel.

Parameters:

C ((N, N, B)) – Raw correlogram counts.
bin_size – Bin width in seconds.
kernel_width – Gaussian kernel std in seconds.

aind_ephys_utils.metrics.spike_latency(times, events, interval, use_psth=True, std_above_baseline=2, bin_size=0.001)¶

Computes latency of spikes to a set of events.

If use_psth is set to True, the latency will be computed using the average PSTH, rather than spikes on individual trials.

If use_psth is set to False, the latency will be computed as the median time to first spike on individual trials.

Parameters:

times (ndarray) – 1-D sequence of times to align (in seconds). Must be sorted in ascending order.
events (ndarray) – 1-D sequence of reference times (in seconds).
interval (tuple) – First value = baseline interval (ignored if use_psth = False) Second value = maximum latency
use_psth (bool, optional) – Set to True to use PSTH method, False otherwise
std_above_baseline (float, optional) – Determines threshold for response onset Latency = first value above Mean + Std * T
bin_size (float, optional) – Determines bin size (in seconds) for PSTH method

Returns:

if use_psth = True
latency (float) – First spike latency in s
psth (float) – The PSTH used to compute latency
if use_psth = False
first_spike_latency (float) – First spike latency in s
individual_latencies (ndarray) – Latency values for all trials with spikes