aind_ephys_utils.metrics package¶
Submodules¶
- aind_ephys_utils.metrics.ccg module
- Lag convention
NN_to_pair_vec()TrialSegmentsTrialSegments.segmentsTrialSegments.durationsTrialSegments.preTrialSegments.postTrialSegments.all_spikesTrialSegments.offsetsTrialSegments.countsTrialSegments.all_spikesTrialSegments.countsTrialSegments.durationsTrialSegments.offsetsTrialSegments.postTrialSegments.preTrialSegments.segments
cc_leads_over_std()ccg()ccg_allpairs_sparse()ccg_between_sets_sparse()ccg_trial_paired()ccg_trial_surrogates()clip_spikes_to_trials()clip_to_window()derangements()jitter_spikes_times()measure_fwhm()measure_per_pair()measure_prominence()monte_carlo_pvalue()pair_vec_to_NN()peak_contributing_spikes()rescale_ccgs()rescale_ccgs_zero_mean()smooth_ccgs()
- aind_ephys_utils.metrics.connectivity module
- aind_ephys_utils.metrics.latency module
Module contents¶
Higher-level metrics for exploratory analysis.
The names re-exported here are the main entry points; the full surface
(measurement helpers, buffers, etc.) is reachable via the submodules
metrics.ccg and metrics.connectivity.
- aind_ephys_utils.metrics.NN_to_pair_vec(arr_NN: ndarray, pairs: ndarray) ndarray¶
Gather a per-pair vector from an
(n_units, n_units)array.Inverse of
pair_vec_to_NN()(modulofillcells outside the pair set), for projecting an existing(N, N)array onto the per-pair representation.- Parameters:
arr_NN –
(n_units, n_units)array.pairs –
(n_pairs, 2)array of(i, j)unit-index pairs.
- Returns:
Per-pair vector
arr_NN[i, j]inpairsrow order.- Return type:
np.ndarray
- aind_ephys_utils.metrics.build_shape_prefilter(*, n_units: int, cross_pairs: ndarray, peak_lag: ndarray, peak_fwhm: ndarray, fwhm_range_s: tuple[float, float] | None, latency_range_s: tuple[float, float] | None) tuple[ndarray, ndarray, ndarray, ndarray, ndarray]¶
Build the latency + FWHM independent pre-filter for pairs.
Bourgon+ 2010: pre-filtering on covariates that are approximately null-independent of the test statistic (peak bin location for latency; noise-driven FWHM for FWHM) is statistically sound and shrinks the multiple-testing burden. Prominence is not part of this — it correlates with peak height under the null and would inflate FDR if used as a pre-filter; apply it post-hoc instead.
- Parameters:
n_units –
N— used to allocate the(N, N)mask arrays.cross_pairs – Iterable of
(i, j)pair tuples withi < j.peak_lag –
(N, N)peak-lag-in-seconds array.peak_fwhm –
(N, N)peak-FWHM-in-seconds array,inffor un-measured.fwhm_range_s –
(low, high)FWHM bounds in seconds, orNoneto disable.latency_range_s –
(low, high)|lag|bounds in seconds, orNoneto disable.
- Returns:
upper_mask, lat_ok, fwhm_ok, prefilter_mask ((N, N) bool arrays) –
upper_maskmarks the upper triangle of supplied pairs;lat_ok/fwhm_okare per-pair shape masks (all-Truewhen the corresponding range isNone);prefilter_maskis their conjunction.prefilter_pairs ((K, 2) int64 array) – The subset of
cross_pairsthat survived the pre-filter.
- aind_ephys_utils.metrics.ccg(spikes: DataArray, *, source_units: Iterable[Any] | None = None, target_units: Iterable[Any] | None = None, pairs: Iterable[tuple[Any, Any]] | ndarray | None = None, pairing: ndarray | Iterable[int] | None = None, window: tuple[float, float] | None = None, bin_size: float = 0.001, max_lag: float = 0.1, normalize: str = 'none', exclude_zero_lag_autocorr: bool = True, include_autocorr: bool = True) tuple[ndarray, DataArray]¶
Compute CCGs directly from a canonical ragged spikes DataArray.
Single-trial ragged spikes dispatch to the session-wide CCG helpers. Multi-trial ragged spikes are packed internally and dispatched to the trial-paired CCG helper. Unit selectors are resolved by coordinate label when possible and by integer position otherwise.
- aind_ephys_utils.metrics.ccg_allpairs_sparse(spike_times_by_unit: list[ndarray], bin_size: float = 0.001, max_lag: float = 0.1, normalize: str = 'none', exclude_zero_lag_autocorr: bool = True, include_autocorr: bool = True, observation_window: float | tuple[float, float] | None = None) tuple[ndarray, ndarray]¶
Memory-lean all-pairs CCGs using event-driven sweep.
- Parameters:
spike_times_by_unit – List of sorted spike time arrays, one per unit.
bin_size – Histogram bin width in seconds.
max_lag – Maximum time lag (symmetric around 0).
normalize –
"none","counts","rate","conditional","unbiased", or"corrcoef"(edge-corrected correlation coefficient).exclude_zero_lag_autocorr – Zero out the central bin for autocorrelograms.
include_autocorr – Compute autocorrelograms on the diagonal.
observation_window – Total observation time.
- Returns:
lags ((B,) float64)
C ((N, N, B) float32 (float64 when
normalize="corrcoef")) –C[i,j](τ) = C[j,i](-τ)
- aind_ephys_utils.metrics.ccg_between_sets_sparse(spike_times_set1: list[ndarray], spike_times_set2: list[ndarray], bin_size: float = 0.001, max_lag: float = 0.1, normalize: str = 'none', observation_window: float | tuple[float, float] | None = None) tuple[ndarray, ndarray]¶
Memory-lean CCGs between two sets of spike trains.
- Parameters:
spike_times_set1 – M spike trains (reference/source units).
spike_times_set2 – N spike trains (target units).
bin_size – Histogram bin width in seconds.
max_lag – Maximum time lag (symmetric around 0).
normalize –
"none","counts","rate","conditional","unbiased", or"corrcoef"(edge-corrected correlation coefficient matchingSpikeAnalysis.jl).observation_window – Total observation time. Float for duration, tuple
(start, end)for explicit bounds, orNoneto infer from data (not recommended).
- Returns:
lags ((B,) float64)
C ((M, N, B) float32 (float64 when
normalize="corrcoef"))
- aind_ephys_utils.metrics.ccg_trial_paired(trial_segments: TrialSegments, pairing: ndarray, *, bin_size: float = 0.001, max_lag: float = 0.1, normalize: str = 'corrcoef', exclude_zero_lag_autocorr: bool = True, include_autocorr: bool = True, pairs: ndarray | None = None, buffers: _CCGBuffers | None = None) tuple[ndarray, ndarray]¶
Compute CCGs with a given trial pairing.
The pairs x trials loop runs entirely inside numba with
prangeover pairs. Useclip_spikes_to_trials()once to build theTrialSegments, then call this function repeatedly with different pairing arrays.- Parameters:
trial_segments – Output of
clip_spikes_to_trials().pairing –
(n_trials,)integer array.pairing[k]is the trial index for the target unit when the reference unit uses trial k. Identity (np.arange(n_trials)) for raw CCG.bin_size – Histogram bin width (seconds).
max_lag – Maximum lag (seconds).
normalize –
"none","counts","rate","conditional","unbiased", or"corrcoef".exclude_zero_lag_autocorr – Zero the central bin for autocorrelograms.
include_autocorr – Compute autocorrelograms on the diagonal.
pairs – Optional
(n_pairs, 2)int64 array of(i, j)unit index pairs to compute. IfNone, computes all upper-triangle pairs. Use this to restrict to e.g. cross-region pairs.buffers – Pre-allocated buffers from
_CCGBuffers.for_segments(). IfNone, buffers are allocated internally (slower for repeated calls). Must have been created with the same pairs.
- Returns:
lags ((B,) float64)
C (float64) – Shape
(N, N, B)whenpairsisNone(with auto/cross symmetric mirror). Shape(n_pairs, B)whenpairsis provided, with rowpcorresponding to the input pairpairs[p]. In the per-pair case the lower-triangle mirror is not applied; reverseC[p, :]to obtain the(j, i)CCG. This avoids the (N, N, B) materialisation entirely when callers only need a subset of pairs (e.g. cross-region pairs for surrogate testing).
Notes
corrcoef normalization (
normalize="corrcoef"):Produces a unitless correlation coefficient that is zero when the two spike trains are independent and conditionally uniform within each trial. Derived from first principles (see
SpikeAnalysis.jlxcorr_discrete_normedfor the full derivation).For each pair
(i, j)the normalized value at lag binbis:C[b] = (h[b] - E[b]) / sqrt(A_i * A_j)
h[b] is the raw coincidence count accumulated across all trials by the numba kernel (
_ccg_all_pairs_trials).E[b] is the expected count under within-trial uniformity with edge correction:
E[b] = ec_shape[b] * sum_k( n_i(k) * n_j(pairing[k]) )
where
ec_shape[b] = scale[b] * w[b] * (d - binstop[b] + w[b]/2) / d^2and the sum is over trialsk. The per-trial productn_i(k) * n_j(pairing[k])respects trial-by-trial rate variation: trials where both neurons fire more contribute proportionally more expected coincidences. When trial durations are uniform (afterclip_to_window),ec_shapefactors out and the per-trial products are computed as a single matrix multiply (trial_counts @ paired_counts.T).The centre bin receives coincidences from both positive and negative lags, so
scale = 2andw = bin_size / 2for that bin; side bins havescale = 1andw = bin_size.A_i, A_j are the edge-corrected auto-correlation terms, each summed across trials:
A_i = sum_k [ count_auto_first(u_i_k, bin_size) - 2 * E_auto(n_i_k, d_k) ]
count_auto_firstcounts spike pairs withinbin_sizeof each other (including self-pairs). The factor of 2 on the expected auto count (rather than 1) matches the cross-correlation context: the denominator must normalize the two-sided cross-histogram at lag 0.
- aind_ephys_utils.metrics.ccg_trial_surrogates(trial_segments: TrialSegments, pairings: Iterable[ndarray], *, bin_size: float = 0.001, max_lag: float = 0.1, normalize: str = 'corrcoef', exclude_zero_lag_autocorr: bool = True, include_autocorr: bool = True, pairs: ndarray | None = None, reduce: Callable[[ndarray, ndarray], ndarray] | None = None) Iterator[ndarray]¶
Compute trial-paired CCGs for many pairings with double-buffered pipeline.
Overlaps the numba kernel (worker thread) with Python scatter/normalization (main thread) for ~30% throughput improvement over sequential
ccg_trial_paired()calls.- Parameters:
trial_segments – Output of
clip_spikes_to_trials().pairings – Iterable of
(n_trials,)integer pairing arrays.bin_size – Histogram bin width (seconds).
max_lag – Maximum lag (seconds).
normalize – Normalization mode.
exclude_zero_lag_autocorr – Zero the central bin for autocorrelograms.
include_autocorr – Compute autocorrelograms on the diagonal.
reduce – Optional reduction
(lags, C) -> result. Applied during scatter so the full CCG is never returned.
- Yields:
np.ndarray – The reduced result if reduce is provided, otherwise the per-pair CCG.
When
pairsisNone: the un-reduced CCG is shape(N, N, B)with the cross/auto symmetric structure, andreduceis called with that shape.When
pairsis provided: the un-reduced CCG is shape(n_pairs, B), with rowpcorresponding to the input pairpairs[p].reduceis called with this per-pair layout — much cheaper fornp.max(C, axis=-1)-style reductions whenn_pairs ≪ N². The lower triangle is not mirrored; reverseC[p, :]to obtain the(j, i)CCG from a(i, j)row.
- aind_ephys_utils.metrics.clip_spikes_to_trials(spike_times_by_unit: list[ndarray], trial_epochs: ndarray, align_times: ndarray | None = None) TrialSegments¶
Clip each unit’s spikes into per-trial segments, relative to an alignment time.
- Parameters:
spike_times_by_unit – Sorted spike time arrays, one per unit.
trial_epochs –
(n_trials, 2)array of[start, stop]times.align_times –
(n_trials,)alignment times (t=0 in the output). Defaults to trial starts (left edge).
- aind_ephys_utils.metrics.measure_fwhm(ccg_1d: ndarray, lags_1d: ndarray, *, search_half_width: float | None = None, half_height_mode: str = 'absolute') float¶
Estimate FWHM of the tallest peak in a 1-D CCG slice.
By default uses absolute half-max (the textbook FWHM definition: width at
0.5 * peak_height), which is stable for broad peaks and appropriate for shift-corrected CCGs whose baseline is already zero-mean. Passhalf_height_mode= "prominence"for the legacy scipy behaviour (width atpeak_height − 0.5 * prominence).No smoothing happens here — pass an already-smoothed slice if you want denoising. Pre-smoothing once at the array level (e.g.
scipy.ndimage.gaussian_filter1d()along the lag axis of an(N, N, K)shift-corrected CCG) is cheaper than per-pair smoothing and avoids edge effects from smoothing-after-windowing.- Parameters:
ccg_1d – 1-D correlogram values (e.g. one row of the corrcoef matrix).
lags_1d – Corresponding lag values (seconds). Must be uniformly spaced.
search_half_width – If given, restrict measurement to lags within
[-search_half_width, search_half_width](same units as lags_1d). Prevents distant baseline wiggles from boundingpeak_widths’ outward walk.half_height_mode –
"absolute"(default) or"prominence". See above.
- Returns:
Full width at half maximum in the same units as lags_1d. Returns
np.infif no positive peak is found.- Return type:
float
- aind_ephys_utils.metrics.measure_prominence(ccg_1d: ndarray, lags_1d: ndarray, *, search_half_width: float | None = None) float¶
Measure the prominence of the tallest peak in a 1-D CCG slice.
Prominence quantifies how much a peak stands out from its surrounding baseline. Uses
scipy.signal.peak_prominences().No smoothing happens here — pass an already-smoothed slice if you want denoising. See
measure_fwhm()for guidance.- Parameters:
ccg_1d – 1-D correlogram values (e.g. one row of the corrcoef matrix).
lags_1d – Corresponding lag values (seconds).
search_half_width – If given, restrict measurement to lags within
[-search_half_width, search_half_width].
- Returns:
Prominence of the tallest peak (same units as ccg_1d). Returns
0.0if no positive peak is found.- Return type:
float
- aind_ephys_utils.metrics.monte_carlo_pvalue(t_obs: ndarray, t_surr: ndarray, tail: str = 'upper') ndarray¶
Monte Carlo p-value with correct finite-sample correction.
Implements
p = (1 + sum(I(t_b >= t_obs))) / (B + 1)(Davison & Hinkley, 1997) which accounts for the observed statistic being one of theB + 1values under the null.- Parameters:
- Returns:
p-values, same shape as t_obs.
- Return type:
np.ndarray
- aind_ephys_utils.metrics.pair_vec_to_NN(pair_values: ndarray, pairs: ndarray, n_units: int, *, fill: float = 0.0, mirror: bool = True, dtype: object = None) ndarray¶
Scatter a per-pair vector into an
(n_units, n_units)array.The sparse per-pair representation used by the trial-paired / surrogate CCG path (and by
ccg_between_sets_sparse) is projected back into the dense(N, N)matrix thatarr[i, j]lookups expect. Cells outsidepairs(and their mirror, whenmirror=True) holdfill.- Parameters:
pair_values – Per-pair values in the same row order as
pairs.pairs –
(n_pairs, 2)array of(i, j)unit-index pairs.n_units –
N— size of the square output.fill – Value for cells not covered by
pairs.mirror – If
True, also write each value at the transposed cell(j, i)(symmetric fill).dtype – Output dtype; defaults to
pair_values.dtype.
- Returns:
(n_units, n_units)array.- Return type:
np.ndarray
- aind_ephys_utils.metrics.rescale_ccgs(C: ndarray, axis: int = -1, eps: float = 1e-12) ndarray¶
Rescale correlograms so each (i,j) slice has min=0 and max=1.
- aind_ephys_utils.metrics.run_surrogates(trial_segs: TrialSegments, n_trials: int, n_units: int, n_surr: int, pairs: ndarray, rng: Generator, *, bin_size: float, max_lag: float, normalize: str, label: str = '') ndarray¶
Run derangement surrogates, return a
(n_surr, n_pairs)array.Each surrogate re-pairs trials with a derangement (no fixed points), recomputes the CCG for every pair in
pairs, and reduces it to the per-pair peak (max over lag bins). Output rows are inpairsorder and arefloat32(surrogate corrcoef peaks live in roughly[-1, 1], so float32 precision is ample and halves memory).- Parameters:
trial_segs – Trial-clipped spike segments (see
clip_spikes_to_trials).n_trials – Number of trials (the derangement length).
n_units – Number of units; kept for signature symmetry — the output shape is per-pair, so callers project to
(N, N)themselves.n_surr – Number of surrogate draws.
pairs –
(n_pairs, 2)unit-index pairs to evaluate.rng – NumPy random generator.
bin_size – CCG kernel parameters.
max_lag – CCG kernel parameters.
normalize – CCG kernel parameters.
label – Prefix for progress log messages.
- Returns:
(n_surr, n_pairs)float32 surrogate peak values.- Return type:
np.ndarray
- aind_ephys_utils.metrics.run_two_stage_mc(*, trial_segs: TrialSegments, n_trials: int, n_units: int, raw_peak_val: ndarray, prefilter_pairs: ndarray, prefilter_mask: ndarray, n_screen: int, n_total: int, bin_size: float, max_lag: float, normalize: str, rng: Generator, fdr_alpha: float) tuple[ndarray, ndarray, int, int, ndarray]¶
Two-stage Monte Carlo significance testing on pre-filtered pairs.
Stage 1 runs
n_screenderangement surrogates over every pre-filtered pair, building coarse p-values. Pairs at the1/(n_screen + 1)floor become candidates for stage 2, which runsmax(n_total - n_screen, 0)additional surrogates so candidate p-values are computed againstn_totalexchangeable surrogates in total (stage-1 and stage-2 draws come from the same null and are concatenated; nothing is wasted). The achievable minimum p-value for a candidate is1/(n_total + 1).Pairs outside the pre-filter never have MC computed and get
p = 1, so a Benjamini-Hochberg pass downstream cannot declare them significant.- Parameters:
trial_segs – Trial-clipped spike segments.
n_trials – Trial and unit counts.
n_units – Trial and unit counts.
raw_peak_val –
(N, N)observed peak heights (the test statistic).prefilter_pairs – Output of
build_shape_prefilter()— the pair list to test.prefilter_mask –
(N, N)bool mask matchingprefilter_pairs(accepted for API symmetry; the function operates onprefilter_pairs).n_screen – Stage-1 surrogate count.
n_total – Target total surrogate count for refined candidates (stage 1 + stage 2). Stage 2 runs
max(n_total - n_screen, 0)extra surrogates; ifn_total <= n_screenthe refinement is a no-op.bin_size – CCG kernel parameters.
max_lag – CCG kernel parameters.
normalize – CCG kernel parameters.
rng – CCG kernel parameters.
fdr_alpha – Used for the surrogate threshold percentile returned for plotting.
- Returns:
p_values ((N, N) array) – Final p-values; held at 1.0 outside the pre-filter.
surr_threshold ((N, N) array) – Per-pair
(1 - fdr_alpha)percentile of surrogate peaks.n_candidates (int) – Number of pairs that hit the stage-1 floor (= stage-2 size).
n_surr_total (int) –
n_screenif there were no candidates / no extra surrogates, elsen_total.candidate_pairs ((n_candidates, 2) int64 array) – The pairs sent through stage 2 (empty when none).
- aind_ephys_utils.metrics.smooth_ccgs(C: ndarray, bin_size: float, kernel_width: float = 0.005) ndarray¶
Smooth correlograms with a Gaussian kernel.
- Parameters:
C ((N, N, B)) – Raw correlogram counts.
bin_size – Bin width in seconds.
kernel_width – Gaussian kernel std in seconds.
- aind_ephys_utils.metrics.spike_latency(times, events, interval, use_psth=True, std_above_baseline=2, bin_size=0.001)¶
Computes latency of spikes to a set of events.
If use_psth is set to True, the latency will be computed using the average PSTH, rather than spikes on individual trials.
If use_psth is set to False, the latency will be computed as the median time to first spike on individual trials.
- Parameters:
times (ndarray) – 1-D sequence of times to align (in seconds). Must be sorted in ascending order.
events (ndarray) – 1-D sequence of reference times (in seconds).
interval (tuple) – First value = baseline interval (ignored if use_psth = False) Second value = maximum latency
use_psth (bool, optional) – Set to True to use PSTH method, False otherwise
std_above_baseline (float, optional) – Determines threshold for response onset Latency = first value above Mean + Std * T
bin_size (float, optional) – Determines bin size (in seconds) for PSTH method
- Returns:
if use_psth = True
latency (float) – First spike latency in s
psth (float) – The PSTH used to compute latency
if use_psth = False
first_spike_latency (float) – First spike latency in s
individual_latencies (ndarray) – Latency values for all trials with spikes