aind_ephys_utils.metrics.ccg module

Cross-correlogram (CCG) computation for spike trains.

Requires the numba optional dependency for the core kernel (install with pip install aind-ephys-utils[numba]).

Lag convention

The CCG C[i, j] is computed as the histogram of t_j - t_i:

  • Positive lag → unit j fires after unit ii drives j.

  • Negative lag → unit j fires before unit ij drives i.

So a peak at positive lag in C[i, j] is evidence that ij (e.g., a monosynaptic excitatory connection from i to j). C[j, i] is the time-reverse: C[j, i, k] == C[i, j, -k].

The kernel uses left-closed bin assignment (floor) matching the convention np.histogram. The "corrcoef" normalization mode subtracts edge-corrected expected counts and divides by sqrt(auto_u * auto_v) to produce a proper correlation coefficient, matching the original Julia implementation of these functions.

aind_ephys_utils.metrics.ccg.NN_to_pair_vec(arr_NN: ndarray, pairs: ndarray) ndarray

Gather a per-pair vector from an (n_units, n_units) array.

Inverse of pair_vec_to_NN() (modulo fill cells outside the pair set), for projecting an existing (N, N) array onto the per-pair representation.

Parameters:
  • arr_NN(n_units, n_units) array.

  • pairs(n_pairs, 2) array of (i, j) unit-index pairs.

Returns:

Per-pair vector arr_NN[i, j] in pairs row order.

Return type:

np.ndarray

class aind_ephys_utils.metrics.ccg.TrialSegments(segments: list[list[ndarray]], durations: ndarray, pre: ndarray, post: ndarray, all_spikes: ndarray, offsets: ndarray, counts: ndarray)

Bases: object

CSR-like storage for spike segments clipped to trials.

Spike times are stored relative to each trial’s alignment time (t=0 is the alignment point). For a go-cue-aligned trial with window=(-1, 3), spikes range from -1 to 3.

segments

segments[unit][trial] — ragged list (for Python access).

durations

(n_trials,) trial durations (stop - start).

pre

(n_trials,) time before alignment point (align - start, >= 0).

post

(n_trials,) time after alignment point (stop - align, >= 0).

all_spikes

(total_spikes,) flat contiguous array (alignment-relative).

offsets

(n_units, n_trials) start index into all_spikes.

counts

(n_units, n_trials) spike count per segment.

all_spikes
counts
durations
offsets
post
pre
segments
aind_ephys_utils.metrics.ccg.cc_leads_over_std(lags: ndarray, C: ndarray, std_mult: float = 4, window_excl: float = 0.05, remove_auto: bool = False) ndarray

Detect unit pairs whose short-lag CCG peak exceeds a noise-floor threshold.

The baseline std is estimated from bins outside [-window_excl, window_excl], and then only the bins inside that window are tested against the threshold.

Parameters:
  • lags ((B,))

  • C ((N, N, B))

  • std_mult – Number of standard deviations above baseline.

  • window_excl – Half-width around zero lag (seconds). Bins outside this window are used as the baseline; bins inside are tested for significant peaks.

  • remove_auto – Zero out the diagonal (auto-correlations).

Returns:

XX[i, j] == 1 when a bin of C[i, j] inside the short-lag window exceeds the threshold and the peak lag is negative.

Return type:

(N, N) bool-like

aind_ephys_utils.metrics.ccg.ccg(spikes: DataArray, *, source_units: Iterable[Any] | None = None, target_units: Iterable[Any] | None = None, pairs: Iterable[tuple[Any, Any]] | ndarray | None = None, pairing: ndarray | Iterable[int] | None = None, window: tuple[float, float] | None = None, bin_size: float = 0.001, max_lag: float = 0.1, normalize: str = 'none', exclude_zero_lag_autocorr: bool = True, include_autocorr: bool = True) tuple[ndarray, DataArray]

Compute CCGs directly from a canonical ragged spikes DataArray.

Single-trial ragged spikes dispatch to the session-wide CCG helpers. Multi-trial ragged spikes are packed internally and dispatched to the trial-paired CCG helper. Unit selectors are resolved by coordinate label when possible and by integer position otherwise.

aind_ephys_utils.metrics.ccg.ccg_allpairs_sparse(spike_times_by_unit: list[ndarray], bin_size: float = 0.001, max_lag: float = 0.1, normalize: str = 'none', exclude_zero_lag_autocorr: bool = True, include_autocorr: bool = True, observation_window: float | tuple[float, float] | None = None) tuple[ndarray, ndarray]

Memory-lean all-pairs CCGs using event-driven sweep.

Parameters:
  • spike_times_by_unit – List of sorted spike time arrays, one per unit.

  • bin_size – Histogram bin width in seconds.

  • max_lag – Maximum time lag (symmetric around 0).

  • normalize"none", "counts", "rate", "conditional", "unbiased", or "corrcoef" (edge-corrected correlation coefficient).

  • exclude_zero_lag_autocorr – Zero out the central bin for autocorrelograms.

  • include_autocorr – Compute autocorrelograms on the diagonal.

  • observation_window – Total observation time.

Returns:

  • lags ((B,) float64)

  • C ((N, N, B) float32 (float64 when normalize="corrcoef")) – C[i,j](τ) = C[j,i](-τ)

aind_ephys_utils.metrics.ccg.ccg_between_sets_sparse(spike_times_set1: list[ndarray], spike_times_set2: list[ndarray], bin_size: float = 0.001, max_lag: float = 0.1, normalize: str = 'none', observation_window: float | tuple[float, float] | None = None) tuple[ndarray, ndarray]

Memory-lean CCGs between two sets of spike trains.

Parameters:
  • spike_times_set1 – M spike trains (reference/source units).

  • spike_times_set2 – N spike trains (target units).

  • bin_size – Histogram bin width in seconds.

  • max_lag – Maximum time lag (symmetric around 0).

  • normalize"none", "counts", "rate", "conditional", "unbiased", or "corrcoef" (edge-corrected correlation coefficient matching SpikeAnalysis.jl).

  • observation_window – Total observation time. Float for duration, tuple (start, end) for explicit bounds, or None to infer from data (not recommended).

Returns:

  • lags ((B,) float64)

  • C ((M, N, B) float32 (float64 when normalize="corrcoef"))

aind_ephys_utils.metrics.ccg.ccg_trial_paired(trial_segments: TrialSegments, pairing: ndarray, *, bin_size: float = 0.001, max_lag: float = 0.1, normalize: str = 'corrcoef', exclude_zero_lag_autocorr: bool = True, include_autocorr: bool = True, pairs: ndarray | None = None, buffers: _CCGBuffers | None = None) tuple[ndarray, ndarray]

Compute CCGs with a given trial pairing.

The pairs x trials loop runs entirely inside numba with prange over pairs. Use clip_spikes_to_trials() once to build the TrialSegments, then call this function repeatedly with different pairing arrays.

Parameters:
  • trial_segments – Output of clip_spikes_to_trials().

  • pairing(n_trials,) integer array. pairing[k] is the trial index for the target unit when the reference unit uses trial k. Identity (np.arange(n_trials)) for raw CCG.

  • bin_size – Histogram bin width (seconds).

  • max_lag – Maximum lag (seconds).

  • normalize"none", "counts", "rate", "conditional", "unbiased", or "corrcoef".

  • exclude_zero_lag_autocorr – Zero the central bin for autocorrelograms.

  • include_autocorr – Compute autocorrelograms on the diagonal.

  • pairs – Optional (n_pairs, 2) int64 array of (i, j) unit index pairs to compute. If None, computes all upper-triangle pairs. Use this to restrict to e.g. cross-region pairs.

  • buffers – Pre-allocated buffers from _CCGBuffers.for_segments(). If None, buffers are allocated internally (slower for repeated calls). Must have been created with the same pairs.

Returns:

  • lags ((B,) float64)

  • C (float64) – Shape (N, N, B) when pairs is None (with auto/cross symmetric mirror). Shape (n_pairs, B) when pairs is provided, with row p corresponding to the input pair pairs[p]. In the per-pair case the lower-triangle mirror is not applied; reverse C[p, :] to obtain the (j, i) CCG. This avoids the (N, N, B) materialisation entirely when callers only need a subset of pairs (e.g. cross-region pairs for surrogate testing).

Notes

corrcoef normalization (normalize="corrcoef"):

Produces a unitless correlation coefficient that is zero when the two spike trains are independent and conditionally uniform within each trial. Derived from first principles (see SpikeAnalysis.jl xcorr_discrete_normed for the full derivation).

For each pair (i, j) the normalized value at lag bin b is:

C[b] = (h[b] - E[b]) / sqrt(A_i * A_j)

h[b] is the raw coincidence count accumulated across all trials by the numba kernel (_ccg_all_pairs_trials).

E[b] is the expected count under within-trial uniformity with edge correction:

E[b] = ec_shape[b] * sum_k( n_i(k) * n_j(pairing[k]) )

where ec_shape[b] = scale[b] * w[b] * (d - binstop[b] + w[b]/2) / d^2 and the sum is over trials k. The per-trial product n_i(k) * n_j(pairing[k]) respects trial-by-trial rate variation: trials where both neurons fire more contribute proportionally more expected coincidences. When trial durations are uniform (after clip_to_window), ec_shape factors out and the per-trial products are computed as a single matrix multiply (trial_counts @ paired_counts.T).

The centre bin receives coincidences from both positive and negative lags, so scale = 2 and w = bin_size / 2 for that bin; side bins have scale = 1 and w = bin_size.

A_i, A_j are the edge-corrected auto-correlation terms, each summed across trials:

A_i = sum_k [ count_auto_first(u_i_k, bin_size)
              - 2 * E_auto(n_i_k, d_k) ]

count_auto_first counts spike pairs within bin_size of each other (including self-pairs). The factor of 2 on the expected auto count (rather than 1) matches the cross-correlation context: the denominator must normalize the two-sided cross-histogram at lag 0.

aind_ephys_utils.metrics.ccg.ccg_trial_surrogates(trial_segments: TrialSegments, pairings: Iterable[ndarray], *, bin_size: float = 0.001, max_lag: float = 0.1, normalize: str = 'corrcoef', exclude_zero_lag_autocorr: bool = True, include_autocorr: bool = True, pairs: ndarray | None = None, reduce: Callable[[ndarray, ndarray], ndarray] | None = None) Iterator[ndarray]

Compute trial-paired CCGs for many pairings with double-buffered pipeline.

Overlaps the numba kernel (worker thread) with Python scatter/normalization (main thread) for ~30% throughput improvement over sequential ccg_trial_paired() calls.

Parameters:
  • trial_segments – Output of clip_spikes_to_trials().

  • pairings – Iterable of (n_trials,) integer pairing arrays.

  • bin_size – Histogram bin width (seconds).

  • max_lag – Maximum lag (seconds).

  • normalize – Normalization mode.

  • exclude_zero_lag_autocorr – Zero the central bin for autocorrelograms.

  • include_autocorr – Compute autocorrelograms on the diagonal.

  • reduce – Optional reduction (lags, C) -> result. Applied during scatter so the full CCG is never returned.

Yields:

np.ndarray – The reduced result if reduce is provided, otherwise the per-pair CCG.

When pairs is None: the un-reduced CCG is shape (N, N, B) with the cross/auto symmetric structure, and reduce is called with that shape.

When pairs is provided: the un-reduced CCG is shape (n_pairs, B), with row p corresponding to the input pair pairs[p]. reduce is called with this per-pair layout — much cheaper for np.max(C, axis=-1)-style reductions when n_pairs . The lower triangle is not mirrored; reverse C[p, :] to obtain the (j, i) CCG from a (i, j) row.

aind_ephys_utils.metrics.ccg.clip_spikes_to_trials(spike_times_by_unit: list[ndarray], trial_epochs: ndarray, align_times: ndarray | None = None) TrialSegments

Clip each unit’s spikes into per-trial segments, relative to an alignment time.

Parameters:
  • spike_times_by_unit – Sorted spike time arrays, one per unit.

  • trial_epochs(n_trials, 2) array of [start, stop] times.

  • align_times(n_trials,) alignment times (t=0 in the output). Defaults to trial starts (left edge).

aind_ephys_utils.metrics.ccg.clip_to_window(ts: TrialSegments, window: tuple[float, float]) TrialSegments

Re-clip a TrialSegments to a symmetric window around alignment.

Parameters:
  • ts – Output of clip_spikes_to_trials().

  • window(left, right) in alignment-relative time, e.g. (-1.0, 2.0). Typically choose left = -min(ts.pre) and right = min(ts.post) for the maximal common window.

Returns:

New instance with segments and CSR clipped to window. All trials have the same effective duration right - left.

Return type:

TrialSegments

aind_ephys_utils.metrics.ccg.derangements(n: int, count: int, rng: Generator) Iterator[ndarray]

Yield count derangements (permutations with no fixed points).

A derangement of range(n) maps no index to itself, so using one to re-pair trials destroys cross-unit correlations while preserving each unit’s within-trial structure — the basis for trial-identity surrogate testing (feed the yielded permutations to ccg_trial_surrogates()).

Parameters:
  • n – Number of trials (must be >= 2; no derangement exists for n < 2).

  • count – Number of derangements to yield.

  • rng – NumPy random generator.

Yields:

np.ndarray – A length-n permutation with no fixed points.

aind_ephys_utils.metrics.ccg.jitter_spikes_times(spike_times_by_unit: list[ndarray], jitter_window: float = 0.05, rng: Generator | None = None) list[ndarray]

Add uniform jitter to each spike train and re-sort.

Parameters:
  • spike_times_by_unit – List of sorted spike time arrays.

  • jitter_window – Half-width of the jitter window in seconds.

  • rng – NumPy random generator. If None, a new default generator is used.

aind_ephys_utils.metrics.ccg.measure_fwhm(ccg_1d: ndarray, lags_1d: ndarray, *, search_half_width: float | None = None, half_height_mode: str = 'absolute') float

Estimate FWHM of the tallest peak in a 1-D CCG slice.

By default uses absolute half-max (the textbook FWHM definition: width at 0.5 * peak_height), which is stable for broad peaks and appropriate for shift-corrected CCGs whose baseline is already zero-mean. Pass half_height_mode= "prominence" for the legacy scipy behaviour (width at peak_height 0.5 * prominence).

No smoothing happens here — pass an already-smoothed slice if you want denoising. Pre-smoothing once at the array level (e.g. scipy.ndimage.gaussian_filter1d() along the lag axis of an (N, N, K) shift-corrected CCG) is cheaper than per-pair smoothing and avoids edge effects from smoothing-after-windowing.

Parameters:
  • ccg_1d – 1-D correlogram values (e.g. one row of the corrcoef matrix).

  • lags_1d – Corresponding lag values (seconds). Must be uniformly spaced.

  • search_half_width – If given, restrict measurement to lags within [-search_half_width, search_half_width] (same units as lags_1d). Prevents distant baseline wiggles from bounding peak_widths’ outward walk.

  • half_height_mode"absolute" (default) or "prominence". See above.

Returns:

Full width at half maximum in the same units as lags_1d. Returns np.inf if no positive peak is found.

Return type:

float

aind_ephys_utils.metrics.ccg.measure_per_pair(corr: ndarray, lags_1d: ndarray, pairs: Iterable[tuple[int, int]], measure_fn: Callable[[...], float], *, n_units: int, fill: float, **measure_kwargs: Any) ndarray

Apply a per-pair scalar shape measurement, mirrored across (i,j) and (j,i).

Helper for sweeping measure_fwhm() or measure_prominence() (or any equivalent scalar-returning function with the (ccg_1d, lags_1d, **kwargs) -> float signature) over a list of pairs from an (N, N, K) CCG slice. Both triangles get filled so downstream symmetric-mask code works in either direction.

Parameters:
  • corr(N, N, K) shift-corrected CCG slice (typically pre-smoothed — pass an already-smoothed array if you want denoising; this helper does no smoothing of its own).

  • lags_1d(K,) lag axis matching the last dimension of corr.

  • pairs – Iterable of (i, j) index tuples. Pairs may be supplied in either triangle; both out[i,j] and out[j,i] are filled regardless.

  • measure_fn – A function with signature (ccg_1d, lags_1d, **kwargs) -> float, e.g. measure_fwhm() or measure_prominence().

  • n_unitsN — used to allocate the output (N, N) array.

  • fill – Default value for un-measured entries (e.g. np.inf for FWHM, 0.0 for prominence so downstream threshold cuts behave correctly on missing data).

  • **measure_kwargs – Forwarded to measure_fn on every call.

Returns:

(N, N) of measurements; entries not in pairs (and not their mirror) hold fill.

Return type:

np.ndarray

aind_ephys_utils.metrics.ccg.measure_prominence(ccg_1d: ndarray, lags_1d: ndarray, *, search_half_width: float | None = None) float

Measure the prominence of the tallest peak in a 1-D CCG slice.

Prominence quantifies how much a peak stands out from its surrounding baseline. Uses scipy.signal.peak_prominences().

No smoothing happens here — pass an already-smoothed slice if you want denoising. See measure_fwhm() for guidance.

Parameters:
  • ccg_1d – 1-D correlogram values (e.g. one row of the corrcoef matrix).

  • lags_1d – Corresponding lag values (seconds).

  • search_half_width – If given, restrict measurement to lags within [-search_half_width, search_half_width].

Returns:

Prominence of the tallest peak (same units as ccg_1d). Returns 0.0 if no positive peak is found.

Return type:

float

aind_ephys_utils.metrics.ccg.monte_carlo_pvalue(t_obs: ndarray, t_surr: ndarray, tail: str = 'upper') ndarray

Monte Carlo p-value with correct finite-sample correction.

Implements p = (1 + sum(I(t_b >= t_obs))) / (B + 1) (Davison & Hinkley, 1997) which accounts for the observed statistic being one of the B + 1 values under the null.

Parameters:
  • t_obs – Observed test statistic, any shape.

  • t_surr – Surrogate statistics, shape (B, *t_obs.shape) where B is the number of surrogates.

  • tail"upper" — p = P(T >= t_obs) (right tail, e.g., peak height). "lower" — p = P(T <= t_obs) (left tail). "both" — p = P(|T| >= |t_obs|) (two-tailed).

Returns:

p-values, same shape as t_obs.

Return type:

np.ndarray

aind_ephys_utils.metrics.ccg.pair_vec_to_NN(pair_values: ndarray, pairs: ndarray, n_units: int, *, fill: float = 0.0, mirror: bool = True, dtype: object = None) ndarray

Scatter a per-pair vector into an (n_units, n_units) array.

The sparse per-pair representation used by the trial-paired / surrogate CCG path (and by ccg_between_sets_sparse) is projected back into the dense (N, N) matrix that arr[i, j] lookups expect. Cells outside pairs (and their mirror, when mirror=True) hold fill.

Parameters:
  • pair_values – Per-pair values in the same row order as pairs.

  • pairs(n_pairs, 2) array of (i, j) unit-index pairs.

  • n_unitsN — size of the square output.

  • fill – Value for cells not covered by pairs.

  • mirror – If True, also write each value at the transposed cell (j, i) (symmetric fill).

  • dtype – Output dtype; defaults to pair_values.dtype.

Returns:

(n_units, n_units) array.

Return type:

np.ndarray

aind_ephys_utils.metrics.ccg.peak_contributing_spikes(t_pre: ndarray, t_post: ndarray, peak_lag: float, half_width: float = 0.002) tuple[ndarray, ndarray]

Find pre-synaptic spikes that have a post-synaptic partner near peak_lag.

Uses a numba-accelerated two-pointer sweep for O(n) runtime.

Parameters:
  • t_pre – Sorted spike times of the pre-synaptic (driving) unit.

  • t_post – Sorted spike times of the post-synaptic (following) unit.

  • peak_lag – Expected lag in seconds (positive = post fires after pre).

  • half_width – Tolerance around peak_lag in seconds. A spike pair counts if |dt - peak_lag| <= half_width where dt = t_post - t_pre. Typically set to the CCG bin width or the FWHM of the peak.

Returns:

  • contributing (np.ndarray) – Pre-synaptic spike times that have a matching post-synaptic spike.

  • mask (np.ndarray) – Boolean mask into t_pre (contributing == t_pre[mask]).

aind_ephys_utils.metrics.ccg.rescale_ccgs(C: ndarray, axis: int = -1, eps: float = 1e-12) ndarray

Rescale correlograms so each (i,j) slice has min=0 and max=1.

aind_ephys_utils.metrics.ccg.rescale_ccgs_zero_mean(C: ndarray, axis: int = -1, eps: float = 1e-12) ndarray

Rescale correlograms: subtract mean, then min-max to [0, 1].

aind_ephys_utils.metrics.ccg.smooth_ccgs(C: ndarray, bin_size: float, kernel_width: float = 0.005) ndarray

Smooth correlograms with a Gaussian kernel.

Parameters:
  • C ((N, N, B)) – Raw correlogram counts.

  • bin_size – Bin width in seconds.

  • kernel_width – Gaussian kernel std in seconds.