aind_ephys_utils.metrics.ccg module¶
Cross-correlogram (CCG) computation for spike trains.
Requires the numba optional dependency for the core kernel
(install with pip install aind-ephys-utils[numba]).
Lag convention¶
The CCG C[i, j] is computed as the histogram of t_j - t_i:
Positive lag → unit j fires after unit i → i drives j.
Negative lag → unit j fires before unit i → j drives i.
So a peak at positive lag in C[i, j] is evidence that i → j
(e.g., a monosynaptic excitatory connection from i to j).
C[j, i] is the time-reverse: C[j, i, k] == C[i, j, -k].
The kernel uses left-closed bin assignment (floor) matching the convention
np.histogram. The "corrcoef" normalization mode subtracts
edge-corrected expected counts and divides by sqrt(auto_u * auto_v) to
produce a proper correlation coefficient, matching the original Julia
implementation of these functions.
- aind_ephys_utils.metrics.ccg.NN_to_pair_vec(arr_NN: ndarray, pairs: ndarray) ndarray¶
Gather a per-pair vector from an
(n_units, n_units)array.Inverse of
pair_vec_to_NN()(modulofillcells outside the pair set), for projecting an existing(N, N)array onto the per-pair representation.- Parameters:
arr_NN –
(n_units, n_units)array.pairs –
(n_pairs, 2)array of(i, j)unit-index pairs.
- Returns:
Per-pair vector
arr_NN[i, j]inpairsrow order.- Return type:
np.ndarray
- class aind_ephys_utils.metrics.ccg.TrialSegments(segments: list[list[ndarray]], durations: ndarray, pre: ndarray, post: ndarray, all_spikes: ndarray, offsets: ndarray, counts: ndarray)¶
Bases:
objectCSR-like storage for spike segments clipped to trials.
Spike times are stored relative to each trial’s alignment time (t=0 is the alignment point). For a go-cue-aligned trial with
window=(-1, 3), spikes range from-1to3.- segments¶
segments[unit][trial]— ragged list (for Python access).
- durations¶
(n_trials,)trial durations (stop - start).
- pre¶
(n_trials,)time before alignment point (align - start, >= 0).
- post¶
(n_trials,)time after alignment point (stop - align, >= 0).
- all_spikes¶
(total_spikes,)flat contiguous array (alignment-relative).
- offsets¶
(n_units, n_trials)start index into all_spikes.
- counts¶
(n_units, n_trials)spike count per segment.
- all_spikes¶
- counts¶
- durations¶
- offsets¶
- post¶
- pre¶
- segments¶
- aind_ephys_utils.metrics.ccg.cc_leads_over_std(lags: ndarray, C: ndarray, std_mult: float = 4, window_excl: float = 0.05, remove_auto: bool = False) ndarray¶
Detect unit pairs whose short-lag CCG peak exceeds a noise-floor threshold.
The baseline std is estimated from bins outside
[-window_excl, window_excl], and then only the bins inside that window are tested against the threshold.- Parameters:
lags ((B,))
C ((N, N, B))
std_mult – Number of standard deviations above baseline.
window_excl – Half-width around zero lag (seconds). Bins outside this window are used as the baseline; bins inside are tested for significant peaks.
remove_auto – Zero out the diagonal (auto-correlations).
- Returns:
X –
X[i, j] == 1when a bin ofC[i, j]inside the short-lag window exceeds the threshold and the peak lag is negative.- Return type:
(N, N) bool-like
- aind_ephys_utils.metrics.ccg.ccg(spikes: DataArray, *, source_units: Iterable[Any] | None = None, target_units: Iterable[Any] | None = None, pairs: Iterable[tuple[Any, Any]] | ndarray | None = None, pairing: ndarray | Iterable[int] | None = None, window: tuple[float, float] | None = None, bin_size: float = 0.001, max_lag: float = 0.1, normalize: str = 'none', exclude_zero_lag_autocorr: bool = True, include_autocorr: bool = True) tuple[ndarray, DataArray]¶
Compute CCGs directly from a canonical ragged spikes DataArray.
Single-trial ragged spikes dispatch to the session-wide CCG helpers. Multi-trial ragged spikes are packed internally and dispatched to the trial-paired CCG helper. Unit selectors are resolved by coordinate label when possible and by integer position otherwise.
- aind_ephys_utils.metrics.ccg.ccg_allpairs_sparse(spike_times_by_unit: list[ndarray], bin_size: float = 0.001, max_lag: float = 0.1, normalize: str = 'none', exclude_zero_lag_autocorr: bool = True, include_autocorr: bool = True, observation_window: float | tuple[float, float] | None = None) tuple[ndarray, ndarray]¶
Memory-lean all-pairs CCGs using event-driven sweep.
- Parameters:
spike_times_by_unit – List of sorted spike time arrays, one per unit.
bin_size – Histogram bin width in seconds.
max_lag – Maximum time lag (symmetric around 0).
normalize –
"none","counts","rate","conditional","unbiased", or"corrcoef"(edge-corrected correlation coefficient).exclude_zero_lag_autocorr – Zero out the central bin for autocorrelograms.
include_autocorr – Compute autocorrelograms on the diagonal.
observation_window – Total observation time.
- Returns:
lags ((B,) float64)
C ((N, N, B) float32 (float64 when
normalize="corrcoef")) –C[i,j](τ) = C[j,i](-τ)
- aind_ephys_utils.metrics.ccg.ccg_between_sets_sparse(spike_times_set1: list[ndarray], spike_times_set2: list[ndarray], bin_size: float = 0.001, max_lag: float = 0.1, normalize: str = 'none', observation_window: float | tuple[float, float] | None = None) tuple[ndarray, ndarray]¶
Memory-lean CCGs between two sets of spike trains.
- Parameters:
spike_times_set1 – M spike trains (reference/source units).
spike_times_set2 – N spike trains (target units).
bin_size – Histogram bin width in seconds.
max_lag – Maximum time lag (symmetric around 0).
normalize –
"none","counts","rate","conditional","unbiased", or"corrcoef"(edge-corrected correlation coefficient matchingSpikeAnalysis.jl).observation_window – Total observation time. Float for duration, tuple
(start, end)for explicit bounds, orNoneto infer from data (not recommended).
- Returns:
lags ((B,) float64)
C ((M, N, B) float32 (float64 when
normalize="corrcoef"))
- aind_ephys_utils.metrics.ccg.ccg_trial_paired(trial_segments: TrialSegments, pairing: ndarray, *, bin_size: float = 0.001, max_lag: float = 0.1, normalize: str = 'corrcoef', exclude_zero_lag_autocorr: bool = True, include_autocorr: bool = True, pairs: ndarray | None = None, buffers: _CCGBuffers | None = None) tuple[ndarray, ndarray]¶
Compute CCGs with a given trial pairing.
The pairs x trials loop runs entirely inside numba with
prangeover pairs. Useclip_spikes_to_trials()once to build theTrialSegments, then call this function repeatedly with different pairing arrays.- Parameters:
trial_segments – Output of
clip_spikes_to_trials().pairing –
(n_trials,)integer array.pairing[k]is the trial index for the target unit when the reference unit uses trial k. Identity (np.arange(n_trials)) for raw CCG.bin_size – Histogram bin width (seconds).
max_lag – Maximum lag (seconds).
normalize –
"none","counts","rate","conditional","unbiased", or"corrcoef".exclude_zero_lag_autocorr – Zero the central bin for autocorrelograms.
include_autocorr – Compute autocorrelograms on the diagonal.
pairs – Optional
(n_pairs, 2)int64 array of(i, j)unit index pairs to compute. IfNone, computes all upper-triangle pairs. Use this to restrict to e.g. cross-region pairs.buffers – Pre-allocated buffers from
_CCGBuffers.for_segments(). IfNone, buffers are allocated internally (slower for repeated calls). Must have been created with the same pairs.
- Returns:
lags ((B,) float64)
C (float64) – Shape
(N, N, B)whenpairsisNone(with auto/cross symmetric mirror). Shape(n_pairs, B)whenpairsis provided, with rowpcorresponding to the input pairpairs[p]. In the per-pair case the lower-triangle mirror is not applied; reverseC[p, :]to obtain the(j, i)CCG. This avoids the (N, N, B) materialisation entirely when callers only need a subset of pairs (e.g. cross-region pairs for surrogate testing).
Notes
corrcoef normalization (
normalize="corrcoef"):Produces a unitless correlation coefficient that is zero when the two spike trains are independent and conditionally uniform within each trial. Derived from first principles (see
SpikeAnalysis.jlxcorr_discrete_normedfor the full derivation).For each pair
(i, j)the normalized value at lag binbis:C[b] = (h[b] - E[b]) / sqrt(A_i * A_j)
h[b] is the raw coincidence count accumulated across all trials by the numba kernel (
_ccg_all_pairs_trials).E[b] is the expected count under within-trial uniformity with edge correction:
E[b] = ec_shape[b] * sum_k( n_i(k) * n_j(pairing[k]) )
where
ec_shape[b] = scale[b] * w[b] * (d - binstop[b] + w[b]/2) / d^2and the sum is over trialsk. The per-trial productn_i(k) * n_j(pairing[k])respects trial-by-trial rate variation: trials where both neurons fire more contribute proportionally more expected coincidences. When trial durations are uniform (afterclip_to_window),ec_shapefactors out and the per-trial products are computed as a single matrix multiply (trial_counts @ paired_counts.T).The centre bin receives coincidences from both positive and negative lags, so
scale = 2andw = bin_size / 2for that bin; side bins havescale = 1andw = bin_size.A_i, A_j are the edge-corrected auto-correlation terms, each summed across trials:
A_i = sum_k [ count_auto_first(u_i_k, bin_size) - 2 * E_auto(n_i_k, d_k) ]
count_auto_firstcounts spike pairs withinbin_sizeof each other (including self-pairs). The factor of 2 on the expected auto count (rather than 1) matches the cross-correlation context: the denominator must normalize the two-sided cross-histogram at lag 0.
- aind_ephys_utils.metrics.ccg.ccg_trial_surrogates(trial_segments: TrialSegments, pairings: Iterable[ndarray], *, bin_size: float = 0.001, max_lag: float = 0.1, normalize: str = 'corrcoef', exclude_zero_lag_autocorr: bool = True, include_autocorr: bool = True, pairs: ndarray | None = None, reduce: Callable[[ndarray, ndarray], ndarray] | None = None) Iterator[ndarray]¶
Compute trial-paired CCGs for many pairings with double-buffered pipeline.
Overlaps the numba kernel (worker thread) with Python scatter/normalization (main thread) for ~30% throughput improvement over sequential
ccg_trial_paired()calls.- Parameters:
trial_segments – Output of
clip_spikes_to_trials().pairings – Iterable of
(n_trials,)integer pairing arrays.bin_size – Histogram bin width (seconds).
max_lag – Maximum lag (seconds).
normalize – Normalization mode.
exclude_zero_lag_autocorr – Zero the central bin for autocorrelograms.
include_autocorr – Compute autocorrelograms on the diagonal.
reduce – Optional reduction
(lags, C) -> result. Applied during scatter so the full CCG is never returned.
- Yields:
np.ndarray – The reduced result if reduce is provided, otherwise the per-pair CCG.
When
pairsisNone: the un-reduced CCG is shape(N, N, B)with the cross/auto symmetric structure, andreduceis called with that shape.When
pairsis provided: the un-reduced CCG is shape(n_pairs, B), with rowpcorresponding to the input pairpairs[p].reduceis called with this per-pair layout — much cheaper fornp.max(C, axis=-1)-style reductions whenn_pairs ≪ N². The lower triangle is not mirrored; reverseC[p, :]to obtain the(j, i)CCG from a(i, j)row.
- aind_ephys_utils.metrics.ccg.clip_spikes_to_trials(spike_times_by_unit: list[ndarray], trial_epochs: ndarray, align_times: ndarray | None = None) TrialSegments¶
Clip each unit’s spikes into per-trial segments, relative to an alignment time.
- Parameters:
spike_times_by_unit – Sorted spike time arrays, one per unit.
trial_epochs –
(n_trials, 2)array of[start, stop]times.align_times –
(n_trials,)alignment times (t=0 in the output). Defaults to trial starts (left edge).
- aind_ephys_utils.metrics.ccg.clip_to_window(ts: TrialSegments, window: tuple[float, float]) TrialSegments¶
Re-clip a
TrialSegmentsto a symmetric window around alignment.- Parameters:
ts – Output of
clip_spikes_to_trials().window –
(left, right)in alignment-relative time, e.g.(-1.0, 2.0). Typically chooseleft = -min(ts.pre)andright = min(ts.post)for the maximal common window.
- Returns:
New instance with segments and CSR clipped to window. All trials have the same effective duration
right - left.- Return type:
- aind_ephys_utils.metrics.ccg.derangements(n: int, count: int, rng: Generator) Iterator[ndarray]¶
Yield count derangements (permutations with no fixed points).
A derangement of
range(n)maps no index to itself, so using one to re-pair trials destroys cross-unit correlations while preserving each unit’s within-trial structure — the basis for trial-identity surrogate testing (feed the yielded permutations toccg_trial_surrogates()).- Parameters:
n – Number of trials (must be >= 2; no derangement exists for n < 2).
count – Number of derangements to yield.
rng – NumPy random generator.
- Yields:
np.ndarray – A length-
npermutation with no fixed points.
- aind_ephys_utils.metrics.ccg.jitter_spikes_times(spike_times_by_unit: list[ndarray], jitter_window: float = 0.05, rng: Generator | None = None) list[ndarray]¶
Add uniform jitter to each spike train and re-sort.
- Parameters:
spike_times_by_unit – List of sorted spike time arrays.
jitter_window – Half-width of the jitter window in seconds.
rng – NumPy random generator. If
None, a new default generator is used.
- aind_ephys_utils.metrics.ccg.measure_fwhm(ccg_1d: ndarray, lags_1d: ndarray, *, search_half_width: float | None = None, half_height_mode: str = 'absolute') float¶
Estimate FWHM of the tallest peak in a 1-D CCG slice.
By default uses absolute half-max (the textbook FWHM definition: width at
0.5 * peak_height), which is stable for broad peaks and appropriate for shift-corrected CCGs whose baseline is already zero-mean. Passhalf_height_mode= "prominence"for the legacy scipy behaviour (width atpeak_height − 0.5 * prominence).No smoothing happens here — pass an already-smoothed slice if you want denoising. Pre-smoothing once at the array level (e.g.
scipy.ndimage.gaussian_filter1d()along the lag axis of an(N, N, K)shift-corrected CCG) is cheaper than per-pair smoothing and avoids edge effects from smoothing-after-windowing.- Parameters:
ccg_1d – 1-D correlogram values (e.g. one row of the corrcoef matrix).
lags_1d – Corresponding lag values (seconds). Must be uniformly spaced.
search_half_width – If given, restrict measurement to lags within
[-search_half_width, search_half_width](same units as lags_1d). Prevents distant baseline wiggles from boundingpeak_widths’ outward walk.half_height_mode –
"absolute"(default) or"prominence". See above.
- Returns:
Full width at half maximum in the same units as lags_1d. Returns
np.infif no positive peak is found.- Return type:
float
- aind_ephys_utils.metrics.ccg.measure_per_pair(corr: ndarray, lags_1d: ndarray, pairs: Iterable[tuple[int, int]], measure_fn: Callable[[...], float], *, n_units: int, fill: float, **measure_kwargs: Any) ndarray¶
Apply a per-pair scalar shape measurement, mirrored across (i,j) and (j,i).
Helper for sweeping
measure_fwhm()ormeasure_prominence()(or any equivalent scalar-returning function with the(ccg_1d, lags_1d, **kwargs) -> floatsignature) over a list of pairs from an(N, N, K)CCG slice. Both triangles get filled so downstream symmetric-mask code works in either direction.- Parameters:
corr –
(N, N, K)shift-corrected CCG slice (typically pre-smoothed — pass an already-smoothed array if you want denoising; this helper does no smoothing of its own).lags_1d –
(K,)lag axis matching the last dimension ofcorr.pairs – Iterable of
(i, j)index tuples. Pairs may be supplied in either triangle; bothout[i,j]andout[j,i]are filled regardless.measure_fn – A function with signature
(ccg_1d, lags_1d, **kwargs) -> float, e.g.measure_fwhm()ormeasure_prominence().n_units –
N— used to allocate the output(N, N)array.fill – Default value for un-measured entries (e.g.
np.inffor FWHM,0.0for prominence so downstream threshold cuts behave correctly on missing data).**measure_kwargs – Forwarded to
measure_fnon every call.
- Returns:
(N, N)of measurements; entries not inpairs(and not their mirror) holdfill.- Return type:
np.ndarray
- aind_ephys_utils.metrics.ccg.measure_prominence(ccg_1d: ndarray, lags_1d: ndarray, *, search_half_width: float | None = None) float¶
Measure the prominence of the tallest peak in a 1-D CCG slice.
Prominence quantifies how much a peak stands out from its surrounding baseline. Uses
scipy.signal.peak_prominences().No smoothing happens here — pass an already-smoothed slice if you want denoising. See
measure_fwhm()for guidance.- Parameters:
ccg_1d – 1-D correlogram values (e.g. one row of the corrcoef matrix).
lags_1d – Corresponding lag values (seconds).
search_half_width – If given, restrict measurement to lags within
[-search_half_width, search_half_width].
- Returns:
Prominence of the tallest peak (same units as ccg_1d). Returns
0.0if no positive peak is found.- Return type:
float
- aind_ephys_utils.metrics.ccg.monte_carlo_pvalue(t_obs: ndarray, t_surr: ndarray, tail: str = 'upper') ndarray¶
Monte Carlo p-value with correct finite-sample correction.
Implements
p = (1 + sum(I(t_b >= t_obs))) / (B + 1)(Davison & Hinkley, 1997) which accounts for the observed statistic being one of theB + 1values under the null.- Parameters:
- Returns:
p-values, same shape as t_obs.
- Return type:
np.ndarray
- aind_ephys_utils.metrics.ccg.pair_vec_to_NN(pair_values: ndarray, pairs: ndarray, n_units: int, *, fill: float = 0.0, mirror: bool = True, dtype: object = None) ndarray¶
Scatter a per-pair vector into an
(n_units, n_units)array.The sparse per-pair representation used by the trial-paired / surrogate CCG path (and by
ccg_between_sets_sparse) is projected back into the dense(N, N)matrix thatarr[i, j]lookups expect. Cells outsidepairs(and their mirror, whenmirror=True) holdfill.- Parameters:
pair_values – Per-pair values in the same row order as
pairs.pairs –
(n_pairs, 2)array of(i, j)unit-index pairs.n_units –
N— size of the square output.fill – Value for cells not covered by
pairs.mirror – If
True, also write each value at the transposed cell(j, i)(symmetric fill).dtype – Output dtype; defaults to
pair_values.dtype.
- Returns:
(n_units, n_units)array.- Return type:
np.ndarray
- aind_ephys_utils.metrics.ccg.peak_contributing_spikes(t_pre: ndarray, t_post: ndarray, peak_lag: float, half_width: float = 0.002) tuple[ndarray, ndarray]¶
Find pre-synaptic spikes that have a post-synaptic partner near peak_lag.
Uses a numba-accelerated two-pointer sweep for O(n) runtime.
- Parameters:
t_pre – Sorted spike times of the pre-synaptic (driving) unit.
t_post – Sorted spike times of the post-synaptic (following) unit.
peak_lag – Expected lag in seconds (positive = post fires after pre).
half_width – Tolerance around peak_lag in seconds. A spike pair counts if
|dt - peak_lag| <= half_widthwheredt = t_post - t_pre. Typically set to the CCG bin width or the FWHM of the peak.
- Returns:
contributing (np.ndarray) – Pre-synaptic spike times that have a matching post-synaptic spike.
mask (np.ndarray) – Boolean mask into t_pre (
contributing == t_pre[mask]).
- aind_ephys_utils.metrics.ccg.rescale_ccgs(C: ndarray, axis: int = -1, eps: float = 1e-12) ndarray¶
Rescale correlograms so each (i,j) slice has min=0 and max=1.
- aind_ephys_utils.metrics.ccg.rescale_ccgs_zero_mean(C: ndarray, axis: int = -1, eps: float = 1e-12) ndarray¶
Rescale correlograms: subtract mean, then min-max to [0, 1].
- aind_ephys_utils.metrics.ccg.smooth_ccgs(C: ndarray, bin_size: float, kernel_width: float = 0.005) ndarray¶
Smooth correlograms with a Gaussian kernel.
- Parameters:
C ((N, N, B)) – Raw correlogram counts.
bin_size – Bin width in seconds.
kernel_width – Gaussian kernel std in seconds.