aind_ephys_utils.ops.reduce module

Dimensionality reduction operations.

This module will contain xarray-native reduction helpers (e.g., PCA) used by the .ephys.reduce accessor.

aind_ephys_utils.ops.reduce.reduce(da: DataArray, *, method: str, dim: str | None = 'unit', n_components: int | None = 5, stack: Tuple[str, ...] | None = ('trial', 'time'), unstack: bool = True, return_dataset: bool = True, window: Tuple[float, float] | Sequence[Tuple[float, float]] | None = None, window_apply: str = 'fit_only', orthogonalize: str = 'none', orthogonalize_across: str = 'none', trial_dim: str = 'trial', time_dim: str = 'time', trial_average: bool = True, labels: str | DataArray | Sequence[str | DataArray] | None = None, targets: DataArray | None = None, rank: int | None = None, regularization: float | None = None, cv: int | None = None, gpfa_options: Dict[str, object] | None = None) DataArray | Dataset

Reduce data dimensionality in an xarray-friendly way.

Parameters:
  • da – Input DataArray.

  • method – Reduction method (e.g. “pca”, “gpfa”, “dpca”, “coding_direction”, “logistic”, “lda”, “rrr”).

  • dim – Dimension to reduce across for methods that operate on a single axis.

  • n_components – Number of components to keep (PCA, dPCA).

  • stack – Dims to stack before reduction (default: (trial, time)).

  • unstack – If True, unstack stacked dims in the output.

  • return_dataset – If True, return a Dataset with projections and weights (and explained variance for PCA). For dPCA and supervised methods, a Dataset is always returned.

  • window – Optional (tmin, tmax) window used for fitting supervised methods.

  • window_apply – “fit_only” (default) fits on the window but projects all samples; “fit_and_project” fits and projects only within the window.

  • orthogonalize – How to orthogonalize supervised components: “none”, “qr”, or “svd”.

  • orthogonalize_across – How to orthogonalize across multiple windows/labels: “none”, “windows”, “labels”, or “all”.

  • trial_dim – Trial dimension name.

  • time_dim – Time dimension name.

  • trial_average – If True (default), average across trials before dPCA marginalization. If trial_dim is absent, input is treated as already averaged and must include label dims directly (e.g., choice).

  • labels – Coordinate name(s) used for dPCA and supervised methods (coding direction, logistic, lda). Must exist in da.coords.

  • targets – Target matrix for reduced-rank regression.

  • rank – Rank for reduced-rank regression.

  • regularization – Regularization strength for supervised methods.

  • cv – Cross-validation folds for supervised methods.

  • gpfa_options – Optional dictionary of GPFA configuration overrides, e.g. {"max_iters": 200, "freq_ll": 5, "fast_mode": True, "gp_param_update_every": 5, "random_state": 0}.