Xarray primer¶
This section introduces the minimal Xarray concepts used in this library. The goal is to make the docs usable even if you have never seen Xarray before.
Core ideas¶
dims: named axes (e.g. “trial”, “unit”, “time”)coords: labeled values along dims (e.g. time in seconds, trial metadata)attrs: metadata for the whole array (e.g. ephys.time_unit)
Why it matters¶
Xarray gives names to axes. That means you can select, group, and combine data without remembering axis order. The library uses these names to keep all operations composable.
Common patterns¶
Select by label¶
Use sel to access data via meaningful labels, rather than
array positions. This is the safest way to subset when trials/units are
reordered, dropped, or merged across preprocessing steps, because the
selection follows coordinate values instead of implicit integer offsets.
da.sel(trial=[0, 1])
Select by integer index¶
Use isel for fast positional slicing when labels are unavailable or when
you intentionally want “the first N items” regardless of their coordinate
values. This is especially useful for quick inspection, debugging, and
deterministic train/test splits based on order.
da.isel(trial=0)
da.isel(unit=slice(0,10))
Filter with conditions¶
Use where(..., drop=True) to express quality-control or behavioral filters
as explicit boolean logic. This keeps filtering readable and reproducible while preserving aligned coordinates on all remaining dimensions.
da_correct = da.where(da["response"] == 1, drop=True)
Assign coordinates¶
Use assign_coords to attach semantic labels (condition IDs, brain region,
session metadata) so later operations can be written using understandable terms. In practice, many higher-level workflows
(groupby, condition averages, plotting labels) depend on these coordinates.
da = da.assign_coords(response=("trial", response_values))
Group by trial metadata¶
Use groupby when you need per-condition summaries while preserving the
same analysis pipeline across conditions. It replaces manual mask loops with a
single declarative step and guarantees each group stays aligned to its own
coordinate label.
for condition, sub in da.groupby("response"):
...
Reduce across dimensions¶
Use named reductions (mean, median, etc.) to collapse across a named
dimensions (for example, trial-averaged
time courses or unit-averaged population dynamics). Because reductions are
dimension-aware, they are less error-prone than raw numpy axis numbers.
mean_over_trials = da.mean(dim="trial")
mean_over_time = da.mean(dim="time")
Stack and unstack¶
Use stack to reshape multidimensional data into a 2D matrix for methods
that expect samples x features (PCA, regressions, decoders). Use unstack
to map model outputs back into interpretable axes so results can be plotted
and compared in the original trial/time/condition structure.
The reduce operator in this package performs this stacking automatically.
da_stack = da.stack(sample=("trial", "time"))
da_unstack = da_stack.unstack("sample")
Common gotchas¶
Always check
da.dimsandda.coordsafter transformations to understand the shape of your data.Xarray aligns by coordinate labels, not just shape. Two arrays with the same length can still mismatch if coordinate values differ.
wherekeeps original shape unless you passdrop=True. If you expected fewer trials/time bins, confirm the dimension sizes after filtering.groupbyonly works on existing coordinates. If your condition is in a separate table or NumPy array, attach it first withassign_coords.stackcreates a MultiIndex coordinate. Some downstream tools expect plain coordinates, so you may needreset_indexorunstackbefore export.Missing values (NaNs) propagate through many operations by default. Decide early whether to fill, mask, or drop NaNs to avoid silent changes in summary statistics.
Coordinate dtype matters for selection. For example, string labels
"1"do not match integer labels1insel/loc.