Xarray primer ============= This section introduces the minimal Xarray concepts used in this library. The goal is to make the docs usable even if you have never seen Xarray before. Core ideas ---------- - ``dims``: named axes (e.g. "trial", "unit", "time") - ``coords``: labeled values along dims (e.g. time in seconds, trial metadata) - ``attrs``: metadata for the whole array (e.g. ephys.time_unit) Why it matters -------------- Xarray gives names to axes. That means you can select, group, and combine data without remembering axis order. The library uses these names to keep all operations composable. Common patterns --------------- Select by label ^^^^^^^^^^^^^^^ Use ``sel`` to access data via *meaningful labels*, rather than array positions. This is the safest way to subset when trials/units are reordered, dropped, or merged across preprocessing steps, because the selection follows coordinate values instead of implicit integer offsets. .. code-block:: python da.sel(trial=[0, 1]) Select by integer index ^^^^^^^^^^^^^^^^^^^^^^^ Use ``isel`` for fast positional slicing when labels are unavailable or when you intentionally want "the first N items" regardless of their coordinate values. This is especially useful for quick inspection, debugging, and deterministic train/test splits based on order. .. code-block:: python da.isel(trial=0) da.isel(unit=slice(0,10)) Filter with conditions ^^^^^^^^^^^^^^^^^^^^^^ Use ``where(..., drop=True)`` to express quality-control or behavioral filters as explicit boolean logic. This keeps filtering readable and reproducible while preserving aligned coordinates on all remaining dimensions. .. code-block:: python da_correct = da.where(da["response"] == 1, drop=True) Assign coordinates ^^^^^^^^^^^^^^^^^^ Use ``assign_coords`` to attach semantic labels (condition IDs, brain region, session metadata) so later operations can be written using understandable terms. In practice, many higher-level workflows (``groupby``, condition averages, plotting labels) depend on these coordinates. .. code-block:: python da = da.assign_coords(response=("trial", response_values)) Group by trial metadata ^^^^^^^^^^^^^^^^^^^^^^^ Use ``groupby`` when you need per-condition summaries while preserving the same analysis pipeline across conditions. It replaces manual mask loops with a single declarative step and guarantees each group stays aligned to its own coordinate label. .. code-block:: python for condition, sub in da.groupby("response"): ... Reduce across dimensions ^^^^^^^^^^^^^^^^^^^^^^^^ Use named reductions (``mean``, ``median``, etc.) to collapse across a named dimensions (for example, trial-averaged time courses or unit-averaged population dynamics). Because reductions are dimension-aware, they are less error-prone than raw ``numpy`` axis numbers. .. code-block:: python mean_over_trials = da.mean(dim="trial") mean_over_time = da.mean(dim="time") Stack and unstack ^^^^^^^^^^^^^^^^^ Use ``stack`` to reshape multidimensional data into a 2D matrix for methods that expect samples x features (PCA, regressions, decoders). Use ``unstack`` to map model outputs back into interpretable axes so results can be plotted and compared in the original trial/time/condition structure. The ``reduce`` operator in this package performs this stacking automatically. .. code-block:: python da_stack = da.stack(sample=("trial", "time")) da_unstack = da_stack.unstack("sample") Common gotchas -------------- - Always check ``da.dims`` and ``da.coords`` after transformations to understand the shape of your data. - Xarray aligns by coordinate labels, not just shape. Two arrays with the same length can still mismatch if coordinate values differ. - ``where`` keeps original shape unless you pass ``drop=True``. If you expected fewer trials/time bins, confirm the dimension sizes after filtering. - ``groupby`` only works on existing coordinates. If your condition is in a separate table or NumPy array, attach it first with ``assign_coords``. - ``stack`` creates a MultiIndex coordinate. Some downstream tools expect plain coordinates, so you may need ``reset_index`` or ``unstack`` before export. - Missing values (NaNs) propagate through many operations by default. Decide early whether to fill, mask, or drop NaNs to avoid silent changes in summary statistics. - Coordinate dtype matters for selection. For example, string labels ``"1"`` do not match integer labels ``1`` in ``sel``/``loc``.