aind_ephys_utils.adapters.dataframe module

pandas/polars -> xarray ingestion for ephys analysis.

This module implements from_dataframe, an adapter that converts one or two DataFrames into canonical xarray objects.

Supported patterns

  1. from_dataframe(units_df): Build a ragged spikes DataArray with dims (unit, trial) where trial=[0] and spike times are expressed in session time.

  2. from_dataframe(trials_df): If the DataFrame does not look like a units/spikes table, interpret it as a trials/events table and build an events DataArray with dims (trial, event, bound).

  3. from_dataframe(units_df, trials_df): Segment per-unit session spike times by trial boundaries and express spikes in trial time relative to an anchor (by default, the values in the start_time column).

The resulting DataArrays are validated by aind_ephys_utils.standards.validate when validate=True.

exception aind_ephys_utils.adapters.dataframe.FromDataFrameError

Bases: ValueError

Raised when from_dataframe cannot interpret or validate the inputs.

aind_ephys_utils.adapters.dataframe.from_dataframe(*dfs: DataFrame, unit_id_col: str | None = None, trial_id_col: str | None = None, spike_times_col: str = 'spike_times', trial_start_col: str = 'start_time', trial_end_col: str = 'end_time', align_to: str | None = None, event_cols: Dict[str, str] | None = None, epoch_cols: Dict[str, Tuple[str, str]] | None = None, long_event_col: str | None = None, long_time_col: str | None = None, long_end_time_col: str | None = None, unit_coords: List[str] | None = None, trial_coords: List[str] | None = None, bin_size: float | None = None, window: Tuple[float, float] | None = None, time: ndarray | None = None, time_unit: str = 's', validate: bool = True) DataArray

Convert pandas or polars DataFrames to xarray DataArrays for ephys analysis.

Four use cases (by number of DataFrames and bin_size):

  1. from_dataframe(units_df):
    -> ragged spikes DataArray (unit, trial) with trial=[0]

    spike times are session-time (no segmentation)

  2. from_dataframe(trials_df) where it does NOT look like a units table:

    -> events/epochs DataArray (trial, event, bound)

  3. from_dataframe(units_df, trials_df, bin_size=None):
    -> ragged spikes DataArray (unit, trial)

    segmented by [trial_start, trial_end] and expressed relative to anchor (anchor = align_to column if provided else trial_start)

  4. from_dataframe(units_df, trials_df, bin_size=…):
    -> dense binned spikes DataArray (unit, trial, time) (rate Hz)

    segmented by trial bounds and binned relative to anchor within window/time

The trials_df can be organized in “wide” or “long” format.

Example of “wide” format:

1 | 0.0 | 2.0 | 0.5 | 0.7 | 1.5 2 | 3.0 | 5.0 | 3.5 | 3.7 | 4.5

Example of “long” format:

1 | go_cue | 0.5 | 0.5 1 | delay | 0.7 | 1.5 2 | go_cue | 3.5 | 3.5 2 | delay | 3.7 | 4.5

Parameters:
  • *dfs (pd.DataFrame) – One or two DataFrames. If one DataFrame is provided, it is interpreted as either a units table (containing spike times) or a trials/events table. If two DataFrames are provided, one must be a units table and the other a trials table.

  • unit_id_col (str, optional) – Column name in units_df to use as unit identifiers. If None, uses the DataFrame index (for pandas DataFrames) or row index (for polars DataFrames).

  • trial_id_col (str, optional) – Column name in trials_df to use as trial identifiers. If None, uses the DataFrame index (for pandas DataFrames) or row index (for polars DataFrames).

  • spike_times_col (str, default="spike_times") – Column name in units_df containing spike times (in session time). Each entry should be a 1D array-like of spike times in seconds.

  • trial_start_col (str, default="start_time") – Column name in trials_df indicating trial start times (session time).

  • trial_end_col (str, default="end_time") – Column name in trials_df indicating trial end times (session time).

  • align_to (str, optional) – Column name in trials_df to use as the alignment anchor for each trial. If None, defaults to trial_start_col. Spike times will be expressed relative to this anchor.

  • event_cols (dict[str, str], optional) – Dictionary mapping event names to column names in trials_df for instantaneous events. Only used when a single trials_df is provided (case 2). E.g., {“go_cue”: “go_cue_time”}.

  • epoch_cols (dict[str, tuple[str, str]], optional) – Dictionary mapping epoch names to (start_col, end_col) pairs in trials_df for extended epochs. Only used when a single trials_df is provided (case 2). E.g., {“delay”: (“delay_start”, “delay_end”)}.

  • long_event_col (str, optional) – Column name in trials_df containing event names in long format. Only used when a single trials_df is provided (case 2).

  • long_time_col (str, optional) – Column name in trials_df containing event times corresponding to long_event_col. Only used when a single trials_df is provided (case 2).

  • long_end_time_col (str, optional) – Column name in trials_df containing epoch end times for long format epochs. Only used when a single trials_df is provided (case 2).

  • unit_coords (list[str], optional) – List of column names from units_df to attach as unit coordinates. If None, all columns except unit_id_col and spike_times_col are included.

  • trial_coords (list[str], optional) – List of column names from trials_df to attach as trial coordinates. If None, all columns except trial_id_col, trial_start_col, trial_end_col, and align_to are included.

  • bin_size (float, optional) – Time bin size (in time_unit) for binning spikes. Only relevant when both units_df and trials_df are provided. If None, returns ragged spikes (case 3). If specified, returns dense binned spikes (case 4).

  • window (tuple[float, float], optional) – Time window (start, end) relative to anchor for extracting spikes, specified in time_unit. Required when bin_size is specified. If None with bin_size=None, uses the full trial duration.

  • time (np.ndarray, optional) – Custom 1D array of uniformly-spaced time bin centers for binned output. Must have spacing equal to bin_size. If provided, overrides automatic bin center calculation. Only used when bin_size is specified.

  • time_unit (str, default="s") – Unit of time for all time-related parameters and output (“s” for seconds, “ms” for milliseconds, etc.).

  • validate (bool, default=True) – If True, validates the output DataArray against ephys standards using aind_ephys_utils.standards.validate.

Returns:

Ephys data as an xarray DataArray with appropriate dimensions, coordinates, and metadata attributes.

Return type:

xr.DataArray

Raises:

FromDataFrameError – If inputs cannot be interpreted or validated, including: - Wrong number of DataFrames (must be 1 or 2) - Missing required columns - Invalid spike times format - Trial boundaries contain NaNs or invalid values - bin_size specified without window (when time is not provided) - time array is not uniformly spaced or doesn’t match bin_size

Examples

Case 1: Units only (session time) >>> spikes = from_dataframe(units_df)

Case 2: Events/epochs from trials >>> events = from_dataframe(trials_df)

Case 3: Units + trials (ragged, trial time) >>> spikes = from_dataframe(units_df, trials_df, align_to=”stim_onset”)

Case 4: Units + trials (binned, trial time) >>> spikes = from_dataframe( … units_df, trials_df, … bin_size=0.01, window=(-0.5, 1.0), align_to=”stim_onset” … )