Architecture

Note

This page describes technical details about EXtra-data. You shouldn’t need this information to use it.

Objects

There are three classes making up the core API of EXtra-data:

  • DataCollection is what you get from opening a run or file: data for several sources over some range of pulse trains (i.e. time). It has methods to select a subset of that data.

  • SourceData comes from run[source], representing one source, such as a motor or a detector module. Each source has a set of keys.

  • KeyData comes from run[source, key], representing data for a single source & key. This has a dtype and a shape like a NumPy array, but the data is not in memory. It has methods to load the data as a NumPy array, an Xarray DataArray, or a Dask array.

Component classes for multi-module detectors build on top of this core to work more conveniently with major data sources. There are more component classes in the EXtra package.

FileAccess is a lower-level class to manage access to a single EuXFEL format HDF5 file, including caching index information. There should only be one FileAccess object per file on disk, even if multiple DataCollection, SourceData and KeyData objects refer to it.

Modules

  • cli contains command-line interfaces.

  • components provides interfaces that bring together data from several similar sources, i.e. multi-module detectors where each module is a separate source.

  • exceptions defines some custom error classes.

  • export sends data from files over ZMQ in the Karabo Bridge format.

  • file_access contains FileAccess (described above), along with machinery to keep the number of open files under a limit.

  • keydata contains KeyData (described above).

  • locality can check whether files are available on disk or on tape in a dCache filesystem.

  • lsxfel is the lsxfel command.

  • reader contains DataCollection (described above), and functions to open a run or a file.

  • read_machinery is a collection of pieces that support reader.

  • run_files_map manages caching metadata about the files of a run in a JSON file, to speed up opening the run.

  • sourcedata contains SourceData (described above).

  • stacking has functions for stacking multiple arrays into one, another option for working with multi-module detector data.

  • utils is miscellaneous pieces that don’t fit anywhere else.

  • validation checks if files & runs have the expected format, for the extra-data-validate command.

  • writer writes data in EuXFEL format files, for write() and write_virtual().

  • write_cxi makes CXI format HDF5 files using virtual datasets to expose multi-module detector data. Used by write_virtual_cxi() and the extra-data-make-virtual-cxi command.