EXtra-data

EXtra-data is a Python library for accessing saved data produced at European XFEL.

Installation

EXtra-data is available on our Anaconda installation on the Maxwell cluster:

module load exfel exfel_anaconda3

You can also install it from PyPI to use in other environments with Python 3.7 or later:

pip install extra_data

This will install the extra_data package and the most commonly useful dependencies. Some large dependencies or dependencies only required for specific functionalities are not installed by default. You can use pip to install everything required for extra or all uses of extra_data (e.g. Dask Array, karabo-bridge-serve-files). This installs both extra_data and dependencies that are necessary:

pip install "extra_data[bridge]"    # install dependencies for karabo-bridge-like data streaming
pip install "extra_data[complete]"  # install dependencies for all features

If you get a permissions error, add the --user flag to that command.

Quickstart

Open a run on the Maxwell cluster:

from extra_data import open_run

run = open_run(proposal=700000, run=1)

You can also specify a run directory, or open an individual file - see Opening files for details. The same methods to access data work with any of these options.

Load data as a NumPy array for a given source & key:

arr = run["SA3_XTD10_PES/ADC/1:network", "digitizers.channel_4_A.raw.samples"].ndarray()

You can load only a region of interest, get a labelled array with train IDs, or load 1D data as columns in a pandas dataframe. See Reading data to analyse in memory (example) and Getting data by source & key (reference) for more information.

For data that’s too big to fit in memory at once, you can read one pulse train at a time:

for train_id, data in run.select("*/DET/*", "image.data").trains():
    mod0 = data["FXE_DET_LPD1M-1/DET/0CH0:xtdf"]["image.data"]

Other options to work with large data volumes include breaking the run into smaller parts with split_trains() before loading data, and automatic chunking with the Dask framework and dask_array().

Documentation contents

Indices and tables