EXtra-data is a Python library for accessing saved data produced at European XFEL.
EXtra-data is available on our Anaconda installation on the Maxwell cluster:
module load exfel exfel_anaconda3
You can also install it from PyPI to use in other environments with Python 3.7 or later:
pip install extra_data
This will install the extra_data package and the most commonly useful dependencies. Some large dependencies or dependencies only required for specific functionalities are not installed by default. You can use pip to install everything required for extra or all uses of extra_data (e.g. Dask Array, karabo-bridge-serve-files). This installs both extra_data and dependencies that are necessary:
pip install "extra_data[bridge]" # install dependencies for karabo-bridge-like data streaming pip install "extra_data[complete]" # install dependencies for all features
If you get a permissions error, add the
--user flag to that command.
Open a run on the Maxwell cluster:
from extra_data import open_run run = open_run(proposal=700000, run=1)
You can also specify a run directory, or open an individual file - see Opening files for details. The same methods to access data work with any of these options.
Load data as a NumPy array for a given source & key:
arr = run["SA3_XTD10_PES/ADC/1:network", "digitizers.channel_4_A.raw.samples"].ndarray()
You can load only a region of interest, get a labelled array with train IDs, or load 1D data as columns in a pandas dataframe. See Reading data to analyse in memory (example) and Getting data by source & key (reference) for more information.
For data that’s too big to fit in memory at once, you can read one pulse train at a time:
for train_id, data in run.select("*/DET/*", "image.data").trains(): mod0 = data["FXE_DET_LPD1M-1/DET/0CH0:xtdf"]["image.data"]
- Reading data to analyse in memory
- Inspecting available data
- Reading data train by train
- Aligning data from different sources
- Averaging detector data with Dask
- Parallel processing with a virtual HDF5 dataset
- Accessing LPD data
- Combining data from separate but concurrent runs
- Reading data files
- Multi-module detector data
- Streaming data over ZeroMQ
- Checking data files
- Command line tools
- Data files format
- Performance notes