EXtra-data is a Python library for accessing and working with data produced at European XFEL.
EXtra-data is the new name for karabo_data. The code to work with detector geometry has been separated as EXtra-geom.
EXtra-data is available on our Anaconda installation on the Maxwell cluster:
module load exfel exfel_anaconda3
You can also install it from PyPI to use in other environments with Python 3.5 or later:
pip install extra_data
If you get a permissions error, add the
--user flag to that command.
Open a run or a file - see Opening files for more:
from extra_data import open_run, RunDirectory, H5File # Find a run on the Maxwell cluster run = open_run(proposal=700000, run=1) # Open a run with a directory path run = RunDirectory("/gpfs/exfel/exp/XMPL/201750/p700000/raw/r0001") # Open an individual file file = H5File("RAW-R0017-DA01-S00000.h5")
After this step, you’ll use the same methods to get data whether you opened a run or a file.
Load data into memory - see Getting data by source & key for more:
# Get a labelled array arr = run.get_array("SA3_XTD10_PES/ADC/1:network", "digitizers.channel_4_A.raw.samples") # Get a pandas dataframe of 1D fields df = run.get_dataframe(fields=[ ("*_XGM/*", "*.i[xy]Pos"), ("*_XGM/*", "*.photonFlux") ])
Iterate through data for each pulse train - see Getting data by train for more:
for train_id, data in run.select("*/DET/*", "image.data").trains(): mod0 = data["FXE_DET_LPD1M-1/DET/0CH0:xtdf"]["image.data"]
These are not the only ways to get data: Reading data files describes various other options.
- Reading data to analyse in memory
- Inspecting available data
- Reading data train by train
- Averaging detector data with Dask
- Parallel processing with a virtual HDF5 dataset
- Accessing LPD data
- Combining data from separate but concurrent runs
- Reading data files
- AGIPD, LPD & DSSC data
- Streaming data over ZeroMQ
- Checking data files
- Command line tools
- Data files format
- Performance notes