Multi-module detector data

Several X-ray pixel detectors are composed of multiple modules, which are stored as separate sources at EuXFEL. extra_data includes convenient interfaces to access data from AGIPD, LPD, DSSC and JUNGFRAU, pulling together the separate modules into a single array.

Note

These detectors can record a lot of data. The .get_array() method loads all of the selected data into memory, which may not be practical for entire runs. You might need to think about iterating over trains, selecting batches of trains from the run, or using Dask arrays.

class extra_data.components.AGIPD1M(data: extra_data.reader.DataCollection, detector_name=None, modules=None, *, min_modules=1)

An interface to AGIPD-1M data.

Parameters
  • data (DataCollection) – A data collection, e.g. from RunDirectory().

  • modules (set of ints, optional) – Detector module numbers to use. By default, all available modules are used.

  • detector_name (str, optional) – Name of a detector, e.g. ‘SPB_DET_AGIPD1M-1’. This is only needed if the dataset includes more than one AGIPD detector.

  • min_modules (int) – Include trains where at least n modules have data. Default is 1.

The methods of this class are identical to those of LPD1M, below.

class extra_data.components.AGIPD500K(data: extra_data.reader.DataCollection, detector_name=None, modules=None, *, min_modules=1)

An interface to AGIPD-500K data

Detector names are like ‘HED_DET_AGIPD500K2G’, otherwise this is identical to AGIPD1M.

class extra_data.components.DSSC1M(data: extra_data.reader.DataCollection, detector_name=None, modules=None, *, min_modules=1)

An interface to DSSC-1M data.

Parameters
  • data (DataCollection) – A data collection, e.g. from RunDirectory().

  • modules (set of ints, optional) – Detector module numbers to use. By default, all available modules are used.

  • detector_name (str, optional) – Name of a detector, e.g. ‘SCS_DET_DSSC1M-1’. This is only needed if the dataset includes more than one DSSC detector.

  • min_modules (int) – Include trains where at least n modules have data. Default is 1.

The methods of this class are identical to those of LPD1M, below.

class extra_data.components.LPD1M(data: extra_data.reader.DataCollection, detector_name=None, modules=None, *, min_modules=1, parallel_gain=False)

An interface to LPD-1M data.

Parameters
  • data (DataCollection) – A data collection, e.g. from RunDirectory().

  • modules (set of ints, optional) – Detector module numbers to use. By default, all available modules are used.

  • detector_name (str, optional) – Name of a detector, e.g. ‘FXE_DET_LPD1M-1’. This is only needed if the dataset includes more than one LPD detector.

  • min_modules (int) – Include trains where at least n modules have data. Default is 1.

  • parallel_gain (bool) – Set to True to read this data as parallel gain data, where high, medium and low gain data are stored sequentially within each train. This will repeat the pulse & cell IDs from the first 1/3 of each train, and add gain stage labels from 0 (high-gain) to 2 (low-gain).

get_array(key, pulses=slice(None, None, None), unstack_pulses=True, *, fill_value=None, subtrain_index='pulseId', roi=(), astype=None)

Get a labelled array of detector data

Parameters
  • key (str) – The data to get, e.g. ‘image.data’ for pixel values.

  • pulses (slice, array, by_id or by_index) – Select the pulses to include from each train. by_id selects by pulse ID, by_index by index within the data being read. The default includes all pulses. Only used for per-pulse data.

  • unstack_pulses (bool) – Whether to separate train and pulse dimensions.

  • fill_value (int or float, optional) – Value to use for missing values. If None (default) the fill value is 0 for integers and np.nan for floats.

  • subtrain_index (str) – Specify ‘pulseId’ (default) or ‘cellId’ to label the frames recorded within each train. Pulse ID should allow this data to be matched with other devices, but depends on how the detector was manually configured when the data was taken. Cell ID refers to the memory cell used for that frame in the detector hardware.

  • roi (tuple) – Specify e.g. np.s_[10:60, 100:200] to select pixels within each module when reading data. The selection is applied to each individual module, so it may only be useful when working with a single module. For AGIPD raw data, each module records a frame as a 3D array with 2 entries on the first dimension, for data & gain information, so roi=np.s_[0] will select only the data part of each frame.

  • astype (Type) – data type of the output array. If None (default) the dtype matches the input array dtype

get_dask_array(key, subtrain_index='pulseId', fill_value=None, astype=None)

Get a labelled Dask array of detector data

Dask does lazy, parallelised computing, and can work with large data volumes. This method doesn’t immediately load the data: that only happens once you trigger a computation.

Parameters
  • key (str) – The data to get, e.g. ‘image.data’ for pixel values.

  • subtrain_index (str, optional) – Specify ‘pulseId’ (default) or ‘cellId’ to label the frames recorded within each train. Pulse ID should allow this data to be matched with other devices, but depends on how the detector was manually configured when the data was taken. Cell ID refers to the memory cell used for that frame in the detector hardware.

  • fill_value (int or float, optional) – Value to use for missing values. If None (default) the fill value is 0 for integers and np.nan for floats.

  • astype (Type, optional) – data type of the output array. If None (default) the dtype matches the input array dtype

trains(pulses=slice(None, None, None), require_all=True)

Iterate over trains for detector data.

Parameters
  • pulses (slice, array, by_index or by_id) – Select which pulses to include for each train. The default is to include all pulses.

  • require_all (bool) – If True (default), skip trains where any of the selected detector modules are missing data.

Yields

train_data (dict) – A dictionary mapping key names (e.g. image.data) to labelled arrays.

write_frames(filename, trains, pulses)

Write selected detector frames to a new EuXFEL HDF5 file

trains and pulses should be 1D arrays of the same length, containing train IDs and pulse IDs (corresponding to the pulse IDs recorded by the detector). i.e. (trains[i], pulses[i]) identifies one frame.

write_virtual_cxi(filename, fillvalues=None)

Write a virtual CXI file to access the detector data.

The virtual datasets in the file provide a view of the detector data as if it was a single huge array, but without copying the data. Creating and using virtual datasets requires HDF5 1.10.

Parameters
  • filename (str) – The file to be written. Will be overwritten if it already exists.

  • fillvalues (dict, optional) – keys are datasets names (one of: data, gain, mask) and associated fill value for missing data (default is np.nan for float arrays and zero for integer arrays)

See also

Accessing LPD data: An example using the class above.

class extra_data.components.JUNGFRAU(data: extra_data.reader.DataCollection, detector_name=None, modules=None, *, min_modules=1)

An interface to JUNGFRAU data.

Parameters
  • data (DataCollection) – A data collection, e.g. from RunDirectory().

  • modules (set of ints, optional) – Detector module numbers to use. By default, all available modules are used.

  • detector_name (str, optional) – Name of a detector, e.g. ‘SPB_IRDA_JNGFR’. This is only needed if the dataset includes more than one JUNGFRAU detector.

  • min_modules (int) – Include trains where at least n modules have data. Default is 1.

get_array(key, *, fill_value=None, roi=(), astype=None)

Get a labelled array of detector data

Parameters
  • key (str) – The data to get, e.g. ‘data.adc’ for pixel values.

  • fill_value (int or float, optional) – Value to use for missing values. If None (default) the fill value is 0 for integers and np.nan for floats.

  • roi (tuple) – Specify e.g. np.s_[:, 10:60, 100:200] to select data within each module & each train when reading data. The first dimension is pulses, then there are two pixel dimensions. The same selection is applied to data from each module, so selecting pixels may only make sense if you’re using a single module.

  • astype (Type) – data type of the output array. If None (default) the dtype matches the input array dtype

get_dask_array(key, fill_value=None, astype=None)

Get a labelled Dask array of detector data

Dask does lazy, parallelised computing, and can work with large data volumes. This method doesn’t immediately load the data: that only happens once you trigger a computation.

Parameters
  • key (str) – The data to get, e.g. ‘data.adc’ for pixel values.

  • fill_value (int or float, optional) – Value to use for missing values. If None (default) the fill value is 0 for integers and np.nan for floats.

  • astype (Type) – data type of the output array. If None (default) the dtype matches the input array dtype

trains(require_all=True)

Iterate over trains for detector data.

Parameters

require_all (bool) – If True (default), skip trains where any of the selected detector modules are missing data.

Yields

train_data (dict) – A dictionary mapping key names (e.g. ‘data.adc’) to labelled arrays.

extra_data.components.identify_multimod_detectors(data, detector_name=None, *, single=False, clses=None)

Identify multi-module detectors in the data

Various detectors record data in a similar format, and we often want to process whichever detector was used in a run. This tries to identify the detector, so a user doesn’t have to specify it manually.

If single=True, this returns a tuple of (detector_name, access_class), throwing ValueError if there isn’t exactly 1 detector found. If single=False, it returns a set of these tuples.

clses may be a list of acceptable detector classes to check.

If you get data for a train from the main DataCollection interface, there is also another way to combine detector modules from AGIPD, DSSC or LPD:

extra_data.stack_detector_data(train, data, axis=- 3, modules=16, fillvalue=None, real_array=True, *, pattern='/DET/(\\d+)CH', starts_at=0)

Stack data from detector modules in a train.

Parameters
  • train (dict) – Train data.

  • data (str) – The path to the device parameter of the data you want to stack, e.g. ‘image.data’.

  • axis (int) – Array axis on which you wish to stack (default is -3).

  • modules (int) – Number of modules composing a detector (default is 16).

  • fillvalue (number) – Value to use in place of data for missing modules. The default is nan (not a number) for floating-point data, and 0 for integers.

  • real_array (bool) – If True (default), copy the data together into a real numpy array. If False, avoid copying the data and return a limited array-like wrapper around the existing arrays. This is sufficient for assembling images using detector geometry, and allows better performance.

  • pattern (str) – Regex to find the module number in source names. Should contain a group which can be converted to an integer. E.g. r'/DET/JNGFR(\d+)' for one JUNGFRAU naming convention.

  • starts_at (int) – By default, uses module numbers starting at 0 (e.g. 0-15 inclusive). If the numbering is e.g. 1-16 instead, pass starts_at=1. This is not automatic because the first or last module may be missing from the data.

Returns

combined – Stacked data for requested data path.

Return type

numpy.array