AGIPD, LPD & DSSC data

These data from AGIPD, LPD and DSSC is spread out in separate files. extra_data includes convenient interfaces to access this data, pulling together the separate modules into a single array.

class extra_data.components.AGIPD1M(data: extra_data.reader.DataCollection, detector_name=None, modules=None, *, min_modules=1)

An interface to AGIPD-1M data.

Parameters
  • data (DataCollection) – A data collection, e.g. from RunDirectory.

  • modules (set of ints, optional) – Detector module numbers to use. By default, all available modules are used.

  • detector_name (str, optional) – Name of a detector, e.g. ‘SPB_DET_AGIPD1M-1’. This is only needed if the dataset includes more than one AGIPD detector.

  • min_modules (int) – Include trains where at least n modules have data. Default is 1.

The methods of this class are identical to those of LPD1M, below.

class extra_data.components.DSSC1M(data: extra_data.reader.DataCollection, detector_name=None, modules=None, *, min_modules=1)

An interface to DSSC-1M data.

Parameters
  • data (DataCollection) – A data collection, e.g. from RunDirectory.

  • modules (set of ints, optional) – Detector module numbers to use. By default, all available modules are used.

  • detector_name (str, optional) – Name of a detector, e.g. ‘SCS_DET_DSSC1M-1’. This is only needed if the dataset includes more than one DSSC detector.

  • min_modules (int) – Include trains where at least n modules have data. Default is 1.

The methods of this class are identical to those of LPD1M, below.

class extra_data.components.LPD1M(data: extra_data.reader.DataCollection, detector_name=None, modules=None, *, min_modules=1)

An interface to LPD-1M data.

Parameters
  • data (DataCollection) – A data collection, e.g. from RunDirectory.

  • modules (set of ints, optional) – Detector module numbers to use. By default, all available modules are used.

  • detector_name (str, optional) – Name of a detector, e.g. ‘FXE_DET_LPD1M-1’. This is only needed if the dataset includes more than one LPD detector.

  • min_modules (int) – Include trains where at least n modules have data. Default is 1.

get_dask_array(key, subtrain_index='pulseId')

Get a labelled Dask array of detector data

Dask does lazy, parallelised computing, and can work with large data volumes. This method doesn’t immediately load the data: that only happens once you trigger a computation.

Parameters
  • key (str) – The data to get, e.g. ‘image.data’ for pixel values.

  • subtrain_index (str) – Specify ‘pulseId’ (default) or ‘cellId’ to label the frames recorded within each train. Pulse ID should allow this data to be matched with other devices, but depends on how the detector was manually configured when the data was taken. Cell ID refers to the memory cell used for that frame in the detector hardware.

get_array(key, pulses=slice(None, None, None), unstack_pulses=True)

Get a labelled array of detector data

Parameters
  • key (str) – The data to get, e.g. ‘image.data’ for pixel values.

  • pulses (slice, array, by_id or by_index) – Select the pulses to include from each train. by_id selects by pulse ID, by_index by index within the data being read. The default includes all pulses. Only used for per-train data.

  • unstack_pulses (bool) – Whether to separate train and pulse dimensions.

trains(pulses=slice(None, None, None), require_all=True)

Iterate over trains for detector data.

Parameters
  • pulses (slice, array, by_index or by_id) – Select which pulses to include for each train. The default is to include all pulses.

  • require_all (bool) – If True (default), skip trains where any of the selected detector modules are missing data.

Yields

train_data (dict) – A dictionary mapping key names (e.g. image.data) to labelled arrays.

write_frames(filename, trains, pulses)

Write selected detector frames to a new EuXFEL HDF5 file

trains and pulses should be 1D arrays of the same length, containing train IDs and pulse IDs (corresponding to the pulse IDs recorded by the detector). i.e. (trains[i], pulses[i]) identifies one frame.

write_virtual_cxi(filename, fillvalues=None)

Write a virtual CXI file to access the detector data.

The virtual datasets in the file provide a view of the detector data as if it was a single huge array, but without copying the data. Creating and using virtual datasets requires HDF5 1.10.

Parameters
  • filename (str) – The file to be written. Will be overwritten if it already exists.

  • fillvalues (dict, optional) – keys are datasets names (one of: data, gain, mask) and associated fill value for missing data (default is np.nan for float arrays and zero for integer arrays)

See also

Accessing LPD data: An example using the class above.

extra_data.components.identify_multimod_detectors(data, detector_name=None, *, single=False, clses=None)

Identify multi-module detectors in the data

Various detectors record data in a similar format, and we often want to process whichever detector was used in a run. This tries to identify the detector, so a user doesn’t have to specify it manually.

If single=True, this returns a tuple of (detector_name, access_class), throwing ValueError if there isn’t exactly 1 detector found. If single=False, it returns a set of these tuples.

clses may be a list of acceptable detector classes to check.

If you get data for a train from the main DataCollection interface, there is also another way to combine detector modules from AGIPD or LPD:

extra_data.stack_detector_data(train, data, axis=- 3, modules=16, fillvalue=nan, real_array=True)

Stack data from detector modules in a train.

Parameters
  • train (dict) – Train data.

  • data (str) – The path to the device parameter of the data you want to stack, e.g. ‘image.data’.

  • axis (int) – Array axis on which you wish to stack (default is -3).

  • modules (int) – Number of modules composing a detector (default is 16).

  • fillvalue (number) – Value to use in place of data for missing modules. The default is nan (not a number) for floating-point data, and 0 for integers.

  • real_array (bool) – If True (default), copy the data together into a real numpy array. If False, avoid copying the data and return a limited array-like wrapper around the existing arrays. This is sufficient for assembling images using detector geometry, and allows better performance.

Returns

combined – Stacked data for requested data path.

Return type

numpy.array