Release Notes


New features:

  • A new interface for data from a single source & key: use run[source, key] to get a KeyData object, which can inspect and load the data from several sequence files (PR #70).

  • Methods which took a by_index object now accept slices (e.g. numpy.s_[:10]) or indices directly (PR #68, PR #79). This includes select_trains(), get_array() and various methods for multi-module detectors, described in AGIPD, LPD & DSSC data.

  • extra-data-make-virtual-cxi --fill-value now accepts numbers in hexadecimal, octal & binary formats, e.g. 0xfe (PR #73).

  • Added an unstack parameter to the get_array() method for multi-module detectors, making it possible to retrieve an array as the data is stored, without separating the train & pulse axes (PR #72).

  • Added a require_all parameter to the trains() method for multi-module detectors, to allow iterating with incomplete frames included (PR #77).

  • New identify_multimod_detectors() function to find multi-module detectors in the data (PR #61).

Fixes and improvements:

  • Fix writing selected detector frames with write_frames() for corrected data (PR #82).

  • Fix compatibility with pandas 1.1 (PR #83).

  • The trains() iterator no longer includes zero-length arrays when a source has no data for that train (PR #75).

  • Fix a test which failed when run as root (PR #67).


New features:

Fixes and improvements:

  • EXtra-data now tries to manage how many HDF5 files it has open at one time, to avoid hitting a limit on the total number of open files in a process (PR #25 and PR #48). Importing EXtra-data will now raise this limit as far as it can (to 4096 on Maxwell), and try to keep the files it handles to no more than half of this. Files should be silently closed and reopened as needed, so this shouldn’t affect how you use it.

  • A better way of creating Dask arrays to avoid problems with Dask’s local schedulers, and with arrays comprising very large numbers of files (PR #63).

  • The classes for accessing multi-module detector data (see AGIPD, LPD & DSSC data) and writing virtual CXI files no longer assume that the same number of frames are recorded in every train (PR #44).

  • Fix validation where a file has no trains at all (PR #42).

  • More testing of EuXFEL file format version 1.0 (PR #56).

  • Test coverage measurement fixed with multiprocessing (PR #37).

  • Tests switched from mock module to unittest.mock (PR #52).


  • Opening and validating run directories now handles files in parallel, which should make it substantially faster (PR #30).

  • Various data access operations no longer require finding all the keys for a given data source, which saves time in certain situations (PR #24).

  • open_run() now accepts numpy integers for proposal and run numbers, as well as standard Python integers (PR #34).

  • Run map cache files can be saved on the EuXFEL online cluster, which speeds up reopening runs there (PR #36).

  • Added tests with simulated bad files for the validation code (PR #23).


  • New get_dask_array() method for accessing detector data with Dask (PR #18).

  • Fix extra-data-validate with a run directory without a cached data map (PR #12).

  • Add .squeeze() method for virtual stacks of detector data from stack_detector_data() (PR #16).

  • Close each file after reading its metadata, to avoid hitting the limit of open files when opening a large run (PR #8). This is a mitigation: you will still hit the limit if you access data from enough files. The default limit on Maxwell is 1024 files, but you can raise this to 4096 using the Python resource module.

  • Display progress information while validating a run directory (PR #19).

  • Display run duration to only one decimal place (PR #5).

  • Documentation reorganised to emphasise tutorials and examples (PR #10).

This version requires Python 3.6 or above.


First separated version. No functional changes from karabo_data 0.7.

Earlier history

The code in EXtra-data was previously released as karabo_data, up to version 0.7. See the karabo_data release notes for changes before the renaming.