Fix a check which made it very slow to open runs with thousands of files (PR #183).
Several new methods for accessing different kinds of metadata:
Several fixes for handling ‘suspect’ train IDs (PR #172).
h5py >= 2.10 is now required (PR #177).
Avoid converting train IDs to floats when using
run.select(..., require_all=True)(PR #159).
Checking whether a given source & key is present is now much faster in some cases (PR #170).
Deprecations & potentially breaking changes:
Earlier versions of EXtra-data unintentionally converted integer data from multi-module detectors to floats (in
get_dask_array()) with the special value NaN for missing data. This version preserves the data type, but missing integer data will be filled with 0. If this is not suitable, you can use the
min_modulesparameter to get only trains where all modules have data, or pass
astype=np.float64, fill_value=np.nanto convert data to floats and fill gaps with NaN as before.
Special handling in
get_series()to label some fast detector data with pulse IDs was deprecated (PR #131). We believe no-one is using this. If you are, please contact firstname.lastname@example.org to discuss alternatives.
Fixes and improvements
Fix default fill value for uint64 data in
extra-data-validatewhen a file cannot be opened (PR #93).
Fix name of
extra-data-validatein its own help info (PR #90).
A new interface for data from a single source & key: use
run[source, key]to get a
KeyDataobject, which can inspect and load the data from several sequence files (PR #70).
Methods which took a
by_indexobject now accept slices (e.g.
numpy.s_[:10]) or indices directly (PR #68, PR #79). This includes
get_array()and various methods for multi-module detectors, described in Multi-module detector data.
Fixes and improvements:
karabo-bridge-serve-files --append-detector-modulesoption to combine data from multiple detector modules. This makes streaming large detector data more similar to the live data streams (PR #40 and PR #51).
New options to filter files from dCache which are unavailable or need to be read from tape when opening a run (PR #35). This also comes with a new command extra-data-locality to inspect this information.
DataCollection.select()can take arbitrary iterables of patterns, rather than just lists (PR #43).
Fixes and improvements:
EXtra-data now tries to manage how many HDF5 files it has open at one time, to avoid hitting a limit on the total number of open files in a process (PR #25 and PR #48). Importing EXtra-data will now raise this limit as far as it can (to 4096 on Maxwell), and try to keep the files it handles to no more than half of this. Files should be silently closed and reopened as needed, so this shouldn’t affect how you use it.
A better way of creating Dask arrays to avoid problems with Dask’s local schedulers, and with arrays comprising very large numbers of files (PR #63).
The classes for accessing multi-module detector data (see Multi-module detector data) and writing virtual CXI files no longer assume that the same number of frames are recorded in every train (PR #44).
Fix validation where a file has no trains at all (PR #42).
More testing of EuXFEL file format version 1.0 (PR #56).
Test coverage measurement fixed with multiprocessing (PR #37).
Tests switched from
Opening and validating run directories now handles files in parallel, which should make it substantially faster (PR #30).
Various data access operations no longer require finding all the keys for a given data source, which saves time in certain situations (PR #24).
Added tests with simulated bad files for the validation code (PR #23).
Close each file after reading its metadata, to avoid hitting the limit of open files when opening a large run (PR #8). This is a mitigation: you will still hit the limit if you access data from enough files. The default limit on Maxwell is 1024 files, but you can raise this to 4096 using the Python resource module.
Display progress information while validating a run directory (PR #19).
Display run duration to only one decimal place (PR #5).
Documentation reorganised to emphasise tutorials and examples (PR #10).
This version requires Python 3.6 or above.
First separated version. No functional changes from karabo_data 0.7.
The code in EXtra-data was previously released as karabo_data, up to version 0.7. See the karabo_data release notes for changes before the renaming.