A new interface for data from a single source & key: use
run[source, key]to get a
KeyDataobject, which can inspect and load the data from several sequence files (PR #70).
Methods which took a
by_indexobject now accept slices (e.g.
numpy.s_[:10]) or indices directly (PR #68, PR #79). This includes
get_array()and various methods for multi-module detectors, described in AGIPD, LPD & DSSC data.
Fixes and improvements:
karabo-bridge-serve-files --append-detector-modulesoption to combine data from multiple detector modules. This makes streaming large detector data more similar to the live data streams (PR #40 and PR #51).
New options to filter files from dCache which are unavailable or need to be read from tape when opening a run (PR #35). This also comes with a new command extra-data-locality to inspect this information.
DataCollection.select()can take arbitrary iterables of patterns, rather than just lists (PR #43).
Fixes and improvements:
EXtra-data now tries to manage how many HDF5 files it has open at one time, to avoid hitting a limit on the total number of open files in a process (PR #25 and PR #48). Importing EXtra-data will now raise this limit as far as it can (to 4096 on Maxwell), and try to keep the files it handles to no more than half of this. Files should be silently closed and reopened as needed, so this shouldn’t affect how you use it.
A better way of creating Dask arrays to avoid problems with Dask’s local schedulers, and with arrays comprising very large numbers of files (PR #63).
Fix validation where a file has no trains at all (PR #42).
More testing of EuXFEL file format version 1.0 (PR #56).
Test coverage measurement fixed with multiprocessing (PR #37).
Tests switched from
Opening and validating run directories now handles files in parallel, which should make it substantially faster (PR #30).
Various data access operations no longer require finding all the keys for a given data source, which saves time in certain situations (PR #24).
Added tests with simulated bad files for the validation code (PR #23).
Close each file after reading its metadata, to avoid hitting the limit of open files when opening a large run (PR #8). This is a mitigation: you will still hit the limit if you access data from enough files. The default limit on Maxwell is 1024 files, but you can raise this to 4096 using the Python resource module.
Display progress information while validating a run directory (PR #19).
Display run duration to only one decimal place (PR #5).
Documentation reorganised to emphasise tutorials and examples (PR #10).
This version requires Python 3.6 or above.
First separated version. No functional changes from karabo_data 0.7.