EXtra-data
==========
**EXtra-data** is a Python library for accessing saved data
produced at `European XFEL `_.
Installation
------------
EXtra-data is available in our Python environment on the Maxwell cluster::
module load exfel exfel-python
You can also install it `from PyPI `__
to use in other environments with Python 3::
pip install extra_data
This will install the `extra_data` package and the most commonly useful dependencies.
Some large dependencies or dependencies only required for specific functionalities are
not installed by default. You can use pip to install everything required for extra or
all uses of `extra_data` (e.g. Dask Array, :ref:`cmd-serve-files`). This installs both
`extra_data` and dependencies that are necessary::
pip install "extra_data[bridge]" # install dependencies for karabo-bridge-like data streaming
pip install "extra_data[complete]" # install dependencies for all features
If you get a permissions error, add the ``--user`` flag to that command.
Quickstart
----------
Open a run on the Maxwell cluster::
from extra_data import open_run
run = open_run(proposal=700000, run=1)
You can also specify a run directory, or open an individual file - see
:ref:`opening-files` for details. The same methods to access data work with any
of these options.
Load data as a NumPy array for a given source & key::
arr = run["SA3_XTD10_PES/ADC/1:network", "digitizers.channel_4_A.raw.samples"].ndarray()
You can load only a region of interest, get a labelled array with train IDs,
or load 1D data as columns in a pandas dataframe. See :doc:`xpd_examples`
(example) and :ref:`data-by-source-and-key` (reference) for more information.
For data that's too big to fit in memory at once, you can read one pulse train
at a time::
for train_id, data in run.select("*/DET/*", "image.data").trains():
mod0 = data["FXE_DET_LPD1M-1/DET/0CH0:xtdf"]["image.data"]
Other options to work with large data volumes include breaking the run into
smaller parts with :meth:`~.DataCollection.split_trains` before loading data,
and automatic chunking with the `Dask `_ framework and
:meth:`~.dask_array`.
Documentation contents
----------------------
.. toctree::
:caption: Tutorials and Examples
:maxdepth: 2
xpd_examples
inspection
iterate_trains
aligning_trains
dask_averaging
parallel_example
lpd_data
xpd_examples2
.. toctree::
:caption: Reference
:maxdepth: 2
reading_files
agipd_lpd_data
streaming
validation
cli
data_format
performance
.. toctree::
:caption: Development
:maxdepth: 1
changelog
architecture
.. seealso::
`Data Analysis at European XFEL
`_
Indices and tables
==================
* :ref:`genindex`
* :ref:`search`