{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Averaging detector data with Dask\n", "\n", "We often want to average large detector data across trains, keeping the pulses within each train separate, so we have an average image for pulse 0, another for pulse 1, etc.\n", "\n", "This data may be too big to load into memory at once, but using [Dask](https://dask.org/) we can work with it like a numpy array. Dask takes care of splitting the job up into smaller pieces and assembling the result." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from extra_data import open_run\n", "\n", "import dask.array as da\n", "from dask.distributed import Client, progress\n", "from dask_jobqueue import SLURMCluster\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, we use [Dask-Jobqueue](https://jobqueue.dask.org/en/latest/) to talk to the Maxwell cluster." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "039219ffa3c54a9e8a8fb32e40f6c563", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(HTML(value='

SLURMCluster

'), HBox(children=(HTML(value='\\n
\\n