{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "a517d16c",
   "metadata": {},
   "source": [
    "# Aligning data from different sources\n",
    "\n",
    "Sometimes, instruments recording data miss a train. In particular, different sources may start & finish recording at slightly different times. So two arrays loaded from the same run don't necessarily line up:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "d3757db4",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from extra_data import open_run"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "983da37d",
   "metadata": {},
   "outputs": [],
   "source": [
    "run = open_run(proposal=700000, run=26)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "01e2987e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "# trains measured: 7263, 7264\n"
     ]
    }
   ],
   "source": [
    "intensity_sase3 = run['SA3_XTD10_XGM/XGM/DOOCS:output', 'data.intensityTD']\n",
    "photflux_sase3 = run['SA3_XTD10_XGM/XGM/DOOCS', 'pulseEnergy.photonFlux']\n",
    "print(f\"# trains measured: {intensity_sase3.shape[0]}, {photflux_sase3.shape[0]}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1d15a1ab",
   "metadata": {},
   "source": [
    "Even if we get the same *number* of trains, they may not line up if different instruments miss different trains:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "489c9a14",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "# trains measured: 7263, 7263\n",
      "Train IDs all match: False\n",
      "Train IDs matching (every 100th train):\n",
      "[ True  True  True  True  True  True  True  True  True  True  True  True\n",
      "  True  True  True  True  True  True  True  True  True  True  True  True\n",
      "  True  True  True  True  True  True  True  True  True False False False\n",
      " False False False False False False False False False False False False\n",
      " False False False False False False False False False False False False\n",
      " False False False False  True  True  True  True  True  True  True  True\n",
      "  True]\n"
     ]
    }
   ],
   "source": [
    "intensity_scs = run['SCS_BLU_XGM/XGM/DOOCS:output', 'data.intensityTD']\n",
    "print(f\"# trains measured: {intensity_sase3.shape[0]}, {intensity_scs.shape[0]}\")\n",
    "\n",
    "train_ids_eq = intensity_sase3.train_id_coordinates() == intensity_scs.train_id_coordinates()\n",
    "print(\"Train IDs all match:\", train_ids_eq.all())\n",
    "print(\"Train IDs matching (every 100th train):\")\n",
    "print(train_ids_eq[::100])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e000813c",
   "metadata": {},
   "source": [
    "We typically want to look at only the trains with data for all the sources we're using.\n",
    "There are a few ways we can get these."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6792e2ba",
   "metadata": {},
   "source": [
    "## By selecting sources\n",
    "\n",
    "Use `.select()` to select specified sources & keys in the run. The `require_all=True` option discards trains where any of the selected data is missing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "1a336880",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "# of trains:    7262\n",
      "Duration:       0:12:06.4\n",
      "First train ID: 517755296\n",
      "Last train ID:  517762559\n",
      "\n",
      "0 detector modules ()\n",
      "\n",
      "2 instrument sources (excluding detectors):\n",
      "  - SA3_XTD10_XGM/XGM/DOOCS:output\n",
      "  - SCS_BLU_XGM/XGM/DOOCS:output\n",
      "\n",
      "2 control sources:\n",
      "  - SA3_XTD10_XGM/XGM/DOOCS\n",
      "  - SCS_BLU_XGM/XGM/DOOCS\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# Select a list of sources & keys\n",
    "sel = run.select([\n",
    "    ('SA3_XTD10_XGM/XGM/DOOCS:output', '*'),\n",
    "    ('SCS_BLU_XGM/XGM/DOOCS:output', '*'),\n",
    "], require_all=True)\n",
    "\n",
    "# Or select sources by pattern - this gets any sources with /XGM/ in the name\n",
    "sel = run.select('*/XGM/*', require_all=True)\n",
    "sel.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "b80cdcda",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "# trains measured: 7262, 7262\n",
      "Train IDs all match: True\n"
     ]
    }
   ],
   "source": [
    "intensity_sase3 = sel['SA3_XTD10_XGM/XGM/DOOCS:output', 'data.intensityTD']\n",
    "intensity_scs = sel['SCS_BLU_XGM/XGM/DOOCS:output', 'data.intensityTD']\n",
    "print(f\"# trains measured: {intensity_sase3.shape[0]}, {intensity_scs.shape[0]}\")\n",
    "\n",
    "train_ids_eq = intensity_sase3.train_id_coordinates() == intensity_scs.train_id_coordinates()\n",
    "print(\"Train IDs all match:\", train_ids_eq.all())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b74273e1",
   "metadata": {},
   "source": [
    "If `.select(..., require_all=True)` gives you 0 trains, it probably means that one of the sources you have selected didn't record any data in that run."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f7c9a83d",
   "metadata": {},
   "source": [
    "## By selecting train IDs\n",
    "\n",
    "We can use all the data for one source, and cut out trains which that specific source missed, with code like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "1034e9f0",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "# trains measured: 7263, 7262\n"
     ]
    }
   ],
   "source": [
    "from extra_data import by_id\n",
    "\n",
    "# Keep all data from this source:\n",
    "intensity_sase3 = run['SA3_XTD10_XGM/XGM/DOOCS:output', 'data.intensityTD']\n",
    "\n",
    "intensity_scs = run['SCS_BLU_XGM/XGM/DOOCS:output', 'data.intensityTD'].select_trains(\n",
    "    by_id[intensity_sase3.train_id_coordinates()]\n",
    ")\n",
    "print(f\"# trains measured: {intensity_sase3.shape[0]}, {intensity_scs.shape[0]}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aa99e276",
   "metadata": {},
   "source": [
    "This only excluded trains missing from the first source, so in this case, the first source still has one extra train which the second does not."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "43fb35ab",
   "metadata": {},
   "source": [
    "## Using xarray\n",
    "\n",
    "The options above exclude trains before loading the data. We can also align data after loading it as [xarray](https://xarray.pydata.org/en/stable/) labelled arrays:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "8db4346b",
   "metadata": {},
   "outputs": [],
   "source": [
    "intensity_sase3_arr = run['SA3_XTD10_XGM/XGM/DOOCS:output', 'data.intensityTD'].xarray()\n",
    "intensity_scs_arr = run['SCS_BLU_XGM/XGM/DOOCS:output', 'data.intensityTD'].xarray()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "82c0dd22",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><svg style=\"position: absolute; width: 0; height: 0; overflow: hidden\">\n",
       "<defs>\n",
       "<symbol id=\"icon-database\" viewBox=\"0 0 32 32\">\n",
       "<path d=\"M16 0c-8.837 0-16 2.239-16 5v4c0 2.761 7.163 5 16 5s16-2.239 16-5v-4c0-2.761-7.163-5-16-5z\"></path>\n",
       "<path d=\"M16 17c-8.837 0-16-2.239-16-5v6c0 2.761 7.163 5 16 5s16-2.239 16-5v-6c0 2.761-7.163 5-16 5z\"></path>\n",
       "<path d=\"M16 26c-8.837 0-16-2.239-16-5v6c0 2.761 7.163 5 16 5s16-2.239 16-5v-6c0 2.761-7.163 5-16 5z\"></path>\n",
       "</symbol>\n",
       "<symbol id=\"icon-file-text2\" viewBox=\"0 0 32 32\">\n",
       "<path d=\"M28.681 7.159c-0.694-0.947-1.662-2.053-2.724-3.116s-2.169-2.030-3.116-2.724c-1.612-1.182-2.393-1.319-2.841-1.319h-15.5c-1.378 0-2.5 1.121-2.5 2.5v27c0 1.378 1.122 2.5 2.5 2.5h23c1.378 0 2.5-1.122 2.5-2.5v-19.5c0-0.448-0.137-1.23-1.319-2.841zM24.543 5.457c0.959 0.959 1.712 1.825 2.268 2.543h-4.811v-4.811c0.718 0.556 1.584 1.309 2.543 2.268zM28 29.5c0 0.271-0.229 0.5-0.5 0.5h-23c-0.271 0-0.5-0.229-0.5-0.5v-27c0-0.271 0.229-0.5 0.5-0.5 0 0 15.499-0 15.5 0v7c0 0.552 0.448 1 1 1h7v19.5z\"></path>\n",
       "<path d=\"M23 26h-14c-0.552 0-1-0.448-1-1s0.448-1 1-1h14c0.552 0 1 0.448 1 1s-0.448 1-1 1z\"></path>\n",
       "<path d=\"M23 22h-14c-0.552 0-1-0.448-1-1s0.448-1 1-1h14c0.552 0 1 0.448 1 1s-0.448 1-1 1z\"></path>\n",
       "<path d=\"M23 18h-14c-0.552 0-1-0.448-1-1s0.448-1 1-1h14c0.552 0 1 0.448 1 1s-0.448 1-1 1z\"></path>\n",
       "</symbol>\n",
       "</defs>\n",
       "</svg>\n",
       "<style>/* CSS stylesheet for displaying xarray objects in jupyterlab.\n",
       " *\n",
       " */\n",
       "\n",
       ":root {\n",
       "  --xr-font-color0: var(--jp-content-font-color0, rgba(0, 0, 0, 1));\n",
       "  --xr-font-color2: var(--jp-content-font-color2, rgba(0, 0, 0, 0.54));\n",
       "  --xr-font-color3: var(--jp-content-font-color3, rgba(0, 0, 0, 0.38));\n",
       "  --xr-border-color: var(--jp-border-color2, #e0e0e0);\n",
       "  --xr-disabled-color: var(--jp-layout-color3, #bdbdbd);\n",
       "  --xr-background-color: var(--jp-layout-color0, white);\n",
       "  --xr-background-color-row-even: var(--jp-layout-color1, white);\n",
       "  --xr-background-color-row-odd: var(--jp-layout-color2, #eeeeee);\n",
       "}\n",
       "\n",
       "html[theme=dark],\n",
       "body.vscode-dark {\n",
       "  --xr-font-color0: rgba(255, 255, 255, 1);\n",
       "  --xr-font-color2: rgba(255, 255, 255, 0.54);\n",
       "  --xr-font-color3: rgba(255, 255, 255, 0.38);\n",
       "  --xr-border-color: #1F1F1F;\n",
       "  --xr-disabled-color: #515151;\n",
       "  --xr-background-color: #111111;\n",
       "  --xr-background-color-row-even: #111111;\n",
       "  --xr-background-color-row-odd: #313131;\n",
       "}\n",
       "\n",
       ".xr-wrap {\n",
       "  display: block;\n",
       "  min-width: 300px;\n",
       "  max-width: 700px;\n",
       "}\n",
       "\n",
       ".xr-text-repr-fallback {\n",
       "  /* fallback to plain text repr when CSS is not injected (untrusted notebook) */\n",
       "  display: none;\n",
       "}\n",
       "\n",
       ".xr-header {\n",
       "  padding-top: 6px;\n",
       "  padding-bottom: 6px;\n",
       "  margin-bottom: 4px;\n",
       "  border-bottom: solid 1px var(--xr-border-color);\n",
       "}\n",
       "\n",
       ".xr-header > div,\n",
       ".xr-header > ul {\n",
       "  display: inline;\n",
       "  margin-top: 0;\n",
       "  margin-bottom: 0;\n",
       "}\n",
       "\n",
       ".xr-obj-type,\n",
       ".xr-array-name {\n",
       "  margin-left: 2px;\n",
       "  margin-right: 10px;\n",
       "}\n",
       "\n",
       ".xr-obj-type {\n",
       "  color: var(--xr-font-color2);\n",
       "}\n",
       "\n",
       ".xr-sections {\n",
       "  padding-left: 0 !important;\n",
       "  display: grid;\n",
       "  grid-template-columns: 150px auto auto 1fr 20px 20px;\n",
       "}\n",
       "\n",
       ".xr-section-item {\n",
       "  display: contents;\n",
       "}\n",
       "\n",
       ".xr-section-item input {\n",
       "  display: none;\n",
       "}\n",
       "\n",
       ".xr-section-item input + label {\n",
       "  color: var(--xr-disabled-color);\n",
       "}\n",
       "\n",
       ".xr-section-item input:enabled + label {\n",
       "  cursor: pointer;\n",
       "  color: var(--xr-font-color2);\n",
       "}\n",
       "\n",
       ".xr-section-item input:enabled + label:hover {\n",
       "  color: var(--xr-font-color0);\n",
       "}\n",
       "\n",
       ".xr-section-summary {\n",
       "  grid-column: 1;\n",
       "  color: var(--xr-font-color2);\n",
       "  font-weight: 500;\n",
       "}\n",
       "\n",
       ".xr-section-summary > span {\n",
       "  display: inline-block;\n",
       "  padding-left: 0.5em;\n",
       "}\n",
       "\n",
       ".xr-section-summary-in:disabled + label {\n",
       "  color: var(--xr-font-color2);\n",
       "}\n",
       "\n",
       ".xr-section-summary-in + label:before {\n",
       "  display: inline-block;\n",
       "  content: '►';\n",
       "  font-size: 11px;\n",
       "  width: 15px;\n",
       "  text-align: center;\n",
       "}\n",
       "\n",
       ".xr-section-summary-in:disabled + label:before {\n",
       "  color: var(--xr-disabled-color);\n",
       "}\n",
       "\n",
       ".xr-section-summary-in:checked + label:before {\n",
       "  content: '▼';\n",
       "}\n",
       "\n",
       ".xr-section-summary-in:checked + label > span {\n",
       "  display: none;\n",
       "}\n",
       "\n",
       ".xr-section-summary,\n",
       ".xr-section-inline-details {\n",
       "  padding-top: 4px;\n",
       "  padding-bottom: 4px;\n",
       "}\n",
       "\n",
       ".xr-section-inline-details {\n",
       "  grid-column: 2 / -1;\n",
       "}\n",
       "\n",
       ".xr-section-details {\n",
       "  display: none;\n",
       "  grid-column: 1 / -1;\n",
       "  margin-bottom: 5px;\n",
       "}\n",
       "\n",
       ".xr-section-summary-in:checked ~ .xr-section-details {\n",
       "  display: contents;\n",
       "}\n",
       "\n",
       ".xr-array-wrap {\n",
       "  grid-column: 1 / -1;\n",
       "  display: grid;\n",
       "  grid-template-columns: 20px auto;\n",
       "}\n",
       "\n",
       ".xr-array-wrap > label {\n",
       "  grid-column: 1;\n",
       "  vertical-align: top;\n",
       "}\n",
       "\n",
       ".xr-preview {\n",
       "  color: var(--xr-font-color3);\n",
       "}\n",
       "\n",
       ".xr-array-preview,\n",
       ".xr-array-data {\n",
       "  padding: 0 5px !important;\n",
       "  grid-column: 2;\n",
       "}\n",
       "\n",
       ".xr-array-data,\n",
       ".xr-array-in:checked ~ .xr-array-preview {\n",
       "  display: none;\n",
       "}\n",
       "\n",
       ".xr-array-in:checked ~ .xr-array-data,\n",
       ".xr-array-preview {\n",
       "  display: inline-block;\n",
       "}\n",
       "\n",
       ".xr-dim-list {\n",
       "  display: inline-block !important;\n",
       "  list-style: none;\n",
       "  padding: 0 !important;\n",
       "  margin: 0;\n",
       "}\n",
       "\n",
       ".xr-dim-list li {\n",
       "  display: inline-block;\n",
       "  padding: 0;\n",
       "  margin: 0;\n",
       "}\n",
       "\n",
       ".xr-dim-list:before {\n",
       "  content: '(';\n",
       "}\n",
       "\n",
       ".xr-dim-list:after {\n",
       "  content: ')';\n",
       "}\n",
       "\n",
       ".xr-dim-list li:not(:last-child):after {\n",
       "  content: ',';\n",
       "  padding-right: 5px;\n",
       "}\n",
       "\n",
       ".xr-has-index {\n",
       "  font-weight: bold;\n",
       "}\n",
       "\n",
       ".xr-var-list,\n",
       ".xr-var-item {\n",
       "  display: contents;\n",
       "}\n",
       "\n",
       ".xr-var-item > div,\n",
       ".xr-var-item label,\n",
       ".xr-var-item > .xr-var-name span {\n",
       "  background-color: var(--xr-background-color-row-even);\n",
       "  margin-bottom: 0;\n",
       "}\n",
       "\n",
       ".xr-var-item > .xr-var-name:hover span {\n",
       "  padding-right: 5px;\n",
       "}\n",
       "\n",
       ".xr-var-list > li:nth-child(odd) > div,\n",
       ".xr-var-list > li:nth-child(odd) > label,\n",
       ".xr-var-list > li:nth-child(odd) > .xr-var-name span {\n",
       "  background-color: var(--xr-background-color-row-odd);\n",
       "}\n",
       "\n",
       ".xr-var-name {\n",
       "  grid-column: 1;\n",
       "}\n",
       "\n",
       ".xr-var-dims {\n",
       "  grid-column: 2;\n",
       "}\n",
       "\n",
       ".xr-var-dtype {\n",
       "  grid-column: 3;\n",
       "  text-align: right;\n",
       "  color: var(--xr-font-color2);\n",
       "}\n",
       "\n",
       ".xr-var-preview {\n",
       "  grid-column: 4;\n",
       "}\n",
       "\n",
       ".xr-var-name,\n",
       ".xr-var-dims,\n",
       ".xr-var-dtype,\n",
       ".xr-preview,\n",
       ".xr-attrs dt {\n",
       "  white-space: nowrap;\n",
       "  overflow: hidden;\n",
       "  text-overflow: ellipsis;\n",
       "  padding-right: 10px;\n",
       "}\n",
       "\n",
       ".xr-var-name:hover,\n",
       ".xr-var-dims:hover,\n",
       ".xr-var-dtype:hover,\n",
       ".xr-attrs dt:hover {\n",
       "  overflow: visible;\n",
       "  width: auto;\n",
       "  z-index: 1;\n",
       "}\n",
       "\n",
       ".xr-var-attrs,\n",
       ".xr-var-data {\n",
       "  display: none;\n",
       "  background-color: var(--xr-background-color) !important;\n",
       "  padding-bottom: 5px !important;\n",
       "}\n",
       "\n",
       ".xr-var-attrs-in:checked ~ .xr-var-attrs,\n",
       ".xr-var-data-in:checked ~ .xr-var-data {\n",
       "  display: block;\n",
       "}\n",
       "\n",
       ".xr-var-data > table {\n",
       "  float: right;\n",
       "}\n",
       "\n",
       ".xr-var-name span,\n",
       ".xr-var-data,\n",
       ".xr-attrs {\n",
       "  padding-left: 25px !important;\n",
       "}\n",
       "\n",
       ".xr-attrs,\n",
       ".xr-var-attrs,\n",
       ".xr-var-data {\n",
       "  grid-column: 1 / -1;\n",
       "}\n",
       "\n",
       "dl.xr-attrs {\n",
       "  padding: 0;\n",
       "  margin: 0;\n",
       "  display: grid;\n",
       "  grid-template-columns: 125px auto;\n",
       "}\n",
       "\n",
       ".xr-attrs dt, dd {\n",
       "  padding: 0;\n",
       "  margin: 0;\n",
       "  float: left;\n",
       "  padding-right: 10px;\n",
       "  width: auto;\n",
       "}\n",
       "\n",
       ".xr-attrs dt {\n",
       "  font-weight: normal;\n",
       "  grid-column: 1;\n",
       "}\n",
       "\n",
       ".xr-attrs dt:hover span {\n",
       "  display: inline-block;\n",
       "  background: var(--xr-background-color);\n",
       "  padding-right: 10px;\n",
       "}\n",
       "\n",
       ".xr-attrs dd {\n",
       "  grid-column: 2;\n",
       "  white-space: pre-wrap;\n",
       "  word-break: break-all;\n",
       "}\n",
       "\n",
       ".xr-icon-database,\n",
       ".xr-icon-file-text2 {\n",
       "  display: inline-block;\n",
       "  vertical-align: middle;\n",
       "  width: 1em;\n",
       "  height: 1.5em !important;\n",
       "  stroke-width: 0;\n",
       "  stroke: currentColor;\n",
       "  fill: currentColor;\n",
       "}\n",
       "</style><pre class='xr-text-repr-fallback'>&lt;xarray.DataArray &#x27;SCS_BLU_XGM/XGM/DOOCS:output.data.intensityTD&#x27; (trainId: 7263, dim_0: 1000)&gt;\n",
       "array([[ 4.4886490e+01,  4.2309365e+03, -4.5598242e+03, ...,\n",
       "         1.0000000e+00,  1.0000000e+00,  1.0000000e+00],\n",
       "       [ 1.0151898e+02,  2.2400598e+03, -2.7732441e+03, ...,\n",
       "         1.0000000e+00,  1.0000000e+00,  1.0000000e+00],\n",
       "       [-1.3794557e+02,  2.4830901e+03, -3.6583892e+03, ...,\n",
       "         1.0000000e+00,  1.0000000e+00,  1.0000000e+00],\n",
       "       ...,\n",
       "       [-4.2194626e+02,  5.4188824e+02, -8.9533582e+02, ...,\n",
       "         1.0000000e+00,  1.0000000e+00,  1.0000000e+00],\n",
       "       [-1.3200552e+02,  1.1471447e+03, -1.4556660e+03, ...,\n",
       "         1.0000000e+00,  1.0000000e+00,  1.0000000e+00],\n",
       "       [-2.3156431e+01,  2.2287026e+03, -3.3196895e+03, ...,\n",
       "         1.0000000e+00,  1.0000000e+00,  1.0000000e+00]], dtype=float32)\n",
       "Coordinates:\n",
       "  * trainId  (trainId) uint64 517755296 517755297 ... 517762558 517762559\n",
       "Dimensions without coordinates: dim_0</pre><div class='xr-wrap' hidden><div class='xr-header'><div class='xr-obj-type'>xarray.DataArray</div><div class='xr-array-name'>'SCS_BLU_XGM/XGM/DOOCS:output.data.intensityTD'</div><ul class='xr-dim-list'><li><span class='xr-has-index'>trainId</span>: 7263</li><li><span>dim_0</span>: 1000</li></ul></div><ul class='xr-sections'><li class='xr-section-item'><div class='xr-array-wrap'><input id='section-ecae156f-399b-4cf3-8798-bf80c001b539' class='xr-array-in' type='checkbox' checked><label for='section-ecae156f-399b-4cf3-8798-bf80c001b539' title='Show/hide data repr'><svg class='icon xr-icon-database'><use xlink:href='#icon-database'></use></svg></label><div class='xr-array-preview xr-preview'><span>44.88649 4230.9365 -4559.824 54.045578 1566.6924 ... 1.0 1.0 1.0 1.0</span></div><div class='xr-array-data'><pre>array([[ 4.4886490e+01,  4.2309365e+03, -4.5598242e+03, ...,\n",
       "         1.0000000e+00,  1.0000000e+00,  1.0000000e+00],\n",
       "       [ 1.0151898e+02,  2.2400598e+03, -2.7732441e+03, ...,\n",
       "         1.0000000e+00,  1.0000000e+00,  1.0000000e+00],\n",
       "       [-1.3794557e+02,  2.4830901e+03, -3.6583892e+03, ...,\n",
       "         1.0000000e+00,  1.0000000e+00,  1.0000000e+00],\n",
       "       ...,\n",
       "       [-4.2194626e+02,  5.4188824e+02, -8.9533582e+02, ...,\n",
       "         1.0000000e+00,  1.0000000e+00,  1.0000000e+00],\n",
       "       [-1.3200552e+02,  1.1471447e+03, -1.4556660e+03, ...,\n",
       "         1.0000000e+00,  1.0000000e+00,  1.0000000e+00],\n",
       "       [-2.3156431e+01,  2.2287026e+03, -3.3196895e+03, ...,\n",
       "         1.0000000e+00,  1.0000000e+00,  1.0000000e+00]], dtype=float32)</pre></div></div></li><li class='xr-section-item'><input id='section-01912a7b-f71a-42aa-84c5-349857f851e9' class='xr-section-summary-in' type='checkbox'  checked><label for='section-01912a7b-f71a-42aa-84c5-349857f851e9' class='xr-section-summary' >Coordinates: <span>(1)</span></label><div class='xr-section-inline-details'></div><div class='xr-section-details'><ul class='xr-var-list'><li class='xr-var-item'><div class='xr-var-name'><span class='xr-has-index'>trainId</span></div><div class='xr-var-dims'>(trainId)</div><div class='xr-var-dtype'>uint64</div><div class='xr-var-preview xr-preview'>517755296 517755297 ... 517762559</div><input id='attrs-6f0d5f63-8e95-42b4-809d-956367125b27' class='xr-var-attrs-in' type='checkbox' disabled><label for='attrs-6f0d5f63-8e95-42b4-809d-956367125b27' title='Show/Hide attributes'><svg class='icon xr-icon-file-text2'><use xlink:href='#icon-file-text2'></use></svg></label><input id='data-fe56a8ac-e709-443c-b46a-e624ec100b9e' class='xr-var-data-in' type='checkbox'><label for='data-fe56a8ac-e709-443c-b46a-e624ec100b9e' title='Show/Hide data repr'><svg class='icon xr-icon-database'><use xlink:href='#icon-database'></use></svg></label><div class='xr-var-attrs'><dl class='xr-attrs'></dl></div><div class='xr-var-data'><pre>array([517755296, 517755297, 517755298, ..., 517762557, 517762558, 517762559],\n",
       "      dtype=uint64)</pre></div></li></ul></div></li><li class='xr-section-item'><input id='section-73b359d3-df29-4ae4-8abf-80caec97552b' class='xr-section-summary-in' type='checkbox' disabled ><label for='section-73b359d3-df29-4ae4-8abf-80caec97552b' class='xr-section-summary'  title='Expand/collapse section'>Attributes: <span>(0)</span></label><div class='xr-section-inline-details'></div><div class='xr-section-details'><dl class='xr-attrs'></dl></div></li></ul></div></div>"
      ],
      "text/plain": [
       "<xarray.DataArray 'SCS_BLU_XGM/XGM/DOOCS:output.data.intensityTD' (trainId: 7263, dim_0: 1000)>\n",
       "array([[ 4.4886490e+01,  4.2309365e+03, -4.5598242e+03, ...,\n",
       "         1.0000000e+00,  1.0000000e+00,  1.0000000e+00],\n",
       "       [ 1.0151898e+02,  2.2400598e+03, -2.7732441e+03, ...,\n",
       "         1.0000000e+00,  1.0000000e+00,  1.0000000e+00],\n",
       "       [-1.3794557e+02,  2.4830901e+03, -3.6583892e+03, ...,\n",
       "         1.0000000e+00,  1.0000000e+00,  1.0000000e+00],\n",
       "       ...,\n",
       "       [-4.2194626e+02,  5.4188824e+02, -8.9533582e+02, ...,\n",
       "         1.0000000e+00,  1.0000000e+00,  1.0000000e+00],\n",
       "       [-1.3200552e+02,  1.1471447e+03, -1.4556660e+03, ...,\n",
       "         1.0000000e+00,  1.0000000e+00,  1.0000000e+00],\n",
       "       [-2.3156431e+01,  2.2287026e+03, -3.3196895e+03, ...,\n",
       "         1.0000000e+00,  1.0000000e+00,  1.0000000e+00]], dtype=float32)\n",
       "Coordinates:\n",
       "  * trainId  (trainId) uint64 517755296 517755297 ... 517762558 517762559\n",
       "Dimensions without coordinates: dim_0"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "intensity_scs_arr"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "67d5d6b4",
   "metadata": {},
   "source": [
    "We'll use the [xarray.align()](https://xarray.pydata.org/en/stable/generated/xarray.align.html#xarray.align) function to line up the arrays by their train ID labels:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "eb212424",
   "metadata": {},
   "outputs": [],
   "source": [
    "import xarray as xr\n",
    "\n",
    "intensity_sase3_arr, intensity_scs_arr = xr.align(\n",
    "    intensity_sase3_arr, intensity_scs_arr, join='inner'\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "2b810651",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(intensity_scs_arr.coords['trainId']  == intensity_sase3_arr.coords['trainId']).all().item()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "550ba7cc",
   "metadata": {},
   "source": [
    "Using `join='inner'` (which is the default) discards data to align the arrays.\n",
    "If we specified `join='outer'` instead, it would insert gaps in the arrays where data is missing."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c8b99b6a",
   "metadata": {},
   "source": [
    "## Multi-module detectors\n",
    "\n",
    "Several detectors at European XFEL have modules recording data as separate sources.\n",
    "This run contains data from a DSSC detector:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "66ba9d4a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<DSSC1M: Data interface for detector 'SCS_DET_DSSC1M-1' with 16 modules>"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from extra_data.components import DSSC1M\n",
    "\n",
    "dssc = DSSC1M(run)\n",
    "dssc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "2c112e18",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "5120"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(dssc.train_ids)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6a0dc7d9",
   "metadata": {},
   "source": [
    "By default, we get trains where any detector module recorded data.\n",
    "We can specify `min_modules` to get trains where *all* modules recorded data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "6918131e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "5049"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dssc_allmod = DSSC1M(run, min_modules=16)\n",
    "\n",
    "len(dssc_allmod.train_ids)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "410ae8bc",
   "metadata": {},
   "source": [
    "Or we can allow a certain number of missing modules in each train, to keep more of the data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "e16e74f0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "5118"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dssc_mostmod = DSSC1M(run, min_modules=15)\n",
    "\n",
    "len(dssc_mostmod.train_ids)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "47f71f21",
   "metadata": {},
   "source": [
    "In this case, missing data will be filled in as 0 (for integers) or NaN (for floating point data) when we read the data. You should check that the code you're using to process the data will behave correctly with the fill value."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "xfel (Python 3.7)",
   "language": "python",
   "name": "xfel"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}