{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Reading data to analyse in memory\n", "\n", "It's often quickest and easiest to load data into memory before analysing it.\n", "\n", "Some types of data, especially from large pixel detectors, may be bigger than the available memory.\n", "Other examples show how to work with very large amounts of data.\n", "But the [machines in the Maxwell cluster](https://confluence.desy.de/display/MXW/Hardware+in+exfel)\n", "have 250–750 GB of memory, so you can use the simple approach for many cases." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import re\n", "import xarray as xr\n", "\n", "from extra_data import open_run" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Tabular data (with pandas)\n", "\n", "We can open a run with EXtra-data on the Maxwell cluster using the proposal & run number." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "# of trains: 3721\n", "Duration: 0:06:12.1\n", "First train ID: 142844490\n", "Last train ID: 142848210\n", "\n", "0 detector modules ()\n", "\n", "2 instrument sources (excluding detectors):\n", " - SA1_XTD2_XGM/XGM/DOOCS:output\n", " - SPB_XTD9_XGM/XGM/DOOCS:output\n", "\n", "2 control sources:\n", " - SA1_XTD2_XGM/XGM/DOOCS\n", " - SPB_XTD9_XGM/XGM/DOOCS\n", "\n" ] } ], "source": [ "run = open_run(proposal=900025, run=150)\n", "run.info() # Show overview info about this data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This example works with data from two X-Ray Gas Monitors (XGMs).\n", "These measure properties of the X-ray beam in different parts of the tunnel.\n", "This data refers to one XGM in XTD2 and one in XTD9.\n", "\n", "[pandas](https://pandas.pydata.org/pandas-docs/stable/) is a popular Python library for working with tabular data.\n", "We'll create a pandas dataframe containing the beam x and y position at each XGM, and the photon flux.\n", "We can select the columns using 'glob' patterns, like selecting files in the terminal.\n", "\n", "* `[abc]`: one character, a/b/c\n", "* `?`: any one character\n", "* `*`: any sequence of characters" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | SPB_XTD9_XGM/XGM/DOOCS/beamPosition.iyPos | \n", "SPB_XTD9_XGM/XGM/DOOCS/pulseEnergy.photonFlux | \n", "SPB_XTD9_XGM/XGM/DOOCS/beamPosition.ixPos | \n", "SA1_XTD2_XGM/XGM/DOOCS/beamPosition.iyPos | \n", "SA1_XTD2_XGM/XGM/DOOCS/pulseEnergy.photonFlux | \n", "SA1_XTD2_XGM/XGM/DOOCS/beamPosition.ixPos | \n", "
|---|---|---|---|---|---|---|
| trainId | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
| 142844490 | \n", "1.717195 | \n", "1327.06958 | \n", "-2.277912 | \n", "0.161399 | \n", "1410.723755 | \n", "2.035218 | \n", "
| 142844491 | \n", "1.717195 | \n", "1327.06958 | \n", "-2.277912 | \n", "0.161399 | \n", "1410.137451 | \n", "2.035218 | \n", "
| 142844492 | \n", "1.717195 | \n", "1327.06958 | \n", "-2.277912 | \n", "0.161399 | \n", "1410.137451 | \n", "2.035218 | \n", "
| 142844493 | \n", "1.717195 | \n", "1327.06958 | \n", "-2.277912 | \n", "0.161399 | \n", "1410.137451 | \n", "2.035218 | \n", "
| 142844494 | \n", "1.717195 | \n", "1327.06958 | \n", "-2.277912 | \n", "0.161399 | \n", "1410.137451 | \n", "2.035218 | \n", "
<xarray.DataArray 'SA3_XTD10_PES/ADC/1:network.digitizers.channel_4_A.raw.samples' (trainId: 1475, dim_0: 40000)>\n",
"array([[ -6, -10, -7, ..., -10, -8, -9],\n",
" [ -8, -8, -7, ..., -9, -2, -11],\n",
" [ -8, -10, -7, ..., -6, -8, -11],\n",
" ...,\n",
" [ -7, -9, -8, ..., -9, -2, -5],\n",
" [ -5, -10, -8, ..., -5, -4, -10],\n",
" [ -7, -8, -7, ..., -6, -5, -8]], dtype=int16)\n",
"Coordinates:\n",
" * trainId (trainId) uint64 128146446 128146447 ... 128147919 128147920\n",
"Dimensions without coordinates: dim_0array([[ -6, -10, -7, ..., -10, -8, -9],\n",
" [ -8, -8, -7, ..., -9, -2, -11],\n",
" [ -8, -10, -7, ..., -6, -8, -11],\n",
" ...,\n",
" [ -7, -9, -8, ..., -9, -2, -5],\n",
" [ -5, -10, -8, ..., -5, -4, -10],\n",
" [ -7, -8, -7, ..., -6, -5, -8]], dtype=int16)array([128146446, 128146447, 128146448, ..., 128147918, 128147919, 128147920],\n",
" dtype=uint64)