Inspecting available data

The .info() method provides an overview of the data in an opened run or file:

[1]:
from extra_data import RunDirectory

run = RunDirectory("/gpfs/exfel/exp/XMPL/201750/p700000/raw/r0010")
run.info()
# of trains:    579
Duration:       0:00:57.9
First train ID: 507096934
Last train ID:  507097512

16 detector modules (SPB_DET_AGIPD1M-1)
  e.g. module SPB_DET_AGIPD1M-1 0 : 512 x 128 pixels
  SPB_DET_AGIPD1M-1/DET/0CH0:xtdf
  250 frames per train, up to 144750 frames total

2 instrument sources (excluding detectors):
  - SA1_XTD2_XGM/XGM/DOOCS:output
  - SPB_XTD9_XGM/XGM/DOOCS:output

18 control sources:
  - ACC_SYS_DOOCS/CTRL/BEAMCONDITIONS
  - SA1_XTD2_ATT/MDL/MAIN
  - SA1_XTD2_MIRR-1/MOTOR/HMRY
  - SA1_XTD2_XGM/XGM/DOOCS
  - SPB_IRU_AGIPD1M/MOTOR/Z_STEPPER
  - SPB_IRU_AGIPD1M/PSC/HV
  - SPB_IRU_AGIPD1M/TSENS/H1_T_EXTHOUS
  - SPB_IRU_AGIPD1M/TSENS/H2_T_EXTHOUS
  - SPB_IRU_AGIPD1M/TSENS/Q1_T_BLOCK
  - SPB_IRU_AGIPD1M/TSENS/Q2_T_BLOCK
  - SPB_IRU_AGIPD1M/TSENS/Q3_T_BLOCK
  - SPB_IRU_AGIPD1M/TSENS/Q4_T_BLOCK
  - SPB_IRU_AGIPD1M1/CTRL/MC1
  - SPB_IRU_AGIPD1M1/CTRL/MC2
  - SPB_IRU_VAC/GAUGE/GAUGE_FR_6
  - SPB_RR_SYS/MDL/BUNCH_PATTERN
  - SPB_RR_SYS/TSYS/X2TIMER2
  - SPB_XTD9_XGM/XGM/DOOCS

The lsxfel command can give similar information at the command line.

The train IDs included in the run are available as a simple list:

[2]:
print(run.train_ids[:10])
[507096934, 507096935, 507096936, 507096937, 507096938, 507096939, 507096940, 507096941, 507096942, 507096943]

And the source names are available as a set:

[3]:
run.all_sources
[3]:
frozenset({'ACC_SYS_DOOCS/CTRL/BEAMCONDITIONS',
           'SA1_XTD2_ATT/MDL/MAIN',
           'SA1_XTD2_MIRR-1/MOTOR/HMRY',
           'SA1_XTD2_XGM/XGM/DOOCS',
           'SA1_XTD2_XGM/XGM/DOOCS:output',
           'SPB_DET_AGIPD1M-1/DET/0CH0:xtdf',
           'SPB_DET_AGIPD1M-1/DET/10CH0:xtdf',
           'SPB_DET_AGIPD1M-1/DET/11CH0:xtdf',
           'SPB_DET_AGIPD1M-1/DET/12CH0:xtdf',
           'SPB_DET_AGIPD1M-1/DET/13CH0:xtdf',
           'SPB_DET_AGIPD1M-1/DET/14CH0:xtdf',
           'SPB_DET_AGIPD1M-1/DET/15CH0:xtdf',
           'SPB_DET_AGIPD1M-1/DET/1CH0:xtdf',
           'SPB_DET_AGIPD1M-1/DET/2CH0:xtdf',
           'SPB_DET_AGIPD1M-1/DET/3CH0:xtdf',
           'SPB_DET_AGIPD1M-1/DET/4CH0:xtdf',
           'SPB_DET_AGIPD1M-1/DET/5CH0:xtdf',
           'SPB_DET_AGIPD1M-1/DET/6CH0:xtdf',
           'SPB_DET_AGIPD1M-1/DET/7CH0:xtdf',
           'SPB_DET_AGIPD1M-1/DET/8CH0:xtdf',
           'SPB_DET_AGIPD1M-1/DET/9CH0:xtdf',
           'SPB_IRU_AGIPD1M/MOTOR/Z_STEPPER',
           'SPB_IRU_AGIPD1M/PSC/HV',
           'SPB_IRU_AGIPD1M/TSENS/H1_T_EXTHOUS',
           'SPB_IRU_AGIPD1M/TSENS/H2_T_EXTHOUS',
           'SPB_IRU_AGIPD1M/TSENS/Q1_T_BLOCK',
           'SPB_IRU_AGIPD1M/TSENS/Q2_T_BLOCK',
           'SPB_IRU_AGIPD1M/TSENS/Q3_T_BLOCK',
           'SPB_IRU_AGIPD1M/TSENS/Q4_T_BLOCK',
           'SPB_IRU_AGIPD1M1/CTRL/MC1',
           'SPB_IRU_AGIPD1M1/CTRL/MC2',
           'SPB_IRU_VAC/GAUGE/GAUGE_FR_6',
           'SPB_RR_SYS/MDL/BUNCH_PATTERN',
           'SPB_RR_SYS/TSYS/X2TIMER2',
           'SPB_XTD9_XGM/XGM/DOOCS',
           'SPB_XTD9_XGM/XGM/DOOCS:output'})

You can see control and instrument sources separately, but for data analysis this distinction is often not important.

[4]:
assert run.all_sources == (run.control_sources | run.instrument_sources)

Within each source, the data is organised under keys. The .keys_for_source() method lists a source’s keys:

[5]:
run.keys_for_source('SA1_XTD2_XGM/XGM/DOOCS:output')
[5]:
{'data.intensityAUXSa1TD',
 'data.intensityAUXSa3TD',
 'data.intensityAUXTD',
 'data.intensitySa1SigmaTD',
 'data.intensitySa1TD',
 'data.intensitySa3SigmaTD',
 'data.intensitySa3TD',
 'data.intensitySigmaTD',
 'data.intensityTD',
 'data.trainId',
 'data.xSa1SigmaTD',
 'data.xSa1TD',
 'data.xSa3SigmaTD',
 'data.xSa3TD',
 'data.xSigmaTD',
 'data.xTD',
 'data.ySa1SigmaTD',
 'data.ySa1TD',
 'data.ySa3SigmaTD',
 'data.ySa3TD',
 'data.ySigmaTD',
 'data.yTD'}

Instrument sources may have multiple values recorded for each train, and may be missing data for some trains. You can see how many data points there are for each train with .get_data_counts(). E.g. for this AGIPD detector module, the counts are the number of frames in each train:

[6]:
run.get_data_counts('SPB_DET_AGIPD1M-1/DET/11CH0:xtdf', 'image.data')
[6]:
507096934      0
507096935      0
507096936      0
507096937      0
507096938      0
            ...
507097185    250
507097186    250
507097187    250
507097188    250
507097189    250
Length: 256, dtype: uint64

This method returns a pandas series. The index (the numbers shown on the left) are train IDs.