Automatic data-discovery¶

The TimeSeries.fetch_open_data() method is only able to download GW strain data from those datasets exposed through the GWOSC API; this notably does not include the Auxiliary Channel Three Hour Release, or strain data for any events or observing runs not yet published.

In addition, the GW strain data make up only a tiny fraction of the ‘raw’ output of a gravitational-wave detector, which includes in excess of 100,000 different ‘channels’ (data streams) from sensors and digital control systems that are used to operate the interferometer and diagnose measure performance.

The full data set for each detector is archived at the relevant observatory and is made freely available to all registered collaboration members. A discovery service called gwdatafind is provided at each location to simplify discovering the file(s) that contain the data of interest for any research.

These data are also made available remotely using nds2, which enables sending data directly over a network to any location. This is used for both the full proprietary data (which requires an authorisation credential to access) and also the Auxiliary Channel Three Hour Release (which is freely available).

`TimeSeries.get()`¶

Additional dependencies: LDAStools.frameCPP or nds2

GWpy provides the TimeSeries.get() method as a one-stop interface to all automatically-discoverable data hosted locally at an IGWN computing centre, or available remotely.

How it works¶

Without any customisation, TimeSeries.get() will attempt to locate data ‘by any means necessary’; in practice that is

if the GWDATAFIND_SERVER environment variable (or legacy LIGO_DATAFIND_SERVER variable) points to a server URL, use gwdatafind to identify which data set includes the channel(s) requested, then locate the files for that data set, and then read them,
if that doesn’t work (for any reason), loop through the NDS2 servers identified in the NDSSERVER environment variable to see if they have the data – if the NDSSERVER variable isn’t set, guess which NDS2 server(s) to use based on the interferometer whose data was requested.

Regarding channel names

To use TimeSeries.get(), you need to know the full name of the data channel you want, which is often not obvious. The Auxiliary Channel Three Hour Release includes a link to a full listing of all included channels. For the full proprietary data set, the IGWN Detector Characterisation working group maintains a record of the most relevant channels for studying a given interferometer subsystem.

Example¶

For example, to channel that records the power incident on the input mode cleaner (IMC) at LIGO-Hanford, is called:

H1:IMC-PWR_IN_OUT_DQ

and we can use TimeSeries.get() to ‘get’ the data for that channel by specifying the special GWOSC NDS2 server url using the host keyword:

>>> from gwosc.datasets import event_gps
>>> from gwpy.timeseries import TimeSeries
>>> gps = event_gps("GW170814")
>>> start = int(gps) - 100
>>> end = int(gps) + 100
>>> data = TimeSeries.get("H1:IMC-PWR_IN_OUT_DQ", start, end, host="losc-nds.ligo.org")
>>> plot = data.plot(ylabel="Power [W]")
>>> plot.show()

(png)

Proprietary datasets¶

All data archived at an IGWN computing centre are identified by a data set ‘tag’, which identifies which data are contained in a given gwf file(s). By default, as described, TimeSeries.get() will search through all available data to find the correct files to read, so this may take a while if the server has knowledge of a large number of different datasets. If you know the dataset name – the tag associated with files containing your data – you can pass that via the frametype keyword argument to significantly speed up the search.

The following table is an incomplete, but probably OK, reference to which dataset (frametype) you want to use for file-based data access:

GEO-600

GEO-600 datasets available with `gwdatafind`¶
Dataset (frametype)	Description
`G1_RDS_C01_L3`	The GEO-600 data, including calibrated strain h(t)

LIGO-Hanford

LIGO-Hanford datasets available with `gwdatafind`¶
Dataset (frametype)	Description
`H1_R`	All auxiliary channels, stored at the native sampling rate
`H1_T`	Second trends of all channels, including `.mean`, `.min`, and `.max`
`H1_M`	Minute trends of all channels, including `.mean`, `.min`, and `.max`
`H1_HOFT_C00`	Strain h(t) and metadata generated using the real-time calibration pipeline
`H1_HOFT_CXY`	Strain h(t) and metadata generated using the off-line calibration pipeline at version `XY`
`H1_GWOSC_O2_4KHZ_R1`	4k Hz Strain h(t) and metadata as released by GWOSC for the O2 data release
`H1_GWOSC_O2_16KHZ_R1`	16k Hz Strain h(t) and metadata as released by GWOSC for the O2 data release

LIGO-Livingston

LIGO-Livingston datasets available with `gwdatafind`¶
Dataset (frametype)	Description
`L1_R`	All auxiliary channels, stored at the native sampling rate
`L1_T`	Second trends of all channels, including `.mean`, `.min`, and `.max`
`L1_M`	Minute trends of all channels, including `.mean`, `.min`, and `.max`
`L1_HOFT_C00`	Strain h(t) and metadata generated using the real-time calibration pipeline
`L1_HOFT_CXY`	Strain h(t) and metadata generated using the off-line calibration pipeline at version `XY`
`L1_GWOSC_O2_4KHZ_R1`	4k Hz Strain h(t) and metadata as released by GWOSC for the O2 data release
`L1_GWOSC_O2_16KHZ_R1`	16k Hz Strain h(t) and metadata as released by GWOSC for the O2 data release

Virgo

Virgo datasets available with `gwdatafind`¶
Dataset (frametype)	Description
`raw`	All auxiliary channels, stored at the native sampling rate
`V1O2Repro1A`	Strain h(t) and metadata for Observing run 3 (`O3`) generated off-line using version `1A` calibration; replace `O2` and `1A` as appropriate
`V1_GWOSC_O2_4KHZ_R1`	4k Hz Strain h(t) and metadata as released by GWOSC for the O2 data release
`V1_GWOSC_O2_16KHZ_R1`	16k Hz Strain h(t) and metadata as released by GWOSC for the O2 data release

Not all datasets are available everywhere

Not all datasets are available from all datafind servers. Each LIGO Lab-operated computing centre has its own datafind server with a subset of the available datasets.

LIGO trend data¶

The LIGO observatories produce second- and minute- trends of all channels automatically, and store them in the {H,L}1_T (second) and {H,L}1_M (minute) datasets. However, the channels in each trend type have the same names, so TimeSeries.get() doesn’t know how to distinguish between the two different trends when given only the channel name.

To get around this you can directly specify (e.g.) frametype="H1_T" (for the LIGO-Hanford second trends) in your TimeSeries.get() method call, or you can use a suffix in the channel name:

Channel name suffices for LIGO trends¶
Trend type	Dataset	Suffix
second	`{H,L}1_T`	`,s-trend`
minute	`{H,L}1_M`	`,m-trend`

e.g.

>>> TimeSeries.get("L1:IMC-PWR_IN_OUT_DQ.mean,s-trend", 1186741850, 1186741870)

will specifically access the second trends of power incident on the LIGO-Livingston IMC.

`TimeSeriesDict.get()`¶

TimeSeries.get() can only retrieve data for a single channel at a time. Looping over a list of names to get data for many channels can be very slow, as each individual call will have to discover and read/download the data for each channel individually. The TimeSeriesDict.get() method enables retrieval of multiple channels in a single call, for a single (start, stop) time interval, greatly reducing the I/O overhead.

To access data for multiple channels in this way, just pass a list of names rather than a single name. In this example we download the second trend (average) of ground motion in the 0.03Hz-0.1Hz range at two locations of the LIGO-Hanford observatory:

Warning

This example uses proprietary data that are only available to members of the LIGO Scientific Collaboration and its partners.

>>> from gwpy.timeseries import TimeSeriesDict
>>> data = TimeSeriesDict.get(
...     ["H1:ISI-GND_STS_ITMY_Z_BLRMS_30M_100M.rms,s-trend",
...      "H1:ISI-GND_STS_ETMY_Z_BLRMS_30M_100M.rms,s-trend"],
...     "July 22 2021 12:00",
...     "July 22 2021 14:00",
... )
>>> plot = data.plot(ylabel="Ground motion [nm/s]")
>>> plot.show()

(png)