The LIGO Scientific Collaboration uses a service colloquially referred to as ‘datafind’ to index and archive the locations of data files produced at the observatories, including those from other detectors (e.g. GEO600). The datafind service continually updates to provide information with a typical latency of 10 minutes, often much less.
The gwdatafind
package can be used to execute queries against the
datafind server to discover the URLs of data files (typically .gwf
format).
See the documentation for that package for full details.
Users who have access to the LIGO Data Grid – the shared computing infrastructure
that supports internal collaboration data analysis – can use the local datafind
server to query for local files that can be accessed directly.
Other users, including those on the Open Science Grid, can use the server located
at datafind.ligo.org:443
to query for files archived under
CVMFS.
TimeSeries.get()
¶Additional dependencies: LDAStools.frameCPP
To discover and read data automatically, use TimeSeries.get()
, and point
it at a datafind host
:
>>> from gwpy.timeseries import TimeSeries
>>> data = TimeSeries.get("L1:GWOSC-4KHZ_R1_STRAIN", 1187008880, 1187008884,
... host="datafind.ligo.org:443")
>>> print(data)
TimeSeries([-5.98844033e-20, -6.34482794e-20, -6.31740522e-20,
..., 6.23573197e-20, 5.54748519e-20,
5.91121781e-20]
unit: dimensionless,
t0: 1187008880.0 s,
dt: 0.000244140625 s,
name: L1:GWOSC-4KHZ_R1_STRAIN,
channel: L1:GWOSC-4KHZ_R1_STRAIN)
This will execute the following series of steps:
query datafind.ligo.org for the list of datasets it knows about
for each dataset, determine whether 'L1:GWOSC-4KHZ_R1_STRAIN'
is
contained within a representative file, and pick the most appropriate
dataset (if multiple)
query datafind.ligo.org again for the URLs of file paths for the matched dataset name
read each required file and return the data
Note
At the time of writing, all queries to datafind.ligo.org are restricted to persons with a valid LIGO.ORG RFC 3820 (X509) credential.
If any of those steps were to fail, TimeSeries.get()
will automatically
fall back to attempting to use nds2
to access the data.
By default, as described, this method will search through all available data to
find the correct files to read, so this may take a while if the server has
knowledge of a large number of different datasets.
If you know the dataset name – the tag associated with files containing your
data – you can pass that via the frametype
keyword argument to
significantly speed up the search:
>>> data = TimeSeries.get("L1:GWOSC-4KHZ_R1_STRAIN", "17 August 2017 12:42:02",
... "17 August 2017 12:42:06", frametype="L1_GWOSC_O2_4KHZ_R1")
All data recorded by the current generation of detectors are identified
by a dataset tag, which identifies which data are contained in a
given gwf
file.
The following table is an incomplete, but probably OK, reference to which
dataset (frametype
) you want to use for file-based data access:
Dataset (frametype) |
Description |
---|---|
|
All auxiliary channels, stored at the native sampling rate |
|
Second trends of all channels, including
|
|
Minute trends of all channels, including
|
|
Strain h(t) and metadata generated using the real-time calibration pipeline |
|
Strain h(t) and metadata generated using the
off-line calibration pipeline at version |
|
4k Hz Strain h(t) and metadata as released by The Gravitational-Wave Open Science Centre (GWOSC) for the O2 data release |
|
16k Hz Strain h(t) and metadata as released by The Gravitational-Wave Open Science Centre (GWOSC) for the O2 data release |
The above datasets refer to the H1
(LIGO-Hanford) instrument, the same are
available for LIGO-Livingston by substituting the L1
prefix.
Note
Not all datasets are available from all datafind servers. Each LIGO Lab-operated computing centre has its own datafind server with a subset of the available datasets.