Filtering tables

In order to perform detailed analysis of tabular data, it is useful to extract portions of a table based on some criteria, this is called filtering. The EventTable object comes with a filter() method that provides an intuitive interface to down-selecting rows in a table.

To demonstrate, first we create a catalogue of gravitational-wave detections using data available from LOSC:

>>> from gwpy.table import EventTable
>>> table = EventTable.read(
...     """name      gps           m1      m2         snr   distance network
...        GW150914  1126259462.00 36.2    23.7       23.7  420      HL
...        LVT151012 1128678900.44 23      13         9.7   440      HL
...        GW151226  1135136350.65 14.2    7.5        13    880      HL
...        GW170104  1167559936.60 31.2    19.4       13    1000     HL
...        GW170814  1186741861.53 30.5    25.3       18    540      HLV""",
...     format='ascii')

Simple filters

We can then filter the table based on snr to get the really loud events:

>>> print(table.filter('snr > 15'))
  name        gps       m1   m2  snr
-------- ------------- ---- ---- ----
GW150914  1126259462.0 36.2 23.7 23.7
GW170814 1186741861.53 30.5 25.3 18.0

Filter functions

We can also filter the table to find those events from O1 by defining a custom filter function that compares to the start and end GPS times for O1 (taken from the LOSC Data Usage Notes):

>>> from gwpy.time import to_gps
>>> o1start = to_gps("Sep 2015")
>>> o1end = to_gps("Feb 2016")
>>> def in_o1(column, interval):
...     return (column >= interval[0]) & (column < interval[1])
>>> print(table.filter(('gps', in_o1, (o1start, o1end))))
   name        gps       m1   m2  snr  network
--------- ------------- ---- ---- ---- -------
 GW150914  1126259462.0 36.2 23.7 23.7      HL
LVT151012 1128678900.44 23.0 13.0  9.7      HL
 GW151226 1135136350.65 14.2  7.5 13.0      HL

The custom filter function could have been as complicated as we liked, as long as the two (and only two) input arguments were the column array for the relevant column, and the collection of other arguments to work with.

Similarly, we could filter the catalogue to find only those events that include data from the Virgo observatory:

>>> import numpy
>>> print(table.filter(('network', numpy.char.endswith, 'V')))
  name        gps       m1   m2  snr  network
-------- ------------- ---- ---- ---- -------
GW170814 1186741861.53 30.5 25.3 18.0     HLV

Using multiple filters

Filters can be trivially chained (either in str form, or functional form):

>>> print(table.filter('snr > 15', 'distance > 5000'))
  name        gps       m1   m2  snr  distance network
-------- ------------- ---- ---- ---- -------- -------
GW170814 1186741861.53 30.5 25.3 18.0      540     HLV

Gotchas

The parser used to intrepet simple filters doesn’t recognised strings containing alpha-numeric characters as single words, meaning things like LIGO data channel names will get parsed incorrectly if not quoted. So, if in doubt, always pass a string in quotes; the quotes will get removed internally by the parser anyway. E.g., use channel = "X1:TEST" and not channel = X1:TEST.

Built-in filters

The GWpy package defines a small number of filter functions that implement standard filtering operations used in gravitational-wave data analysis:

in_segmentlist(column, segmentlist)

Return the index of values lying inside the given segmentlist

not_in_segmentlist(column, segmentlist)

Return the index of values not lying inside the given segmentlist