Filtering tables¶
In order to perform detailed analysis of tabular data, it is useful to
extract portions of a table based on some criteria, this is called filtering.
The EventTable object comes with a filter() method
that provides an intuitive interface to down-selecting rows in a table.
To demonstrate, we can download GWTC-2 from GWOSC:
>>> from gwpy.table import EventTable
>>> events = EventTable.fetch_open_data("GWTC-2")
>>> print(events)
name chi_eff_upper ... GPS final_mass_source_upper
... solMass
------------------ ------------- ... ------------ -----------------------
GW190408_181802-v1 0.14 ... 1238782700.3 3.9
GW190412-v3 0.08 ... 1239082262.2 3.9
GW190413_052954-v1 0.29 ... 1239168612.5 12.5
... ... ... ... ...
GW190924_021846-v1 0.3 ... 1253326744.8 5.2
GW190929_012149-v1 0.34 ... 1253755327.5 33.6
GW190930_133541-v1 0.31 ... 1253885759.2 9.2
Length = 39 rows
Simple filters¶
The simplest EventTable filter is a str statement that provides
a mathematical operation for a column and a threshold.
With the above GWTC-2 events, we can use the filter
"network_matched_filter_snr > 15" to pick out those events with
high signal power:
EventTable using a str definition¶>>> print(events.filter("network_matched_filter_snr > 15"))
name chi_eff_upper ... GPS final_mass_source_upper
... solMass
------------------ ------------- ... ------------ -----------------------
GW190412-v3 0.08 ... 1239082262.2 3.9
GW190521_074359-v1 0.1 ... 1242459857.5 6.5
GW190630_185205-v1 0.12 ... 1245955943.2 4.4
GW190814-v2 0.06 ... 1249852257.0 1.1
GW190828_063405-v1 0.15 ... 1251009263.8 7.2
Filter functions¶
More complicated filtering can be achieved by defining a function that
takes in two arguments - the first being the column slice of the input table,
the second can be whatever you want - and returns a boolean array.
The EventTable.filter() method is then called passing in a filter
3-tuple with these elements
the column name (
str) or a tuple of namesthe function to call
the other argument(s) for the function (normally a single value, or a
tupleof arguments)
If a single column name is given as the first tuple element, the function will
receive a single Column as the input.
If a tuple of names is given, the input will be a slice of the original table
containing only the named columns.
Using the same events table we can define a function to include only
those events in the first six months of 2019:
EventTable using a filter function¶>>> from gwpy.time import to_gps
>>> start = to_gps("Jan 2019")
>>> end = to_gps("Jul 2019")
>>> def q12_2019(column, interval):
... """Returns `True` if ``interval[0] <= column < interval[1]``
... """
... return (column >= interval[0]) & (column < interval[1])
>>> print(events.filter(('GPS', q12_2019, (start, end))))
name chi_eff_upper ... GPS final_mass_source_upper
... solMass
------------------ ------------- ... ------------ -----------------------
GW190408_181802-v1 0.14 ... 1238782700.3 3.9
GW190412-v3 0.08 ... 1239082262.2 3.9
GW190413_052954-v1 0.29 ... 1239168612.5 12.5
... ... ... ... ...
GW190706_222641-v1 0.26 ... 1246487219.3 18.3
GW190707_093326-v1 0.1 ... 1246527224.2 1.9
GW190708_232457-v1 0.1 ... 1246663515.4 2.5
Length = 24 rows
The custom filter function could have been as complicated as we liked, as long as the two (and only two) input arguments were the column array for the relevant column, and the collection of other arguments to work with. For example could filter the table to return only those events with high mass ratio:
EventTable using multiple columns¶>>> def high_mass_ratio(table, threshold):
... """Returns `True` if ``mass_1_source / mass_2_source >= threshold``
... """
... return (table['mass_1_source'] / table['mass_2_source']) >= threshold
>>> print(events.filter((('mass_1_source', 'mass_2_source'), high_mass_ratio, 3.0)))
name chi_eff_upper ... GPS final_mass_source_upper
... solMass
------------------ ------------- ... ------------ -----------------------
GW190412-v3 0.08 ... 1239082262.2 3.9
GW190426_152155-v1 0.32 ... 1240327333.3 None
GW190814-v2 0.06 ... 1249852257.0 1.1
GW190929_012149-v1 0.34 ... 1253755327.5 33.6
Using multiple filters¶
Filters can be chained (either in str form, or functional form):
EventTable.filter()¶>>> print(events.filter("network_matched_filter_snr > 15", "luminosity_distance > 1000"))
name chi_eff_upper ... GPS final_mass_source_upper
... solMass
------------------ ------------- ... ------------ -----------------------
GW190521_074359-v1 0.1 ... 1242459857.5 6.5
GW190828_063405-v1 0.15 ... 1251009263.8 7.2
Gotchas¶
The parser used to interpret simple filters doesn’t recognise strings
containing alpha-numeric characters as single words, meaning things like
LIGO data channel names will get parsed incorrectly if not quoted.
So, if in doubt, always pass a string in quotes; the quotes will get removed
internally by the parser anyway. E.g., use channel = "X1:TEST" and not
channel = X1:TEST.
Built-in filters¶
The GWpy package defines a small number of filter functions that implement standard filtering operations used in gravitational-wave data analysis:
|
Return the index of values lying inside the given segmentlist. |
|
Return the index of values not lying inside the given segmentlist. |