Working with frames and series
In this section, you will work with frames and series in the sandbox.
Start the sandbox🔗
In the terminal, type
opensafely exec ehrql:v0 sandbox example-data
and press Enter.
Notice that the command prompt,
or the set of characters before the cursor,
has changed to >>>
.
For the remainder of this section,
when you see >>>
,
you should type the code that follows into the sandbox and press Enter.
Work with patient data🔗
Import the patients
table🔗
>>> from ehrql.tables.beta.core import patients
Work with a patient frame🔗
The patients
table is a patient frame; it has one row per patient.
Notice that all values in the patient_id
column are unique.
>>> patients
patient_id | date_of_birth | sex | date_of_death
------------------+-------------------+-------------------+------------------
0 | 1973-07-01 | female | 2015-09-14
1 | 1948-03-01 | male | None
2 | 2003-04-01 | male | None
3 | 2007-06-01 | female | None
4 | 1938-10-01 | male | 2018-05-23
5 | 1994-04-01 | female | None
6 | 1953-05-01 | male | None
7 | 1992-08-01 | female | None
8 | 1931-10-01 | female | 2017-11-10
9 | 1979-04-01 | male | None
Work with a patient series🔗
The patients.date_of_birth
column is a patient series; it has one row per patient.
>>> patients.date_of_birth
0 | 1973-07-01
1 | 1948-03-01
2 | 2003-04-01
3 | 2007-06-01
4 | 1938-10-01
5 | 1994-04-01
6 | 1953-05-01
7 | 1992-08-01
8 | 1931-10-01
9 | 1979-04-01
Work with event data🔗
Import the medications
table🔗
>>> from ehrql.tables.beta.core import medications
Work with an event frame🔗
The medications
table is an event frame; it has many rows per patient.
Notice that some values in the patient_id
column are not unique.
>>> medications
patient_id | row_id | date | dmd_code
------------------+-------------------+-------------------+------------------
0 | 0 | 2014-01-11 | 39113611000001102
1 | 1 | 2015-08-06 | 39113611000001102
1 | 2 | 2018-09-21 | 39113311000001107
1 | 3 | 2020-05-17 | 22777311000001105
3 | 4 | 2022-11-09 | 22777311000001105
4 | 5 | 2017-05-11 | 39113611000001102
5 | 6 | 2017-07-11 | 3484711000001105
5 | 7 | 2019-07-06 | 39113611000001102
7 | 8 | 2021-01-27 | 3484711000001105
9 | 9 | 2015-03-14 | 3484711000001105
Work with an event series🔗
The medications.date
column is an event series; it has many rows per patient.
>>> medications.date
0 | 0 | 2014-01-11
1 | 1 | 2015-08-06
1 | 2 | 2018-09-21
1 | 3 | 2020-05-17
3 | 4 | 2022-11-09
4 | 5 | 2017-05-11
5 | 6 | 2017-07-11
5 | 7 | 2019-07-06
7 | 8 | 2021-01-27
9 | 9 | 2015-03-14
Transform input data into output data🔗
ehrQL tables, such as patients
and medications
, contain input data.
As a researcher, your task is to transform them into a dataset.
A dataset has one row per patient and contains output data.
Let's look at how we might transform input data into output data.
Transform an event frame into a patient frame🔗
To transform an event frame into a patient frame:
- Sort the event frame
- Select either the first row or the last row in the event frame
>>> medications.sort_by(medications.date).first_for_patient()
patient_id | date | dmd_code
------------------+-------------------+------------------
0 | 2014-01-11 | 39113611000001102
1 | 2015-08-06 | 39113611000001102
3 | 2022-11-09 | 22777311000001105
4 | 2017-05-11 | 39113611000001102
5 | 2017-07-11 | 3484711000001105
7 | 2021-01-27 | 3484711000001105
9 | 2015-03-14 | 3484711000001105
Transform an event frame into another event frame🔗
To transform an event frame into another event frame, filter rows that match or do not match a condition.
Rows that match 100mcg/dose Salbutamol:
>>> medications.where(medications.dmd_code == "39113611000001102")
patient_id | row_id | date | dmd_code
------------------+-------------------+-------------------+------------------
0 | 0 | 2014-01-11 | 39113611000001102
1 | 1 | 2015-08-06 | 39113611000001102
4 | 5 | 2017-05-11 | 39113611000001102
5 | 7 | 2019-07-06 | 39113611000001102
Rows that do not match 100mcg/dose Salbutamol:
>>> medications.except_where(medications.dmd_code == "39113611000001102")
patient_id | row_id | date | dmd_code
------------------+-------------------+-------------------+------------------
1 | 2 | 2018-09-21 | 39113311000001107
1 | 3 | 2020-05-17 | 22777311000001105
3 | 4 | 2022-11-09 | 22777311000001105
5 | 6 | 2017-07-11 | 3484711000001105
7 | 8 | 2021-01-27 | 3484711000001105
9 | 9 | 2015-03-14 | 3484711000001105
Extract a series of years from a series of dates🔗
To extract a series of years from a series of dates,
append .year
to the series of dates.
>>> patients.date_of_birth.year
0 | 1973
1 | 1948
2 | 2003
3 | 2007
4 | 1938
5 | 1994
6 | 1953
7 | 1992
8 | 1931
9 | 1979
Add one or more years to a series of dates🔗
To add one or more years to a series of dates,
use the years
function.
>>> from ehrql import years
>>> patients.date_of_birth + years(1)
0 | 1974-07-01
1 | 1949-03-01
2 | 2004-04-01
3 | 2008-06-01
4 | 1939-10-01
5 | 1995-04-01
6 | 1954-05-01
7 | 1993-08-01
8 | 1932-10-01
9 | 1980-04-01