How does health data turn into a research dataset?
In a research dataset, an individual’s data merges into a mass of data. Instead of focusing on a single person’s information, a researcher’s interest is directed towards the results obtained from the entire dataset, such as averages and distributions.
However, before research can begin, it must be ensured that data is handled safely and responsibly. Next, we will explain how this is done in practice in Finland.
Applying for a data permit – defining the dataset and purpose of use
When social and health care registry data is needed for research, the researcher applies for a fixed-term permit from the authority to use the registry data. This is called a data permit.
The researcher must define exactly what data will be included in the dataset and for what purpose it will be used. In practice, this means the researcher describes in detail in their application what the research aims to find out and what data is needed to achieve this.
Rather than giving the research permission to process an individual’s entire medical history, the permit is granted for precisely defined variables, such as specific laboratory values, procedure codes, or medication data.
This ensures that the research receives exactly the information it needs – no more, no less.
Data encryption – making identification difficult
Before a dataset is released to a researcher, it is processed so that identifying an individual becomes significantly more difficult. Direct identifiers, such as names and personal identity codes, are removed and replaced with a code.
Furthermore, precise details can be generalised. For example, instead of a postcode, a region may be included, or instead of a specific date of birth, only the birth year is provided.
Can you recognise the health data in the image?
This is an example of what registry data used for research might look like. The data in the image is artificial and not based on real individuals. Download example data as an Excel file (14 kb)
Defining data processors – access only for named individuals
The data permit names the specific individuals who have the right to process the dataset. No one else can access the data. If there are changes to the research group, new researchers must separately apply for permission to process the data.
In the next section, we will explain where and how researchers process the research data.
What happens to Emma’s data before research?
A research group of doctors is starting a registry study on the treatment of hypothyroidism in the Pirkanmaa and Satakunta regions in Finland. Their goal is to improve the care pathway for patients with hypothyroidism by comparing care practices in these regions over the past twenty years.
They apply for access to a dataset containing information on individuals who have been diagnosed for hypothyroidism in the Pirkanmaa region. This dataset also includes Emma’s medical records, laboratory results, and prescription data.
In the data permit application, the research group justifies what data they need and for what purpose. For the research, it is not necessary to know the patients’ exact date of birth or address, so the permit is granted only for the birth year and city of residence.
In the research dataset, Emma is no longer an identifiable person but part of a group:
“Women aged 40–49 living in Pirkanmaa with hypothyroidism.”
