Updated 27.06.2024

Ready-made datasets

Our goal is to offer customers ready-made datasets on thematic subject areas. Currently we offer a ready-made dataset collection consisting of the registry data collected in the FinRegistry research project and a COVID-19-themed ready-made dataset.

Ready-made datasets are, as the name suggests, pre-compiled and pre-processed datasets that are available more quickly, without the need for cost estimates or extraction fees from controllers.

How to apply for a ready-made dataset?

Data permit for the use of ready-made dataset is applied for in the same way as for other data. The COVID-19 dataset cannot be tailored or modified according to the customer’s needs like other data. Read more How to apply for a ready-made dataset?

How much does a ready-made dataset cost?

A data permit for ready-made datasets costs 300 euros. If other data is combined with the material, the normal data permit fee applies. Read more How much does a ready-made dataset cost?

Available datasets

Take a look at the FinRegistry registry-based and COVID-19-themed ready-made datasets. Read more Available datasets

How to apply for a ready-made dataset?

Apply for the ready-made dataset by using the data permit application form in Findata’s e-service (asiointi.findata.fi).

The COVID-19 dataset cannot be tailored or modified according to the customer’s needs like other data. The FinRegistry dataset does not have such predefined datasets or modules and the extraction is tailored to the needs of the applicant.

The data to be handed over are pseudonymized separately for each customer and permit. Ready-made datasets are individual-level data that can only be analyzed in a secure processing environment that meets the requirements. The primary processing environment is Findata’s Kapseli.

How much does a ready-made dataset cost?

A data permit for one ready-made dataset or an extraction from it costs 300 euros.

For the FinRegistry data, in addition to the data permit fee, Findata’s extraction costs are charged based on the amount of work involved. Findata’s hourly fee is EUR 147/hour.

The COVID-19 material consists of pre-assembled modules, so no extraction costs are charged. For COVID-19 data set, the delivery of the data to Kapseli is also free of charge. If you want to analyze the data in another environment, two working hours, i.e. 294,00 EUR (VAT +0%), will be charged for the delivery costs. We inform the client of the costs if the workload estimate exceeds two hours.

If you want to combine other data with the ready-made dataset, the price and processing time of the normal data permit will apply.

The price is based on the regulation of the Ministry of Social Affairs and Health. The current prices are valid until 31 December 2024.

See the Pricing page for more information.

Available datasets

FinRegistry-DATASET

More detailed description: Aineistokatalogi.fi

 The FinRegistry dataset consists of the registry data collected in the FinRegistry research project and the research data generated from them. The material includes data from Digital and Population Data Services Agency (DVV), Cancer Registry, Finnish Centre for Pensions (ETK), Kanta services, Kela, THL and Statistics Finland. It contains over 20 datasets and covers data from several decades.

There are three different types of datasets in Findata’s ready-made dataset collection:

  1. datasets created in the project, with a completely new file structure,
  2. datasets modified in the project, with file structures similar to the original datasets and
  3. datasets covering the original data collected for the project.

Type 3 datasets are included in Findata’s ready-made materials only when the corresponding type 2 datasets are not.

Findata’s FinRegistry ready-made material will be compiled gradually during spring 2024, starting with type 1 and type 2 datasets and progressing to type 3 datasets. The source data of type 3 datasets have already been described in the Data Resources Catalog by the original controller. However, the data collected for the FinRegistry project typically contain fewer variables than the original data.

Datasets per controller

Type 1:

  • Minimal phenotype, Detailed longitudinal

Type 2:

  • Digital and Population Data Services Agency: Pedigree, Relative pairs, Relatives, Marriages, Living history
  • Finnish Centre for Pensions: Unpaid periods and benefit periods under VEKL, Pension-insured earnings, Earnings-related pensions
  • Kanta Services: Patient Data Repository: Laboratory results
  • The Finnish Institute for Health and Welfare: Children born, Vaccinations, Infectious diseases, Malformations, Social assistance, Social welfare

Type 3:

  • Finnish Cancer Registry: Cancer
  • Statistics Finland: Causes of death
  • The Social Insurance Institution of Finland: Dispensed medicines reimbursable under the National Health Insurance scheme, Entitlements to reimbursement of pharmaceutical expenses
  • Kanta Services: Kanta Prescription Centre: Prescriptions, Dispensed medicines

Contrary to previously given information, the following source datasets collected by the FinRegistry reserach project have not been included in Findata’s FinRegistry ready-made dataset due to their size and structure: Primary health care visits, Health care, Intensive care.

Code lists in English can be found in the corresponding dataset descriptions in the National Data Catalogue (aineistokatalogi.fi) produced by the FinRegistry research project. Links to these are included in Findata’s dataset descriptions.

COVID-19-ready-made dataset

More detailed description: Aineistokatalogi.fi

The COVID-19dataset contains data from four controller: The Finnish Institute of Health and Welfare (THL), Kela/Kanta, Fimea and Statistics Finland. The target group is formed based on THL’s Infectious Disease Register. The data includes people who fell ill with COVID-19 in the HUS area in 2020–2021.

Data contents specific to the controller

  • Fimea: information on side effects of corona vaccinations
  • THL:
    • primary healthcare and specialist healthcare information (Hilmo and Avohilmo registers) on COVID-19 related reception visits and ward treatment periods
    • Various background information and more detailed information about COVID-19 from the Infectious Disease Register
  • Kela/Kanta: comprehensive COVID-19 vaccination information
  • Tilastokeskus: cause of death data

Modules

You can apply for a permit either for the entire dataset or form a whole from the modules below.

  • Annual packages
    • 2020
    • 2021
  • Age limit
    • 0–15 year olds
    • 16 years and older

Basic information

Findata’s ready-made dataset: COVID-19N%
Cohort size138 396
Male69 84350,47
Female68 55349,53
A diagnosis of COVID-19 in 2020 a20 75515,00
A diagnosis of COVID-19 in 2020 a118 21785,42
Those who received a positive diagnosis by age group in 2020
0–152 37911,46
16+18 37688,54
Those who received a positive diagnosis by age group in 2021
0–1527 04022,87
16+91 17777,13
Those who died during the follow-up periodb1 1830,85

a Some of the persons included in the material were diagnosed with COVID-19 in both 2020 and 2021.
b All causes of death

More information

Peija Haaramo

Chief Metadata Specialist