Producing anonymous results

On this page, you will find instructions for producing anonymous results and examples to support the application of the instructions. At the end of the page, you will find guidance on publishing results from Kapseli or another secure processing environment.

Instructions for producing anonymous results

Introduction

If you process personal data, you must produce the results in an anonymous format that does not reveal any information about individuals or their characteristics.

Findata ensures the anonymity of published results. This requirement applies to all results produced from materials that have been authorised under the Act on the Secondary Use of Health and Social Data and processed in a secure processing environment in accordance with said Act. We do not charge a fee for anonymity verification.

Results must be produced so that they comply with these instructions regarding different result types. The aim is to ensure the data protection of published results.

These instructions contain a list of the most common result types and their related disclosure risk. The list is not exhaustive, and the disclosure risk is also affected by the properties of the data and the content of its variables. The anonymity of some result types can be established fairly easily, while other types may require more detailed examination.

When producing results, you have to assess whether the definition of anonymous data is met. Submit your results for verification only once you are confident that they are anonymous.

Verifying the anonymity of results consist of:

  1. These instructions on producing anonymous results
  2. The summary form submitted to Findata alongside the results, and
  3. Checks carried out by Findata.

Fill in the summary form carefully. If you are publishing results from Kapseli, the form is available on the D drive. If you are using another secure processing environment, you can download the form from the link above. The verification of anonymity at Findata relies heavily on the information you provide in the summary form. If needed, we are available to advise you on producing anonymous results.

If you cannot produce results in accordance with these instructions but believe they are anonymous, you must submit justifications on the anonymity as a part of the summary form. Note that results needing case-by-case assessment might take longer to verify. The justifications you submit to Findata must only concern anonymity, not matters such as the importance of getting the research published or the significance of the research.

What does anonymisation mean?

Anonymisation is the process of ensuring that material cannot be used to

  • identify any individual directly or indirectly
  • draw conclusions about a specific individual
  • link data concerning an individual to other materials.

Anonymised material must be impossible or extremely difficult to revert to an identifiable format that could be used to identify a specific individual.

When materials are disclosed under the Secondary Use Act, any subsequent published results must be anonymous (Act on the Secondary Use of Health and Social Data, section 52). If your research requires publishing results that cannot be anonymised, you should take this into account already when planning the research and consider the option of conducting the research under some other legislation than the Secondary Use Act.

Results must be anonymised in a way appropriate to the result type. This can be done, for example, by:

  • using sufficiently coarse categories,
  • censoring, editing or deleting data involving a disclosure risk, or
  • adding uncertainty to the data.

Even if an individual result is anonymous, there is a risk of disclosing an individual person when multiple results are combined. A typical example is producing several frequency tables for representing different data using the same background variable classifications. These tables can be combined to produce a frequency table with more detailed information, which may reveal additional information about the observations in the tables.

When assessing disclosure risk, consider how your results might be combined with current, past, or future analyses. If you are aware of prior publications based on the same or nearly the same material or a subset of it, these must be taken into account.

To ensure anonymity, the starting point is that results are based on data from several observations and that the minimum frequency in the results is three.

Results, such as table cell or group data, must be based on at least three observations.

The use of this threshold criterion is a clear and simple way to reduce the risk of disclosing an individual and ensure the data protection of results. However, you must assess whether the minimum frequency is sufficient for your particular results. Using the threshold value of three does not necessarily guarantee the anonymity of results. For example, if you publish cells showing 0% and 100% (all or none of the individuals have a specific feature), each case must be assessed separately. If you censor a cell with a low cell-specific frequency, make sure that the censored value cannot be calculated using the other values in the table.

In addition to cell-specific frequencies, you must also consider the size of the cell-specific population. This refers to the group of individuals who could fall into the cell based on its defining characteristics (e.g. persons of a certain age living in a particular municipality). A small cell-specific population increases the risk of identifying individuals.

Findata requires sufficient background information for each type of analysis to verify anonymity. This information, such as the number of observations that the results are based on, must be included in the summary form.

Descriptive indicators and analyses

The principles described in the table below form the basis of our anonymity verification process.

Result typeDisclosure riskNotes
Descriptive analyses and indicators  
 FrequenciesUsually highCan be published in principle when the minimum frequency in the results is at least three. See also Example 1.
Maximum, minimum, rangeUsually highCan be published in principle when the value of the indicator is based on several observations or if disclosing the individual person can be prevented in some other way. Usually, the minimum and maximum are applied to an individual observation, so these values may pose a risk of disclosure. You can improve the anonymity of your results by dividing your data into categories, as these categories will then include several individuals. Consider using suitable quantiles instead of minimum and maximum figures. See also Example 2.
Fractiles (quantiles, deciles, percentiles, median)Usually highCan be published in principle when the number of background observations is large enough.
ModeLowCan be published in principle if all the members of a group are not issued the same value.
Mean, standard deviationUsually highCan be published in principle, but check that the result represents a sufficiently large group and that the entire target group is not issued the same value. In addition, make sure not to report indicators from several nearly identical groups or subgroups.
Indices, ratios, indicatorsLowCan be published in principle, but remember to take your calculation formula into account. Indices produced with more complex formulas (e.g. Fisher Price) do not usually pose a disclosure risk, but very simple formulas are more prone to that risk, in which case your figures must be based on a sufficient number of observations. 
Degrees of concentration  LowCan be published in principle when the group under consideration contains a sufficient number of observations.
Higher momentum indicators (variance, covariance, kurtosis, skewness)LowCan be published in principle because the indicator has been clearly converted from the original individual values. Make sure not to publish an excessive number of indicators from a small group, as they could serve to disclose the entire group.
Graphs: visual representations of the original materialUsually highFor more detailed instructions, see the Graphs section.
Correlations and regression-type analyses 
Regression coefficientsLowCan be published in principle.
Estimation residualsUsually highResiduals refer to a single observation, so there is a disclosure risk. For more detailed instructions, see the Graphs section.
Estimate summary and test variables (t, F, R2, χ2 etc.) LowCan be published in principle when the group under consideration contains a sufficient number of observations.
Correlation factorsLowCan be published in principle when the group under consideration contains a sufficient number of observations.
Factor analysisLowCan be published in principle, but make sure that your factors are not based on a single variable.
Main component analysisLowMain component vectors and their corresponding values can be published in principle. Make sure to check the projections of the main components because they correspond to the scatter plot. For more detailed instructions, see the Graphs section.
Correspondence analysisLowCan be published in principle.
Table 1. Classification of result types according to disclosure risk

The terms “group” and “target group” refer to the observations from which indicators are calculated.

Graphs

The data protection assessment process for graphs relies on aggregated tabular presentations, as they make it easier to perceive the frequency of the observations underlying the points or plots in a graph, which would usually be impossible to discern from the graph itself. If necessary, a table specifying the background of your result should be provided with the graph if said graph is used to depict individual observations or a small target group.

Click the titles below to read more about each result type.

Distribution graphs, histogram

Distribution graphs require special attention to outliers, as they may pose a disclosure risk.

This is particularly challenging at the tails of a normal distribution, where it may not be possible to include the entire tail in the graph. For histograms, ensure the material is classified so that each class contains a sufficient number of observations. This instruction corresponds to the principles that apply to descriptive statistics.

Bar chart

Bar charts must be prepared so that each category includes a sufficient number of observations. This instruction corresponds to the principles that apply to descriptive statistics.

Scatter chart, scatter plot

As a rule, each individual point in a scatter chart is based on a single unit. As such, these charts cannot be published unless the data has been grouped so that each point represents multiple observations.

Scatter charts may only be published if the underlying data would meet anonymisation requirements in tabular form. The assessment must also consider whether the combination of variables used could enable individual identification.

You can improve the anonymity of scatter charts by replacing them with a graph depicting the frequency of observations in grid cells or by adding randomness to your points.

Box plots

Box plots pose a disclosure risk by default, as they contain points referring to individual observations. Outliers are particularly susceptible to causing identification risk. The use of means may also increase the risk of identification. This instruction corresponds to the principles that apply to descriptive statistics.

Residuals

Residuals refer to individual observations. When visualising residuals, prefer a graph format that does not rely on individual points. If a graph based on individual points is used, avoid displaying the axis values.

Survival analysis, Kaplan-Meier curve

The disclosure risk in survival analysis depends on how the analysis is defined. In principle, the results can be published if each step in the curve corresponds to a sufficient number of observations

Individual steps may also be allowed if it is clear that the data underlying the curve cannot be used to infer precise ages or calendar dates. In such cases, however, you must assess how well the background information identifies individuals.

See also Example 3.

Spatial analysis

Spatial analysis is particularly challenging in terms of data protection, as location information usually plays a key role in identifying individuals. Publishing spatial analysis results generally requires reclassifying location information and presenting the data as heat maps instead of individual observation points.

Sankey diagrams

The data protection implications of Sankey diagrams (flow diagrams) depend on the type of information presented. If the diagram depicts exact frequencies, the same principles apply as for descriptive statistics.

See also Example 4.

Other result types

Click the titles below to read more about each result type.

Images and other imaging materials

The privacy risks associated with imaging materials must be assessed on a case-by-case basis. Such materials must not include any identifiers or metadata that could be linked to an individual. The risk of identification can be reduced by limiting the data and lowering the image resolution. If imaging materials can be combined with other data, the risk of identification increases.

For more information on the anonymisation and anonymity of imaging and signal data, see the policy outline prepared by the Expert Group on Data Protection appointed by the Ministry of Social Affairs and Health:

Results based on genome data

The anonymity of results based on genome data must always be assessed on a case-by-case basis.

1. Rare individual genetic variations

Publishing rare significant genetic findings is as a rule permitted if the variation is found in at least three individuals. If fewer than three individuals share the variation, the exact number should be concealed.

When publishing such findings, particular attention must be paid to the risk of identification. A minimum frequency threshold of n=3 may help reduce this risk, but its sufficiency should be evaluated in each case. The smaller the target group and the more background information is provided (such as phenotype or geographical origin) the higher the risk of identifying individuals.

When publishing results related to individual subjects, you must not include:

    • exact age,
    • a detailed description of the subject’s phenotype,
    • medical history,
    • the exact geographical area where the subjects were collected, or
    • any other information that could compromise the anonymity of the subjects.

    Details of the genetic variation may be published if they are already available in public variant databases, including:

    • allele frequency,
    • clinical significance of the variant,
    • rsID identifying the variant, or
    • the research method used to identify it.

    Table 2 provides an example of how to publish rare genetic findings.

    Causal geneDisorderPatient NZygosity of variantReference transcriptDNA variantProtein variantMolecular consequencePathogenicity classificationdb SNP IDgnomAD MAF, EuropeangnomAD MAF, FinnishSpliceAIphyloPCADD
    EPHA2Lung cancer<3HeterozygousNM_004431.5c.1171G>Cp.Gly291Argmissense variantLikely pathogenicrs341925490.016790.009470.01000.1356.37
    NRASLung cancer5HeterozygousNM_002524.5c.183A>Cp.Gln61Hismissense variantPathogenic/Likely pathogenicrs121913255NANANANANA
    DDR2Lung cancer<3HeterozygousNM_006182.4c.716T>Gp.Leu239Argmissense variantLikely pathogenicrs5780152160.0000.0000.0006,3224.5
    Table 2. Example of reporting rare genetic findings

    Genetic variations not present in the individual’s own genome

    Time-bound genetic mutations that do not occur in the subject’s genome, such as those found in cancer tumours, may be published at an individual level. However, extra caution must be taken to ensure individuals cannot be identified.

    When publishing results related to individual subjects, you must not include:

    • exact age,
    • a detailed description of the subject’s phenotype,
    • medical history,
    • the exact geographical area where the subjects were collected, or
    • any other information that could compromise the anonymity of the subjects.

    Where possible, results should be presented in grouped form based on mutation or person classifications.

    3. Genealogical research

    When publishing genealogical results, the anonymity of extended family members must be ensured. Hereditary genetic changes identified through genealogical studies may be presented as pedigrees, as illustrated in Figure 1.

    When presenting persons in a pedigree, you cannot include any other detailed background information about them, such as:

    • age,
    • time of discovery of the genetic changes,
    • geographical area from which the genealogical history has been collected, or
    • any other information that endangers the anonymity of the family.

    The more detailed the information, the greater the risk of identification. In cases of very rare diseases, it is possible that a pedigree cannot be published in anonymous format.

    Figure 1. Example of pedigree reporting
    Artificial intelligence (AI) and machine learning models

    If an AI or machine learning model/algorithm is the publishable result of your study, you must consider its anonymity already during the development. You must be able to prove the anonymity of the model in order to publish it. Anonymity is assessed on a case-by-case basis.

    When submitting a model for anonymity verification, you must also submit the Anonymity form for AI and machine learning models to Findata. You should read the form carefully already when planning the model, so that you are familiar with factors affecting anonymity already when developing the model.

    Download the form: Anonymity form for AI and machine learning models (Word, 44 kb)

    Anonymity is affected by:

    1. characteristics of the model itself (e.g. model type, number of parameters to be trained)
    2. features of the training data (e.g. such as quantity, content, format and possible processing, such as anonymisation)
    3. the model’s ability to generalise and how it performs the task for which it is trained.

    If a model is trained exclusively on anonymous training material, the model itself can also be considered to be anonymous.

      The anonymity of a model can be ensured by using differential privacy during training. There are no unequivocal guidelines for selecting the differential privacy parameters, as it depends on the situation and must be justified on a case-by-case basis.

      Small values, such as

      where N is the number of individuals in the material used, produce mathematically strong protection. However, in some cases, even higher values may suffice.

      As a rule, training data should not include material containing individual-level direct identifiers. A model with parameters that include information referring to an individual in an understandable format (e.g. text) does not meet anonymity criteria. The development process must also include sufficient measures for assessing the generalisation ability of the model.

      Individual-level result materials

      As a rule, published results must be based on several observations.

      If you need to publish result materials that are individual-level on the whole, the result materials must meet the definition of anonymous data (see What does anonymisation mean?).

      Producing individual-level anonymous data usually requires the use of advanced methods, such as differential privacy. The anonymity of individual-level result materials must always be ensured on a case-by-case basis.

      Synthetic result materials

      The anonymity of synthetic materials depends on:

      1. the method of generation, and
      2. the properties of the personal data used as potential background data.

      The fact that data is synthetic is not, on its own, sufficient to guarantee anonymity.

      Demonstrating anonymity depends largely on the method of generating the data and its complexity. If the aim is to generate anonymous synthetic data, you should consider already when developing the process how the anonymity of the data can be verified. The anonymity of synthetic result materials must always be ensured on a case-by-case basis.

      Results of qualitative research

      Results produced using qualitative research methods must also be anonymous and meet the definition of anonymous data.

      If possible, a minimum frequency of three should be used in published results to ensure data protection. If this is not possible, results must be processed so that individual persons cannot be identified directly or indirectly.

      The method of forming the target group affects the risk of identification. Identification risk is higher if the target group includes all the persons meeting certain inclusion criteria compared to a situation where the target group is selected randomly or by sampling.

      Especially when publishing quotations from text, you as the producer of the results have to ensure that the quotation does not enable the identification of an individual or event. The risk of identification increases if the quotation contains details of an individual or event or if it concerns a rare phenomenon. 

      Examples to support the application of the instructions

      Example 1: Frequency table and secondary disclosure

      In a frequency table, data in each cell must be based on at least three observations. If the number of observations is less than that, the exact numeric value must be censored. Data referring to fewer than three observations can for example be censored with “<3”.

      It should not be possible to calculate the censored value using the other numbers in the table. If the other numbers in the table can be used to determine the exact value of the censored frequency, the other numbers must also be censored.

      Example 1.1.

      In the table below, the Year 3 cell contains an observation based on fewer than three persons, so it has to be censored.

       Year 1Year 2Year 3Total
      Group a, n012113

      However, simply censoring the small observation value is not enough, since the other numbers can be used to calculate its exact value. You can prevent that calculation by coarsening the total number or the exact numerical value of another observation value as presented below. When doing so, Year 2 and Total can be used to conclude that the value of the Year 3 cell can be 1 or 2, but the exact value is not known.

       Year 1Year 2Year 3Total
      Group a, n012<313–14
       Year 1Year 2Year 3Total
      Group a, n0>10<313

      When the numerical value is zero, you do not usually need to hide it because that data does not refer to any individual person.

      Example 1.2.

      Two cells of the table below contain an observation based on fewer than three persons (n=1), and both need to be censored.

       Year 1Year 2Year 3Total
      Group b, n1315

      Even if both of the observation values were censored with <3, the values in Year 2 and Total could be used to deduce that the value of both censored cells is 1. In this case, one way you can prevent the calculation of small observation values is by censoring all three observation values with ≤3.

       Year 1Year 2Year 3Total
      Group b, n≤3≤3≤35

      The situation would be different if, for example, the total amount was 6 (as in table 1.3. in the example below). In this case, it would not have been possible to determine which of the two censored cells marked with <3 had the value 1 and which had the value 2.

      Example 1.3.

      Observations based on fewer than three persons can also get disclosed by proportional shares. In the example below, even if the small frequencies were censored with <3, it is possible to use the percentages and the total to calculate the exact values of cells Year 1 and Year 2.

       Year 1Year 2Year 3Total
      Group c, n1326
      Group c, %17.75033.3100

      In this case, the percentages have to be censored as well.

       Year 1Year 2Year 3Total
      Group c, n<33<36
      Group c, %≤3350.017–33100

      Example 2: Minimum and maximum

      Minimum and maximum values usually concern an individual observation, so they can involve a disclosure risk, especially in cases where the group size is small and/or individual observations are very abnormal. Minimum and maximum values can be published when the value of the indicator is based on several observations or if the individual’s disclosure risk can be prevented in some other way.

      For example, when presenting indicators on the height of subjects, abnormally short or tall persons may stand out from the group, which increases the risk of disclosing the individual. In the group presented in the table below, where the average height is 168 cm, the numerical value of 195 cm stands out as abnormal. People with abnormal characteristics are also more likely to stick in people’s minds, which means that the risk of disclosure is higher.

       Height, cm
      Person 1162
      Person 2162
      Person 3170
      Person 4157
      Person 5195
      Person 6166
      Person 7171
      Person 8164

      You can improve the anonymity of your results by dividing your data into categories, as these categories will then include several individuals. Consider using suitable quantiles alongside any minimum and maximum figures.

      Example 3: Kaplan Meier graph

      A Kaplan-Meier graph and its background data can be published if the data behind the graph cannot be used to determine exact calendar dates or individuals’ characteristics, such as age. In this case, even individual steps can be allowed in the graph. The graph in Figure 2 can be published because it cannot be used to identify individuals or to determine precise calendar dates.

      Figure 2. Example: Kaplan Meier graph. Image source: Tutkijaportti (In Finnish, PDF-file, 1 335 kb)

      Example 4. Sankey diagram

      If a Sankey diagram depicts exact numbers, you have to take the minimum frequency into account in the results. The data in Figure 3 cannot be published without anonymisation, such as reclassification, as the figure determines the exact n< 3 frequencies.

      Figure 3. Example: Sankey diagram.

      Example 5: Individual-level material and qualitative research

      As a rule, published results must be based on several observations. This also applies to qualitative studies. However, if you need to publish individual-level result materials, you have to process the results so that no individual person can be identified from the result materials.

      The risk of identification is especially increased in text quotations if the quotation contains details of an individual or event or if it concerns a rare phenomenon. The fictitious quotation below contains the patient’s name and a fair amount of other identifying information.

      Eero Example, 13 years old, was brought to Turku University Hospital after he fell in the courtyard of the Turku Yhteiskoulu School. E.E. suffers from haemophilia A. The gastroenterologist who examined E.E. made a referral to…

      In text quotations, details can be censored by categorisation, which reduces the risk of identification.

      [A patient] [of primary school age] was brought to [a university hospital] for treatment after [they] fell in the courtyard [of their school]. [The patient] has [a severe chronic illness]. [The physician] who examined [the patient] made a referral to…

      How to expedite the verification of result anonymity

      Producing anonymous results – checklist

        You can use this checklist before submitting published results for anonymity verification.    Done
      I have read Findata’s instructions on producing anonymous results. 
      I have filled in the summary form carefully. 
      The results meet the definition of anonymous data. 
      The results do not contain direct identifiers (e.g. name, personal identity code). 
      The results do not contain strong indirect identifiers (e.g. address, registration number). 
      The results do not contain pseudo-ID codes (e.g. Findata FID). 
      The results do not contain cell-specific frequencies n< 3. If the results contain cell-specific frequencies n< 3, I have justified the anonymity of the results. 
      If small frequencies have been censored, the censored values cannot be inferred from other results. 
      The results do not contain individual-level data or data that refers to a single person (e.g. minimum, maximum). If the results contain such data, I have justified the anonymity of the results. 
      I have assessed the anonymity of the results with especial care if the results include:
      An artificial intelligence / machine learning model
      Images or other imaging materials
      Genetic or genomic data
      Synthetic result materials
         

      How to expedite the verification of result anonymity:

      1. Read the instructions on this page carefully. Make sure that the results you produce comply with the instructions.
        • If necessary, contact Findata’s help desk (info@findata.fi) if you need help for applying the instructions.
      2. Fill in the summary form carefully and take all sections of the form into account.
        • If the results you have produced do not comply with all the statements, justify why the results can still be considered anonymous. Please note that such results cannot be published without valid justifications.
      3. Produce results in a format that enables anonymity verification.
        • Make sure that all variables are described with names that can be understood by a person not involved in the study.
        • Make sure that the result type is clearly presented (e.g. frequency, regression coefficient or some other test statistic).
      4. Request result extracts from the secure processing environment as a reasonably-sized package.
        • We do not recommend submitting individual result packages frequently (e.g. every day). Processing result packages in several different submissions takes more time in terms of data transfer and communication.
        • A large number of results will increase the verification time. As a rule, we recommend delivering results in packages of up to 50 files. Processing a very extensive result package with hundreds of files/tabs is a lot of work, especially if any comments have to be made regarding the anonymity of the results.
      5. Submit the results in a commonly used file format such as Word, pdf, Excel, csv, JPEG, TIFF, or PNG.
        • If we cannot open a file you submit, we will have to ask you to deliver the materials again in another file format.

      References

      Publishing the results

      Publication refers to making information publicly available, which includes presenting results outside your immediate working group. This can be in the form of a scientific journal, thesis, textbook, manual, conference presentation, abstract, report, survey, or internet publication.

      Publishing results from Kapseli

      1. Data processing: Data is processed in the Kapseli environment, and only the final analysis results are exported. Results must be in an anonymous format, with Findata ensuring anonymity as per the Secondary Use Act.
      2. Verify anonymity: Use the guidelines on the page Procucing anonymous results to verify the anonymity of results intended for publication.
      3. Transfer results: Transfer the results and the summary form to Findata via the Output (O:) drive in Kapseli.
        • The summary form for verifying anonymity is located in the Kapseli D-folder under “Käyttöohjeet_User_guide_05062023.”
        • Compress the files and the summary form into a zip folder named as follows:
          “Results_[Record_number_of_permit_decision][Kapseli_ID][Delivery_date]” (e.g., “Results_THL_1234_14.02.00_2020_a01_15032021”).
          • Note: Date format should be ddmmyyyy.
        • Create an empty text file named “ZZZ_READY.txt” in the Output drive. This triggers the automatic transfer of the zip folder. Ensure the file name is correct.
          • Transfers occur hourly and every 30 minutes. Files will be deleted from the Output drive after transfer.
      4. Notify Findata (optional): Email Findata at data@findata.fi to confirm your transfer. We will follow up if we do not receive your submission. There will be no confirmation of transfer success.
      5. Review and delivery of results: Findata will review the submission within 5 working days and provide the results via Nextcloud to the permit holder. If additional information is needed, we will contact you.
        • For large result files, the review process might exceed the usual 5-day limit. This time limit pertains only to verifying anonymity, not to other file imports from Kapseli (e.g., code files).
        • If you don’t have a Nextcloud account, request one via the “Order a new Nextcloud account” form in Findata’s e-service.

      Publishing results from other secure operating environments

      1. Summary form: Download and complete the form for verifying the anonymity of the results
      2. Compress files: Zip the files and the summary form, naming the folder as follows:
        • “Results_[Record_number_of_permit_decision][Kapseli_ID][Delivery_date]” (e.g., “Results_THL_1234_14.02.00_2020_a01_15032021”).
          • Note: Date format should be ddmmyyyy.
      3. Transfer results:
        • Nextcloud: If you have a Nextcloud account, transfer results via Nextcloud.
        • Secure Email: If you do not have a Nextcloud account, transfer results via secure email. Do not send results via regular, non-secure email.
      4. Contact Findata: Email Findata at data@findata.fi with the subject “Ensuring the anonymity of results.”
        • Indicate whether you are using Nextcloud or secure email for the transfer.
        • If using Nextcloud, include the diary number of the data permit and your Nextcloud ID. Findata will provide the folder name for your transfer and a zip folder with the summary form.
        • If using secure email, Findata will send a secure email for you to reply with your zip folder containing the results and the summary form.
      5. Follow-up: If there are concerns about the anonymity of the results, Findata will contact you within seven working days.
        • If you do not hear from us within this timeframe, you may proceed with publishing your results.

      Note: Results exported from Statistics Finland’s Fiona remote processing environment after 18 March 2025 do not need to be submitted separately to Findata for anonymity verification.

      Reference guide

      If Findata has granted a data permit or made a data request decision for your project, cite Findata in publications as follows: “Sosiaali- ja terveysalan tietolupaviranomainen Findata” or “Finnish Social and Health Data Permit Authority Findata.”

      • Follow the writing guidelines of the scientific publication series.
      • We recommend that references to Findata be made in accordance with its statutory duties. In data permits, these duties include, for example, pseudonymization and ensuring the anonymity of results, and in data requests, the duties include data integration, aggregation, and anonymization.
      • Findata can be referenced in the text, tables, figures, permit lists, acknowledgments, and reference lists.
      • Whenever possible, include the diary number(s) of the data permit or data request in the references.

      Examples of in-text citations

      “Research data was obtained from the Finnish Social and Health Data Permit Authority Findata with data permit THL/XXXX/14.XX.00/20XX. Findata was responsible for the pseudonymization of the data and ensuring the anonymity of the final results.”

      “The statistics were produced by Findata, the Finnish Social and Health Data Permit Authority, with data request THL/XXXX/14.XX.00/20XX. Findata was responsible for data integration and producing the anonymized statistics.”

      Example of table citation

      DataSource
      Research dataFinnish Social and Health Data Permit Authority Findata, data permit THL/XXXX/14.XX.00/20XX

      Example of citation in a reference list

      Findata. (Year). Data permit THL/XXXX/14.XX.00/20XX. Finnish Social and Health Data Permit Authority Findata.

      Report published research results to Findata

      Use the form below to report articles and publications that have made use of data authorised by Findata. One of the criteria for the issuing of a data permit for the purpose of scientific research is that the results are published as scientific publications. The form can also be used to report publications of data authorised for other uses.