Producing anonymous results

On this page, you’ll find instructions for producing anonymous results. If you process personal data, you must provide the results in an anonymous form that cannot reveal any information about individuals. Findata ensures this anonymity according to the Act on the Secondary Use of Health and Social Data. This applies to all materials that have been authorised under the Act.

This page includes common result types and considerations for ensuring anonymity. Note that the list is not comprehensive, and the content of variables can affect the risk of disclosure. Even if an exact value is present, it may not expose an individual.

You must check your results and verify that they follow these instructions regarding different result types. While some result types can be easily verified for anonymity, others may require more specific examination. Adhering to these instructions doesn’t guarantee anonymity, but it gets you as close as possible to the goal.

In addition to following these instructions, you must submit a summary form along with the results, and Findata will perform result checks. Do not send the results for anonymity verification until you are certain they have been generated in an anonymous format.

Fill out the summary form carefully, as the verification of anonymity relies heavily on the information provided. If you need guidance on producing anonymous results, we can assist you. Generate the results in a format that allows for anonymity verification. Below, you’ll find tips to expedite the verification process.

How to expedite the verification of result anonymity:

  1. Carefully read the instructions on this page. Ensure that the results you produce adhere to the guidelines.
  2. Fill out the summary form diligently. Complete all sections of the form and tick all necessary boxes on the Summary page.
    • If your results do not align with all statements, provide justification for why the results can still be considered anonymous.
    • If the results are being submitted through the transfer service Nextcloud, ensure that the Nextcloud ID is mentioned in the form. Additional instructions on encryption and data submission through Nextcloud can be found on the Data Transfers to Findata -page.
  3. Generate the results in a format that allows for anonymity verification. Ensure that all variables are labeled with names understandable to individuals outside the research.
    • Clearly indicate the type of results (e.g., frequency, regression coefficient, or another test statistic).
  4. Request results in reasonable-sized packages for export.
    • Avoid sending individual result packages frequently (e.g., every day). Handling result packages through multiple separate submissions consumes more time for data transfer and communication.
    • We recommend submitting results in packages of no more than 50 files. Handling an extremely large result package containing hundreds of files can be labor-intensive, especially if there are uncertainties or comments regarding result anonymity.
  5. If you are requesting other data besides results from the processing environment, ensure that these files do not contain results. Clearly describe the data being transferred in the summary form.
    • Ensure that code files do not contain results or data (e.g., pseudo-ID).

What does anonymisation mean?

Anonymisation is the process of ensuring that material:

  • Cannot be used to identify any individual directly or indirectly.
  • Cannot be used to draw conclusions about a specific individual.
  • Cannot be linked to other material concerning a specific person.

The anonymised material must be impossible or extremely difficult to revert to an identifiable form. According to the Secondary Use Act, results must be anonymous. If there is a need to publish results that cannot be anonymised, this should be considered during the study’s planning process by using other criteria, such as obtaining consent from research subjects.

Even if an individual result is anonymous, there is a risk of disclosure when multiple results are combined. For example, several frequency tables using the same variable classifications can often be combined to create more detailed tables. Consider how your results could be combined with both current and previous analyses. If prior publications have used similar material, provide links to these publications.

To protect data subjects’ anonymity, the minimum frequency in results is five. This ensures data protection. For justifiable reasons and on a case-by-case basis, a minimum frequency of three may be used. This includes studies involving very small target groups or rare diseases. This is only acceptable if the result is significant and necessary to report at this level of accuracy while still meeting anonymity criteria.

Provide Findata with sufficient background information for each analysis type to ensure anonymity. This information must be displayed with the result or as a separate document so that the result and background information can be easily understood.

Findata uses the principles described in the table below as the basis for verifying anonymity.

Classification of disclosure risk by result type

Data typeResult typeGeneral classification
Descriptive statistics  
 Frequency tableTo be verified
 Quantity tableTo be verified
 Maximum, minimum, percentile, medianTo be verified
 ModeGenerally safe
 Mean, indices, ratios, indicatorsTo be verified
 Degree of concentration            Generally safe
 Higher momentum indicators (such as variance, covariance, kurtosis, skewness)Generally safe
 Graphs: visual representations of the original materialTo be verified
Correlations and regression-type analyses 
 Linear regression coefficientsGenerally safe
 Non-linear regression coefficients                        Generally safe
 Estimation residualsTo be verified
 Estimate summary and test variables (R2, χ2 etc.)                            Generally safe
 Correlation coefficientsGenerally safe
 Factor analysisGenerally safe
 Correspondence analysisGenerally safe


Before delivering your results for anonymity verification, make sure that:

  1. your results include no <5 frequencies
  2. your results cannot be used to identify any individuals either directly or indirectly and that the data cannot be combined with other data concerning the same person

Descriptive indicators and analyses

In the text below, the terms “group” and “target group” refer to the observations from which statistics are calculated.

Minimum, maximum and range

In general, minimum and maximum often refer to individual units, which are the easiest to identify, meaning that they contain a clear disclosure risk. These can be published if the value of the statistics is based on more than one unit. Anonymity of these results can be improved by categorising data, as categories will then include several individuals. Consider using suitable quantiles alongside any minimum and maximum figures.

Fractiles – quantiles, deciles, percentiles, median

These can be published if the underlying frequency is large enough.

Mean, standard deviation

In rare cases, may contain a disclosure risk. Check that the result represents a sufficiently large group and that the entire target group is not issued the same value. Check that these statistics are not reported from several nearly identical groups or subgroups.

Mode

This can be published in principle, but check that it will not disclose the entire group, i.e. that it does not describe the value of the entire target group.

Higher momentum indicators, such as variance

These can be published in principle, as the indicator has been clearly converted from the original individual values. Make sure not to publish an excessive number of indicators from a small group, as they could serve to disclose the entire group.

Correlation coefficients

These can be published in principle when the group under consideration contains a sufficient number of observations.

Degrees of concentration

These can be published in principle when the group under consideration contains a sufficient number of observations.

Linear regression, non-linear regression

Coefficients can be published in principle.

Test variables

These can be published in principle.

Factor analysis

These can be published in principle, but make sure that your factors are not based on a single variable.

Principal component analysis

Main component vectors and their corresponding values can be published in principle. Check any projections of the main components (they correspond to scatter plot, see below).

Indices, ratios, indicators

Indices can be published in principle, but the calculation formula must be taken into account. Indices based on more complex formulas (e.g. Fisher Price) do not usually pose a disclosure risk, while very simple formulas are more prone to this, in which case they must be based on a sufficient number of observations. 

Gini coefficients

Gini coefficients must be calculated for a sufficiently large number of observations. Data needed for the verification process: calculation formula and possibly frequencies underlying the figures.

Graphs

The data protection evaluation process for graphs relies on aggregated tabular presentations, as they make it easier to perceive the frequency of the observations underlying the points or plots in a graph, which would usually be impossible to discern from the graph itself. Therefore, a table specifying the underlying frequencies should be provided with the graph if it is used to depict individual observations or a small target group.

Histogram

When using histograms, the data should be classified in a way that each individual class contains a sufficient number of observations. This can be particularly challenging for the tail ends of any normal distributions, for example. This instruction is proportional to the case of descriptive statistics, and it may thus limit the depiction of the entire tail.

Scatter chart or scatter plot

As a rule, each individual point in a scatter chart describes a single unit. Therefore, these types of charts and plots are not publishable without first grouping the data so that each point contains several observations. These can be published only if the data which the chart or plot is based on could be published as a table. However, the assessment process should also take into account whether a combination of the variables you have used would allow for the identification of any individuals. You can improve the anonymity of scatter charts by replacing them with a graph depicting the frequency of observations in grid cells or by adding randomness to your points.

Box plots

Box plots pose a disclosure risk in principle, as they contain dots that pertain to individual observations, and abnormal observations in particular could lead to the disclosure of someone’s identity. Means may also pose a disclosure risk. This instruction is proportional to the case of descriptive statistics, and any outlier observations pose a particular disclosure risk.

Residuals

Residuals refer to a single observation. When depicting any residuals, it is recommended to use a graph format instead of a graph that is based on individual points. If a graph based on individual points is used, avoid disclosing the values of the axes.

Survival analysis, Kaplan-Meier curve

These may include a disclosure risk, depending on the definition of the analysis. These may be published if each step of the curve corresponds to a sufficient number of observations. If it is clear that the data behind the curve cannot be used to determine any exact ages or dates, also steps with single observations can be allowed. Closer consideration needs to be done if the curve includes steps with single observations. That is because it is possible that detailed backround information may enable individual persons recognization.

Spatial analysis

Particularly challenging in terms of data protection, as location information usually plays a key role in the disclosure of an individual. Usually requires a great deal of reclassification and, preferably, presenting the data as thermal maps instead of observation points.

Other result types

Photographs and other imaging materials

Imaging materials are verified on a case-by-case basis. It is very difficult to define any general guidelines for imaging materials, as they represent such a diverse group of materials. Naturally, these types of materials may never contain any direct text-based identifiers, and the rougher an image is the more difficult it will be to identify. The persons who handle imaging materials are usually the best at determining the disclosure risk presented by each piece of material. For example, an image of a single tooth will rarely reveal the identity of its owner, but an entire tooth chart could be used for that purpose.

Also, see the principle guidelines prepared by the high-level expert group appointed by the Ministry of Social Affairs and Health on the anonymisation and anonymity of image and signal data:

Anonymization and Anonymity of Image and Signal Data in Processing Under the Act on Secondary Use of Social and Health Data (522/2019) (PDF 252 kB).

Hereditary genetic data

In terms of hereditary genetic data, indicators concerning a sufficiently small number of variants calculated from a sufficiently large group can be considered anonymous. However, these must be verified on a case-by-case basis.

Machine learning

In terms of neural networks and other machine learning models (decision trees, etc.), the actual publication of these materials is rarely needed. However, the need may arise for disclosing such results outside Kapseli, and the verification of these results will be carried out on a case-by-case basis. General instructions will be provided at a later date.

Individual-level data

The anonymity of individual-level data must always be ensured on a case-by-case basis. Contact Findata for more detailed instructions.

References

Publishing the results

Publication refers to making information publicly available, which includes presenting results outside your immediate working group. This can be in the form of a scientific journal, thesis, textbook, manual, conference presentation, abstract, report, survey, or internet publication.

Publishing results from Kapseli

  1. Data processing: Data is processed in the Kapseli environment, and only the final analysis results are exported. Results must be in an anonymous format, with Findata ensuring anonymity as per the Secondary Use Act.
  2. Verify anonymity: Use the guidelines on the page Procucing anonymous results to verify the anonymity of results intended for publication.
  3. Transfer results: Transfer the results and the summary form to Findata via the Output (O:) drive in Kapseli.
    • The summary form for verifying anonymity is located in the Kapseli D-folder under “Käyttöohjeet_User_guide_05062023.”
    • Compress the files and the summary form into a zip folder named as follows:
      “Results_[Record_number_of_permit_decision][Kapseli_ID][Delivery_date]” (e.g., “Results_THL_1234_14.02.00_2020_a01_15032021”).
      • Note: Date format should be ddmmyyyy.
    • Create an empty text file named “ZZZ_READY.txt” in the Output drive. This triggers the automatic transfer of the zip folder. Ensure the file name is correct.
      • Transfers occur hourly and every 30 minutes. Files will be deleted from the Output drive after transfer.
  4. Notify Findata (optional): Email Findata at data@findata.fi to confirm your transfer. We will follow up if we do not receive your submission. There will be no confirmation of transfer success.
  5. Review and delivery of results: Findata will review the submission within 5 working days and provide the results via Nextcloud to the permit holder. If additional information is needed, we will contact you.
    • For large result files, the review process might exceed the usual 5-day limit. This time limit pertains only to verifying anonymity, not to other file imports from Kapseli (e.g., code files).
    • If you don’t have a Nextcloud account, request one via the “Order a new Nextcloud account” form in Findata’s e-service.

Publishing results from other secure operating environments

  1. Summary form: Download and complete the form for verifying the anonymity of the results
  2. Compress files: Zip the files and the summary form, naming the folder as follows:
    • “Results_[Record_number_of_permit_decision][Kapseli_ID][Delivery_date]” (e.g., “Results_THL_1234_14.02.00_2020_a01_15032021”).
      • Note: Date format should be ddmmyyyy.
  3. Transfer results:
    • Nextcloud: If you have a Nextcloud account, transfer results via Nextcloud.
    • Secure Email: If you do not have a Nextcloud account, transfer results via secure email. Do not send results via regular, non-secure email.
  4. Contact Findata: Email Findata at data@findata.fi with the subject “Ensuring the anonymity of results.”
    • Indicate whether you are using Nextcloud or secure email for the transfer.
    • If using Nextcloud, include the diary number of the data permit and your Nextcloud ID. Findata will provide the folder name for your transfer and a zip folder with the summary form.
    • If using secure email, Findata will send a secure email for you to reply with your zip folder containing the results and the summary form.
  5. Follow-up: If there are concerns about the anonymity of the results, Findata will contact you within seven working days.
    • If you do not hear from us within this timeframe, you may proceed with publishing your results.

Report published research results to Findata

Use the form below to report articles and publications that have made use of data authorised by Findata. One of the criteria for the issuing of a data permit for the purpose of scientific research is that the results are published as scientific publications. The form can also be used to report publications of data authorised for other uses.

Reference Guide

If Findata has granted a data permit or made a data request decision for your project, cite Findata in publications as follows: “Sosiaali- ja terveysalan tietolupaviranomainen Findata” or “Finnish Social and Health Data Permit Authority Findata.”

  • Follow the writing guidelines of the scientific publication series.
  • We recommend that references to Findata be made in accordance with its statutory duties. In data permits, these duties include, for example, pseudonymization and ensuring the anonymity of results, and in data requests, the duties include data integration, aggregation, and anonymization.
  • Findata can be referenced in the text, tables, figures, permit lists, acknowledgments, and reference lists.
  • Whenever possible, include the diary number(s) of the data permit or data request in the references.

Examples of in-text citations

“Research data was obtained from the Finnish Social and Health Data Permit Authority Findata with data permit THL/XXXX/14.XX.00/20XX. Findata was responsible for the pseudonymization of the data and ensuring the anonymity of the final results.”

“The statistics were produced by Findata, the Finnish Social and Health Data Permit Authority, with data request THL/XXXX/14.XX.00/20XX. Findata was responsible for data integration and producing the anonymized statistics.”

Example of table citation

DataSource
Research dataFinnish Social and Health Data Permit Authority Findata, data permit THL/XXXX/14.XX.00/20XX

Example of citation in a reference list

Findata. (Year). Data permit THL/XXXX/14.XX.00/20XX. Finnish Social and Health Data Permit Authority Findata.