Systematic Review of Using Genomics Data for Reidentifying Patient
Context and Introduction
Accessible contextual data accompanying genomic sequence data are necessary for informed public wellness responses to emergencies such equally COVID-xix. Equally of May 2021, the COVID-xix pandemic has claimed the lives of over 22 thousand individuals in Canada solitary (Public Health Agency of Canada, 2020). With global cases exceeding 140 million and an international death cost of over iii million individuals, COVID-19 continues to exist a public health emergency devastating the populations and economies of countries around the globe (John Hopkins Coronavirus Resource Center, 2020). While accelerated efforts in vaccine evolution and production hold meaning hope (BBC News, 2020; CBC, 2020), it is evident that continued public wellness interventions will exist needed to bring an "finish" to the COVID-19 pandemic (Levin et al., 2020). Specifically, viral genomic information sharing past researchers and public health authorities will be crucial to informing ongoing local, provincial, national, and international public health responses (Walport and Brest, 2011; van Panhuis et al., 2014; Dye et al., 2016; Edelstein et al., 2018). For example, analyzing SARS-CoV-ii viral genomic sequences has been essential in elucidating manual patterns, identifying variants with enhanced transmissibility or clinical severity, and the existent-fourth dimension analysis of outbreaks (Fang and Meng, 2020).
Beyond informing public health policy, rapidly depositing SARS-CoV-2 genomic sequences in open databases have been of cardinal importance for chop-chop developing COVID-19 vaccines, testing kits, and other research efforts. For instance, the commencement SARS-CoV-2 genomic sequences deposited in the Global Initiative on Sharing Influenza (GISAID) database allowed for speedily developing the Pfizer-BioNTech BNT162b2 vaccine candidate (Polack et al., 2020). Similarly, the SARS-CoV-ii sequences deposited in GISAID have besides provided the basis for the accelerated evolution and deployment of numerous diagnostic testing kits (Bohn et al., 2020). Recently, the importance of COVID-based genomic data sharing has been increasingly underscored with the emergence of novel SARS-CoV-two Variants of Business organisation (VOCs) (To et al., 2020; Mahase, 2021). The Canadian and international response to VOCs relies centrally on viral genomic sequencing to find and track VOC transmission and to investigate key mutations that bear on disease severity and the virus'southward ability to escape natural and post-vaccination immunity (Volz et al., 2020). For case, the B.1.ane.seven (Alpha), B.1.351 (Beta), and P.1 (Gamma) VOCs were all detected largely through a combination of epidemiological, contextual, and genomic data sharing (Volz et al., 2020; Mahase, 2021). This detection is hugely significant. Although it is impossible to fully quantify, failing to detect more virulent and or deadly VOCs in a timely manner would probable cause substantial delays in enacting the appropriate response measures (Davies et al., 2021).
Recognizing the promise of genomic data sharing, the Canadian COVID-19 Genomics Network (CanCOGeN) was launched to coordinate and upscale existing genomics-based research and surveillance efforts, with the goals of tracking viral introductions, informing the public wellness response, and exploring the relationship of viral and human genomes in individual outcomes (Genome Canada, 2021a). CanCOGeN is mandated to sequence up to 10,000 individuals (host) genomes and up to 150,000 viral sample genomes (Genome Canada, 2021b). The sometimes innately differing nature of information sharing in man genomics versus pathogen genomics elicits varying legal, ethical, governance, technological, and other practical concerns. Accordingly, the CanCOGeN project comprises of two main subgroups- CanCOGeN-HostSeq and CanCOGeN-VirusSeq to accost topics specific to the individual and viral data sharing respectively, while overarching committees, such as the CanCOGeN Ideals and Governance, Implementation, and Coordination Committees also exist to synchronize the efforts of these two groups. As a part of its mandate, the Ethics and Governance Committee has been tasked with exploring the privacy and upstanding concerns of sharing SARS-CoV-two genomic sequences along with the relevant associated contextual information. Sequencing data lonely provides little to no utility (Schriml et al., 2020). Interpreting sequence information alongside high-quality contextual information provides exponentially more meaningful findings. Descriptive data fields such as the date of sample collection, geographic region of origin, and the age of the private are critical for the proper contextual estimation of the sequencing data and analytical results when conducting genomic surveillance and investigating a broad range of inquiry questions. In an endeavor to increase the utility of archived pathogen genomic data, using existing pathogen contextual data standards (MIxS and MIGS) and considering Canadian legislation, VirusSeq adult a curtailed list of 16 minimal contextual data fields (see Table ane) to exist associated with deposited SARS-CoV-2 sequences.
TABLE 1. MIxS Compliance and Implementation Metadata Standards (Genomics Standards Consortium, 2021).
Despite the broadly accepted benefits of such data sharing towards both wellness policy and research, the CanCOGeN Ideals and Governance Committee has establish that privacy and the protection of personal data are frequently stated as justifications to resist sharing minimal contextual data in directly association with the viral sequences they depict (Joly, 2020). Privacy as a challenge to data sharing is not exclusive to COVID-nineteen and has been well-documented (Butler, 2007; van Panhuis et al., 2014; Sorani et al., 2015; Bernier and Knoppers, 2020; Bonomi et al., 2020). In the current context, there are concerns that publicly archiving SARS-CoV-two viral sequencing information in combination with the minimal gear up of contextual information volition allow for the reidentification of individuals (Shean and Greninger, 2018; Joly, 2020). This paper reviews and addresses potential privacy risks of sharing pathogen sequencing information along with its accompanying minimum contextual data mainly nether the Canadian legal context. Notwithstanding, many of the principles and reasoning used here tin be similarly applied in an international context. The first section introduces the key concepts of identifiability and personal information. The 2d section discusses whether publicly sharing SARS-CoV-2 genomic sequences inherently threatens the privacy of individuals. The third section focuses on the privacy considerations of publicly archiving four (age, gender, province/territory of drove, and sample collection appointment) minimal contextual data fields associated with the viral sequences. The 4th section then discusses situations where the privacy risks are elevated in sharing specific fields of contextual data in certain contexts and outlines precautions that tin can be used to mitigate such risks. Finally, as a part of the deliberations of the VirusSeq Ethics and Governance Working Group, some concerns were raised regarding the risk of individual self-identification in publicly bachelor formats. The last department addresses this point specifically and focuses on the question of whether the definition of "identifiability" includes cocky-identification.
A Brief Review on the Definition of Personal Information and Its Relationship to Privacy
To assess the privacy risks of sharing viral sequencing information and its associated minimum contextual data, it is of import to kickoff address concerns equally to whether such data constitutes "personal information," which, in general, requires the private's consent or other justified reasons to share in the context of research (Office of the Privacy Commissioner of Canada, 2013). In Canada, with a federal-provincial sectionalization of powers, personal information is protected under numerous forms of federal and provincial privacy legislation (Bernier and Knoppers, 2020). At the national level, personal information collected by federal entities is subject area to the Privacy Human action (Privacy Act, 1985; Role of the Privacy Commissioner of Canada, 2019), while the Personal Data Protection and Electronic Documents Act (PIPEDA) applies to the personal information nerveless throughout the commercial sector (Part of the Privacy Commissioner of Canada, 2020; PIPEDA, 2000). Additionally, each province is entitled to enact its own privacy legislation, if such provincial legislation is considered "essentially similar" to PIPEDA (Part of the Privacy Commissioner of Canada, 2017). Indeed, there are numerous applicative laws in Canada. Despite this broad variety of laws governing the collection and disclosure of personal information in Canada, the definition of what constitutes "personal information" is relatively compatible, focusing on the feature of "identifiability." For instance, PIPEDA defines personal information as "information well-nigh an identifiable private" (that is recorded in any form … ) (Office of the Privacy Commissioner of Canada, 2019; PIPEDA, 2000). Similarly, at the provincial level in Quebec, personal information is "information concerning a natural person that allows the person to exist identified" (Act respecting Admission to documents held by public bodies and the Protection of personal information, Québec, 1982). In British Columbia (BC), the BC Personal Health Data Access and Protection of Privacy (East-Health) Human activity, BC Personal Information Protection Act, and BC Freedom of Information Protection of Privacy Act, all hold like definitions as those provided by the above laws (Freedom of Data and Protection of Privacy Human activity, British Columbia, 1996; Personal Information Protection Deed, British Columbia, 2003; E-health (Personal Health Information Access And Protection of Privacy) Act British Columbia, 2008). Lastly, the Information and Privacy Commissioner of Ontario summarises that information is "personal" if "it is reasonable to identify an individual from the information (either lone or by combining information technology with other data)" (Information and Privacy Commissioner of Ontario, 2016b). Other countries effectually the globe have similarly emphasized the concept of "identifiability" in their privacy legislation. For example, the European Union's General Information Protection Regulation (GDPR) states that personal information is "relating to an identified or identifiable natural person" (General Data Protection Regulation, 2016). In the United States, "personal health data" is designated individually identifiable data relating to the "(...) health status of an individual (...)" by the Health Insurance Portability and Accountability Deed (HIPAA) (HIPPA, 1996). Similarly in China, personal information is defined as "information that tin can identify specific natural persons either by itself or when combined with other information and in Australia, the Australian Privacy Human action too focuses on identifiability as a component of personal information (The Privacy Act, 1988; Civil Code of the People's Commonwealth of Mainland china, 2020). These numerous legal definitions across a wide multifariousness of jurisdictions emphasize that identifiability is a necessary and ubiquitous requirement concerning the definition of personal information. Every bit such, in evaluating the privacy risks of publicly archiving viral genomic data and its associated contextual data, it will be central to assess whether such information can be considered personal data. Here, we will focus on this question past discussing the potential identification risks of sharing SARS-CoV-two viral genomic sequences and their associated contextual data.
Does Publicly Archiving of SARS-CoV-2 Viral Sequences Inherently Create Privacy Risks?
While concerns regarding the privacy risks of certain contextual information fields have been raised, it seems intuitive to first consider whether SARS-CoV-2 viral genomic sequences alone generate whatever privacy risks. Is it possible for an individual to be identified through only publicly archived pathogen sequences? To consider this question, it is of import to appraise whether the SARS-CoV-2 viral genome can be used equally an identifier. Viruses are frequently characterized by their "serial interval" and "mutation charge per unit." The serial interval describes the fourth dimension betwixt the onset of symptoms in an infector (private that transmits the virus) individual and the infectee (individual infected past the virus from the infector), and with the SARS-CoV-2 virus, the serial interval is estimated to be close to 4 days (Du et al., 2020). While the mutation rate has been predicted to be once every 10–fifteen days (Duchene et al., 2020). Since the series interval is shorter than the mutation rate, multiple infector-infectee pairs will likely share the same viral sequence. If different individuals are likely to share the same pathogen sequence, the pathogen sequence alone cannot exist used to effectively distinguish betwixt various sequenced individuals. It is also extremely unlikely that each tested private would take a unique viral sequence, therefore information technology is as improbable for SARS-CoV-2 sequences to pose a significant reidentification risk to the host. Moreover, if at the time of sequencing, an individual is establish to exist infected with a unique form of the virus, the mutation rate of the SARS-CoV-2 virus are such that if the individual were to be tested again in the hereafter, they would be unlikely to possess the same viral sequence (Du et al., 2020; Duchene et al., 2020). Overall, it is extremely unlikely for SARS-CoV-ii sequences derived from an private to be used as an effective identifier. Some have noted that it is possible for pathogen samples to be "contaminated" with human DNA. In this scenario, sharing viral sequencing data can be argued as peradventure likewise sharing homo genomic information (Population Health and Genomics Foundation, 2020). While possible, such risks are also very unlikely given that technical safeguards are routinely implemented to systematically and robustly decrease any human-like or non-viral sequences of all public-level viral sequence datasets (this task is often termed "de-hosting") (Population Wellness and Genomics Foundation, 2020; Public Health Agency of Canada - National Microbiology Laboratory, 2021). De-hosting is a very common technique used to remove human-reads from pathogen sequence datasets. Tools used for de-hosting remove genomic reads that map onto to human reference genome and are well-validated. Applying such quality control and safety techniques ensure that the risk of reidentification from public-level viral sequencing data is extremely low. In summary, the innate characteristics of the SARS-CoV-2 virus are such that it is statistically unlikely for one-to-one unique host-to-pathogen matches to occur. Additionally, various computer-based techniques are employed to sufficiently remove homo-like sequences from the viral sequences to farther minimize reidentification risks before publicly archiving in any public database.
Does the Minimum Contextual Data (List 1) CanCOGeN Intends to Pubicly Eolith Plant "Personal Information" According to Canadian Privacy Legislation?
As previously mentioned, the utility of sequencing data from a public health or research perspective is frequently highly dependent on the thoroughness and quality of its accompanying contextual information (Schriml et al., 2020). Some typical examples of contextual data include "laboratory of origin, date of collection, individual historic period and gender, method of sampling, etc." (Griffiths et al., 2020). Concerns have been raised that publicly releasing these data fields in association with the samples they depict could violate the privacy of individuals (Shean and Greninger, 2018; Joly, 2020). Here, the cadre question to assess is whether the minimal contextual data makes the associated pathogen data "identifiable" and is thus considered "personal information." While the law oftentimes writes of identifiability in binary terms (i.e., an individual is either identifiable or not-identifiable), statistically speaking, identifiability is ameliorate conceived every bit a spectrum of probabilities. These probabilities range from 0 to 100%, where the percentage describes the certainty with which information can be attributed to a person (Rocher et al., 2019). As noted, oftentimes, the term "identifier" is used in this context to describe information that contributes to the reidentification or identification of an individual (Sweeney, 2000; Golle, 2006; Rocher et al., 2019). Many specific denominations of the term, such as "unique" identifier, "quasi-identifier", or "directly" identifiers exist, all emphasizing their potential to increment the probability of personal identification. For example, a quasi-identifier refers to a combination of traits or attributes in a dataset that is non independently capable of identification, just when in combination with other accessible data, becomes highly identifying (Sweeney, 2000). Typical examples of quasi-identifiers include characteristics such as date of nascency, gender, visible minority status, and profession (Sweeney, 2000).
While identifiability is not a uncomplicated binary nor a "yes" or "no" concept, few resources specifically address the question of when an individual statistically and quantitatively passes from the qualitative terms of "not-identified/non-identifiable" to "identified/identifiable." Despite this, resources practice exist. Echoing the stances of privacy researchers and data-release precedent, the Information and Privacy Commissioner of Ontario has published the De-identification Guidelines for Structured Information, a guide on the identifiability, privacy, and the release of information (Data and Privacy Commissioner of Ontario, 2016a). What is considered "identifiable" does not merely depend on the statistical probability of attribution, but rather it is too affected past the sensitivity (also sometimes referred to every bit the degree of the potential "invasion of privacy") (Dyke et al., 2015; Information and Privacy Commissioner of Ontario, 2016a). The sensitivity of data considers the consequences to an individual if the privacy of such data were to be invaded. Some data is more sensitive because the contents information technology reveals are ordinarily of greater event. For case, in general, the repercussions of revealing an individual'south psychiatric history are typically greater than revealing the same individual'due south rhesus claret type (Dyke et al., 2015). For more sensitive data accounted to nowadays a higher invasion of privacy, the criteria for what is considered identifiable is stricter. What is considered non-identifiable for information with low sensitivity can conversely be considered identifiable if such data were to be considered highly sensitive (Data and Privacy Commissioner of Ontario, 2016a). Ontario'southward De-identification Guidelines for Structured Information defines a reidentification chance of below 5% to be considered acceptable for information with the potential for high sensitivity (a loftier invasion of privacy) (Data and Privacy Commissioner of Ontario, 2016a). In other words, if the combination of reasonably available information can "unmarried out" 20 or fewer individuals from a pool of potential candidates, the individual who the information is about, should be considered "identifiable," if the information is considered sensitive (Information and Privacy Commissioner of Ontario, 2016a). The smaller the pool of potential candidates, the more identifiable an individual is. Here, COVID-19 related testing data are considered more than sensitive due to their revealing implications on an individual'south past or nowadays health condition/status and past medical testing that they have undergone. In Canada, such health-based information is generally considered equally sensitive if identifiable (Townsend v. Sun Life Financial, 2012). The de-identification guide thus recommends a threshold of 5% for loftier sensitivity information, vii.v% for medium, and 10% for low sensitivity data (Information and Privacy Commissioner of Ontario, 2016a).
In evaluating the potential privacy risks of openly depositing SARS-CoV-2 genomic sequences and their minimum contextual data, we are aware that the four data fields of ane) age (displayed in intervals of 10-years), ii) gender, iii) province/territory of collection, and four) date of collection, are considered more than problematic from a privacy and reidentification standpoint by various stakeholders (Sweeney, 2000; Golle, 2006; Rocher et al., 2019). The other 12 fields while useful for statistical analyses, do not appreciably impact the take chances of reidentification (except in situations where these other fields indirectly human activity as an indirect proxy for one of these four fields, which volition also exist discussed). Therefore, we will primarily explore the privacy and reidentification risks of those iv fields. Every bit a reminder, the important main consideration is whether these four information fields in combination with other "reasonably bachelor" information can allow for the identification of an individual, and accordingly, whether the diverse privacy legislations of Canada and other jurisdictions are chosen into outcome. Based on the most recently available census data available from each province and territory, and considering the three fields of age, gender, province/territory location, if the population were to be stratified by contextual information fields such as historic period and gender (note the data released past Stats Canada uses historic period intervals of 5 years instead of CanCOGeN Virus-Seq'due south proposed ten-years intervals. The five-years interval is more identifying, since a more than specific historic period range will be inherently more identifying), the number of individuals in the majority of categories greatly exceeds 20 individuals (Statistics Canada, 2020a). This is true for even the well-nigh sparsely populated provinces/territories such as Prince Edward Island or Nunavut (Statistics Canada, 2020b; Statistics Canada, 2020c). This means that by using the contextual information identifiers of age category, province/territory, and gender, the vast majority of individuals are not considered identified to the threshold of 5%. In short, for most individuals in Canada, the three traits of province/territory, gender, and historic period do non constitute personal information, every bit they cannot exist used to sufficiently identify an individual. Potential exceptions for this volition be discussed in the next section. Lastly, the data-field "collection date" may appear to exist a strong quasi-identifier for stratifying the population. Yet, this is not an accurate conceptualization of reidentification, as a reasonably competent third-party volition not exist able to link such information to the other contextual data fields. This is because the date that an individual is tested for COVID-19 cannot exist information that is considered "reasonably available" (Townsend 5. Sun Life Fiscal, 2012). A tertiary-political party private cannot be expected to have access to an individual's COVID-19 testing history (including date that the examination was performed on) and to use this data in conjunction with the contextual field released in public databases to reidentify. In other words, the field of collection date cannot be used every bit an identifier (Sweeney, 2000; Golle, 2006; Rocher et al., 2019). Taken together, the iv proposed contextual data fields should non be considered "personal information" and can be shared publicly. It is, nevertheless, of import to note that identifiability is a contextual affair that sometimes exceeds factors such as identifiability and data sensitivity. At that place is a plethora of other factors such as the costs of identification, fourth dimension bachelor, the technology available, population pool, etc. that must too be considered (Beauvais, 2020). In some circumstances, certain data fields may disproportionately raise the risk of reidentification, for example, the field of "province" in depression-population provinces such as Prince Edward Island (estimated popular. of 159,713 in 2020), and these cases will be discussed in the following department (Statistics Canada, 2020b).
Situations Where Sharing the Sample's Province of Origin, Gender, and Date of Drove May Disproportionately Increase the Risk of Identification
Identifiability is contextual and contingent on factors such as the population pool and confirmed cases in that specific province, and more than (Data and Privacy Commissioner of Ontario, 2016a). This section discusses the reidentification risk in these scenarios. For provinces with a larger population, the risk of reidentification is inherently lower. The Gordon v. Canada (Health) 2008 federal court case established that the data field of "province" or "territory" can create a disproportionate run a risk of reidentification in provinces and territories with a smaller population (such as Prince Edward Isle) (Gordon V. Canada (Health), 2008). Recognizing this, the CanCOGeN project has proposed to begin the data sharing process by replacing the "province" and "territory" field as "other" in all provinces/territories outside of British Columbia, Alberta, Ontario, and Quebec. The population, amid other factors, in these 4 last provinces allow for the condom inclusion of this data field without appreciably raising the possibility of reidentification of such individuals. As a final notation, data providers should exist cautious about the level of geographic specificity they reveal when providing the methodologically relevant fields such every bit "collection agency." For example, information technology is not uncommon for the drove agency to exist the name of a local hospital, which then can reveal more than detailed geographical location and increase the adventure of reidentification. In brusk, measures should be taken and so that information indicating an inappropriate level of geographic specificity is not provided.
Disclosing age and gender in conjunction with other fields can increment the risk of reidentification (Sweeney, 2000; Golle, 2006; Rocher et al., 2019). However, despite this increase, the ability to identify such individuals still falls below the previously mentioned threshold of five% equally already explained. However, it is important to note that the privacy risks of disclosing age are not compatible, as the number of very elderly or very young individuals brand up a significantly smaller fraction of the population, and this should be considered (Statistics Canada, 2020a).
In some cases, provincial data report forms include not-traditional options for gender (e.thousand., not-binary and transgender) (CanCOGeN, 2021). Because individuals who do non accommodate to traditional binary terms make upward a very modest percentage of the population in that location is an increased take a chance of reidentification (Waite and Denier, 2019). Accordingly, VirusSeq has proposed to encompass all non-tradition gendered options into "not-disclosed" when publicly archived, consistent with what is washed with the other initiatives (Statistics Canada, 2021). At the same fourth dimension, such demographic information on non-binary individuals should still exist collected as it contributes to equity, diversity, inclusion, and improves scientific representation of individuals and groups traditionally excluded from research (Bentley et al., 2017). These efforts will ameliorate ensure that the conducted enquiry and their accompanying medical technical advances will correspond marginalized individuals and groups as well as those who are traditionally well-represented. To reduce the potential privacy risks of this inclusion, this demographic data could exist made available through controlled-admission procedures.
The date of collection is another data field that originally had been thought to unacceptably increase privacy risks. Most of the current Health Canada diagnostic tests used for SARS-CoV-2 are based on Reverse Transcription polymerase Concatenation Reaction (RT-PCR), with results typically obtained 24–48 h after the appointment of sample collection (Health Canada, 2020 ). These delays considerably reduces the chances of associating the reported daily cases with the specific collection date. Furthermore, the typical range is not absolute, making information technology extremely unlikely to associate the testing date with the data release, as such, it will exist equally unlikely for the drove date to exist used as an identifier fifty-fifty if such information were to become public. In conjunction with what has already been written about the "reasonably available" standard, the date of collection does not appreciably increment the take a chance of reidentification. Notably, the introduction and mass dissemination of rapid COVID-19 testing kits, and potentially, other future advancements, may pb to the collection date and testing appointment being the same (Aguiar et al., 2020; Albert et al., 2021). If this were to unfold, and this appointment was disclosed with other identifying fields (e.g., province, when the province in question is "small", gender, age, and the number of daily cases past province/neighbourhood), the risk of reidentification may increase. Although whether whatever increment makes a meaningful difference in terms of privacy is questionable and would also be case-dependent and contingent on multiple factors. Therefore, nosotros recommend periodically monitoring reidentification risk to account for the increased efficiency of diagnostic methods, and other relevant developments that could potentially increase privacy risks.
Does the Definition of "Identifiable" Include Self-Identification?
In the previous sections, we have emphasized that the concept of identifiability is an important component in the definition of "personal information." Concerns regarding the gamble of private cocky-identification in publicly available formats accept been raised. To be specific, if an individual is capable of identifying themself based on a list of contextual information and their viral genomic sequence in a public data repository or reported data, would that so hateful that their information should be considered "identifiable" and cannot exist shared publicly? The right to privacy is historically divers every bit being able to protect one'due south personal life from intrusion by third parties (Warren and Louis, 1890). Similarly, in contemporary legislation, the concept of identifiability relates to identifiability from the perspective of an unauthorized tertiary political party and not that of an private with admission to high-level privy information. The emphasis on third parties is particularly important. The central notion proposed is that identifiability should be evaluated from the perspective of a 3rd party, and not the individual themselves. This is confirmed by various precedents set by Canadian and European case-laws, best-practice documents, and peer reviewed literature guidelines which assess identifiability from a third person'southward perspective. In the Canadian context, the 2008 Gordon v. Canada (Health) lawsuit, the federal courts considered the likelihood of individual reidentifiability specifically through the perspective of a tertiary political party attempting to reidentify an private with access to information that is reasonably available (Gordon 5. Canada (Health), 2008). More recently, in 2019, in the case Canada (Information Commissioner) V. Canada (Public Safety and Emergency Preparedness) 2019, the Federal courts once more assessed what constituted as "identifiable" and accordingly, the definition of what "personal information" is (Canada (Information Commissioner) 5. Canada (Public Safe and Emergency Preparedness), 2019). Recall that the Canadian Privacy Act states that information is personal, "if in that location is a serious possibility that the information could exist used to identify an private either on its own or when combined with other available data." In this case, the significant of what "other available information" should mean was explored. The court reasoned, "the goal of the Privacy Act (…) is to prevent the undue disclosure of one's personal data to others, non to oneself (…). That an individual might know that information technology is their proper name that is redacted from a document, for case, does not make the residuum of the document personal data." (Canada (Information Commissioner) v. Canada (Public Safety and Emergency Preparedness), 2019). Similarly, in the EU Courtroom of Justice, the outcome of what constituted as personal information was once more considered through the perspective of a third party attempting to reidentify an individual (Patrick Bryer five. Bundesrepublik Germany, 2016). Likewise, the Deidentification Guidelines for Structured Data released by the Information and Privacy Commissioner of Ontario likewise evaluates and discusses the risks of reidentification from the perspective of either a "prosecutor" or "journalistic" third political party (Information and Privacy Commissioner of Ontario, 2016a). Finally, in all scientific publications reviewed, identifiability is also ever written in terms of an unauthorized tertiary party (Sweeney, 2000; Golle, 2006; Rocher et al., 2019; Beauvais, 2020). The legal and logical footing of identifiability is always referred to from the perspective of an unauthorized third party with access to reasonably available information. The focus on third parties with respect to identifiability is justified given an private'south noesis of themselves and their personal information typically greatly exceeds that of any 3rd party. A self-identification criterion would create a subjective, individually variable, and arbitrary standard to determine the exact definition and scope of personal information. In this sense, using a self-identification criterion would create an unnecessary, illogical, and inconsistent barrier to the free period of data and ideas.
Conclusion
Our paper presents the showtime endeavor to analyze the privacy risks of sharing viral genomic sequences and their accompanying contextual data in the public domain, and this is likely relevant for many countries. The open disclosure of a minimal set of contextual data fields associated with the viral samples is crucial towards the timely promotion of research, collaboration, and scientific advocacy in a fourth dimension when it is desperately needed. Nosotros demonstrated using the Canadian privacy and public wellness framework that information technology is not contradictory to privacy laws to share a small amount of such data in association with genomic viral sequences. However, in sure scenarios when privacy risks may be disproportionately elevated, we also recommend considering special mitigating measures to significantly reduce risks. Measures such equally disclosing age in intervals rather than the exact age and revealing the province/territory of origin but for Canadian provinces and territories with sufficiently big populations can exist essential in ensuring the privacy of individuals. Despite our findings that legal privacy barriers are surmountable, concerns outside privacy are also appreciable. For example, despite an inability to sufficiently single out an individual, wide contextual information tin can still negatively implicate and stigmatize certain social groups or communities (Quigley, 2012). Although beyond the scope of this paper, issues beyond privacy must also be considered.
The COVID-19 pandemic has apace evolved into a devastating global public health and economical crisis. In these circumstances, the gratuitous flow of depression-privacy chance viral sequences and their associated contextual data is key in meliorate understanding key factors surrounding COVID-19, from patience variability, transmission, to the creation of better testing, effective treatments, reliable vaccines, and beyond. Global public health emergencies should be understood by policymakers and privacy bodies as creating an imperative to review whether existing privacy laws offer sufficient flexibility to permit public health authorities and the enquiry community to carry out their work for the public good. The Canadian Role of Privacy Commissioner declared that "during a public wellness crisis, privacy laws nevertheless utilise, but they are not a bulwark to appropriate information sharing." Similar statements accept also been fabricated by other provincial privacy commissioners, including those of Alberta, Saskatchewan, and Ontario (Office of the Privacy Commissioner of Canada, 2020). Sharing SARS-CoV-ii genomic sequences alongside a minimal prepare of contextual data in the public domain with advisable mitigating measures is, co-ordinate to our findings, not contrary to the protection of personal data and privacy and is necessary for providing governments and researchers with the best available evidence to inform intervention. Our piece of work by and large addresses concerns surrounding personal information and privacy. It does non explore the validity of arguments based on laws providing additional emergency powers to public health authorities in times of pandemics. It is our view that robust pathogen genomic surveillance should be facilitated in this day and age given the well-documented benefits in disease prevention and intervention responses (Grubaugh et al., 2019; Naveca et al., 2020). Indeed, while such data sharing is perhaps "beneficial" in regular times, in a global pandemic, data sharing ought to be characterized as both urgent and "necessary."
Author Contributions
LS: Performed inquiry, authored sections of the manuscript, and coordinated between different experts. Authored the introduction, the abstract, and the decision-provided input on other sections. HL: Performed research, analysis, authored sections of the manuscript, and coordinated between different experts. Authored department on personal information and privacy, identifability of viral sequences, contextual data and personal data/identifability, data fields with disproportionate risks, and self-identification and personal information-provided input on other sections. FB, ErG, EmG, WH, SS-K, SM, GD, MZ and YJ Chair of Committee: provided input and suggestions on paper direction and content.
Funding
This research is funded by Genome Canada, Genome Quebec, the Authorities of Canada and the Ministère de l'Économie et de l'Innovation du Québec. The honor number for Genome Quebec is PT 89229.
Disharmonize of Interest
The authors declare that the research was conducted in the absence of whatsoever commercial or fiscal relationships that could be construed as a potential conflict of involvement.
The reviewer CHH declared a by co-authorship with the authors YJ, MHZ and LS to the handling editor.
Publisher's Annotation
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may exist made past its manufacturer, is not guaranteed or endorsed by the publisher.
Acknowledgments
We would similar to thank and admit the following, Fonds de recherche du Québec – Santé, Junior one Enquiry Scholar program, and Canadian Covid Genomics Network (CanCOGeN) Virus Genome Sequencing Project (VirusSeq) Genome Canada/Genome Quebec.
References
Act respecting Admission to documents held by public bodies and the Protection of personal information, Québec (1982). CQLR c A-2.i, fine art. 57. Available at: http://legisquebec.gouv.qc.ca/en/ShowDoc/cs/A-ii.1 (Accessed January 16, 2020).
Aguiar, East. R. One thousand. R., Navas, J., and Pacheco, Fifty. K. C. (2020). The COVID-19 Diagnostic Engineering science Landscape: Efficient Data Sharing Drives Diagnostic Development. Front. Public Health eight, 309. doi:10.3389/fpubh.2020.00309
PubMed Abstract | CrossRef Full Text | Google Scholar
Albert, Eastward., Torres, I., Bueno, F., Huntley, D., Molla, E., Fernández-Fuentes, M. Á., et al. (2021). Field Evaluation of a Rapid Antigen Test (Panbio™ COVID-nineteen Ag Rapid Examination Device) for COVID-19 Diagnosis in Main Healthcare Centres. Clin. Microbiol. Infect. 27, e7–472. doi:10.1016/j.cmi.2020.11.004
CrossRef Total Text | Google Scholar
Bentley, A. R., Callier, South., and Rotimi, C. N. (2017). Diversity and Inclusion in Genomic Inquiry: Why the Uneven Progress? J. Community Genet. 8, 255–266. doi:10.1007/s12687-017-0316-6
PubMed Abstruse | CrossRef Full Text | Google Scholar
Bohn, G. K., Lippi, G., Horvath, A., Sethi, S., Koch, D., Ferrari, M., et al. (2020). Molecular, Serological, and Biochemical Diagnosis and Monitoring of COVID-xix: IFCC Taskforce Evaluation of the Latest Evidence. Clin. Chem. Lab. Med. 58 (7), 1037–1052. doi:10.1515/cclm-2020-0722
PubMed Abstract | CrossRef Full Text | Google Scholar
Bonomi, L., Huang, Y., and Ohno-Machado, L. (2020). Privacy Challenges and Research Opportunities for Genomic Data Sharing. Nat. Genet. 52 (7), 646–654. doi:10.1038/s41588-020-0651-0
PubMed Abstract | CrossRef Full Text | Google Scholar
Canada (Data Commissioner) 5. Canada (Public Safety and Emergency Preparedness) (2019). FC 1279. Bachelor at: https://canlii.ca/t/j35r2 (Accessed October 25, 2021).
Civil Code of the People's Republic of Prc (2020). The Thirteenth National People'south Congress on May 28th, 2020. (Accessed January 21, 2021).
Google Scholar
Davies, Northward. G., Abbott, Southward., Barnard, R. C., Jarvis, C. I., Kucharski, A. J., Munday, J. D., et al. (2021). Estimated Transmissibility and Impact of SARS-CoV-two Lineage B.1.ane.vii in England. Science 372. doi:10.1126/science.abg3055
PubMed Abstract | CrossRef Total Text | Google Scholar
Du, Z., Xu, X., Wu, Y., Wang, L., Cowling, B. J., and Meyers, Fifty. A. (2020). Serial Interval of COVID-19 Amid Publicly Reported Confirmed Cases. Emerg. Infect. Dis. 26, 1341–1343. doi:10.3201/eid2606.200357
PubMed Abstract | CrossRef Full Text | Google Scholar
Dye, C., Bartolomeos, K., Moorthy, V., and Kieny, K. P. (2016). Information Sharing in Public Wellness Emergencies: a Phone call to Researchers. Balderdash. World Health Organ. 94 (3), 158. doi:x.2471/blt.16.170860
PubMed Abstract | CrossRef Total Text | Google Scholar
Dyke, S. O. M., Cheung, W. A., Joly, Y., Ammerpohl, O., Lutsik, P., Rothstein, M. A., et al. (2015). Epigenome Data Release: a Participant-Centered Approach to Privacy protection. Genome Biol. 16 (1), 142. doi:ten.1186/s13059-015-0723-0
PubMed Abstract | CrossRef Full Text | Google Scholar
Edelstein, M., Lee, 50. Chiliad., Herten-Crabb, A., Heymann, D. L., and Harper, D. R. (2018). Strengthening Global Public Health Surveillance through Information and Benefit Sharing. Emerg. Infect. Dis. 24 (7), 1324–1330. doi:x.3201/eid2407.151830
CrossRef Total Text | Google Scholar
Full general Information Protection Regulation (2016). Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the Protection of Natural Persons with Regard to the Processing of Personal Information and on the Gratuitous Motion of Such Data and Repealing Directive 95/46/EC. Available at: https://gdpr-info.eu/art-4-gdpr/ (Accessed February 12, 2020).
Google Scholar
Golle, P. (2006). "Revisiting the Uniqueness of Simple Demographics in the US Population," in Proceedings of the 5th ACM Workshop on Privacy in Electronic Society, 77–lxxx. doi:x.1145/1179601.1179615
CrossRef Total Text | Google Scholar
Griffiths, E. J., Timme, R. East., Page, A. J., Alikhan, N.-F., Fornika, D., Maguire, F., et al. (2020). The PHA4GE SARS-CoV-2 Contextual Data Specification for Open Genomic Epidemiology. doi:10.20944/preprints202008.0220.v1
CrossRef Total Text | Google Scholar
Grubaugh, N. D., Ladner, J. T., Lemey, P., Pybus, O. G., Rambaut, A., Holmes, East. C., et al. (2019). Tracking Virus Outbreaks in the Twenty-Showtime Century. Nat. Microbiol. 4 (1), 10–19. doi:10.1038/s41564-018-0296-2
PubMed Abstract | CrossRef Full Text | Google Scholar
Levin, A. T., Hanage, Westward. P., Owusu-Boaitey, N., Cochran, G. B., Walsh, S. P., and Meyerowitz-Katz, Thou. (2020). Assessing the Historic period Specificity of Infection Fatality Rates for COVID-19: Systematic Review, Meta-Analysis, and Public Policy Implications. Eur. J. Epidemiol. 35, 1123–1138. doi:10.1007/s10654-020-00698-1
PubMed Abstract | CrossRef Total Text | Google Scholar
Polack, F. P., Thomas, Due south. J., Kitchin, N., Absalon, J., Gurtman, A., Lockhart, Due south., et al. (2020). Safety and Efficacy of the BNT162b2 mRNA Covid-19 Vaccine. N. Engl. J. Med. 383 (27), 2603–2615. doi:10.1056/nejmoa2034577
PubMed Abstract | CrossRef Total Text | Google Scholar
Quigley, D. (2012). Applying Bioethical Principles to Identify-Based Communities and Cultural Group Protections: The Instance of Biomonitoring Results Communication. J. Law Med. Ethic. forty (2), 348–358. doi:10.1111/j.1748-720X.2012.00668.10
PubMed Abstruse | CrossRef Total Text | Google Scholar
Rocher, L., Hendrickx, J. M., and de Montjoye, Y. A. (2019). Estimating the success of Re-identifications in Incomplete Datasets Using Generative Models. Nat. Commun. ten, 3069–ix. doi:10.1038/s41467-019-10933-3
PubMed Abstract | CrossRef Total Text | Google Scholar
Schriml, Fifty. M., Chuvochina, Chiliad., Davies, N., Eloe-Fadrosh, Due east. A., Finn, R. D., Hugenholtz, P., et al. (2020). COVID-19 Pandemic Reveals the Peril of Ignoring Metadata Standards. Sci. Data vii (i), 188. doi:10.1038/s41597-020-0524-5
PubMed Abstruse | CrossRef Full Text | Google Scholar
Shean, R. C., and Greninger, A. 50. (2018). Individual Collection: High Correlation of Sample Collection and Patient Admission Date in Clinical Microbiological Testing Complicates Sharing of Phylodynamic Metadata. Virus. Evol. 4, vey005. doi:10.1093/ve/vey005
PubMed Abstract | CrossRef Full Text | Google Scholar
Sorani, Chiliad. D., au, fnm., Yue, J. K., Sharma, Southward., Manley, G. T., Ferguson, A. R., et al. (2015). Genetic Information Sharing and Privacy. Neuroinform 13 (1), one–6. doi:ten.1007/s12021-014-9248-z
PubMed Abstract | CrossRef Full Text | Google Scholar
To, Chiliad. K.-W., Hung, I. F.-N., Ip, J. D., Chu, A. West.-H., Chan, W.-M., Tam, A. R., et al. (2020). Coronavirus Disease 2019 (COVID-nineteen) Re-infection by a Phylogenetically Distinct Severe Acute Respiratory Syndrome Coronavirus 2 Strain Confirmed by Whole Genome Sequencing. Clin. Infect. Dis. 73, e2946–e2951. doi:10.1093/cid/ciaa1275
PubMed Abstract | CrossRef Full Text | Google Scholar
van Panhuis, W. G., Paul, P., Emerson, C., Grefenstette, J., Wilder, R., Herbst, A. J., et al. (2014). A Systematic Review of Barriers to Information Sharing in Public Wellness. BMC Public Health 14, 1144. doi:10.1186/1471-2458-14-1144
PubMed Abstract | CrossRef Full Text | Google Scholar
Volz, Eastward., Mishra, S., Chand, M., Barrett, J. C., Johnson, R., Geidelberg, L., et al. (2020). Transmission of SARS-CoV-ii Lineage B.ane.1.seven in England: Insights from Linking Epidemiological and Genetic Data. medRxiv xxx, 20249034. doi:10.1101/2020.12.thirty.20249034
CrossRef Total Text | Google Scholar
Waite, Due south., and Denier, N. (2019). A Enquiry Note on Canada's LGBT Information Mural: Where We Are and what the Future Holds. Can. Rev. Sociology/Revue canadienne de sociologie 56 (1), 93–117. doi:ten.1111/cars.12232
PubMed Abstract | CrossRef Full Text | Google Scholar
Walport, Thou., and Brest, P. (2011). Sharing Inquiry Data to Improve Public Health. The Lancet 377, 537–539. doi:10.1016/s0140-6736(10)62234-9
CrossRef Full Text | Google Scholar
Source: https://www.frontiersin.org/articles/10.3389/fgene.2021.716541/full