The Use Of Health Data For Research Purposes Nursing Essay

College of Human and Health Sciences

Swansea University

Assignment for module 3 (SHIM13)

Module number:

SHIM13

Module name:

Communication systems

Title of assignment:

Written essay: Critically analyse the issues that arise from using secondary health data in a specific application of your choice. (4000 words; 100% of marks)

Name of student:

Tim Knifton

Student ID number:

717819

Word count:

Declaration:

I confirm that:

this is my own independent work, except where clearly stated otherwise;

any work by another author has been clearly distinguished from my own work;

this work has not been accepted in substance for any other course;

I agree that this work may be analysed electronically for evidence of plagiarism.

Abstract

Clinicians' day-to-day interaction with electronic health records (EHRs) generates rich, patient-level data that is gathered specifically for healthcare treatment as its primary function. This data has the potential to be used for secondary-use purposes including research, clinical trials, public health, population health management and quality assurance measurement. This paper explores in a critical manner, the use of primary data for research as a secondary use and what issues (both practical and ethical) need to be considered in the use of such highly sensitive data. It is shown that secondary use of health data in general has many negative aspects to its use. However, with forethought, planning and understanding, electronic primary health data has the potential to expand its usefulness to improve individual lives and the population in general and benefiting organizational functioning through secondary use.

Contents page

Introduction

Definition of secondary data

Ethical questions for using health data from an Electronic Health Record system for a secondary purpose

Deontology versus self utilitarianism

The use of Health Data for research purposes

Issues associated with secondary use of health data

Data stored as images

Unstructured data/free text

Missing data/inaccuracies

Variable quality across institutions

Lack of basic security requirements

Data protection and common law duty of confidentiality

Patient consent

Patient safety

Interoperability

Data linkage

Data security when using secondary data

Pseudoanonymisation

De-personalisation/De-identification

Anonymisation

Encryption

Conclusion

References

Introduction

There is acknowledgement that the face of health care is changing as new technologies are being incorporated into the existing infrastructure (Meingast, Roosta & Sastry, 2006). In line with confidentiality standards and appropriate records management, it is essential for modern health information systems to consider the right balance between safeguards for protecting the privacy of patient’s data and safe access and retrieval of information for primary and secondary uses (Flores Zuniga, Win & Susilo, 2010).

Secondary use of health data applies health information (or primary data) for uses outside of direct health care delivery (Safran, C et al.,2007) and rich, patient-level data that can be utilized for secondary-use purposes such as population health management (Zheng, Mei & Hanauer, 2011), epidemiological research (Stakic & Tasic, 2010), enhancing quality assurance (McQuaid, Zheng, Melville & Green, 2009) and health economy (Pommerening & Reng, 2004).

The possibility to reuse this information for secondary purposes could serve as a major impetus for medical research, decision making in health policies, and quality assurance in health care (Holzer & Gall, 2011), giving the use of primary health data the power to expand its usefulness and portability in planning with an efficient means.

This paper critically analyses the secondary use of health data in the application of research and the general pitfalls in using such information for secondary purposes.

Definition of secondary data

Secondary data is defined as the data that researchers do not create themselves but use in their research (Martin, Jones III & Gomez, 2010). The use of electronic systems could be treasure troves for secondary data use, a term referring to the aggregation of EHRs for research and other purposes (Torres, 2010). With the escalating costs of health care, health care managers and policy makers are increasingly interested in preventive approaches to disease management (Ye, Isaman & Barhak 2012). This would help primary healthcare institutions better understand certain types of disease and how to minimise the impact on the patient and the financial cost to the organisation.

Although there is great potential for secondary use, health data contains so much information relating to patient care and treatment, access to and secondary use of data poses complex ethical, political, technical, and social challenges (Safran et al., 2006).

Ethical questions for using health data from an Electronic Health Record system for a secondary purpose

Reuse of EHR data has been limited by a number of factors, including concerns about the quality of the data and their suitability for research (Weiskopf & Weng, 2012). In their defence, the ethical and moral justification for the creation, storage and processing of EHR derives from the fact that they are instrumental for the protection of health (AKA, 2007), reuse of data may not be an initial consideration when the primary purpose was to treat patients.

Perhaps the central ethical question is whether for-profit secondary uses of data are appropriate and justifiable, and if so, what privacy safeguards should be employed (Sittig & Singh, 2011). In support of secondary use, it is worth considering that the proposed intervention is not carried out exclusively for the benefit of the individual patient, but for the potential benefit of society as a whole (Kosseim & Brady, 2008), which refers back to the benefits that were identified in the definition of secondary data such as disease management.

To reinforce ethical behavior, institutions usually provide employee education emphasizing the importance of protecting patient privacy and the confidentiality of patient information (LaRue, Newbold, Saadawi & Courtney, 2009) and privacy, quality and availability become primary ethical concerns (Kluge, 2007).

Deontology versus self utilitarianism

Ethics is an understanding of the nature of conflicts arising from moral imperatives and how best we may deal with them (Avasthi, Ghosh, Sarkar & Grover, 2013). At its core, deontology claims that the right is independent from the good (Gray & Schein, 2012) and their theories provide strong support for protecting research participants and whole communities of people, even if protections for human subjects slow research or the acquisition of knowledge (Coughlin, 2006).

On the other hand, utilitarianism evaluates actions according to their consequences (Fortes & Zoboli, 2002) and emphasizes the primary importance of respecting the basic rights of human beings as autonomous beings – i.e. free and thereby capable of establishing their own moral norms and rules (Ess, 2007).

In contrast to utilitarianism, from a deontological point of view the end does not justify the means, but the means themselves underlie the need of justification (Witte & Heitkamp, 2009) but both these perspectives colour societal perceptions of a researcher’s conduct and in particular the research participant’s autonomy in relation to consent (Armitage, 2005).

The use of Health Data for research purposes

Research is described as the systematic and rigorous process of enquiry that aims to describe processes and develop explanatory concepts and theories in order to contribute to a scientific body of knowledge (Bowling, 2009).

If the data was readily available it would suggest that the use of such data would benefit planning and disease management. Biomedical, health services, and policy research, particularly in the areas of population studies and public health, depend heavily on the ready availability of data about patients and populations (Bloomrosen & Detmer, 2008) and such use of data has the power to transform how health care is delivered in the future. (Garrett, 2010).

Secondary uses of data could include quality measurement, public health surveillance, and patient access to data about their illness (Hersh, 2007). Electronic records could facilitate new interfaces between care and research environments, leading to great improvements in the scope and efficiency of research (Powell & Buchan, 2005). Retrospective analysis of health data holds promise to expedite scientific discovery in medicine and constitutes a significant part of clinical research (Botsis, Hartvigsen, Chen & Weng, 2010).

Large clinical databases are becoming increasingly available to researchers as more hospitals and practices adopt EHR systems (Abhyankar, Demner-Fushman & McDonald, 2012), the essential feature seems to be a longitudinal collection of personal health information of a single individual, entered or accepted by health care providers and stored electronically. (Noseworthy, 2004).

Elkin et al., (2010) is of the opinion that research stands to gain substantially by employing EHR health data for secondary uses. EHR-supported research can have direct and primary practice rewards related to education and policy development, practice refinement, and clinical discovery (Goldwein, 2007). In support of this, health research based on the secondary use of health data contributes to our present level of knowledge of the causes, trends and natural history of diseases and symptoms (Bonney, 2009).

However, the disadvantages of secondary data are related to the fact that their selection and quality, and the methods of their collection, are not under the control of the researcher, and that they are sometimes impossible to validate (SØRENSEN, Sabroe & OLSEN, 1996). In support of this, the data collected might be so extensive that the individual whose job it is to interpret the findings can potentially arrive at many different, even conflicting conclusions (Stewart & Kamins, 1992).

More work on disadvantages needed.

To summarise, using health data for research will not replace randomized clinical trials as a definitive research method for specific issues, but it will offer the capacity for real-time learning from the experience of tens of millions of people and will greatly increase the ability to generate and test hypotheses (Etheredge, 2007).

Issues associated with secondary use of health data

Data stored as images

Images, diagnostics reports, and evidence documents derived from the processing of images represent important components of a patient’s medical record (Noumeir & Pambrun, 2009, November). To the extent that imaging data are used in planning and clinical decision making, they should be as accessible as data that are stored in alphanumeric form (Seto & Friedman, 2012).

Unstructured data/free text

Two types of data entry exist nowadays — free text based and structured (Hanzlicek, Spidlen, Heroutova & Nagy, 2005). At times, free text may be necessary to communicate precisely what is happening, such as a complex symptom history (Bagley & Waldren, 2011). While codified data entered through structured templates are generally more desirable, a significant amount of clinical documentation continues to exist in an unstructured, narrative format (Zheng, Mei & Hanauer, 2011).

In general, high quality coded data are much easier to use than free text data, although not all data are equally valuable or reliable (Manion et al., 2012), for example, codes might be derived from text which describes a suspected or possible rather than certain diagnosis (Nicholson, Tate, Koeling & Cassell, 2011) and furthermore, accessing free-text information in the EHR can be difficult and time consuming (Natarajan, Stein, Jain & Elhadad, 2010).

This would suggest that free text data would not be considered as high quality as structured data as doctors tend to record consultations in a narrative format by using a field that allows free and unrestricted entry.

Missing data/inaccuracies

The more accurate and complete the medical record, the more useful it becomes for meeting the needs of both patients and providers (Hassol et al., 2004), this could be translated as integrity of the data is considered necessary for further use.

To add further fuel to the fire, language difference directly reflects IT system difference, not only for its display and command language, but also for coding schema of medical terms and concepts (Kimura et al., 2008). One of the major challenges for the re-use of EHR data is that data from clinician documentation are not always of optimal quality for such re-use (Hersh, 2011).

Variable quality across institutions

It is stated by Geissbuhler et al (2012) that healthcare is a data intensive enterprise.

Due to patients visiting many hospital and care environments, issues such as paper records contribute to fragmentation of care because encounters of an individual patient are often provided at multiple clinical locations and documented in site-specific paper charts that are only locally accessible (Lobach & Detmer, 2007). A patient’s health records can be dispersed over multiple sites without the healthcare professional having access to (or even knowledge of) this data (Demuynck & De Decker, 2005).

There is also the case of coding quality or changes to coding. For example, the rate at which a given ICD-9 code is used may vary across institutions, even though the rate at which the condition exists may not vary (Chute et al., 2011).

Lack of basic security requirements

Whenever information is important information security is essential (Posthumus, 2004), secondary data exploitation can mean data being accessed by those who are not directly involved in patient care, for example, research assistants and students, and appropriate safeguards are necessary (Grant et al., 2006).

Not to mention electronic health information is portable and mobile; the ease with which information can be disseminated through EHRs raises concern about the potential for unauthorized access to and use of this information (McGuire et al., 2008), not to mention the deliberate or unintentional release of individually identifiable health information can have devastating affects on an individual (Lumpkin, 2007). For instance, a history about substance abuse or HIV infection could result in discrimination or harassment (Riedl, Grascher & Neubauer, 2007) which would breach confidentiality and human rights laws.

Data protection and common law duty of confidentiality

Data protection laws seek to protect user rights and rely in part on a certain view of data location and related security practices to ensure those rights are maintained (Desai, 2013).

Doctors' longstanding common law duty of confidentiality to their patients has been supplemented by restrictions on processing electronic and paper based records in the Data Protection Act 1998 (Al-Shahi & Warlow, 2000).

However, there are several exemptions, which apply to the processing of `personal data' for `research purposes under relevant conditions' (Redsell & Cheater, 2001). To comply with the Data Protection Act, research using patient data requires either the patient’s consent or that the data should be anonymised, so that patients cannot be identified (Marsh & Reynard, 2009). Identifiable data are data that contain detail sufficient to identify a single individual, exclusive from any other individual, with a high degree of certainty (Yiannakoulias, 2011).

Patient Consent

The expectation of consent to any secondary data use is further strengthened by the patients’ belief that information about them that is developed in the health care encounter will be used solely for therapeutic purposes (Kluge, 2004) but there is a concern by some researchers that requirements for the patient’s consent and anonymity will undermine their research (Win, 2005 cited by Evans & Ramay, 2001; Roberts & Wilson, 2001; Cox, 2001). Although, consent can be withdrawn at any time and it is a fundamental right of every individual to demand privacy because the disclosure of sensitive data may cause serious problems (Neubauer & Kolb, 2009).

Furthermore, Laurie (2008) states that it is always worth noting that consent is not always practicable or possible to obtain but patients should exercise as much choice over the content and movement of their health records as is consistent with good clinical care and the lack of serious harm to others (Kalra & Ingram, 2006).

Patient Safety

The integrity of data in multi-user environments as well as ensuring accountability of actions is essential for patient safety (Nykänen, Ruotsalainen, Blobel & Seppälä, 2009). It is worth noting that not all adverse outcomes (including death) are due to problems with either the process or structure of care (Battles and Lilford, 2003) but sometimes, medical errors are sometimes caused by patients not disclosing medical facts (Binns, 2004).

Interoperability

In a general sense, "interoperability" simply means to be able to work together (Scott, 2009). The interoperability of an EHR system is defined as the ability of two or more applications being able to communicate in an effective manner without compromising the content of the transmitted data (Moghaddasi, Hosseini, Asadi & Ganjali, 2011). In support of this, Shea & Hripcsak (2010) state that EHRs that are interoperable can connect not only to each other but also to common services meaning a more sleek and seamless approach.

The development and promotion of standards is essential to the functionality of electronic health record systems especially in terms of interoperability (Fidahussein, Friedlin & Grannis, 2011) as the lack of interoperability impedes access to data (Kuperman, Blair, Franck, Devaraj & Low, 2010).

Tao et al., (2012) also supports this opinion stating that interoperability as one of the most important goals for meaningful use of the Electronic Health Records hence, interoperability for health information systems requires communication, accurate and consistent data exchange (Gliklich & Dreyer, 2010) as without interoperability, even electronic medical records will remain fragmented (Mandl, Szolovits, Kohane, Markwell & MacDonald, 2001).

In any case, whether in paper or electronic format, the data derived from various sources are complex, and validating the data is crucial to the accuracy of the measurement process (Feliciano, MSHI, Allison & Houston, 2008).

Data linkage

Holman et al., (2008) defines data linkage as "the bringing together from two or more different sources, data that relate to the same individual, family, place or event", it is the task of identifying, matching and merging records that correspond to the same entities from several databases or even within one database (Christen, 2012).

Lowrance (2003) adds to this by stating that matching and combining data from multiple databases, especially at the individual level, is a powerful tool but unfortunately it is not a feature of all computer systems (De Lusignan & Van Weel, 2006).

Data linkage requires expertise in several areas, including knowledge of the datasets to be linked (Bradley, Penberthy, Devers & Holden, 2010) bearing in mind that when linked health information is used for clinical care, reliability and accuracy of both identifiers and data are essential (Kelman, Bass & Holman, 2007). Linked data may create additional concerns about error if cases are not linked accurately (Bohensky et al., 2011) and wrong data linkage can lead to incomplete and inaccurate data, and in the case of clinical data this could result in a significant and immediate impact on the patient (Win et al., 2004).

Data security when using secondary data

Information is recorded on every activity that takes place and, almost always, the identity of the person being investigated, treated or cared for is included (Black, 2003). This is particularly true when the information is in a form that allows the identification of patients and/or is shared with people from outside the immediate team providing direct patient care (Lelliott, 2003). Device failures and both natural and man-made disasters are inevitable (Sittig & Singh, 2012).

There are several techniques introduced that attempt to protect privacy of an individual (Tinabo, Mtenzi & O'Shea, 2009) but even with these restrictions in place, the risk of identification may still be appreciable because of the richness of the data or the rarity of certain data values (Kalra, Gertz, Singleton & Inskip, 2006). Cryptography is the practice of transforming data to make it indecipherable by a third party, unless a particular piece of secret information is made available to them. (Harrower, 2009).

Pseudoanonymisation

In Greek a pseudonym is a false name, used for example by authors which do not want to share their identity (Riedl et al., 2007).

However there are disadvantages to pseudoanonymisation. In spite of pseudonymization it is possible to reveal the identity of a patient by performing data mining on the records, in particular when stigmatical medical information is available. (Ding & Klein, 2010).

De-personalisation/De-identification

De-identification consists in stripping patient identifiers such as name, address, hospital identification number, from the image headers or substituting a false value for the real identifier (Keyhani et al., 2008).

De-identification is essential to meet privacy and security concerns (Teasdale, Bates, Kmetik, Suzewits & Bainbridge, 2007) but the study populations may comprise many thousands of people; some patients will have moved or died; the information may need to be linked to two or more databases; and individual identification may be necessary to prevent double counting (Souhami, 2006). Hence, in some research and ethical frameworks it may be necessary to maintain the possibility to re-contact patients in the event of results relevant to their health being obtained (Iacono & Rajasekaran, 2008).

Anonymisation

Anonymization can be achieved by depersonalization, the removal of any patient-identifying information from the health records (Heurix & Neubauer, 2011).

Effective data ambiguation requires prior knowledge of the data distribution (Noumeir, Lemay & Lina, 2007).

Encryption

Encryption is another option and is more precise in restricting access (Sun, Fang & Zhu, 2010). Stine & Dang (2011) define encryption as a mathematical transformation to scramble data requiring protection (plaintext) into a form not easily understood by unauthorized people or machines (ciphertext), thus enabling secure storage, transmission, and access (Zhang & Liu, 2010).

There are disadvantages to the use of encryption techniques for particular secondary uses. Encrypted data cannot be used for research projects without explicit allowance by the patient who has to decrypt the data and, thus, unconceals his identity (Neubauer & Kolb, 2009). Therefore, data confidentiality requires fine grained control over who may access a given piece of encrypted data. This can only be achieved by regulating access to the decryption keys, which typically need to be widely available across the enterprise (Moulds, 2007).

Conclusion

When considering secondary use of health data, many aspects need to be considered to ensure an ethical and practical transition from initial primary use. The issues covered in the above text form part of the sharing agenda when consideration on what constitutes fair and lawful secondary use of health data. This would suggest that secondary use is not as straight forward as many people would think, especially in recent times when data sharing and inadvertent loss is a reported in the media on a regular basis, leaving an organisation open to criticism, adverse publicity, reputational damage and financial penalties. Add more about Research conclusions.