Ethical Resolve Blog

New data analytics ethics column in the Communications of the ACM

The Communications of the ACM (a leading journal in computer science) recently published my perspective on the proposed Common Rule revisions in the quarterly Computing Ethics column. You can also find the full text here as a pdf file.


Big Data Analytics and Revision of the Common Rule
By Jacob Metcalf
Communications of the ACM, Vol. 59 No. 7, Pages 31-33

“Big data” is a major technical advance in terms of computing expense, speed, and capacity. But it is also an epistemic shift wherein data is seen as infinitely networkable, indefinitely reusable, and significantly divorced from the context of collection.1,7 The statutory definitions of “human subjects” and “research” are not easily applicable to big data research involving sensitive human data. Many of the familiar norms and regulations of research ethics formulated to prior paradigms of research risks and harms, and thus the formal triggers for ethics review are miscalibrated. We need to reevaluate long-standing assumptions of research ethics in light of the emergence of “big data” analytics.6,10,13

The U.S. Department of Health and Human Services (HHS) released a Notice of Proposed Rule-Making (NPRM) in September 2015 regarding proposed major revisions (the first in three decades) to the research ethics regulations known as the Common Rule.a The proposed changes grapple with the consequences of big data, such as informed consent for bio-banking and universal standards for privacy protection. The Common Rule does not apply to industry research, and some big data science in universities might not fall under its purview, but the Common Rule addresses the burgeoning uses of big data by setting the tone and agenda for research ethics in many spheres.

Twelve Principles of Data Ethics

Ethical Resolve has helped author Accenture’s newly released Data Ethics report, and in particular took the lead role in writing the section Developing a Code of Data Ethics. Steven Tiell and I hashed these out with the assistance of multiple contributors. The full report is available here. These 12 universal principles of data ethics are intended to help enterprises and professional communities develop tailored codes of ethics to guide responsible data use. Let us know if your organization needs assistance instantiating these principles.

A set of universal principles of data ethics can help guide data science professionals and practitioners in creating a code of data ethics that is specific and contextual for their organization or community of stakeholders:

 1. The highest priority is to respect the persons behind the data.

Where insights derived from data could impact the human condition, the potential harm to individuals and communities should be the paramount consideration. Big data can produce compelling insights into populations, but those same insights can be used to unfairly limit an individual’s possibilities.

2. Account for the downstream uses of datasets.

Data professionals should strive to use data in ways that are consistent with the intentions and understanding of the disclosing party. Many regulations govern datasets on the basis of the status of the data: “public,” “private” or “proprietary”, for example. But what is done with datasets is ultimately more consequential to subjects/users than the type of data or the context in which it is collected. Correlative use of repurposed data in research and industry represents the greatest promise and the greatest risk of data analytics.

3. The consequences of utilizing data and analytical tools today are shaped by how they’ve been used in the past.

There’s no such thing as raw data. All datasets and accompanying analytic tools carry a history of human decision-making. As far as possible, that history should be auditable. This should include mechanisms for tracking the context of collection, methods of consent, chains of responsibility, and assessments of data quality and accuracy.

4. Seek to match privacy and security safeguards with privacy and security expectations.

Data subjects hold a range of expectations about the privacy and security of their data. These expectations are often context-dependent. Designers and data professionals should give due consideration to those expectations and align safeguards and expectations with them, as much as possible.

5. Always follow the law, but understand that the law is often a minimum bar.

Digital transformations have become a standard evolutionary path for businesses and governments. However, because laws have largely failed to keep up with the pace of digital innovation and change, existing regulations are often miscalibrated to current risks. In this context, compliance means complacency. To excel in data ethics, leaders must define their own compliance frameworks to outperform legislated requirements.

6. Be wary of collecting data just for the sake of having more data.

The power and peril of data analytics is that data collected today will be useful for unpredictable purposes in the future. Give due consideration to the possibility that less data may result in both better analysis and less risk.

7. Data can be a tool of both inclusion and exclusion.

While everyone should have access to the social and economic benefits of data, not everyone is equally impacted by the processes of data collection, correlation, and prediction. Data professionals should strive to mitigate the disparate impacts of their products and listen to the concerns of affected communities.

8. As far as possible, explain methods for analysis and marketing to data disclosers.

Maximizing transparency at the point of data collection can minimize the more significant risks that arise as data travels through the data supply chain.

9. Data scientists and practitioners should accurately represent their qualifications (and limits to their expertise), adhere to professional standards, and strive for peer accountability.

The long-term success of this discipline depends on public and client trust. Data professionals should develop practices for holding themselves and their peers accountable to shared standards.

10. Design practices that incorporate transparency, configurability, accountability and auditability.

Not all ethical dilemmas have design solutions, but paying close attention to design practices can break down many of the practical barriers that stand in the way of shared, robust ethical standards. Data ethics is an engineering challenge worthy of the best minds in the field.

11. Products and research practices should be subject to internal (and potentially external) ethical review.

Organizations should prioritize establishing consistent, efficient and actionable ethics review practices for new products, services and research programs. Internal peer-review practices help to mitigate risk, and an external review board can contribute significantly to public trust.

12. Governance practices should be robust, known to all team members and regularly reviewed.

Data ethics poses organizational challenges that can’t be resolved by compliance regimes alone. Because the regulatory, social and engineering terrains are in flux, organizations engaged in data analytics need collaborative, routine and transparent practices for ethical governance.

Avanade’s TechSummit 2016 panel on digital ethics

I recently had the honor of participating on a panel at Avanade’s annual TechSummit conference. Organized by Steven Tiell of Accenture’s TechVision team, we were tasked with discussing the role of digital ethics and digital trust in enterprise. I joined Steven on stage with Bill Hoffman, Associate Director of the World Economic Forum and Scott David, Director of Policy at the University of Washington Center for Information Assurance and Cybersecurity. Below are my prepared remarks, which of course differ extensively from what I actually got around to saying on stage.

1. We’ve seen ethics requirements for medical and academic research, particularly when federal dollars are at play. Why should businesses care about ethics in their research?

Businesses should care about ethics most of all because it is, by definition, the right thing to do. But to go beyond a pat answer, I think it is useful to define the domain of “ethics.” I think of ethics as the methods and tools you use to make a consequential decision when there is relatively little settled guidance about the right thing to do. If you knew the right thing to do, then it would probably be a matter for compliance or legal departments. I like how digital sociologist Annette Markham recently put it when discussing a major data research scandal: “ethics is about making choices at critical juncture,” particularly when those choices affect other people. What I would add to Annette’s definition is that ethics is not just the decisions, but also all the work you have to do in advance to enable those critical decisions. You need the capacity to identify and evaluate those critical junctures, and to then make efficient, consistent and actionable decisions. Done well, ethics is a future-oriented stance. In my opinion, building the habits and infrastructures that make it possible for business to make good choices at critical junctions is simply something that will be good for the bottom line in the long run. It will certainly enable businesses to identify and mitigate risks more effectively.

When it comes to the matter of research ethics in particular, there are three aspects that bear more scrutiny when considering how and why enterprises should engage in ethics review practices.

First, because businesses now hold more data about human behavior than any other entity in human history, the value of those businesses is increasingly indexed to what they can do with that data now and in the future. Thus, the types of research being done looks like the types of research that have traditionally been located in university settings. It should indicate something important to us that academic researchers and institutions have invested so much in handling research ethics: research practices carry significant risk and require sustained attention.

Second, anyone can now be a researcher and everyone is a research subject. Yet all of our familiar ethics norms and infrastructures make certain outdated assumptions about institutional boundaries that create formal and informal professional limits on who can do consequential research. But those assumptions do not hold when human data research happens everywhere. Without the familiar institutional boundaries, businesses will need to make up the slack somehow.

Third, big data research methods simply do pose new kinds of risks for enterprise. Holding so much private data and using that data to intervene in people’s’ lives in a tailored, personalized fashion, poses risks beyond simply privacy. Research is often perceived as creepy or controlling, where even products that do the same thing might not. Thus it is important to align design practices, product development and ethics review in a manner that users of your services or providers of your data can be comfortable with.

Read More

Keynote presentation at Cambridge data ethics workshop

On 10 June, 2016, I will be giving a keynote talk at the Data Ethics Workshop, hosted by the Center for Research in the Arts, Social Sciences and Humanities at Cambridge University in the UK. I look forward to meeting some of the great thinkers in this field from the other side of the pond, and learning more about the different data ethics landscape in the EU.

Speaker: Jake Metcalf
Institution: Data and Society Institute and Founding Partner, Ethical Resolve
Title: Data subjectivity: responding to emerging forms of research and research subjects

Abstract: There are significant disjunctions between the established norms and practices of human- subjects research protections and the emerging research methods and infrastructures at the heart of data science and the internet economy. For example, long-standing research ethics regulations typically exempt from further review research projects that utilize pre-existing and/or public datasets, such as most data science research. This was once a sound assumption because such research does not require additional intervention into a person’s life or body, and the ‘publicness’ of the data meant all informational or privacy harms had already occurred. However, because big data enables datasets to be (at least in theory) widely networked, continually updated, infinitely repurposable and indefinitely stored, this assumption is no longer sound—big data allows potential harms to become networked, distributed and temporally stretched such that potential harms can take place far outside of the parameters of the research. Familiar protections for research subjects need rethinking in light of these changes to scientific practices. In this talk I will discuss how a historicization of ‘human subjects’ in research enables us to critically interrogate an emerging form of research subjectivity in response to the changing conditions of data-driven research. I will ask how data scientists, practitioners, policy-makers and ethicists might account for the emerging interests and concerns of ‘data subjects,’ particularly in light of proposed changes to research ethics regulations in the U.S.

New academic paper on human-subjects research and data ethics

Ethical Resolve’s Jake Metcalf has a new article co-authored with Kate Crawford about the strained relationship between familiar norms and policies of research ethics and the research methods of data analytics. It is available in pre-publication form on the open access Social Science Research Network, and will soon be available at the peer-reviewed journal Big Data & Society.

Where are Human Subjects in Big Data Research? The Emerging Ethics Divide

Jacob Metcalf
Data & Society Research Institute

Kate Crawford
Microsoft Research; MIT Center for Civic Media; NYU Information Law Institute

May 14, 2016

Big Data and Society, Spring 2016

There are growing discontinuities between the research practices of data science and established tools of research ethics regulation. Some of the core commitments of existing research ethics regulations, such as the distinction between research and practice, cannot be cleanly exported from biomedical research to data science research. These discontinuities have led some data science practitioners and researchers to move toward rejecting ethics regulations outright. These shifts occur at the same time as a proposal for major revisions to the Common Rule — the primary regulation governing human-subjects research in the U.S. — is under consideration for the first time in decades. We contextualize these revisions in long-running complaints about regulation of social science research, and argue data science should be understood as continuous with social sciences in this regard. The proposed regulations are more flexible and scalable to the methods of non-biomedical research, but they problematically exclude many data science methods from human-subjects regulation, particularly uses of public datasets. The ethical frameworks for big data research are highly contested and in flux, and the potential harms of data science research are unpredictable. We examine several contentious cases of research harms in data science, including the 2014 Facebook emotional contagion study and the 2016 use of geographical data techniques to identify the pseudonymous artist Banksy. To address disputes about human-subjects research ethics in data science, critical data studies should offer a historically nuanced theory of “data subjectivity” responsive to the epistemic methods, harms and benefits of data science and commerce.

Keywords: data ethics, human subjects, common rule, critical data studies, data subjects, big data

Digital Trust at the Core of Accenture’s 2016 Vision

The partners of Ethical Resolve recently joined Accenture in their San Jose office to learn about Accenture’s 2016 Tech Vision. We have been collaborating with their staff on a report on data ethics to be released later in 2016.

We were pleased to hear about Accenture’s commitment to focusing on ethical issues in order to help their clients build digital trust with customers.

In particular, we agree that it is vital for companies  to focus on their stewardship of user data to ensure that this information is used responsibly and with the interests and rights of customers in mind. As we move further into 2016, it has become clear that one of the simplest approaches to data ethics is to implement effective processes for ethical decision making. What this means for companies is that any employee who makes decisions with ethical ramifications needs to have a clear and effective process for determining what is right thing to do.

Practices as simple as the use of checklists and templates for ethical decision making can greatly improve a company’s ability to properly manage ethical risks and build trust with their customers. With the proper implementation of customized processes for ethical decision making, companies can greatly improve their relationships with customers without undue difficulty.

We look forward to working more with Accenture to help offer processes that are easily adopted by clients to achieve the aim of greater digital trust between tech companies and their customers.

Getting rigorously naive, or why tech needs philosophy

A liberal arts degree has been a hot ticket in tech lately, according to a recent article in Forbes. Immediately foregrounding bias, this post is written by two philosophers who couldn’t agree more with the views expressed in the article.

Despite countless jokes about from our families and peers about starting a “philosophy store,” it turns out that the ability to rigorously pursue abstract inquiry is actually quite helpful in today’s tech sector. Stewart Butterfield, the CEO and founder of Slack (Ethical Resolve’s favorite Internet service du jour) and holder of a philosophy degree, recently discussed why. He told reporter George Anders that training in philosophy was critical to building the first user-friendly knowledge management tool on the Internet. “I learned how to write really clearly. I learned how to follow an argument all the way down, which is invaluable in running meetings. And when I studied the history of science, I learned about the ways that everyone believes something is true–like the old notion of some kind of ether in the air propagating gravitational forces–until they realized that it wasn’t true.”

There are other philosophers scattered around the tech sector in prominent positions. Damon Horowitz has the title “In-House Philosopher/Director of Engineering” at Google, which he earned after Google acquired his startup. In this TEDx presentation, Horowitz argues that tech requires a “moral operating system” if we are to build data analytics systems that peer deeply into our lives. His view is that tech companies need to make space for careful thinking about ancient questions of morality.

Read More

Implications of the Common Rule revisions for private enterprise

Photo courtesy Flickr user Yi Chen.

Photo courtesy Flickr user Yi Chen.


Through my position with the Council for Big Data, Ethics and Society, I recently lead the drafting of a collective public comment on the proposed revisions to the Common Rule, the federal regulation that requires federally funded research projects to receive independent, prior ethics review. The proposed revisions—the first in three decades—are largely a response to the rise of big data analytics in scientific research. Although the changes to biomedical research have received the most public attention, there are some important lessons to take home for any company utilizing data analytics.

Academic research on human subjects is governed by a set of ethical guidelines referred to as the “Common Rule.” These guidelines apply to all human-subjects research that receives government funding, and most universities and research-granting foundations require them of all research. The best known stipulation of the Common Rule is the requirement that research projects receive independent prior review to mitigate harms to research subjects. Private companies are not bound by the Common Rule insofar as they do not receive government funding, but the Common Rule sets the tone and agenda of research ethics in general—it has an outsized footprint well beyond its formal purview. Thus even private industry has good reason to pay attention to the norms animating the Common Rule, even if they are not obligated to follow these regulations.

Indeed, many of the datasets most interesting to researchers and dangerous to subjects are publicly available datasets containing private data.

The most notable problem posed by the revisions in the NPRM is the move to exclude from oversight all research that utilizes public datasets. Research ethics norms and regulations have long assumed that public datasets cannot pose additional informational harms—by definition the harm is already caused by the data contained therein becoming public. However, big data analytics render that assumption anachronistic. We used to be able to assume that data would stay put within its original context of collection. However, the power and peril of big data is that datasets are now architected to be (at least in theory) infinitely repurposable, perpetually updated, and indefinitely available. A public, open dataset that appears entirely innocuous in one context can be munged with another public dataset and pose genuine harms to the subjects of that research. See, for example, the case of NYC taxi database, and the many, many private details there were revealed about drivers and riders from a public dataset.

Read More

Skynet starts with data mining: thinking through the ethics of AI

I was recently interviewed by John C. Havens at Mashable about the creation of data ethics and AI ethics committees.

The ethics of artificial intelligence is becoming a much more concrete public discussion, particularly with the recent open letter advocating for a ban on autonomous weapons systems. The letter, organized by the Future of Life Institute and signed by 10,000 plus people, including many AI researchers and prominent tech leaders, advocates for an international ban on autonomous weapons systems that can operate without meaningful human input.

This follows on the heels of some major media attention earlier in the year about Bill Gates, Elon Musk and Steven Hawking arguing that artificial super-intelligence poses a future existential threat to humanity (including all signing another FLI open letter). Hawking told the BBC that, “The development of full artificial intelligence could spell the end of the human race.” There are reasons to be skeptical of some of this fear, not least of which is the definitional problem of actually getting a handle on what counts as AI and whether it would ever have generalized, incredibly plastic intelligence like human bio-brains do or the ability to maintain machine bodies without humans. (My favorite semi-serious reason for doubting is Baratunde Thurston’s point that if AI looked like human intelligence in aggregate it would spend all day taking cat pictures and trying to sell the rest of us stuff.)

Read More