Academic Research

Viewing posts from the Academic Research category

Keynote presentation at Cambridge data ethics workshop

On 10 June, 2016, I will be giving a keynote talk at the Data Ethics Workshop, hosted by the Center for Research in the Arts, Social Sciences and Humanities at Cambridge University in the UK. I look forward to meeting some of the great thinkers in this field from the other side of the pond, and learning more about the different data ethics landscape in the EU.

Speaker: Jake Metcalf
Institution: Data and Society Institute and Founding Partner, Ethical Resolve
Title: Data subjectivity: responding to emerging forms of research and research subjects

Abstract: There are significant disjunctions between the established norms and practices of human- subjects research protections and the emerging research methods and infrastructures at the heart of data science and the internet economy. For example, long-standing research ethics regulations typically exempt from further review research projects that utilize pre-existing and/or public datasets, such as most data science research. This was once a sound assumption because such research does not require additional intervention into a person’s life or body, and the ‘publicness’ of the data meant all informational or privacy harms had already occurred. However, because big data enables datasets to be (at least in theory) widely networked, continually updated, infinitely repurposable and indefinitely stored, this assumption is no longer sound—big data allows potential harms to become networked, distributed and temporally stretched such that potential harms can take place far outside of the parameters of the research. Familiar protections for research subjects need rethinking in light of these changes to scientific practices. In this talk I will discuss how a historicization of ‘human subjects’ in research enables us to critically interrogate an emerging form of research subjectivity in response to the changing conditions of data-driven research. I will ask how data scientists, practitioners, policy-makers and ethicists might account for the emerging interests and concerns of ‘data subjects,’ particularly in light of proposed changes to research ethics regulations in the U.S.

New academic paper on human-subjects research and data ethics

Ethical Resolve’s Jake Metcalf has a new article co-authored with Kate Crawford about the strained relationship between familiar norms and policies of research ethics and the research methods of data analytics. It is available in pre-publication form on the open access Social Science Research Network, and will soon be available at the peer-reviewed journal Big Data & Society.

Where are Human Subjects in Big Data Research? The Emerging Ethics Divide

Jacob Metcalf
Data & Society Research Institute

Kate Crawford
Microsoft Research; MIT Center for Civic Media; NYU Information Law Institute

May 14, 2016

Big Data and Society, Spring 2016

There are growing discontinuities between the research practices of data science and established tools of research ethics regulation. Some of the core commitments of existing research ethics regulations, such as the distinction between research and practice, cannot be cleanly exported from biomedical research to data science research. These discontinuities have led some data science practitioners and researchers to move toward rejecting ethics regulations outright. These shifts occur at the same time as a proposal for major revisions to the Common Rule — the primary regulation governing human-subjects research in the U.S. — is under consideration for the first time in decades. We contextualize these revisions in long-running complaints about regulation of social science research, and argue data science should be understood as continuous with social sciences in this regard. The proposed regulations are more flexible and scalable to the methods of non-biomedical research, but they problematically exclude many data science methods from human-subjects regulation, particularly uses of public datasets. The ethical frameworks for big data research are highly contested and in flux, and the potential harms of data science research are unpredictable. We examine several contentious cases of research harms in data science, including the 2014 Facebook emotional contagion study and the 2016 use of geographical data techniques to identify the pseudonymous artist Banksy. To address disputes about human-subjects research ethics in data science, critical data studies should offer a historically nuanced theory of “data subjectivity” responsive to the epistemic methods, harms and benefits of data science and commerce.

Keywords: data ethics, human subjects, common rule, critical data studies, data subjects, big data

Getting rigorously naive, or why tech needs philosophy

A liberal arts degree has been a hot ticket in tech lately, according to a recent article in Forbes. Immediately foregrounding bias, this post is written by two philosophers who couldn’t agree more with the views expressed in the article.

Despite countless jokes about from our families and peers about starting a “philosophy store,” it turns out that the ability to rigorously pursue abstract inquiry is actually quite helpful in today’s tech sector. Stewart Butterfield, the CEO and founder of Slack (Ethical Resolve’s favorite Internet service du jour) and holder of a philosophy degree, recently discussed why. He told reporter George Anders that training in philosophy was critical to building the first user-friendly knowledge management tool on the Internet. “I learned how to write really clearly. I learned how to follow an argument all the way down, which is invaluable in running meetings. And when I studied the history of science, I learned about the ways that everyone believes something is true–like the old notion of some kind of ether in the air propagating gravitational forces–until they realized that it wasn’t true.”

There are other philosophers scattered around the tech sector in prominent positions. Damon Horowitz has the title “In-House Philosopher/Director of Engineering” at Google, which he earned after Google acquired his startup. In this TEDx presentation, Horowitz argues that tech requires a “moral operating system” if we are to build data analytics systems that peer deeply into our lives. His view is that tech companies need to make space for careful thinking about ancient questions of morality.

Read More

Implications of the Common Rule revisions for private enterprise

Photo courtesy Flickr user Yi Chen.

Photo courtesy Flickr user Yi Chen.


Through my position with the Council for Big Data, Ethics and Society, I recently lead the drafting of a collective public comment on the proposed revisions to the Common Rule, the federal regulation that requires federally funded research projects to receive independent, prior ethics review. The proposed revisions—the first in three decades—are largely a response to the rise of big data analytics in scientific research. Although the changes to biomedical research have received the most public attention, there are some important lessons to take home for any company utilizing data analytics.

Academic research on human subjects is governed by a set of ethical guidelines referred to as the “Common Rule.” These guidelines apply to all human-subjects research that receives government funding, and most universities and research-granting foundations require them of all research. The best known stipulation of the Common Rule is the requirement that research projects receive independent prior review to mitigate harms to research subjects. Private companies are not bound by the Common Rule insofar as they do not receive government funding, but the Common Rule sets the tone and agenda of research ethics in general—it has an outsized footprint well beyond its formal purview. Thus even private industry has good reason to pay attention to the norms animating the Common Rule, even if they are not obligated to follow these regulations.

Indeed, many of the datasets most interesting to researchers and dangerous to subjects are publicly available datasets containing private data.

The most notable problem posed by the revisions in the NPRM is the move to exclude from oversight all research that utilizes public datasets. Research ethics norms and regulations have long assumed that public datasets cannot pose additional informational harms—by definition the harm is already caused by the data contained therein becoming public. However, big data analytics render that assumption anachronistic. We used to be able to assume that data would stay put within its original context of collection. However, the power and peril of big data is that datasets are now architected to be (at least in theory) infinitely repurposable, perpetually updated, and indefinitely available. A public, open dataset that appears entirely innocuous in one context can be munged with another public dataset and pose genuine harms to the subjects of that research. See, for example, the case of NYC taxi database, and the many, many private details there were revealed about drivers and riders from a public dataset.

Read More

Getting the formula right: Social trust, A/B testing and research ethics

Image courtesy of Flickr user x6e38 under CC license

Image courtesy of Flickr user x6e38 under CC license

Most Internet services, and especially social media services, routinely conduct experiments on users’ experiences even though few of us are aware of it and consent procedures are murky. In a recent New York Times op-ed Michelle Meyer and Christopher Chabris argue that we should enthusiastically embrace the model of experimentation on users called “A/B testing.” This type of data-intensive experimentation is the bread and butter of the Internet economy and now is at the heart of sprawling ethical dispute over whether experimenting on Internet users’ data is equivalent to human experimentation on legal, ethical or regulatory grounds. In their Op-Ed, Myer and Chabris argue that A/B testing is on the whole ethical because without it Internet services would have no idea about what works, let alone what works best. They suggest that whatever outrage users might feel about such experiments are due to a “moral illusion” wherein we are prone to assuming that the status quo is natural and any experimental changes need to be justified and regulated, but the reality of Internet services is that there is no non-experimental state.


While they’re right that this type of experimentation is a poor fit for the ways we currently regulate research ethics, they fall short of explaining that data scientists need to earn the social trust that is the foundation of ethical research in any field. Ultimately, the foundations of ethical research are about trusting social relationships, not our assumptions about how experiments are constituted. This is a critical moment for data-driven enterprises to get creative and thoughtful about building such trust.

Even if those specific regulations do not work for A/B testing, it does not follow that fostering and maintaining such trust is not an essential component of knowledge production in the era of big data.


A/B testing is the process of dividing users randomly into two groups and comparing their response to different user experiences in order to determine which experience is “better.” Whichever website design or feed algorithm best achieves the prefered outcome—such as increased sales, regular feed refreshes, more accurate media recommendations, etc.—will become the default user experience. A/B testing appears innocuous enough when a company is looking for hard data about which tweaks to a website design drives sales, such as the color of the “buy” button. Few would argue that testing the color of a buy button or placement of an ad requires the informed consent of every visitor to a website. However, when a company possesses (or is accessing via data mining) a vast store of data on your life, your political preferences, your daily activities, your calendar, your personal networks, and your location, A/B testing takes on a different flavor.

Read More