Ethics Review

Viewing posts from the Ethics Review category

Twelve Principles of Data Ethics

Ethical Resolve has helped author Accenture’s newly released Data Ethics report, and in particular took the lead role in writing the section Developing a Code of Data Ethics. Steven Tiell and I hashed these out with the assistance of multiple contributors. The full report is available here. These 12 universal principles of data ethics are intended to help enterprises and professional communities develop tailored codes of ethics to guide responsible data use. Let us know if your organization needs assistance instantiating these principles.

A set of universal principles of data ethics can help guide data science professionals and practitioners in creating a code of data ethics that is specific and contextual for their organization or community of stakeholders:

 1. The highest priority is to respect the persons behind the data.

Where insights derived from data could impact the human condition, the potential harm to individuals and communities should be the paramount consideration. Big data can produce compelling insights into populations, but those same insights can be used to unfairly limit an individual’s possibilities.

2. Account for the downstream uses of datasets.

Data professionals should strive to use data in ways that are consistent with the intentions and understanding of the disclosing party. Many regulations govern datasets on the basis of the status of the data: “public,” “private” or “proprietary”, for example. But what is done with datasets is ultimately more consequential to subjects/users than the type of data or the context in which it is collected. Correlative use of repurposed data in research and industry represents the greatest promise and the greatest risk of data analytics.

3. The consequences of utilizing data and analytical tools today are shaped by how they’ve been used in the past.

There’s no such thing as raw data. All datasets and accompanying analytic tools carry a history of human decision-making. As far as possible, that history should be auditable. This should include mechanisms for tracking the context of collection, methods of consent, chains of responsibility, and assessments of data quality and accuracy.

4. Seek to match privacy and security safeguards with privacy and security expectations.

Data subjects hold a range of expectations about the privacy and security of their data. These expectations are often context-dependent. Designers and data professionals should give due consideration to those expectations and align safeguards and expectations with them, as much as possible.

5. Always follow the law, but understand that the law is often a minimum bar.

Digital transformations have become a standard evolutionary path for businesses and governments. However, because laws have largely failed to keep up with the pace of digital innovation and change, existing regulations are often miscalibrated to current risks. In this context, compliance means complacency. To excel in data ethics, leaders must define their own compliance frameworks to outperform legislated requirements.

6. Be wary of collecting data just for the sake of having more data.

The power and peril of data analytics is that data collected today will be useful for unpredictable purposes in the future. Give due consideration to the possibility that less data may result in both better analysis and less risk.

7. Data can be a tool of both inclusion and exclusion.

While everyone should have access to the social and economic benefits of data, not everyone is equally impacted by the processes of data collection, correlation, and prediction. Data professionals should strive to mitigate the disparate impacts of their products and listen to the concerns of affected communities.

8. As far as possible, explain methods for analysis and marketing to data disclosers.

Maximizing transparency at the point of data collection can minimize the more significant risks that arise as data travels through the data supply chain.

9. Data scientists and practitioners should accurately represent their qualifications (and limits to their expertise), adhere to professional standards, and strive for peer accountability.

The long-term success of this discipline depends on public and client trust. Data professionals should develop practices for holding themselves and their peers accountable to shared standards.

10. Design practices that incorporate transparency, configurability, accountability and auditability.

Not all ethical dilemmas have design solutions, but paying close attention to design practices can break down many of the practical barriers that stand in the way of shared, robust ethical standards. Data ethics is an engineering challenge worthy of the best minds in the field.

11. Products and research practices should be subject to internal (and potentially external) ethical review.

Organizations should prioritize establishing consistent, efficient and actionable ethics review practices for new products, services and research programs. Internal peer-review practices help to mitigate risk, and an external review board can contribute significantly to public trust.

12. Governance practices should be robust, known to all team members and regularly reviewed.

Data ethics poses organizational challenges that can’t be resolved by compliance regimes alone. Because the regulatory, social and engineering terrains are in flux, organizations engaged in data analytics need collaborative, routine and transparent practices for ethical governance.

Avanade’s TechSummit 2016 panel on digital ethics

I recently had the honor of participating on a panel at Avanade’s annual TechSummit conference. Organized by Steven Tiell of Accenture’s TechVision team, we were tasked with discussing the role of digital ethics and digital trust in enterprise. I joined Steven on stage with Bill Hoffman, Associate Director of the World Economic Forum and Scott David, Director of Policy at the University of Washington Center for Information Assurance and Cybersecurity. Below are my prepared remarks, which of course differ extensively from what I actually got around to saying on stage.

1. We’ve seen ethics requirements for medical and academic research, particularly when federal dollars are at play. Why should businesses care about ethics in their research?

Businesses should care about ethics most of all because it is, by definition, the right thing to do. But to go beyond a pat answer, I think it is useful to define the domain of “ethics.” I think of ethics as the methods and tools you use to make a consequential decision when there is relatively little settled guidance about the right thing to do. If you knew the right thing to do, then it would probably be a matter for compliance or legal departments. I like how digital sociologist Annette Markham recently put it when discussing a major data research scandal: “ethics is about making choices at critical juncture,” particularly when those choices affect other people. What I would add to Annette’s definition is that ethics is not just the decisions, but also all the work you have to do in advance to enable those critical decisions. You need the capacity to identify and evaluate those critical junctures, and to then make efficient, consistent and actionable decisions. Done well, ethics is a future-oriented stance. In my opinion, building the habits and infrastructures that make it possible for business to make good choices at critical junctions is simply something that will be good for the bottom line in the long run. It will certainly enable businesses to identify and mitigate risks more effectively.

When it comes to the matter of research ethics in particular, there are three aspects that bear more scrutiny when considering how and why enterprises should engage in ethics review practices.

First, because businesses now hold more data about human behavior than any other entity in human history, the value of those businesses is increasingly indexed to what they can do with that data now and in the future. Thus, the types of research being done looks like the types of research that have traditionally been located in university settings. It should indicate something important to us that academic researchers and institutions have invested so much in handling research ethics: research practices carry significant risk and require sustained attention.

Second, anyone can now be a researcher and everyone is a research subject. Yet all of our familiar ethics norms and infrastructures make certain outdated assumptions about institutional boundaries that create formal and informal professional limits on who can do consequential research. But those assumptions do not hold when human data research happens everywhere. Without the familiar institutional boundaries, businesses will need to make up the slack somehow.

Third, big data research methods simply do pose new kinds of risks for enterprise. Holding so much private data and using that data to intervene in people’s’ lives in a tailored, personalized fashion, poses risks beyond simply privacy. Research is often perceived as creepy or controlling, where even products that do the same thing might not. Thus it is important to align design practices, product development and ethics review in a manner that users of your services or providers of your data can be comfortable with.

Read More

Keynote presentation at Cambridge data ethics workshop

On 10 June, 2016, I will be giving a keynote talk at the Data Ethics Workshop, hosted by the Center for Research in the Arts, Social Sciences and Humanities at Cambridge University in the UK. I look forward to meeting some of the great thinkers in this field from the other side of the pond, and learning more about the different data ethics landscape in the EU.

Speaker: Jake Metcalf
Institution: Data and Society Institute and Founding Partner, Ethical Resolve
Title: Data subjectivity: responding to emerging forms of research and research subjects

Abstract: There are significant disjunctions between the established norms and practices of human- subjects research protections and the emerging research methods and infrastructures at the heart of data science and the internet economy. For example, long-standing research ethics regulations typically exempt from further review research projects that utilize pre-existing and/or public datasets, such as most data science research. This was once a sound assumption because such research does not require additional intervention into a person’s life or body, and the ‘publicness’ of the data meant all informational or privacy harms had already occurred. However, because big data enables datasets to be (at least in theory) widely networked, continually updated, infinitely repurposable and indefinitely stored, this assumption is no longer sound—big data allows potential harms to become networked, distributed and temporally stretched such that potential harms can take place far outside of the parameters of the research. Familiar protections for research subjects need rethinking in light of these changes to scientific practices. In this talk I will discuss how a historicization of ‘human subjects’ in research enables us to critically interrogate an emerging form of research subjectivity in response to the changing conditions of data-driven research. I will ask how data scientists, practitioners, policy-makers and ethicists might account for the emerging interests and concerns of ‘data subjects,’ particularly in light of proposed changes to research ethics regulations in the U.S.

Digital Trust at the Core of Accenture’s 2016 Vision

The partners of Ethical Resolve recently joined Accenture in their San Jose office to learn about Accenture’s 2016 Tech Vision. We have been collaborating with their staff on a report on data ethics to be released later in 2016.

We were pleased to hear about Accenture’s commitment to focusing on ethical issues in order to help their clients build digital trust with customers.

In particular, we agree that it is vital for companies  to focus on their stewardship of user data to ensure that this information is used responsibly and with the interests and rights of customers in mind. As we move further into 2016, it has become clear that one of the simplest approaches to data ethics is to implement effective processes for ethical decision making. What this means for companies is that any employee who makes decisions with ethical ramifications needs to have a clear and effective process for determining what is right thing to do.

Practices as simple as the use of checklists and templates for ethical decision making can greatly improve a company’s ability to properly manage ethical risks and build trust with their customers. With the proper implementation of customized processes for ethical decision making, companies can greatly improve their relationships with customers without undue difficulty.

We look forward to working more with Accenture to help offer processes that are easily adopted by clients to achieve the aim of greater digital trust between tech companies and their customers.

Getting rigorously naive, or why tech needs philosophy

A liberal arts degree has been a hot ticket in tech lately, according to a recent article in Forbes. Immediately foregrounding bias, this post is written by two philosophers who couldn’t agree more with the views expressed in the article.

Despite countless jokes about from our families and peers about starting a “philosophy store,” it turns out that the ability to rigorously pursue abstract inquiry is actually quite helpful in today’s tech sector. Stewart Butterfield, the CEO and founder of Slack (Ethical Resolve’s favorite Internet service du jour) and holder of a philosophy degree, recently discussed why. He told reporter George Anders that training in philosophy was critical to building the first user-friendly knowledge management tool on the Internet. “I learned how to write really clearly. I learned how to follow an argument all the way down, which is invaluable in running meetings. And when I studied the history of science, I learned about the ways that everyone believes something is true–like the old notion of some kind of ether in the air propagating gravitational forces–until they realized that it wasn’t true.”

There are other philosophers scattered around the tech sector in prominent positions. Damon Horowitz has the title “In-House Philosopher/Director of Engineering” at Google, which he earned after Google acquired his startup. In this TEDx presentation, Horowitz argues that tech requires a “moral operating system” if we are to build data analytics systems that peer deeply into our lives. His view is that tech companies need to make space for careful thinking about ancient questions of morality.

Read More

Implications of the Common Rule revisions for private enterprise

Photo courtesy Flickr user Yi Chen.

Photo courtesy Flickr user Yi Chen.


Through my position with the Council for Big Data, Ethics and Society, I recently lead the drafting of a collective public comment on the proposed revisions to the Common Rule, the federal regulation that requires federally funded research projects to receive independent, prior ethics review. The proposed revisions—the first in three decades—are largely a response to the rise of big data analytics in scientific research. Although the changes to biomedical research have received the most public attention, there are some important lessons to take home for any company utilizing data analytics.

Academic research on human subjects is governed by a set of ethical guidelines referred to as the “Common Rule.” These guidelines apply to all human-subjects research that receives government funding, and most universities and research-granting foundations require them of all research. The best known stipulation of the Common Rule is the requirement that research projects receive independent prior review to mitigate harms to research subjects. Private companies are not bound by the Common Rule insofar as they do not receive government funding, but the Common Rule sets the tone and agenda of research ethics in general—it has an outsized footprint well beyond its formal purview. Thus even private industry has good reason to pay attention to the norms animating the Common Rule, even if they are not obligated to follow these regulations.

Indeed, many of the datasets most interesting to researchers and dangerous to subjects are publicly available datasets containing private data.

The most notable problem posed by the revisions in the NPRM is the move to exclude from oversight all research that utilizes public datasets. Research ethics norms and regulations have long assumed that public datasets cannot pose additional informational harms—by definition the harm is already caused by the data contained therein becoming public. However, big data analytics render that assumption anachronistic. We used to be able to assume that data would stay put within its original context of collection. However, the power and peril of big data is that datasets are now architected to be (at least in theory) infinitely repurposable, perpetually updated, and indefinitely available. A public, open dataset that appears entirely innocuous in one context can be munged with another public dataset and pose genuine harms to the subjects of that research. See, for example, the case of NYC taxi database, and the many, many private details there were revealed about drivers and riders from a public dataset.

Read More

Getting the formula right: Social trust, A/B testing and research ethics

Image courtesy of Flickr user x6e38 under CC license

Image courtesy of Flickr user x6e38 under CC license

Most Internet services, and especially social media services, routinely conduct experiments on users’ experiences even though few of us are aware of it and consent procedures are murky. In a recent New York Times op-ed Michelle Meyer and Christopher Chabris argue that we should enthusiastically embrace the model of experimentation on users called “A/B testing.” This type of data-intensive experimentation is the bread and butter of the Internet economy and now is at the heart of sprawling ethical dispute over whether experimenting on Internet users’ data is equivalent to human experimentation on legal, ethical or regulatory grounds. In their Op-Ed, Myer and Chabris argue that A/B testing is on the whole ethical because without it Internet services would have no idea about what works, let alone what works best. They suggest that whatever outrage users might feel about such experiments are due to a “moral illusion” wherein we are prone to assuming that the status quo is natural and any experimental changes need to be justified and regulated, but the reality of Internet services is that there is no non-experimental state.


While they’re right that this type of experimentation is a poor fit for the ways we currently regulate research ethics, they fall short of explaining that data scientists need to earn the social trust that is the foundation of ethical research in any field. Ultimately, the foundations of ethical research are about trusting social relationships, not our assumptions about how experiments are constituted. This is a critical moment for data-driven enterprises to get creative and thoughtful about building such trust.

Even if those specific regulations do not work for A/B testing, it does not follow that fostering and maintaining such trust is not an essential component of knowledge production in the era of big data.


A/B testing is the process of dividing users randomly into two groups and comparing their response to different user experiences in order to determine which experience is “better.” Whichever website design or feed algorithm best achieves the prefered outcome—such as increased sales, regular feed refreshes, more accurate media recommendations, etc.—will become the default user experience. A/B testing appears innocuous enough when a company is looking for hard data about which tweaks to a website design drives sales, such as the color of the “buy” button. Few would argue that testing the color of a buy button or placement of an ad requires the informed consent of every visitor to a website. However, when a company possesses (or is accessing via data mining) a vast store of data on your life, your political preferences, your daily activities, your calendar, your personal networks, and your location, A/B testing takes on a different flavor.

Read More