Data Ethics

Viewing posts from the Data Ethics category

Twelve Principles of Data Ethics

Ethical Resolve has helped author Accenture’s newly released Data Ethics report, and in particular took the lead role in writing the section Developing a Code of Data Ethics. Steven Tiell and I hashed these out with the assistance of multiple contributors. The full report is available here. These 12 universal principles of data ethics are intended to help enterprises and professional communities develop tailored codes of ethics to guide responsible data use. Let us know if your organization needs assistance instantiating these principles.

A set of universal principles of data ethics can help guide data science professionals and practitioners in creating a code of data ethics that is specific and contextual for their organization or community of stakeholders:

 1. The highest priority is to respect the persons behind the data.

Where insights derived from data could impact the human condition, the potential harm to individuals and communities should be the paramount consideration. Big data can produce compelling insights into populations, but those same insights can be used to unfairly limit an individual’s possibilities.

2. Account for the downstream uses of datasets.

Data professionals should strive to use data in ways that are consistent with the intentions and understanding of the disclosing party. Many regulations govern datasets on the basis of the status of the data: “public,” “private” or “proprietary”, for example. But what is done with datasets is ultimately more consequential to subjects/users than the type of data or the context in which it is collected. Correlative use of repurposed data in research and industry represents the greatest promise and the greatest risk of data analytics.

3. The consequences of utilizing data and analytical tools today are shaped by how they’ve been used in the past.

There’s no such thing as raw data. All datasets and accompanying analytic tools carry a history of human decision-making. As far as possible, that history should be auditable. This should include mechanisms for tracking the context of collection, methods of consent, chains of responsibility, and assessments of data quality and accuracy.

4. Seek to match privacy and security safeguards with privacy and security expectations.

Data subjects hold a range of expectations about the privacy and security of their data. These expectations are often context-dependent. Designers and data professionals should give due consideration to those expectations and align safeguards and expectations with them, as much as possible.

5. Always follow the law, but understand that the law is often a minimum bar.

Digital transformations have become a standard evolutionary path for businesses and governments. However, because laws have largely failed to keep up with the pace of digital innovation and change, existing regulations are often miscalibrated to current risks. In this context, compliance means complacency. To excel in data ethics, leaders must define their own compliance frameworks to outperform legislated requirements.

6. Be wary of collecting data just for the sake of having more data.

The power and peril of data analytics is that data collected today will be useful for unpredictable purposes in the future. Give due consideration to the possibility that less data may result in both better analysis and less risk.

7. Data can be a tool of both inclusion and exclusion.

While everyone should have access to the social and economic benefits of data, not everyone is equally impacted by the processes of data collection, correlation, and prediction. Data professionals should strive to mitigate the disparate impacts of their products and listen to the concerns of affected communities.

8. As far as possible, explain methods for analysis and marketing to data disclosers.

Maximizing transparency at the point of data collection can minimize the more significant risks that arise as data travels through the data supply chain.

9. Data scientists and practitioners should accurately represent their qualifications (and limits to their expertise), adhere to professional standards, and strive for peer accountability.

The long-term success of this discipline depends on public and client trust. Data professionals should develop practices for holding themselves and their peers accountable to shared standards.

10. Design practices that incorporate transparency, configurability, accountability and auditability.

Not all ethical dilemmas have design solutions, but paying close attention to design practices can break down many of the practical barriers that stand in the way of shared, robust ethical standards. Data ethics is an engineering challenge worthy of the best minds in the field.

11. Products and research practices should be subject to internal (and potentially external) ethical review.

Organizations should prioritize establishing consistent, efficient and actionable ethics review practices for new products, services and research programs. Internal peer-review practices help to mitigate risk, and an external review board can contribute significantly to public trust.

12. Governance practices should be robust, known to all team members and regularly reviewed.

Data ethics poses organizational challenges that can’t be resolved by compliance regimes alone. Because the regulatory, social and engineering terrains are in flux, organizations engaged in data analytics need collaborative, routine and transparent practices for ethical governance.

Avanade’s TechSummit 2016 panel on digital ethics

I recently had the honor of participating on a panel at Avanade’s annual TechSummit conference. Organized by Steven Tiell of Accenture’s TechVision team, we were tasked with discussing the role of digital ethics and digital trust in enterprise. I joined Steven on stage with Bill Hoffman, Associate Director of the World Economic Forum and Scott David, Director of Policy at the University of Washington Center for Information Assurance and Cybersecurity. Below are my prepared remarks, which of course differ extensively from what I actually got around to saying on stage.

1. We’ve seen ethics requirements for medical and academic research, particularly when federal dollars are at play. Why should businesses care about ethics in their research?

Businesses should care about ethics most of all because it is, by definition, the right thing to do. But to go beyond a pat answer, I think it is useful to define the domain of “ethics.” I think of ethics as the methods and tools you use to make a consequential decision when there is relatively little settled guidance about the right thing to do. If you knew the right thing to do, then it would probably be a matter for compliance or legal departments. I like how digital sociologist Annette Markham recently put it when discussing a major data research scandal: “ethics is about making choices at critical juncture,” particularly when those choices affect other people. What I would add to Annette’s definition is that ethics is not just the decisions, but also all the work you have to do in advance to enable those critical decisions. You need the capacity to identify and evaluate those critical junctures, and to then make efficient, consistent and actionable decisions. Done well, ethics is a future-oriented stance. In my opinion, building the habits and infrastructures that make it possible for business to make good choices at critical junctions is simply something that will be good for the bottom line in the long run. It will certainly enable businesses to identify and mitigate risks more effectively.

When it comes to the matter of research ethics in particular, there are three aspects that bear more scrutiny when considering how and why enterprises should engage in ethics review practices.

First, because businesses now hold more data about human behavior than any other entity in human history, the value of those businesses is increasingly indexed to what they can do with that data now and in the future. Thus, the types of research being done looks like the types of research that have traditionally been located in university settings. It should indicate something important to us that academic researchers and institutions have invested so much in handling research ethics: research practices carry significant risk and require sustained attention.

Second, anyone can now be a researcher and everyone is a research subject. Yet all of our familiar ethics norms and infrastructures make certain outdated assumptions about institutional boundaries that create formal and informal professional limits on who can do consequential research. But those assumptions do not hold when human data research happens everywhere. Without the familiar institutional boundaries, businesses will need to make up the slack somehow.

Third, big data research methods simply do pose new kinds of risks for enterprise. Holding so much private data and using that data to intervene in people’s’ lives in a tailored, personalized fashion, poses risks beyond simply privacy. Research is often perceived as creepy or controlling, where even products that do the same thing might not. Thus it is important to align design practices, product development and ethics review in a manner that users of your services or providers of your data can be comfortable with.

Read More

Keynote presentation at Cambridge data ethics workshop

On 10 June, 2016, I will be giving a keynote talk at the Data Ethics Workshop, hosted by the Center for Research in the Arts, Social Sciences and Humanities at Cambridge University in the UK. I look forward to meeting some of the great thinkers in this field from the other side of the pond, and learning more about the different data ethics landscape in the EU.

Speaker: Jake Metcalf
Institution: Data and Society Institute and Founding Partner, Ethical Resolve
Title: Data subjectivity: responding to emerging forms of research and research subjects

Abstract: There are significant disjunctions between the established norms and practices of human- subjects research protections and the emerging research methods and infrastructures at the heart of data science and the internet economy. For example, long-standing research ethics regulations typically exempt from further review research projects that utilize pre-existing and/or public datasets, such as most data science research. This was once a sound assumption because such research does not require additional intervention into a person’s life or body, and the ‘publicness’ of the data meant all informational or privacy harms had already occurred. However, because big data enables datasets to be (at least in theory) widely networked, continually updated, infinitely repurposable and indefinitely stored, this assumption is no longer sound—big data allows potential harms to become networked, distributed and temporally stretched such that potential harms can take place far outside of the parameters of the research. Familiar protections for research subjects need rethinking in light of these changes to scientific practices. In this talk I will discuss how a historicization of ‘human subjects’ in research enables us to critically interrogate an emerging form of research subjectivity in response to the changing conditions of data-driven research. I will ask how data scientists, practitioners, policy-makers and ethicists might account for the emerging interests and concerns of ‘data subjects,’ particularly in light of proposed changes to research ethics regulations in the U.S.

Digital Trust at the Core of Accenture’s 2016 Vision

The partners of Ethical Resolve recently joined Accenture in their San Jose office to learn about Accenture’s 2016 Tech Vision. We have been collaborating with their staff on a report on data ethics to be released later in 2016.

We were pleased to hear about Accenture’s commitment to focusing on ethical issues in order to help their clients build digital trust with customers.

In particular, we agree that it is vital for companies  to focus on their stewardship of user data to ensure that this information is used responsibly and with the interests and rights of customers in mind. As we move further into 2016, it has become clear that one of the simplest approaches to data ethics is to implement effective processes for ethical decision making. What this means for companies is that any employee who makes decisions with ethical ramifications needs to have a clear and effective process for determining what is right thing to do.

Practices as simple as the use of checklists and templates for ethical decision making can greatly improve a company’s ability to properly manage ethical risks and build trust with their customers. With the proper implementation of customized processes for ethical decision making, companies can greatly improve their relationships with customers without undue difficulty.

We look forward to working more with Accenture to help offer processes that are easily adopted by clients to achieve the aim of greater digital trust between tech companies and their customers.

Implications of the Common Rule revisions for private enterprise

Photo courtesy Flickr user Yi Chen.

Photo courtesy Flickr user Yi Chen.


Through my position with the Council for Big Data, Ethics and Society, I recently lead the drafting of a collective public comment on the proposed revisions to the Common Rule, the federal regulation that requires federally funded research projects to receive independent, prior ethics review. The proposed revisions—the first in three decades—are largely a response to the rise of big data analytics in scientific research. Although the changes to biomedical research have received the most public attention, there are some important lessons to take home for any company utilizing data analytics.

Academic research on human subjects is governed by a set of ethical guidelines referred to as the “Common Rule.” These guidelines apply to all human-subjects research that receives government funding, and most universities and research-granting foundations require them of all research. The best known stipulation of the Common Rule is the requirement that research projects receive independent prior review to mitigate harms to research subjects. Private companies are not bound by the Common Rule insofar as they do not receive government funding, but the Common Rule sets the tone and agenda of research ethics in general—it has an outsized footprint well beyond its formal purview. Thus even private industry has good reason to pay attention to the norms animating the Common Rule, even if they are not obligated to follow these regulations.

Indeed, many of the datasets most interesting to researchers and dangerous to subjects are publicly available datasets containing private data.

The most notable problem posed by the revisions in the NPRM is the move to exclude from oversight all research that utilizes public datasets. Research ethics norms and regulations have long assumed that public datasets cannot pose additional informational harms—by definition the harm is already caused by the data contained therein becoming public. However, big data analytics render that assumption anachronistic. We used to be able to assume that data would stay put within its original context of collection. However, the power and peril of big data is that datasets are now architected to be (at least in theory) infinitely repurposable, perpetually updated, and indefinitely available. A public, open dataset that appears entirely innocuous in one context can be munged with another public dataset and pose genuine harms to the subjects of that research. See, for example, the case of NYC taxi database, and the many, many private details there were revealed about drivers and riders from a public dataset.

Read More

Skynet starts with data mining: thinking through the ethics of AI

I was recently interviewed by John C. Havens at Mashable about the creation of data ethics and AI ethics committees.

The ethics of artificial intelligence is becoming a much more concrete public discussion, particularly with the recent open letter advocating for a ban on autonomous weapons systems. The letter, organized by the Future of Life Institute and signed by 10,000 plus people, including many AI researchers and prominent tech leaders, advocates for an international ban on autonomous weapons systems that can operate without meaningful human input.

This follows on the heels of some major media attention earlier in the year about Bill Gates, Elon Musk and Steven Hawking arguing that artificial super-intelligence poses a future existential threat to humanity (including all signing another FLI open letter). Hawking told the BBC that, “The development of full artificial intelligence could spell the end of the human race.” There are reasons to be skeptical of some of this fear, not least of which is the definitional problem of actually getting a handle on what counts as AI and whether it would ever have generalized, incredibly plastic intelligence like human bio-brains do or the ability to maintain machine bodies without humans. (My favorite semi-serious reason for doubting is Baratunde Thurston’s point that if AI looked like human intelligence in aggregate it would spend all day taking cat pictures and trying to sell the rest of us stuff.)

Read More

Getting the formula right: Social trust, A/B testing and research ethics

Image courtesy of Flickr user x6e38 under CC license

Image courtesy of Flickr user x6e38 under CC license

Most Internet services, and especially social media services, routinely conduct experiments on users’ experiences even though few of us are aware of it and consent procedures are murky. In a recent New York Times op-ed Michelle Meyer and Christopher Chabris argue that we should enthusiastically embrace the model of experimentation on users called “A/B testing.” This type of data-intensive experimentation is the bread and butter of the Internet economy and now is at the heart of sprawling ethical dispute over whether experimenting on Internet users’ data is equivalent to human experimentation on legal, ethical or regulatory grounds. In their Op-Ed, Myer and Chabris argue that A/B testing is on the whole ethical because without it Internet services would have no idea about what works, let alone what works best. They suggest that whatever outrage users might feel about such experiments are due to a “moral illusion” wherein we are prone to assuming that the status quo is natural and any experimental changes need to be justified and regulated, but the reality of Internet services is that there is no non-experimental state.


While they’re right that this type of experimentation is a poor fit for the ways we currently regulate research ethics, they fall short of explaining that data scientists need to earn the social trust that is the foundation of ethical research in any field. Ultimately, the foundations of ethical research are about trusting social relationships, not our assumptions about how experiments are constituted. This is a critical moment for data-driven enterprises to get creative and thoughtful about building such trust.

Even if those specific regulations do not work for A/B testing, it does not follow that fostering and maintaining such trust is not an essential component of knowledge production in the era of big data.


A/B testing is the process of dividing users randomly into two groups and comparing their response to different user experiences in order to determine which experience is “better.” Whichever website design or feed algorithm best achieves the prefered outcome—such as increased sales, regular feed refreshes, more accurate media recommendations, etc.—will become the default user experience. A/B testing appears innocuous enough when a company is looking for hard data about which tweaks to a website design drives sales, such as the color of the “buy” button. Few would argue that testing the color of a buy button or placement of an ad requires the informed consent of every visitor to a website. However, when a company possesses (or is accessing via data mining) a vast store of data on your life, your political preferences, your daily activities, your calendar, your personal networks, and your location, A/B testing takes on a different flavor.

Read More

How is the Drone Industry Handling Privacy?

When it comes to privacy, the drone industry is not clear for takeoff.

Despite being a relatively small event, the Drones, Data X Conference in Santa Cruz last month painted a large canvas of the current state of affairs in the world of UAV technology. The audience heard from industry leaders in business and government, as well as being able to interact with a large number of drone enthusiasts and hobbyists. These two groups have completely different views of the world of drone flight.

From the perspective of hobbyists like Ryan Jay, the world of the drone hobbyist is a bit like the wild west: there are FAA regulations “that have no teeth” to control his activities in drone flight. He is a one man flight team, able to build, launch and pilot his own UAVs using First Person View (FPV) by means of a camera mounted on his vehicle. Ryan has built (and lost) several vehicles over the past few years as a hobbyist, and he doesn’t see his ability to pursue this hobby being meaningfully limited by the long conversations going on the in the drone industry between NASA, the FAA and drone hardware and software makers.

The view from above 1,200 feet is different. In an industry with investment already in the billions there is no shortage of careful thinking going into answering questions about how drones should be regulated to protect privacy and security. Whereas the private drone hobbyist can do everything herself, using drones for commercial purposes is highly visible and highly regulated. The billions invested in the drone industry have not been spent to fulfill the desires of hobbyists: there is massive ROI projected for companies who are able to leverage drones for purposes that are currently done inefficiently by other means.

Whether we are talking about aerial inspection of powerlines, flare stacks, wind farms, oil pipelines, and solar arrays, surveying of forests, mines, quarries and agricultural resources, use of drones for disasters relief, search and rescue, and delivery of medical supplies to remote areas, drones are implicated as the tool that makes the impossible possible. There is a recurring narrative in this drone community about the power of aerial vision: the phrase “god’s eye view” of the world was repeated so many times over the course of the day it began to make me uncomfortable. Romeo Durscher, Director of Education for DJI (the largest drone maker in the world) used the phrase more than once in his talk saying at one point “The God’s eye view is really my favorite view.”

Read More

Beyond Privacy

This series of blog posts will address existing and emergent concerns about ethics in the education technology market. The goal of this series is to help technology entrepreneurs begin to think through and respond to the risks their companies face with regard to ethical challenges. Failure to develop an effective and clear ethics policy has destroyed multi-million dollar companies, and will do so again in the future. Too often, entrepreneurs think that they can wait to respond to ethical concerns until after their product has become successful. We have observed over and over again that the most successful products are designed with the ethical use of that product in mind. These posts will encourage readers to consider how successes and failures in the technology market can be understood in terms of ethical approach.


Data Ethics in Education Technology

How data is used in education technology can determine the success or failure of a company. By looking at the history of the ed tech market we can learn how good ethics forms the foundation of successful ed tech companies.


The debate about the role of big data in education has sometimes been cast as a dispute between reformers and traditionalists. Reformers are described as those who support technology as the solution to the problems of the education system, whereas traditionalists tend to support teacher pay increases, reduced class sizes, etc. as the better approach. This debate has often conflated multiple ethical questions within education technology under the same heading: concerns about big data. Disentangling the concerns surrounding the processes of education data is mandatory if companies are to effectively formulate and communicate sound ethical data use policies to their current and potential users. When it comes to ed tech, good intentions and high hopes for digital tools are not enough—parents, teachers and students need concrete evidence that the providers are not merely interested in the bottom line.


Fully understanding what’s at stake requires that we distinguish between issues of data privacy, data commercialization and predictive data modeling. By separating and clarifying threads of the discussion we can better understand the ethical questions in the discussion, avoiding the politicized binary in the debate. I will address each of these issues in separate posts, beginning here with a discussion of some notable successes and failures around data privacy. This will allow us to respond to the technological needs of stakeholders with greater care and efficiency.


The ethical concerns around big data privacy became increasingly pronounced in 2014, and this discussion has become very visible in education technology. The very public collapse of InBloom in failing to properly communicate their policies on data handling helped to bring these questions to the forefront. As researchers Jules Polonetsky and Omer Tene have described it, InBloom’s rapid expansion “brought to the fore weighty policy choices, which required sophisticated technology leadership and policy articulation.” The necessary leadership and articulation was never achieved, and ultimately the public outcry against InBloom forced many districts to end their relationship with company. The lesson to be learned here is not that big data analytics are a failed solution to the problems of our education system, but that big data practices need to be more carefully articulated. Education data needs to be focused on generating demonstrable learning outcomes, rather than the mere collection of data itself. One notable feature of InBloom’s failed policy was the move to collect as many data points about students as possible, rather than targeting data collection practices around specific hypotheses. The “collect and then measure” approach to education data is a mistake. Ed tech companies need to be able to say why they are collecting each data set and exactly what that data is going to achieve. Data practices must also include robust and simple privacy policies and be explained in ways that are easily understandable to the public if they are to succeed.


Read More