Devising our Data Destiny

The Hadoop ecosystem is becoming the incumbent data platform.  It brings powerful new capabilities for collecting and analyzing data.  This technology can be used for both harm and benefit.  As a society, we should deliberately address this potential, developing pragmatic approaches.


We’ve addressed similar situations in the past.  For example, when George Eastman invented the Kodak camera, he suddenly made photography affordable.  No longer were large investments in complex cameras and laboratories required.  But once everyone had cameras, society then needed to determine rules for what could be photographed.  After a series of legal cases, photographers were permitted to take pictures of most things without first obtaining permission.  Some exceptions were made for things not on public display.  With this legal fabric in place, amateur photography flourished, and we now enjoy snapshots of our friends and families.  Society responded to Eastman’s technology with pragmatic regulation, permitting us to enjoy photography’s creations and limit harm.

Today we are in a similar situation.  We must once again develop policies that preserve our privacy while permitting progress.  First let’s consider both possibilities for data.

Descriptions of data dystopias abound.  We revile the surveillance society, where governments and corporations record nearly every aspect of our lives and use the data to control us for their interests.  We believe that many acts must remain private, not subject to public scrutiny.  We worry that the present may no longer fade away, but remain available for permanent examination.

On the other hand, we can also see advantages to data collection.  For example, as the world modernizes, better education is critical to our economic and political survival.  Online courses can be more easily customized to each student, yielding improved learning.  To achieve this, we must gather data about students and analyze it.  Anonymization and sampling are often impossible without discarding utility.  We face arguably greater crises in healthcare and energy use, and in these areas too, data analysis can provide significant good.

We cannot simply ban collection of personal data or we will suffer.  Our world is becoming more crowded, with more to feed, house and employ.  Populations are aging.  The planet is warming.  Data alone will not provide all the answers, but can be a substantial component of any solution.  So we would be wise to permit data collection when we can trust the collectors and analyzers not to abuse.

Transparency is a great tool for building trust.  We must know who is collecting what data and how they are using it.  We must know how long it is being held.  We should permit new uses of collected data that were not imagined when it was collected, provided those uses are consistent with our values.  Placing the facts of data collection and use on public display is required to permit these determinations.

We must also decide what uses of collected data are appropriate.  We need folks to be able to use and publish results of analysis that aggregate many individuals.  However we do not want data collectors to expose private details or use them improperly.  Such determinations can be subtle.  For example, we want to be able to call our bank and discuss details of payments we’ve made, but we don’t want this information otherwise shared.

To ensure transparency and appropriate use we need oversight.  We need independent organizations that monitor and certify whether data is collected and used as described and permitted.  In some cases this might be the government, in others it might be industry associations.  As in banking, finance, agriculture, labor and other industries, we require regulators who audit, certify and license activity of data collectors.

To enforce these regulations we need significant penalties for those who violate standards.  Best practices can be established that limit penalties when violations are unavoidable (e.g., the result of a sophisticated break-in).  But those who violate policies, either intentionally or negligently, should be punished financially and perhaps even criminally in order for regulation to be effective.

Governments around the world are beginning to make progress on these issues, albeit using different tactics.  In the US, the Federal Trade Commission (FTC) leads.  They publish guidelines for industries, such as Do-Not-Track, and Privacy by Design.  They encourage industries to self-regulate to avoid disciplinary actions, which has led to the formation of organizations such as the Digital Advertising Alliance and influenced World Wide Web Consortium (W3C) browser standards.  The FTC frequently conducts lawsuits enforcing its policies.  Currently they’re working on tighter regulation for data brokers, who buy and sell consumer data, to increase transparency and better limit abuse.

In the European Union, regulators are attempting to define basic human rights about data collection, then use these to guide policy implementation.  Under the General Data Protection Regulation (GDPR), personal data collection would require consent, and users could demand that any personal data collected be subsequently purged.  Fines for violation would be significant.

Current regulatory efforts are works in progress, so it is not yet clear how each will function, but we can begin to consider probable impacts.  We must also work to better coordinate these efforts, since many technologies operate across borders.

Had photographers required explicit permission from each person they photographed we would likely not have most of the photos we enjoy today.  Similarly, legislation that overly restricts data collection may significantly limit our progress.  While certain general principles like transparency and privacy are clear, overly restricting collection may be short-sighted. Rather it is more important that we define abuse of collected data, ensure that we can detect it when it occurs, and punish it effectively.

These are tractable problems.  If we choose to address them productively, we can both avoid the hazards and achieve the rewards of data technology.

This is the first article in a series on issues around data privacy and regulation.  Future articles will discuss data privacy challenges in particular industries, technologies that might assist, and regulatory efforts worldwide.


Filed under: General