Sunday, June 1, 2014

Geospatial Data Licensing for Non-lawyers (Webinar)

Last week I presented a webinar hosted by LocationTech on Geospatial Data Licensing. The presentation was geared for non-lawyers and can be found here.  It is also available on YouTube here.  Thanks to Andrew Ross and the Eclipse Foundation for putting this together. 

Friday, May 9, 2014

White House Big Data and Privacy Report: Wake Up Call for Geospatial Community?

On May 1, the White House released a report: "Big Data: A Technological Perspective". The report was prepared by the President's Council of Advisors on Science and Technology (PCAST), a group of leading scientists and engineers that make policy recommendations to the President on important issues. The President had asked PCAST to prepare a report on the privacy implications of Big Data.

The Report

The report provides a detailed analysis of the privacy risks associated with Big Data. Many of these risks have been well documented.  However, this report also goes into detail on the risks associated with what it refers to as "born analog" data. Born-analog data is defined in the report as arising "from the characteristics of the physical world.” The data becomes accessible electronically when it “impinges upon a ‘sensor’, an engineered device that observes physical effects and converts them to digital form”.  As this sensor is frequently associated with a physical location, much of this data is geospatial and has been collected, processed, visualized, analyzed, stored and distributed for years by the geospatial community without any privacy considerations.
Undoubtedly, some will question the need for such measures as it is a change in the way they have operated in the past with respect to geospatial datasets. However, it is important for the geospatial community to recognize that privacy concerns have evolved, due to in part to the rapid technological advancements that they helped create.  Other sectors – finance, medical, education – that collect and use data are required to take these steps.  As geospatial technology moves into the mainstream, and the number and variety of commercial uses grow, geospatial companies can expect to become subject to similar requirements. The alternative could be much worse.

However, that is beginning to change. According to the report, the privacy concerns associated with born-analog datasets are that they "likely contain more information than the minimum necessary for their immediate purpose." Data minimization – collecting the minimum required to perform the task at hand – is one of the tenets of privacy protection around the world.  While the report acknowledges that there are a number of technological and business reasons for this to occur, the authors suggests that there are inherent privacy risks with such an approach. For example, “[a] consequence is that born-analog data will often contain information that was not originally expected. Unexpected information could in many cases lead to unanticipated beneficial products and services, but it could also give opportunities for unanticipated misuse.”(p.23)  The line as to whether a use constitutes an unanticipated benefit or an unanticipated misuse often depends upon your point of view.

Many in the geospatial community have believed they are immune from the privacy discussions because the technology they use is not capable of “identifying” a specific individual. For example, satellite and most aerial images are not of sufficient quality to identify an individual’s face or read a license plate. However, privacy risks have evolved. For example, the report cites the increased power of data fusion in connection with born-analog data and states that the risks are not simply in “identifying” an individual but also in developing correlations and creating profiles.

“Data fusion occurs when data from different sources are brought into contact and new facts emerge (See section 3.2.2). Individually, each data source may have a specific limited purpose. Their combination, however, may uncover new meanings. In particular, data fusion can result in the identification of individual people, the creation of profiles of an individual and the tracking of an individual’s activities. More broadly, data analytics discovers patterns and correlations in large corpuses of data, using increasingly powerful statistical algorithms. If those data include personal data, the inferences flowing from data analytics may then be mapped backed to inferences, both certain and uncertain about individuals” (p.x)

The report then goes on to describe various types of technologies that create born-analog data that contains “personal information”. The geospatial community relies on many of these for their products and services, including (i) video from . . . overhead drones; (ii) imaging infrared video; and (iii) synthetic aperture radar (SAR). (p 22) The report also identifies privacy risks associated with LiDAR, acknowledging that while LiDAR is important to governments, industry and a broad range of academic disciplines, “[s]cene extraction is an example of inadvertent capture of personal information and can be used for data fusion to reveal personal information.” (p. 27) In addition, the report cites the privacy risks associated with “precise geolocation in imagery from satellites and drones”. (p. 28)
The report makes several recommendations to the President. The most relevant to the geospatial community are:
·         Policy attention should focus more on the actual uses of big data and less on its collection and analysis;
·         Policies and regulation, at all levels of government, should not embed particular technological solutions, but rather should be stated in terms of intended outcomes; and,
·         The United States should take the lead both in the international arena and at home by adopting policies that stimulate the use of practical privacy-protecting technologies that exist today.  It can exhibit leadership both by its convening power (for instance, by promoting the creation and adoption of standards) and also by its own procurement practices (such as its own use of privacy-preserving cloud services). 

 What does the Report Mean For the Geospatial Community?

It is unlikely that the White House report will result in any laws being passed in this session of Congress that will specifically address privacy risks associated with born-analog data. However, the report has reframed the discussion on privacy in a way that will have a direct impact on the geospatial community. For example, suppliers of geospatial data products and services to the federal government soon may be required to certify that they are taking proper steps to protect any personal information acquired from born-analog data. The geospatial community also should expect that regulators, such as the Federal Trade Commission - and the Federal Aviation Agency with respect to UAVs – will begin citing the findings of this report in future discussion on policies and regulations. Lawyers will also likely cite the report to influence court decisions on matters regarding privacy concerns associated with geospatial data.          

As a result, organizations that collect, use, store and/or distribute geospatial data should consider taking a number of steps. These include:
-          Conducting an inventory of their born-analog data to identify potential privacy risks;
-          Developing privacy policies (external) and privacy statements (internal) with respect to born-analog datasets that do (or could) contain personal information;
-          Incorporating explicit language requiring compliance with privacy laws and regulations in their vendor and customer agreements; and
-          Training employees who work with born-analog data on privacy and internal procedures.

In addition, if geospatial datasets are deemed by law to contain “personal information”, there may be additional obligations imposed upon geospatial organizations. For example, they may be required to implement specific information security measures, such as encryption, when the data is transferred or stored. Geospatial organizations may also become subject to state data breach laws, which details specific steps to be taken if networks are hacked, or certain data is lost or stolen.

Thursday, May 1, 2014

Legal Impact of Anonymisation Techniques and Geospatial Data

The Article 29 Data Protection Working Party recently published Opinion 05/2014 on Anonymisation Techniques. The purpose of the opinion was to "analyze the effectiveness and limits of existing anonymisation techniques against the EU legal background of data protection and provide recommendations . . . "

The report discusses a number of anonymisation techniques, including randomization - through noise addition, permutation and differential privacy - and generalization - through aggregation, k-anonymity, l-diversity and t-closeness. The opinion examines the "robustness" of each technique with respect to three criteria: (i) whether it was still possible to single out an individual; (ii) was it possible to link records relating to an individual and (iii) whether information can be inferred concerning an individual. The group also explored pseudonymisation, primarily to "clarify some pitfalls and misconceptions: pseudonymisation is not a method of anonymisation." The report concludes that "anonymisation techniques can provide privacy guarantees and may be used to generate efficient anonymisation processes, but only if their application is engineered appropriately." This requires the data process to clearly identify the context and the objectives of the process, and should be determined on a "case-by-case basis". Moreover, anonymisation should not be a one-off exercise, as privacy risks should be regularly reassessed.

The report included a number of references to geospatial information. For example, it states that "if an organisation collects data on individual travel movements, the individual travel patters at event level would still qualify as personal data for any party, as long as the data controller (or any other party) still has access to the original raw data, even if direct identifiers have been removed from the set provided to third parties. But if the data controller would delete the raw data, and only provide aggregate statistics to third parties on a high level, such as 'on Mondays on trajectory X there are 160% more passengers than on Tuesdays', that would qualify as anonymous data." (p. 9). The report also refers to a 1997 study in which an academic researcher could link the identify of specific data subjects to the attributes of an anonymised data using only a zip code and two other attributes. (pp 33-34)

With respect to pseudonymised datasets, the report cites a study published in 2013 conducted by MIT  researchers that found by using 15 months of spatial-temporal mobility coordinates of 1.5 million people on a territory within a radius of 100 km, "95% of the population could be singled-out with four location points, and that just two points were enough to single-out more than 50% of the data subjects (if one of the points in known)" - even if the individuals' identities were pseudonymised. (p. 23)

The Article 29 Data Protection Working Party consists of representatives from Data Protection authorities (and a few others) across Europe. As such, although the opinion does not constitute the law in Europe, it provides useful guidance for those collecting/processing/using/storing/distributing data in the region. Since Europe is considered one of the leaders in data protection/privacy, many other nations will consider the European position when drafting their own laws and policies. In addition, I point out that the author of the 1997 study cited above is currently the Chief Technology Officer for the U.S. Federal Trade Commission (FTC). The FTC is becoming the de facto federal authority for privacy in the U.S. As a result, it is a useful marker for organizations that are attempting to anonomise datasets that contain geospatial attributes. 

Monday, April 28, 2014

Spatial Law and Policy Update (April 28, 2014)

"Where Geospatial Technology Is Taking the Law"


The Privacy Paradox and the Boston Marathon  (Spatial Law and Policy Blog)

InBloom Wilts Amid Privacy Backlash  (IAPP) Well funded education tech company goes out of business due to privacy concerns over business model. "Regardless of the merits of either of these arguments, the lesson for companies in ed tech and beyond should be loud and clear: Privacy is a core business concern. And as should be evident from the inBloom case, getting it right means a lot more than just complying with applicable laws and regulations. Privacy isn’t just about regulatory compliance. It’s about setting the tone of your message, managing consumer expectations, bringing the public along with you for the ride, avoiding privacy lurches  and not creepy . It’s more an art than a science."

Case Law: Weller v Associated Newspapers Limited, Paparazzi, beware – Alexia Bedat  (Informms Blog)  Detailed discussion of how UK court examines privacy tort claims. 

If You Get A Misdelivered Package, UPS Will Give A Stranger Your Home Address  (Forbes) I wonder how big of a concern this would have been 20 years ago?


The Supreme Court's struggle to grasp Aereo's tiny TV antennas  (LA Times)  The Supreme Court's decision in this case could impact the geospatial community.

Licensing and the public domain  (Spatial Reserves)  Blog post discusses various open licensing types, including the ODbL. "For some organisations integrating OSM data with their own private data, or organisations who are mandated to make their data available in the public domain (for example the US Geological Survey), wider use of this data resource is not an option and the benefits of crowd-sourced, free and open datasets like OSM will never be fully realised."

Data Quality

Data Scientists Not Required: New Alteryx Release Puts Predictive and Customer Analytics in the Hands of Every Analyst  (Directions) Critical, real-time decision making being pushed further down into organizations may result in data being used/analyzed in ways it is not suitable. 

CreepShield Claims to Out the Creeps in Online Dating  (NYT) The site returns results showing the photos and names of offenders in its database — even when they are obviously far from a match.


    Spatial Data Infrastructure/Open Data

    Public Safety/Law Enforcement/National Security

    Sheriff ran air surveillance over Compton without telling residents (LA Times) Sheriff did not feel it was important to notify residents because there were already a number of CCTV cameras on the ground. 

    Prosecutors: GPS device tracked murder victim  (wdnt) Troubling if true. 

Technology Platforms


Indoor Location


Internet of Things/Smart Grid/Intelligent Transportation Systems/Autonomous Vehicles

Why Data from Automated Vehicles Needs Serious Protection  (GPS World) "The data generated is both of a critical and personal nature. And data that is moving in and out of the vehicle to be processed elsewhere or to communicate with other vehicles is particularly vulnerable. The consequences are far greater than a violation of privacy or a stolen identity."

Remote Sensing


AMATEUR FOOTAGE: A GLOBAL STUDY OF USER-GENERATED CONTENT IN TV AND ONLINE-NEWS OUTPUT  Study suggests that crowd sourced content does not receive proper attribution.


The problem is not Uber — the problem is missing regulations  (Sensor & Systems)  This is not just an Uber problem. It’s a problem we face with every “innovation”

Wednesday, April 23, 2014

The Privacy Paradox and the Boston Marathon

Having grown up in the Boston area, I have been following with great interest the story about the runner at the Boston Marathon who collapsed and was picked up an carried by other runners towards the finish line. It was a heartwarming story, particularly given the tragic events at last year's marathon. However I should not have been surprised, given the state of today's media, that there has been a backlash over the past 24 hours with at least one website disputing how far the runner was carried and whether he was carried across the finish line or whether he finished himself.

As a geoprivacy geek, what caught my attention with the most recent reporting was that the initial reporter specifically states that he will not disclose the fallen runner's name because the runner had asked to have his privacy protected. However the website disputing the initial accounts went on to identify the runner, based in part I assume on the number on his running bib that was captured on video and cameras at the finish line.  (There is a website that allows you to search runners by bib number.)

This incident highlights to me the challenges that we will face in the not too distant future as lawmakers, regulators and judges try to protect privacy in public places. I often refer to this as the "Privacy Paradox". Our location in public spaces, and accompanying information about what we are doing, who we are with, what we are looking at, etc. are being collected in more ways and by more people than ever before. However, we increasingly are expecting greater privacy. (Twenty years ago I don't think many runners (or spectators for that matter) worried about their privacy at the Boston Marathon - I know my family didn't when we used to run watch).

Presumably, the runner signed an acknowledgement (or waiver) that his bib number would be public and tied to his name.  However, I wonder if this document addressed the release of his name in highly public (and perhaps embarrassing) dispute such as is now taking place. Similarly, it is unclear as to why Deadspin felt the need to disclose the runner's identity; however should such disclosure be considered a violation of the runner's privacy?  Even if he had specifically asked not to have his name identified?

Privacy in public places. It may not prove to be a paradox, but it is going to be a long and difficult journey.

Saturday, March 29, 2014

The Role of Trust and the Future of Augmented Reality

On March 26, I spoke to the Augmented Reality Community meeting held in conjunction with the Open Geospatial Consortium (OGC) Technical and Planning Committee meeting held in Arlington, Virginia.  Quite understandably, the subject of trust and Augmented Reality (A/R) quickly turned to privacy. 

I had a hard time coming up with what to say on this subject.  First, because Augmented Reality can /and will involve many diverse technologies and applications. Second, because concerns over A/R technologies often drown out discussions regarding the value of many A/R applications.  

So I began my talk at the beginning - by breaking down the elements of A/R.  Augmented  Reality  is defined on Wikipedia as "a live, copy or view of a physical, real-world environment whose elements are augmented (or supplemented) by computer-generated sensory input such as sound, video, graphics or GPS data."

Breaking it down, what are the privacy concerns in "reality"? I would suggest there are two primary issues for the A/R community to consider in this area:

1.    Historically there have been different expectation of privacy when a person is in public then when they are are in a private place.  However those expectations are beginning to change; more and more courts, regulators, policymakers, academics, privacy advocates and even technologists are redefining the what expectations of privacy in public should be reasonable give new technological capabilities. 

2.    What are the privacy expectations with respect to an object? Increasingly there have been growing expectations of privacy with objects, such as mobile phones.  How will this translate to other objects, such as automobiles, or the outside of homes. 

.        Next, what are the privacy concerns associated with augmentation - "the elements are augmented (or supplemented) by computer-generated sensory input such as sound, video, graphics or GPS data."  I would suggest there are two primary issues for the A/R community to consider with respect to augmentation. 

1.    Are you augmenting with public input (data) or private input (data)? Obviously there is a greater private concern associated with private data. However, there are increasingly concerns with public data as well. For example, the New York newspaper that posted a controversial interactive map of publicly available names and addresses of registered gun owners. 

2.    What is the definition of “public” data?     Social media is pushing the limits of what has is considered public and what is private. However, do people appreciate how available the posted information will be become and how it might be used? Are some types of social media more “public”?

Based upon this analysis, I came up with three questions that I believe the A/R community should consider when building applications and use cases.   These questions can help define the framework in which to determine the potential impact of A/R in a market/jurisdiction. Also, should expect that the answers will change over time, and in some cases will be “individual”-specific, such as when a minor is involved.

1.    If/when does the display of augmented public data of someone who is in the public violate that individual’s privacy?

2.    If/when does the use or sharing of augmented public data of someone who is in public violate that individual’s privacy?

3.    Is there ever a time when the display and/or use of augmented private data of someone who is in private worth the potential/perceived violation of that individual’s privacy?   If the answer is yes, when is it appropriate by whom?