Community Leadership Summit Wiki
Advertisement

host: Michael Burnstein

notes: Andrew Davis

similarities and differences b/w open data communities and open source communities

  • open data communities have more internal debate over licensing
  • the field is new and there has not been enough litigation for businesses to guage risk
  • Copyright restrictions are weightier for data vs software
    • in the US a database (of copyrightable facts) cannot itself be copyrighted -- aggregation isn't "creative"
    • in Europe aggregation is considered to be a creative task (this has hindered open data startups from starting in Europe)
    • in the US you can extract the facts from a dataset and not be liable for copyright infringement -- The copyright situation in europe is more tenuous
    • open data communities are difficult to start because rules are not the same all over the world
    • restrictions require that those joining/using data startups to be known personally to insure that they are not litigous
    • open street maps had to abandon a CC viral license model


How do we incentiveize contribution to Open Data?

  • people will put their data into the public domain to contribute to a specific cause
  • You don't have to appeal to "what's right" or "social good"
    • bringing business to open data requires showing business benefits
    • Opening research data
      • peer review in the public space is one way of convincing academia
      • getting rid of database access fees is something academics are highly interested in
      • scandals in falsafied data has brought public knowledge of open data to the netherlands (stoppel, stannel (sp?))

"don't try to evangelize non-geeks about the benefit of open data because they don't care"

http://xkcd.com/743/

Read vs. Write Access in Open Data (access to data vs. contributing to data)*how do you verify public contribution

    • community verification (10 people agree, so it's probably right)
    • trusted users
    • community users can crowsource coverage of data verification
    • don't allow public access
    • multiple repositories can be used to verify each other
    • Closed Source code can restrict duplication and insures quality demands
    • humans must be involved, you can't autiomate all verification
  • What happens when bots target a dataset for corruption?
    • time thresholds are often used to prevent bot corruption


Public vs. Private Data*you must track the source of all data

  • medical data has certain fields that can never be shared
  • lawyer wiki is completely closed to allow for open discussion
  • is it enough to close or denormalize some data to the public to maintain privacy?
  • allow users to express their level of consent with clear wording


Examples

  • Tri-Met routing built upon open street map
    • tri met is responsible for route and timetable accuracy
    • tri met is not responsible for map accuracy
    • tri-met verifies the open street map data every night and submits corrections back to the community
  • First Monday
  • AOL anonymized search query data was quickly de-anonymized

Advertisement