host: Michael Burnstein
notes: Andrew Davis
similarities and differences b/w open data communities and open source communities
- open data communities have more internal debate over licensing
- the field is new and there has not been enough litigation for businesses to guage risk
- Copyright restrictions are weightier for data vs software
- in the US a database (of copyrightable facts) cannot itself be copyrighted -- aggregation isn't "creative"
- in Europe aggregation is considered to be a creative task (this has hindered open data startups from starting in Europe)
- in the US you can extract the facts from a dataset and not be liable for copyright infringement -- The copyright situation in europe is more tenuous
- open data communities are difficult to start because rules are not the same all over the world
- restrictions require that those joining/using data startups to be known personally to insure that they are not litigous
- open street maps had to abandon a CC viral license model
How do we incentiveize contribution to Open Data?
- people will put their data into the public domain to contribute to a specific cause
- You don't have to appeal to "what's right" or "social good"
- bringing business to open data requires showing business benefits
- Opening research data
- peer review in the public space is one way of convincing academia
- getting rid of database access fees is something academics are highly interested in
- scandals in falsafied data has brought public knowledge of open data to the netherlands (stoppel, stannel (sp?))
"don't try to evangelize non-geeks about the benefit of open data because they don't care"
Read vs. Write Access in Open Data (access to data vs. contributing to data)*how do you verify public contribution
- community verification (10 people agree, so it's probably right)
- trusted users
- community users can crowsource coverage of data verification
- don't allow public access
- multiple repositories can be used to verify each other
- Closed Source code can restrict duplication and insures quality demands
- humans must be involved, you can't autiomate all verification
- What happens when bots target a dataset for corruption?
- time thresholds are often used to prevent bot corruption
Public vs. Private Data*you must track the source of all data
- medical data has certain fields that can never be shared
- lawyer wiki is completely closed to allow for open discussion
- is it enough to close or denormalize some data to the public to maintain privacy?
- allow users to express their level of consent with clear wording
Examples
- Tri-Met routing built upon open street map
- tri met is responsible for route and timetable accuracy
- tri met is not responsible for map accuracy
- tri-met verifies the open street map data every night and submits corrections back to the community
- First Monday
- AOL anonymized search query data was quickly de-anonymized