louie is currently certified at Master level.

Name: Luis Villa
Member since: 1999-11-09
Last Login: 2008-07-15 03:49:38

FOAF RDF Share This

Homepage: http://tieguy.org/


A former maintainer of legOS, I'm now actively involved in GNOME as bugmaster and release team member. I haven't updated my advo page since advo was in beta; please don't expect that to change drastically. :)

Recent blog entries by louie

Syndication: RSS 2.0

Complying with Creative Commons license attribution requirements in slides and powerpoint

When I was at Mozilla and WMF, I frequently got asked how to give proper credit when using Creative Commons-licensed images in slideshows. I got the question again last week, and am working on slides right now, so here’s a quick guide.

The basics

First, a quick refresher. To comply with Creative Commons (CC) attribution requirements, you need to provide four things in a “reasonable” manner:

  1. the title of the work (if there is one);
  2. the author (might be an internet username);
  3. the source (where you got it); and
  4. the license (including version).

CC helpfully condenses those to “TASL“. An example:

“Larry Lessig giving #ccsummit2011 keynote” by David Kindler is licensed under CC BY 2.0

Creating this information has traditionally been a pain, but this one were generated with one click by the great new “copy credit as text” button in the CC search beta!

Once you’ve created an appropriate credit line, the question, then, is what is a “reasonable” way to put it into a slide deck? There are a few options.

The maximalist option

An obvious option is to put the credit information on every slide, like the lower right hand corner here:

From “‘Program and Engagement Coordination’ – A reflective process management to take movement conferences to the next level“, by Cornelius Kibelka, under CC BY 4.0.

This has some benefits:

  • Clearly complies with the license.
  • Regularly reminds the audience that the images are available and reusable.
  • If you reorganize the slides, the credit stays with the image.

Things that aren’t so great:

  • Distracts from your message.
  • Very difficult to read, so not very useful to the audience, or motivating for the author.

What Lessig does

To keep the focus on his content, Creative Commons founder Lessig puts all his attributions on a single slide at the end of each talk. (This is consistent with his famous “Lessig method” — large, bold images and very few words.) You can see an example just before the end of a talk he gave in 2013. Note that Lessig does not give an oral explanation of what is on the slide, or mention of the license, since they are shown during applause.

My own slides do something similar:

I give more detail by providing links, and note that all images are specifically CC BY-SA 3.0 unless otherwise noted.

So what’s good/bad about this approach? Good:

  • Doesn’t distract from your message as a speaker (which is the reason you’re speaking, after all!)
  • Complies with the license, since it is “reasonable” for the slide medium.


  • Doesn’t give the authors much recognition.
  • Only weakly informs the audience that that the images are available and reusable (since it is at the end and nearly unreadable).
  • If you reorder your slides, or copy and paste into a different deck, you also have to remember to reorder/reuse your attribution slide.

Improving recognition and utility

Given those drawbacks, here are two things you can consider doing to improve on Lessig’s approach.

Fix utility with a clear link to downloadable information

Consider adding a slide at the end, before the full attribution slide, that provides a download link and mentions the license — something like “download slides, and get links and licenses for images, at lu.is/talks“. If you leave that slide up during Q&A, and the URL is short and memorable, the audience can easily find the licensing information later when it is useful to them.

Recognize authors with a thank-you slide

The small type and quick flash of a long attribution slide may be legally compliant, but it does not help give authors the recognition they often want. So consider adding a “thank you” slide with just the names of authors, and a prominent CC logo, without any titles and licensing information. It will make the authors happy, especially if any of them are in the audience!

Syndicated 2017-02-27 19:02:38 from Blog – Luis Villa: Open Law and Strategy

React’s license: necessary and open?

I got multiple emails last week about React’s patent license, and this analysis made the rounds. So a few quick thoughts.

tl;dr: React’s patent license (1) isn’t a bad idea, because the BSD license is not explicit about granting patent rights; and (2) probably meets the requirements of the Open Source Definition.

React, by erokism, used under CC BY 2.0

Disclaimer: I have in the past counseled Facebook, but I do not currently represent them, and have never advised them on React.

Why are we here?

Big software companies who genuinely want to give away infrastructure code like React generally have three slightly conflicting goals:

  1. be super-permissive (because you want maximum use)
    1. (a) including GPL-compatibility!  (if you take maximum use seriously)
  2. give users confidence that you won’t sue them over patents
  3. (optional) have defensive patent clauses (if you want to discourage your users from suing you over patents)

Here’s the problem: historically, there hasn’t been a license that meets all of those needs. No license gives both #1(a) and #2, because FSF has historically considered patent termination an incompatibility with GPL v2. BSD/MIT does #1, but doesn’t do 3 – and may not give you confidence about patents (#2).

BSD doesn’t give patent confidence?!?

You might be surprised when I say the BSD license may not give users confidence around patents. You’re not alone! El Camino Legal writes:

I’ve never heard any lawyer postulate that [the BSD license] does not grant a license to fully exploit the licensed software under all of the licensor’s intellectual property. … Developers-licensees (or, more to the point, their lawyers) have traditionally been very confident that the BSD License does not leave room for a licensor to successfully sue under patents.

I personally think a court should and probably would read the BSD license in this way. But I — and many other FOSS experts — are not “very” confident about this, especially for clients at high patent risk for some reason.

Why not? In short, the BSD license does not actually say “you have a license to use our patents” — it just says “you can use our software”. Courts should in this case say “of course allowing you to use their software also allows you to use their patents”. (In US patent law, this is called an implied license.) But whether a court will do this varies from country to country, and even court to court. And in an era (hopefully ending soon!) of mass litigation over software patents, some large companies — and individuals — reasonably want more confidence than that.

You don’t have to take my word for it: law firmsscholars, and FSF have written about concerns with implied licenses. Google took the issue seriously enough to write a React-like additional permission for WebM; and Oracle explicitly cited the problem as motivation for writing, and getting OSI approval for, a BSD-style license with explicit patent grant. (HP doesn’t like Oracle’s license, but still agrees that “there may be a need” to address the problem.)

Don’t over-react by deleting all your BSD-license code! BSD’s implied patent license is probably fine, the vast majority of the time. But if you use BSD-licensed code and face increased patent risk (say, you compete with the author, and they have a lot of patents) then it is reasonable to investigate more. And if you publish code under BSD there is no harm, and some potential benefit, in resolving the uncertainty up front with explicit patent language. This is exactly what Facebook seems to have tried here.

Is it well-written?

Since (until recently) there were no standard permissive licenses with an explicit patent license, concerned companies have used custom-drafted licenses. Unfortunately, virtually no one gets new open licenses right on the first try. For example, Google revised their WebM patent language after early feedback from the open license community. And even the most careful open license drafters have a clause they regret. (Ask me over a beer sometime.)

Given that history, it isn’t surprising that this new license is somewhat inelegant. For example, El Camino is correct that the “Necessary Claim” language comes from standards rather than software. (I suspect Facebook got it from either the Apache license and the WebM patent grant.) I’d personally add that “for the avoidance of doubt” is usually not good practice. And I’m curious why they called this an “additional” grant in the title of the document ­— on the one hand, that could be read to acknowledge the implicit grant in the BSD license (great!), but on the other hand it could be read to weaken the value of the termination clause (not so hot). (And of course, Facebook also had some second thoughts, updating the license to allow countersuits – against themselves!)

Is it open?

El Camino’s blog post has gotten attention in large part for claiming that the React license is not open source. Respectfully, I think they’ve gotten this wrong, and I want to correct the record.

Their claim that React is not open source hinges on the definition of a “fee” in section 1 of the Open Source Definition. The Definition says:

The license shall not require a royalty or other fee for such sale.

El Camino argues that the React license clause that requires you not to sue Facebook over patents is a “fee”, since the licensee “pays a price… not… paid with money” to use the software. This interpretation is not unreasonable! Giving up your options is, indeed, a “price” in some sense.

However, the OSI and the broader open source community have always interpreted “fee” to mean monetary payment. This is reflected in the annotated Open Source Definition, which states that this clause “require[s] free redistribution” (emphasis mine).

More conclusively, the GPL (indeed, all copyleft licenses) also require you to give up some options — the option to make proprietary derivatives! If “fee” was defined as “giving up options”, then the GPL would never have been treated as an open license. Instead, GPL has always been considered open by the Open Source Initiative — pretty conclusive evidence that “fee” means monetary payment.

And of course, as El Camino noted in an update to their original post, OSI approved similar patent language when they approved MPL 1.1.

I’m not going to firmly claim that the React license is compliant with the Open Source Definition, since it hasn’t gone through a full OSI review. But I think the concern raised by El Camino is based on a (well-intentioned) misunderstanding of the Open Source Definition, and the language would likely pass an OSI review for OSD compliance.

Is it a good idea?

Of course, a license can meet the requirements of the Open Source Definition and still not be a great idea. For example, when drafting MPL 2.0 we realized that narrowing MPL 1.1’s patent termination clause would encourage use in some cases while not hurting Mozilla’s contributors. I suspect that, overall, React’s license would be better if it made the same change. But, again, “you might not want to use it if your company is a frequent patent litigator and/or huge Facebook competitor” is not the same as “not open”.

License protects users, not just Facebook

It is important to note that there are two key ways that this clause protects React’s users, not just Facebook.

First, there is the obvious one: this gives users a very explicit patent license. If Zuckerberg retires tomorrow (or, um, sells their open source components to Oracle) React’s users will still have a very clear license to those patents.

Second, this clause gives Facebook the ability to protect React users who are sued over React-related patents, not just Facebook. Would Facebook actually protect React users that way? No idea! But if I’m a troll and considering suing React users en masse, this language at least gives a reason to pause and think twice. (MPL 2.0’s patent retaliation clause, canceling not just the patent license but also the copyright license, would have even more teeth – something for Facebook to consider if they revise this again :)

Bottom line

Is the React license elegant? No. Should you be worried about using it? Probably not. If anything, Facebook’s attempt to give users an explicit patent license should probably be seen as a good faith gesture that builds some confidence in their ecosystem.

But yeah, don’t use it if your company intends to invest heavily in React and also sue Facebook over unrelated patents. That… would be dumb. :)

Syndicated 2016-10-31 21:51:13 from Blog – Luis Villa: Open Law and Strategy

Public licenses and data: So what to do instead?

I just explained why open and copyleft licensing, which work fairly well in the software context, might not be legally workable, or practically a good idea, around data. So what to do instead? tl;dr: say no to licenses, say yes to norms.

"Day 43-Sharing" by A. David Holloway, under CC BY 2.0.
Day 43-Sharing” by A. David Holloway, under CC BY 2.0.

Partial solutions

In this complex landscape, it should be no surprise that there are no perfect solutions. I’ll start with two behaviors that can help.

Education and lawyering: just say no

If you’re reading this post, odds are that, within your organization or community, you’re known as a data geek and might get pulled in when someone asks for a new data (or hardware, or culture) license. The best thing you can do is help explain why restrictive “public” licensing for data is a bad idea. To the extent there is a community of lawyers around open licensing, we also need to be comfortable saying “this is a bad idea”.

These blog posts, to some extent, are my mea culpa for not saying “no” during the drafting of ODbL. At that time, I thought that if only we worked hard enough, and were creative enough, we could make a data license that avoided the pitfalls others had identified. It was only years later that I finally realized there were systemic reasons why we were doomed, despite lots of hard work and thoughtful lawyering. These posts lay out why, so that in the future I can say no more efficiently. Feel free to borrow them when you also need to say no :)

Project structure: collaboration builds on itself

When thinking about what people actually want from open licenses, it is important to remember that how people collaborate is deeply impacted by factors of how your project is structured. (To put it another way, architecture is also law.) For example, many kernel contributors feel that the best reason to contribute your code to the Linux kernel is not because of the license, but because the high velocity of development means that your costs are much lower if you get your features upstream quickly. Similarly, if you can build a big community like Wikimedia’s around your data, the velocity of improvements is likely to reduce the desire to fork. Where possible, consider also offering services and collaboration spaces that encourage people to work in public, rather than providing the bare minimum necessary for your own use. Or more simply, spend money on community people, rather than lawyers! These kinds of tweaks can often have much more of an impact on free-riding and contribution than any license choice. Unfortunately, the details are often project specific – which makes it hard to talk about in a blog post! Especially one that is already too long.

Solving with norms

So if lawyers should advise against the use of data law, and structuring your project for collaboration might not apply to you, what then? Following Peter Desmet, Science Commons, and others, I think the right tool for building resilient, global communities of sharing (in data and elsewhere) is written norms, combined with a formal release of rights.

Norms are essentially optimistic statements of what should be done, rather than formal requirements of what must be done (with the enforcement power of the state behind them). There is an extensive literature, pioneered by Nobelist Elinor Ostrom, on how they are actually how a huge amount of humankind’s work gets done – despite the skepticism of economists and lawyers. Critically, they often work even without the enforcement power of the legal system. For example, academia’s anti-plagiarism norms (when buttressed by appropriate non-legal institutional supports) are fairly successful. While there are still plagiarism problems, they’re fairly comparable to the Linux kernel’s GPL-violation problems – even though, unlike GPL, there is no legal enforcement mechanisms!

Norms and licenses have similar benefits

In many key ways, norms are not actually significantly different than licenses. Norms and licenses both can help (or hurt) a community reach their goals by:

  • Educating newcomers about community expectations: Collaboration requires shared understanding of the behavior that will guide that collaboration. Written norms can create that shared expectation just as well as licenses, and often better, since they can be flexible and human-readable in ways legally-binding international documents can’t.
  • Serving as the basis for social pressure: For the vast majority of collaborative projects, praise, shame, and other social nudges, not legal threats, are the actual basis for collaboration. (If you need proof of this, consider the decades-long success of open source before any legal enforcement was attempted.) Again, norms can serve this role just as well or not better, since it is often desire to cooperate and a fear of shaming that are what actually drive collaboration.
  • Similar levels of enforcement: While you can’t use the legal system to enforce a norm, most people and organizations also don’t have the option to use the legal system to enforce licenses – it is too expensive, or too time consuming, or the violator is in another country, or one of many other reasons why the legal system might not be an option (especially in data!) So instead most projects result to tools like personal appeals or threats of publicity – tools that are still available with norms.
  • Working in practice (usually): As I mentioned above, basing collaboration on social norms, rather than legal tools, work all the time in real life. The idea that collaboration can’t occur without the threat of legal sanction is really a somewhat recent invention. (I could actually have listed this under differences – since, as Ostrom teaches us, legal mechanisms often fail where norms succeed, and I think that is the case in data too.)

Why are norms better?

Of course, if norms were merely “as good as” licenses in the ways I just listed, I probably wouldn’t recommend them. Here are some ways that they can be better, in ways that address some of the concerns I raised in my earlier posts in this series:

  • Global: While [building global norms is not easy](http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3038591/), social norms based on appeals to the very human desires for collaboration and partnership can be a lot more global than the current schemes for protecting database or hardware rights, which aren’t international. (You can try to fake internationalization through a license, but as I pointed out in earlier posts, that is likely to fail legally, and be ignored by exactly the largest partners who you most want to get on board.)
  • Flexible: Many of the practical problems with licenses in data space boil down to their inflexibility: if a license presumes something to be true, and it isn’t, you might not be able to do anything about it. Norms can be much more generous – well-intentioned re-users can creatively reinterpret the rules as necessary to get to a good outcome, without having to ask every contributor to change the license. (Copyright law in the US provides some flexibility through fair use, which has been critical in the development of the internet. The EU does not extend such flexibility to data, though member states can add some fair dealing provisions if they choose. In neither case are those exceptions global, so they can’t be relied on by collaborative projects that aim to be global in scope.)
  • Work against, not with, the permission culture: Lessig warned us early on about “permission culture” – the notion that we would always need to ask permission to do anything. Creative Commons was an attempt to fight it, but by being a legal obligation, rather than a normative statement, it made a key concession to the permission culture – that the legal system was the right terrain to have discussions about sharing. The digital world has pretty whole-heartedly rejected this conclusion, sharing freely and constantly. As a result, I suspect a system that appeals to ethical systems has a better chance of long-term sustainability, because it works with the “new” default behavior online rather than bringing in the heavy, and inflexible, hand of the law.

Why you still need a (permissive) license

Norms aren’t enough if the underlying legal system might allow an early contributor to later wield the law as a threat. That’s why the best practice in the data space is to use something like the Creative Commons public domain grant (CC-Zero) to set a clear, reliable, permissive baseline, and then use norms to add flexible requirements on top of that. This uses law to provide reliability and predictability, and then uses norms to address concerns about fairness, free-riding, and effectiveness. CC-Zero still isn’t perfect; most notably it has to try to be both a grant and a license to deal with different international rules around grants.

What next?

In this context, when I say “norms”, I mean not just the general term, but specifically written norms that can act as a reference point for community members. In the data space, some good examples are DPLA’s “CCO-BY” and the Canadensys biodiversity initiative. A more subtle form can be found buried in the terms for NIH’s Clinical Trials database. So, some potential next steps, depending on where your collaborative project is:

  • If your community has informal norms (“attribution good! sharing good!”) consider writing them down like the examples above. If you’re being pressed to adopt a license (hi, Wikidata!), consider writing down norms instead, and thinking creatively about how to name and shame those who violate those norms.
  • If you’re an organization that publishes licenses, consider using your drafting prowess to write some standard norms that encapsulate the same behaviors without the clunkiness of database (or hardware) law. (Open Data Commons made some moves in this direction circa 2010, and other groups could consider doing the same.)
  • If you’re an organization that keeps getting told that people won’t participate in your project because of your license, consider moving towards a more permissive license + a norm, or interpreting your license permissively and reinforcing it with norms.

Good luck! May your data be widely re-used and contributors be excited to join your project.

Syndicated 2016-09-26 15:00:12 from Blog – Luis Villa: Open Law and Strategy

Copyleft, attribution, and data: other considerations

Public licenses for databases don’t work well. Before going into solutions to that problem, though, I wanted to talk briefly about some things that are important to consider when thinking about solutions: real-world examples of the problems; a common, but bad, solution; and a discussion of the motivations behind public licenses.

Bullfrog map unavailable“, by Peter Desmets, under CC BY 3.0 unported

Real-world concerns, not just theoretical

When looking at solutions, it is important to understand that the practical concerns I blogged about aren’t just theoretical — they matter in practice too. For example, Peter Desmet has done a great job showing how overreaching licenses make bullfrog maps (and other data combinations) illegal. Alex Barth of OpenStreetMap has also discussed how ODbL creates problems for OSM users (though he got some Wikipedia-related facts wrong). And I’ve spoken to very well-intentioned organizations (including thoughtful, impactful non-profits) scared off from OSM for similar reasons.

On the flip side, because these rules are based on such flimsy legal grounds, sophisticated corporate legal departments often feel comfortable circumventing
the requirements by exploiting loopholes. (Needless to say, they don’t blog about the problems with the licenses – they just go ahead and use the loopholes.) So overreaching attempts to create new rights are, in many ways, the worst of both worlds: they hurt well-intentioned cooperation, and don’t dissuade parties with a significant interest in exploiting the commons.

What not to do: create new “rights”

When thinking about solutions, it is unfortunately also important to say what isn’t a good idea: create new rights, or override limitations on old ones. The Free Software Foundation, to their great credit, has always consistently said that if weakening copyright also weakens the GPL, they’ll take that tradeoff; and that vice-versa, the GPL should not ask for rights that go beyond copyright law. The most recent copyleft licenses from Creative Commons, Mozilla, and the FSF all make this explicit: limitations on copyright, like fair use, are not trumped by our licenses.

Unfortunately, many people have a good-faith desire to see copyleft-like results in other domains. As a result, they’ve gone the wrong way on this point. ODbL is probably the most blatant example of this: even at the time, Science Commons correctly pointed out that ODbL’s attempt to create database rights by contract outside of the EU was a bad idea. Unfortunately, well-intentioned people (including me!) pushed it through anyway. Similarly, open hardware proponents have tried to stretch copyright to cover functional works, with predictably messy results.

This is not just practically wrong, for the reasons I’ve explained in earlier posts. It is also ethically wrong for those of us who want to see more data sharing, because any “rights” we create by fiat are going to end up being used primarily to stop sharing, not encourage it.

Remembering why we do share-alike and attribution

Consider this section a brief sketch for a future post – if I forgot something
big, please let me know, but please don’t roast me in comments for being brief
or reductive about your favorite motivation.

It is important when writing about public licenses to remember why the idea of
placing restrictions on re-use is so intuitively appealing outside of software.
If we don’t understand why people want to do less-than-public domain, it’s hard
to come up with solutions that actually work. Motivations tend to be some
combination (varying from person to person and community to community) of:

  • Recognition: Many people want to at least be recognized for their work, even when they ask for nothing else. (When Creative Commons assessed usage after their 1.0 licenses, [97-98% of people chose attribution](https://creativecommons.org/2004/05/25/announcingandexplainingournew20licenses/).) This sentiment underlies many otherwise “permissive” licenses, as well as academic norms around plagiarism and attribution.
  • Reducing free riding: Lots of people are afraid that commons can be destroyed by people who use the resource without giving back. Historically, this “tragedy of the commons” was about [rivalrous](https://en.wikipedia.org/wiki/Rivalry_(economics)) goods (like fisheries), but the same concern is often raised in the context of collaborative communities, whose labor can be rivalrous even when their goods are non-rivalrous. Some people like share-alike requirements because, pragmatically, they feel such requirements are one way to prevent (or at least reduce) this risk by encouraging people to either participate fully or not participate at all. (If you’re interested in this point, I’ve [written about it before](http://lu.is/blog/2014/12/02/free-riding-and-copyleft-in-cultural-commons-like-flickr/).)
  • “Fairness”: Many people like share-alike out of a deep moral sense that if you take, you should also give back. This often looks the same as the previous point, but with the key distinction that at least some people focused on fairness care more about process and less about outcomes: a smaller, less productive community with more sharing may, for them, be better than a larger, more productive community where not everyone shares perfectly.
  • Access to allow self-help: Another variation on the previous two points is a use of copyleft that focuses less on “is the author helping me by cooperating” and more on “did the author give me materials I can then use to help myself”. In this view, increased access to raw material (like source code, or data) can be good even the authors are non-cooperative. (To those familiar with the Linux kernel discussions, this is essentially “I got a lousy driver, and the authors hate me, but at least I got *a* driver”.)
  • Ethical: Many people simply think data/source should never be proprietary, and so will use any means possible, like copyleft, to increase the amount of non-proprietary code in the world.

All of these motivations can be more or less valid at different points in time, in ways that (again) deserve a different post. (For example, automatic attribution may not have the same impact as “human” attribution, which may not be a surprise given the evidence on crowding out of intrinsic motivations.)

Finally, next (and final?) post: what solutions we’ve got.

Syndicated 2016-09-21 18:22:38 from Blog – Luis Villa: Open Law and Strategy

Copyleft and data: databases as poor subject

In my last post, I wrote about how database law is a poor platform to build a global public copyleft license on top of.  Of course, whether you can have copyleft in data only matters if copyleft in data is a good idea. I no longer think that is the case, because the way databases are used in the wild makes copyleft impractical even for good-faith users who want to share back. As with the last post, when we compare software (where copyleft has worked reasonably well) to databases, we’ll see that databases are different in very significant ways that impact whether or not we can expect copyleft (and to some extent other standardized public licenses) to work.

Card Puncher from the 1920 US Census.
Card Puncher from the 1920 US Census.

How works are combined

In software copyleft, the most common scenarios to evaluate are merging two large programs, or copying one small file into a much larger program. In this scenario, understanding how licenses work together is fairly straightforward: you have two licenses. If they can work together, great; if they can’t, then you don’t go forward, or, if it matters enough, you change the license on your own work to make it work.

In contrast, data is often combined in three ways that are significantly different than software:

  • Scale: Instead of a handful of projects, data is often combined from hundreds of sources, so doing a license conflicts analysis if any of those sources have conflicting obligations (like copyleft) is impractical. Peter Desmet did a great job of analyzing this in the context of an international bio-science dataset, which has 11,000+ data sources.
  • Boundaries: There are some cases where hundreds of pieces of software are combined (like operating systems and modern web services) but they have “natural” places to draw a boundary around the scope of the copyleft. Examples of this include the kernel-userspace boundary (useful when dealing with the GPL and Linux kernel), APIs (useful when dealing with the LGPL), or software-as-a-service (where no software is “distributed” in the classic sense at all). As a result, no one has to do much analysis of how those pieces fit together. In contrast, no natural “lines” have emerged around databases, so either you have copyleft that eats the entire combined dataset, or you have no copyleft. ODbL attempts to manage this with the concept of “independent” databases and produced works, but after this recent case I’m not sure even those tenuous attempts hold as a legal matter anymore.

  • Authorship: When you combine a handful of pieces of software, most of the time you also control the licensing of at least one of those pieces of software, and you can adjust the licensing of that piece as needed. (Widely-used exceptions to this rule, like OpenSSL, tend to be rare.) In other words, if you’re writing a Linux kernel driver, or a WordPress theme, you can choose the license to make sure it complies. Not necessarily the case in data combinations: if you’re making use of large public data sets, you’re often combining many other data sources where you aren’t the author. So if some of them have conflicting license obligations, you’re stuck.

How attribution is managed

Attribution in large software projects is painful enough that lawyers have written a lot on it, and open-source operating systems vendors have built somewhat elaborate systems to manage it. This isn’t just a problem for copyleft: it is also a problem for the supposedly easy case of attribution-only licenses.

Now, again, instead of dozens of authors, often employed by the same copyright-owner, imagine hundreds or thousands. And imagine that instead of combining these pieces in basically the same way each time you build the software, imagine that every time you have a different query, you have to provide different attribution data (because the relevant slices of data may have different sources or authors). That’s data!

The least-bad “solution” here is to (1) tag every field (not just data source) with licensing information, and (2) have data-reading software create new, accurate attribution information every time a new view into the data is created. (I actually know of at least one company that does this internally!) This is not impossible, but it is a big burden on data software developers, who must now include a lawyer in their product design team. Most of them will just go ahead and violate the licenses instead, pass the burden on to their users to figure out what the heck is going on, or both.

Who creates data

Most software is either under a very standard and well-understood open source license, or is produced by a single entity (or often even a single person!) that retains copyright and can adjust that license based on their needs. So if you find a piece of software that you’d like to use, you can either (1) just read their standard FOSS license, or (2) call them up and ask them to change it. (They might not change it, but at least they can if they want to.) This helps make copyleft problems manageable: if you find a true incompatibility, you can often ask the source of the problem to fix it, or fix it yourself (by changing the license on your software).

Data sources typically can’t solve problems by relicensing, because many of the most important data sources have different structures. In particular:

  • Governments: Lots of data is produced by governments, where licensing changes can literally require an act of the legislature. So if you do anything that goes against their license, or two different governments release data under conflicting licenses, you can’t just call up their lawyers and ask for a change.
  • Community collaborations: The biggest open software relicensing that’s ever been done (Mozilla) required getting permission from a few thousand people. Successful online collaboration projects can have 1-2 orders of magnitude more contributors than that, making relicensing is hard. Wikidata solved this the right way: by going with CC0.

So what to do?

So if data is legally hard to build a license for, and the nature of data makes copyleft (or even attribution!) hard, what to do? I’ll go into that in my next post.

Syndicated 2016-09-14 13:00:42 from Blog – Luis Villa: Open Law and Strategy

712 older entries...


louie certified others as follows:

  • louie certified dsandras as Master
  • louie certified verbal as Journeyer
  • louie certified sisob as Master
  • louie certified malcolm as Master
  • louie certified benadida as Master

Others have certified louie as follows:

  • slothrop certified louie as Apprentice
  • pat certified louie as Apprentice
  • gman certified louie as Master
  • arvind certified louie as Master
  • aldug certified louie as Master
  • jamesh certified louie as Master
  • BenFrantzDale certified louie as Master
  • campd certified louie as Master
  • kristian certified louie as Master
  • jfleck certified louie as Master
  • ricardo certified louie as Master
  • snorp certified louie as Master
  • dsandras certified louie as Master
  • fcrozat certified louie as Master
  • Hallski certified louie as Master
  • menthos certified louie as Master
  • Jody certified louie as Master
  • fxn certified louie as Master
  • Jordi certified louie as Master
  • cactus certified louie as Master
  • Uraeus certified louie as Master
  • nixnut certified louie as Master
  • garnacho certified louie as Master
  • elanthis certified louie as Master
  • riggwelter certified louie as Master
  • sdodji certified louie as Master
  • strider certified louie as Master
  • jarashi certified louie as Master
  • wardv certified louie as Master
  • nconway certified louie as Master
  • lerdsuwa certified louie as Master
  • dcoombs certified louie as Master
  • AlanHorkan certified louie as Master
  • araumntl06 certified louie as Master
  • sqlguru certified louie as Master
  • jnewbigin certified louie as Master
  • gpoo certified louie as Master
  • wlach certified louie as Master
  • cinamod certified louie as Master

[ Certification disabled because you're not logged in. ]

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

Share this page