Free and Open-Source Software Licenses#
There are a variety of free and open-source software licenses, which differ in the restrictions they place on use of the software. While this may seem a bit technical, as a newbie entering the world of data science it is important to have at least a basic understanding of these. For one thing, it could protect you from a legal violation in the future. For another, you will be using free software, and writing software, so it’s worth thinking about why and how the tools you use exist, and how you might wish to license your own code.
Some of the most commonly-used free software licenses are the GNU General Public License (GPL), the MIT License, the BSD License, and Creative Commons licenses. Wikipedia has an extensive list of free and open-source software licenses which compares them on their critical dimensions. Some of the most important dimensions to consider are the following. Linking refers to whether the code can be linked with code that carries a different license. Distribution refers to whether recipients of the software can redistribute it. Modification concerns whether recipients can modify the code. Patent grant concerns whether licensees are protected from patent claims regarding the software. Private use refers to whether licensees can modify the code for private use or, conversely, if any modifications must be made public. Sublicensing refers to whether modified code can be licensed under a different license (e.g., if code can be slightly modified, and then copyrighted for commercial use).
Open source and free software licenses differ with respect to how restrictive they are on re-use. Licenses on free software are copyleft, meaning that every person who receives a copy of the software is permitted to reproduce, adapt, or distribute it, but they can only do so under the same licensing terms as the original. Open source software, on the other hand, can carry more or less permissive licenses than free software. On the more permissive end, open source software may allow re-use under different licensing than the original, including commercial (paid) licensing. As well, open software can be tivoized, meaning that although the software is free, it will only run on proprietary hardware (so-named for the Tivo product that first used this scheme).
Ultimately, the differences between these licenses largely reflect a tradeoff between concepts of freedom and permissiveness. A more permissive license places fewer restrictions on how the software can be used and re-used. The term “copyleft” is a somewhat tongue-in-cheek play on “copyright”; whereas copyright is typically used by authors to restrict others’ ability to reproduce or modify a work, copyleft is intended to explicitly allow others to do these things, and prevent others from placing more restrictive licenses on derivations of the copylefted work. Placing a copyleft or other “share-alike” restriction on software in a sense reflects a more anti-corporate ideology, whereby the author limits some uses of their work, in return for the protection that it always be free. On the other hand, this can limit the utility of the software. By allowing derivatives of one’s software to be used commercially, a creator can potentially empower others to create new and valuable products — but the cost of developing those products might make paying the original author to use a piece of software infeasible. A more permissive license can thus actually offer more opportunities for one’s work to benefit society, by placing less restrictions on how it’s used.