Using SPDX-License-Identifier
- 8 minutes read - 1503 wordsBackground
SPDX is an open standard for communicating software bill of material and license information (and some other information).
The work started in February of 2010 and published the SPDX 1.0 specification in August of 2011. Any open source project started before that adopted its own method of providing license information in its source code.
If projects haven’t moved over to use the SPDX-License-Identifier it might a good time to switch and look at how other major open source projects have utilized it.
Legacy examples
Looking at old source code, that I have written for the SBC library you find comment statements like this on top of each file.
/*
*
* Bluetooth low-complexity, subband codec (SBC) library
*
* Copyright (C) 2004-2008 Marcel Holtmann <marcel@holtmann.org>
*
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
*
*/
#ifdef HAVE_CONFIG_H
#include <config.h>
#endif
#include <stdio.h>
#include <errno.h>
They start with the title of the project, the copyright information and then the license text following by the initial standard C source code includes. And at that time it seems the sensible thing to do.
That the actual physical postal address was included seemed weird, but at the time that was the best practice. It seemed like a boilerplate that is there and has always been there and I am going to let someone else deal with that if the Free Software Foundation ever moves their offices.
I never thought of writing them a letter and asking for a copy of that license, but seem someone actually did.
These days the recommendation is to actually change that paragraph to point to their website instead of their office address.
/*
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, see <https://www.gnu.org/licenses/>.
*
*/
Needless to say that little projects bothered to go through their source code base and change it around. I for example couldn’t be bothered and if you read the GPL version 2 license text, it still recommends using the postal address and including all the rest of the boilerplate.
The GPL version 3 license text switching to using the website as reference, but still recommends the same boilerplate as before. When it comes to FSF, they have no oncept of SPDX license identifiers or any recommendation on how to use them. It looks like that work doesn’t exist for them.
What is the SPDX way
With that in mind, it is boilerplate and we keep repeating it over and over again. So I think it is a good idea to get rid of it. Especially in the context of source code scanning and trying to have license compliance, a standard format helps to identify issues with licenses.
Looking a SPDX IDs or the SPDX Tutorial for help tells you how to use the license identifier, but not really on how to make your files with the program name and copyright to make it look pleasing for your eyes. Which is no surprise since SPDX is mostly about making things machine readable.
// SPDX-License-Identifier: GPL-2.0-or-later
/* SPDX-License-Identifier: GPL-2.0-or-later */
The SPDX is clear that the license identifier has nothing to do with the copyright or the description and should not intermix, but reality is that most projects put all this information in semi-formatted way into the initial comment block of its source files. Now the question comes on how to integrate this nicely.
The Linux kernel seems to have come to an agreement on how to make up source files.
For kernel source files, the decision was made that the SPDX tag should appear as the first line in the file (or the second line for scripts where the first line must be the #! string). For normal C source files, the string will be a comment using the “//” syntax; header files, instead, use traditional (/* */) comments for reasons related to tooling.
Following the Linux kernel style it would start looking like this now:
// SPDX-License-Identifier: GPL-2.0-or-later
/*
*
* Bluetooth low-complexity, subband codec (SBC) library
*
* Copyright (C) 2004-2008 Marcel Holtmann <marcel@holtmann.org>
*
*/
#ifdef HAVE_CONFIG_H
#include <config.h>
#endif
#include <stdio.h>
#include <errno.h>
I don’t find it visually pleasing. First it intermixes different comment styles and second the name and copyright comment block looks kinda lost in this example.
And utilizing the same comment style makes it actually worse.
/* SPDX-License-Identifier: GPL-2.0-or-later */
/*
*
* Bluetooth low-complexity, subband codec (SBC) library
*
* Copyright (C) 2004-2008 Marcel Holtmann <marcel@holtmann.org>
*
*/
#ifdef HAVE_CONFIG_H
#include <config.h>
#endif
#include <stdio.h>
#include <errno.h>
None of these two options are visually pleasing when you open a source code file for the first time and have to make sense out of it. My personal opinion is that this style is too catered for people using shell hacks to parse files instead of using appropriate tools to find the license identifier. I also don’t like the fact the source files and header files you have to use a different comment style.
The only time I would opt for this kind using the license identifier is when having simple code (like examples) and the copyright and program information are just plain overhead or provided elsewhere. Having a license statement in each file is always a good idea since it avoids guessing what the authors original intent was.
/* SPDX-License-Identifier: GPL-2.0-or-later */
#ifdef HAVE_CONFIG_H
#include <config.h>
#endif
#include <stdio.h>
#include <errno.h>
Since I don’t feel utilizing either of these approaches, it is time to look at other prominent open source projects.
The Zephyr RTOS follows a different approach and has the license identifier embedded in the original comment block.
/*
* Copyright (c) 2015 Intel Corporation
*
* SPDX-License-Identifier: Apache-2.0
*/
#ifndef ZEPHYR_INCLUDE_ZEPHYR_H_
#define ZEPHYR_INCLUDE_ZEPHYR_H_
And so does the ARM Mbed OS while also adding the license boilerplate.
/* mbed Microcontroller Library
* Copyright (c) 2006-2013 ARM Limited
* SPDX-License-Identifier: Apache-2.0
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifndef MBED_H
#define MBED_H
These are just a few examples on how the SPDX license identifier is used and I am sure many permutations of this are used all over open source projects around the world.
It means there is not right or even wrong way to include your license identifier in your comment block on top of your source files.
Figuring out your own way
Adding the actual license boilerplate like ARM Mbed does is not an option. The whole point in changing existing code is to get rid of boilerplate and not just add more. You also have to assume that people releasing source code under an open source license have a good idea what they are doing and more important, people utilizing open source from others understand what an open source license means and implies.
Lets go back to what the FSF put in their licenses on how to attribute information about the program and the copyright.
<one line to give the program's name and a brief idea of what it does.>
Copyright (C) <year> <name of author>
This program is free software: you can redistribute it ...
Maybe the original comment block used was too verbose anyway and used too many line breaks and empty lines to make it visually pleasing.
I think it is time to revert to basics and use what the FSF proposes and replace the license boilerplate with the SPDX license identifier and then just close the comment block.
/*
* Bluetooth low-complexity, subband codec (SBC) library
* Copyright (C) 2004-2008 Marcel Holtmann <marcel@holtmann.org>
*
* SPDX-License-Identifier: GPL-2.0-or-later
*/
#ifdef HAVE_CONFIG_H
#include <config.h>
#endif
#include <stdio.h>
#include <errno.h>
While having contemplated so many different versions, this seems the most pleasing one. In addition it would allow for an easy addition of future tags if the need arises.