Page MenuHomePhabricator

Update copyright_header.py to not duplicate parts of the header
ClosedPublic

Authored by nakihito on Feb 4 2020, 22:54.

Details

Summary

Currently, using the insert command of this script on a file like
wallet_zapwallettxes.py will cause part of the copyright header to be
duplicated.

current behavior (without this patch) example:

# Copyright (c) 2014-2016 The Bitcoin Core developers
# Distributed under the MIT software license, see the accompanying
# file COPYING or http://www.opensource.org/licenses/mit-license.php.

to:

# Copyright (c) 2015-2019 The Bitcoin developers
# Distributed under the MIT software license, see the accompanying
# file COPYING or http://www.opensource.org/licenses/mit-license.php.
# Copyright (c) 2014-2016 The Bitcoin Core developers
# Distributed under the MIT software license, see the accompanying
# file COPYING or http://www.opensource.org/licenses/mit-license.php.

This change fixes the behavior so that it does not duplicate lines.
example:

# Copyright (c) 2014-2016 The Bitcoin Core developers
# Distributed under the MIT software license, see the accompanying
# file COPYING or http://www.opensource.org/licenses/mit-license.php.

to:

# Copyright (c) 2014-2016 The Bitcoin Core developers
# Copyright (c) 2014-2019 The Bitcoin developers
# Distributed under the MIT software license, see the accompanying
# file COPYING or http://www.opensource.org/licenses/mit-license.php.
Test Plan

On a file with the copyright header:

contrib/devtools/copyright_headers.py insert <file>

This should outout the following error and make no change:
*** src/seeder/main.cpp already has a copyright by The Bitcoin developers

Remove the // Copyright (c) <oldyear>-<newyear> The Bitcoin developers line from a .cpp or .h file.

contrib/devtools/copyright_headers.py insert <file>

The copyright header should be the same as before (maybe newyear is updated to something more recent).

Edit the // Copyright (c) <oldyear>-<newyear> The Bitcoin developers line in a .cpp or .h file to // Copyright (c) <oldyear>-<newyear> The Bitcoin Core developers.

contrib/devtools/copyright_headers.py insert <file>

The copyright header should now include the The Bitcoin developers line (which could have its newyear updated as before) underneath the The Bitcoin Core developers line.

Completely remove the entire copyright header from the file.

contrib/devtools/copyright_headers.py insert <file>

This should reproduce the results of the second case.

Repeat using a python file.

Diff Detail

Repository
rABC Bitcoin ABC
Branch
FixCopyrightScript
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 9272
Build 16486: Default Diff Build & Tests
Build 16485: arc lint + arc unit

Event Timeline

Owners added a reviewer: Restricted Owners Package.Feb 4 2020, 22:54
nakihito edited the test plan for this revision. (Show Details)
nakihito retitled this revision from Update copyright_heder.py to not duplicate parts of the header to Update copyright_header.py to not duplicate parts of the header.Feb 4 2020, 23:53
nakihito edited the summary of this revision. (Show Details)
jasonbcox requested changes to this revision.Feb 5 2020, 00:13
jasonbcox added a subscriber: jasonbcox.

I like the spirit of the change, as it moves towards making it easier to append ourselves as authors of files with existing copyrights rather than appending manually.

However, the implementation has some flaws (see comments).

For this input:

# Copyright (c) 2014-2016 The Bitcoin Core developers
# Distributed under the MIT software license, see the accompanying
# file COPYING or http://www.opensource.org/licenses/mit-license.php.

I would expect this output with insert:

# Copyright (c) 2014-2016 The Bitcoin Core developers
# Copyright (c) 2020 The Bitcoin developers
# Distributed under the MIT software license, see the accompanying
# file COPYING or http://www.opensource.org/licenses/mit-license.php.

And with this new output as input, I would expect insert to make no changes.

What is curious to me is that insert and update are separate commands. On one hand, this makes sense: if you author a new file, you insert the entire copyright. If you modify a file that you did not author, you update the (hopefully) existing copyright.
However, with the substantial linting, modification, and automation we've been adding to the codebase, I see no issue in always adding ourselves as authors of every file automatically. I think there's an argument to merge the behavior of the two so that
as much of the copyright that is missing gets added as needed, or only an update to the existing line is made.

There's a strong hint here that we should add this as a linter so that new files get copyright added automatically and newly backported files get a new line appended to the copyright automatically, and every file touched at the beginning of the new year will get
bumped automatically. In essence, we won't have to waste brain cycles on this problem ever again if implemented with care.

contrib/devtools/copyright_header.py
597 ↗(On Diff #16015)

This doesn't appear to work when a copyright doesn't exist at all because it assumes there is one, breaking the primary use case for insert

This revision now requires changes to proceed.Feb 5 2020, 00:13
contrib/devtools/copyright_header.py
597 ↗(On Diff #16015)

I realize my comment is actually wrong. Regardless, there's a code smell here due to find_distribution_line_index not being used where it is called.

Moved find_distribution_line_index() call sites to be closer to where it is actually used.

This revision is now accepted and ready to land.Feb 7 2020, 18:30