Professional Documents
Culture Documents
Response to the Request for Information on Open Source Software Security: Areas of Long-Term
Focus and Prioritization
Docket ID: ONCD–2023–0002
The Python Software Foundation is the organization behind the Python programming language. We are
grateful to the requesting agencies for their attention to the critical issue of security in open-source software.
We believe that proactive investment in increasing security in this field will support a healthy future for the
open-source software ecosystem, ultimately delivering widespread benefits to the public.
In our response, we have drawn on our expertise and depth of connection to Python maintainers and users,
one of the largest and most robust open-source software communities. In particular, two recently launched
initiatives have built up our relevant organizational knowledge and technical expertise: The Security
Developer-in-Residence and the PyPI Safety & Security Engineer initiatives are grant-funded programs that
each create a full-time position at the Python Software Foundation allowing a developer to devote themselves
to the highest priority security needs in the ecosystem. These roles have created immediate impact with high
community support, and provide a potential successful model for how the government can engage with
ecosystems like Python.
We recommend the Federal Government prioritize: supporting open-source project maintainers on evolving
security practices; supporting end-to-end verifiable sources and builds of open-source Python projects;
promoting adoption of memory safe languages, and finally supporting non-technical solutions to open-source
software supply chain challenges. Interventions targeting these areas are high-leverage avenues to drive down
systemic risks and increase the sustainability of the Python community and other open source software
communities in the long term. Our submission details our reasoning and suggested interventions.
Contributors: Deb Nicholson, Executive Director; Ee Durbin, Director of Infrastructure; Mike Fiedler, PyPI
Safety and Security Engineer; Seth Larson, Security Developer-in-Residence; Loren Crary, Director of
Resource Development
Contact: To discuss any of this RFI response in further detail, please contact RFI-2023@python.org.
Table of Contents
I. About the Python Programming Language and the Python Software Foundation 1
II. About the Python Package Index 1
III. Support Project Maintainers on Evolving Packaging Security Practices 3
IV. End-to-end verifiable sources and builds of open-source Python packages 5
V. Python, Packages, and Memory Safety 7
VI. Non-Technical Solutions to Supply Chain Challenges 9
Appendix 11
I. About the Python Programming Language and the Python Software Foundation
Python is an open-source, memory safe programming language. It was originally developed by Guido van
Rossum and first released in 1991. By several measures, it is the most widely used programming language in
the world.1 At our last measurement, python.org served over 110 billion downloads for Python releases
annually.2 It is used in a vast range of applications.3 It is among the most common programming languages
learned in schools and universities; it is one of the most commonly used languages in basic and applied
sciences; companies from startups to giant global enterprises rely on Python to build ubiquitous applications
like Instagram (Meta), Dropbox, and Spotify; and it is a critical language for the U.S. Government as a
component in projects for agencies across the federal government. Python is also one of the most important
languages in the burgeoning fields of data science and artificial intelligence.4
The Python Software Foundation (the “PSF”) is the organization behind Python, as well as the Python Package
Index (“PyPI”), providing critical digital infrastructure relied on by an uncountable number of individuals,
projects, companies, organizations, and governments globally. It is a U.S. 501(c)(3) nonprofit that was
originally formed in 2001 to hold the copyrights, licenses, and trademarks for the Python language and to fulfill
its mission to “promote, protect, and advance the Python programming language, and to support and facilitate
the growth of a diverse and international community of Python programmers.” In addition to providing the basic
infrastructure for Python as well as PyPI, the PSF supports the Python community through several initiatives
such as a grants program (in 2022 the PSF distributed $215,000 in grant funds to 138 groups in 42 countries)
and hosts PyCon US, the largest annual gathering of Python users, companies, and projects that attracts
attendees from around the world. The PSF also runs strategic initiatives to improve the Python ecosystem as a
whole, such as the Developer in Residence program which provides for a paid developer dedicated full-time to
improving the core Python language; the PyPI Safety & Security Engineer who is the first paid full-time
software engineer dedicated to the security of PyPI; and the Security Core Developer in Residence, who is
responsible for identifying and prioritizing security risks to the Python ecosystem and taking action to mitigate
those risks.
One reason Python is so commonly used across such a wide range of applications is its robust packaging
ecosystem, which is centered on the Python Package Index (“PyPI”).5 PyPI is a public repository of software
1
See Appendix section 1.
2
2021 Python Software Foundation Annual Impact Report p8, https://www.python.org/psf/annual-report/2021/
3
See Appendix section 2.
4
For example, OpenAI’s flagship product ChatGPT is built on Python.
5
https://pypi.org/
1
that is free to use for distributing and downloading bundles of Python software, commonly known as
“packages.” PyPI is a free service the Python Software Foundation provides to the public to make distributing
and downloading Python packages simple, such that packages and their dependencies can be installed
typically with only a single command. The large array of packages available makes it easy for developers to
find and integrate preexisting code that has functionality they need in their own projects.
The first version of PyPI was opened for public use over twenty years ago in 2002,6 and it continues to be the
main repository for open-source Python packages. PyPI has grown to host over 490,000 distinct projects, with
over 5 million releases, which are downloaded collectively over a billion times per day. Maintainers upload
roughly between 12,000-15,000 new releases per day to PyPI.
PyPI website typically receives between 3.7 and 4 million unique visitors (humans) per month. PyPI receives
approximately 3 billion web requests per day for projects hosted on its infrastructure (approx. 90 billion per
month).
PyPI differs from other software platforms like GitHub or GitLab in that it provides a place for maintainers to
produce a Python-compatible artifact (i.e a package) for end-users to download and install. Thus end-users do
not necessarily have to bear the concerns of compiling the source code to their specific computer
architectures, saving them time and inaccuracies. Some notable projects available on PyPI include Google’s
open source projects for Tensorflow,7 which provide many of the foundational components for today’s Artificial
Intelligence and Machine Learning applications; and NumFOCUS8’ open source sponsored projects including
NumPy, Matplotlib, and pandas,9 which are used widely across industries. Government agencies such as the
U.S. General Services Administration, 18F, Center for Disease Control, Department for Veterans Affairs, U.S.
Naval Research Laboratory, National Security Agency, and the National Aeronautics and Space Administration
among others leverage many of these packages.10
Secondly, PyPI is publicly available for any user to register and upload an open-source Python project to be
shared and used by others, as long as they follow the PyPI Terms of Use and Acceptable Use Policy.11 The
Python community also produces tutorials guiding new project maintainers on how to package their project and
distribute via PyPI.12 By making the service freely available and accessible, PyPI encourages users to share
their projects with others.
Finally, as an independent, non-corporate project backed by the Python Software Foundation, PyPI works for
the common good of all projects, maintainers, and users, without prioritizing the concerns of one particular
corporation, class of users, or use case over another.
6
PEP 301 – Package Index and Metadata for Distutils, https://peps.python.org/pep-0301/
7
https://www.tensorflow.org/
8
https://numfocus.org/
9
https://pypi.org/project/numpy/ ; https://pypi.org/project/matplotlib/ ; https://pypi.org/project/pandas/
10
See Appendix section 3.
11
https://pypi.org/policy/terms-of-use/ and https://pypi.org/policy/acceptable-use-policy/
12
https://packaging.python.org/en/latest/tutorials/packaging-projects/
2
III. Support Project Maintainers on Evolving Packaging Security Practices
RFI Focus Area: Secure Open-Source Software Foundations, Sub-Area: Reducing entire classes of
vulnerabilities at scale
Summary
There is constant evolution and innovation in the security best practices for open-source project maintainers.
When new technologies are developed or new threats are identified, often the software solutions are quickly
available, and large corporate actors can easily adopt them, but an enormous number of critical projects don’t
have the resources to do the same. When projects without such corporate backing use out-of-date security
practices, they open vulnerabilities that they pass on to every project downstream. Funding nonprofit
organizations to take security best practices the “last mile” to reach critical projects will increase security for the
whole ecosystem.
When the resources package maintainers rely on to determine security best practices are updated with new
practices, there aren’t many feedback mechanisms today informing existing project maintainers on whether
they are following the latest practices, and what a path to integrating those new practices might look like.
Existing projects often need to apply changes to adopt newer practices, but may not be aware that these
opportunities exist. At the same time, publicizing these opportunities poses a risk for the users of said projects
until a maintainer adopts them, effectively putting a bulls-eye on a known vulnerability.
An example of such an evolving best practice is: When PyPI was originally built, it allowed maintainers to
publish packages by authenticating with a valid username/password pair. Since then, PyPI has implemented
API Tokens13, a more secure authentication mechanism, so maintainers can opt into publishing packages with
these tokens rather than with a username/password pair. Tokens can be scoped to specific projects, instead of
full access to any project tied to the maintainer’s account, narrowing the potential fallout of a given token being
compromised. By not updating to the token-based approach, any exposure of the maintainer’s
username/password would allow an attacker to control any of the packages in the maintainer’s account,
instead of only the package allowed by the token. Token-based authentication is the better authentication
mechanism, but each maintainer must first become aware of the new standard, second understand how to
implement it, and finally to actually roll out the update to their individual projects.
13
In the name of backwards compatibility, the username/password option still exists, but is actively discouraged. This
option will be removed once 2FA is enforced at the end of 2023.
14
https://pypi.org/project/urllib3/
15
https://pypi.org/project/cryptography/
3
without the resources or even the knowledge to update their package publishing processes to more secure
methods. Out of 490,000 projects on PyPI, 91% of projects have a single account with the maintainer role.16
For example, maintainers using a username/password for publishing packages may receive an email directing
them to adopt API Tokens, but still places the burden of adoption on the maintainer. Providing maintainers with
tools to easily adopt the newer practice could accelerate better security practices, reducing risk exposure
across the ecosystem.
A targeted investment in security practice resources for collective use can exponentially reduce risk across the
sector. If a trusted actor with appropriate capacity and expertise builds (and, crucially, continuously updates)
these resources and shares them across the ecosystem, it lifts an enormous burden from individual
maintainers making duplicative efforts to separately track evolving security threats and standards and teaching
themselves to adapt their projects. An organization like the PSF that is neutral and already directly connected
with thousands of projects through the critical digital infrastructure it provides via PyPI is uniquely positioned to
take on this work.
We also propose making the information readily available to maintainers, but not to the general public, as this
exposes potential weaknesses in the maintainer’s processes and could become a signal for attackers to
pursue.
These tools will benefit individual maintainers, community projects, and corporations alike, as they may all find
opportunities of which they were previously unaware, as well as the resources allowing them to take action.
16
Data accessed from PyPI.org database on 2023-11-01.
4
The federal government should also provide funding to open-source software foundations for security-focused
developer-in-residence programs and other security initiatives. Existing initiatives of this kind have had a
meaningful impact on the security of our ecosystem,17 and there is a great opportunity for the government to
scale up its role in driving this impact.
Summary
We’d like to share more detailed context and recommendations with respect to one specific area under the
broader category of security packages related to packaging we discussed in the previous section. We believe it
is critical to prioritize strengthening the software supply chain. An enormous and diverse set of users—from
hobbyists to scientists to multinational enterprises to U.S. government agencies—rely on PyPI and the
packages it hosts. Currently, there is no standard way for those users to verify the source and provenance of
these packages. This means PyPI users don’t have a guarantee that the software they download is what they
are expecting, creating a vulnerability that can be exploited by malicious actors. While tools exist that address
different aspects of this risk, there isn’t currently a cohesive and comprehensive toolset. We propose the
creation of a centralized, standardized toolset that allows for end-to-end verifiability and that is enabled by
default for all PyPI packages, effectively eliminating an entire class of risks for a large class of software
consumers.
Supply chain attacks targeting open source projects remain a constant and costly threat. Being able to
verifiably link a package back to its source code and its build process protects against some of the most potent
of these attacks. Some examples of these attacks are malware being silently injected into the build process like
what happened with “Webmin”18 or when a PyPI account with access to a Python package “ctx”19 was
compromised and suddenly began publishing malware directly to PyPI.
17
See for example the results from just the first three months of the PSF Security Developer-in-Residence program:
https://pyfound.blogspot.com/2023/10/security-developer-in-residence-2023-q3-report.html ; and also these three reports
of PyPI security improvements from the first two months of the PyPI Safety & Security Engineer program:
https://blog.pypi.org/posts/2023-08-08-2fa-enforcement-for-new-users/,
https://blog.pypi.org/posts/2023-08-17-github-token-scanning-for-public-repos/, and
https://blog.pypi.org/posts/2023-09-18-inbound-malware-reporting/ .
18
See https://www.webmin.com/exploit.html
19
See https://python-security.readthedocs.io/pypi-vuln/index-2022-05-24-ctx-domain-takeover.html
20
SLSA is an OpenSSF project with support from many organizations. See: https://slsa.dev/ and
https://openssf.org/press-release/2023/04/19/openssf-announces-slsa-version-1-0-release/ SLSA v1.0 focuses on
5
critical issue is adoption. Today, over half of open source project maintainers are not aware of security
standards like NIST Secure Software Development Framework (“SSDF”), SLSA, or OpenSSF Scorecards.21
Right now, projects need to adopt each standard or tool implementing the standard individually and must use
customized workflows or configurations, which dissuades many projects from adopting. Toward these
standards being adopted by critical open-source Python projects, we propose creating a centrally curated
toolset for building Python packages with optimal security practices enabled by default. A central
toolset would require a relatively small investment to leverage existing security technology to create
widespread adoption in one of the largest open source ecosystems in the world.
This approach builds on how the PSF has optimized security for uploading Python packages to PyPI using
“Trusted Publishers,”22 which is a secure publishing method that Python projects can implement using an
automated platform that provides an OIDC identity, like GitHub Actions. This Trusted Publishers initiative
combines beneficial security practices like short-lived scoped credentials, automated and hosted builds, and
workflow-based authentication alongside an improved user experience in order to drive adoption of the security
technology, improving security for all PyPI users.
While individual package maintainers can implement one or more of these practices “out-of-band”, there are
three core reasons why a centralized set of verifiability tooling is necessary for widespread adoption:
● A trusted builder: SLSA’s underlying technology requires using one or a few central “builder”
workflows which are trusted by both software producers and consumers. These builders should be
owned and maintained by a trusted organization in the ecosystem in order to drive adoption and to
minimize the number of variants.
● A controlled build environment: Reproducible builds and capturing Software Bill-of-Materials require
control over the build process in order to ensure all inputs and dependencies are recorded and come
from known sources. The controlled build environment provided by the trusted builder allows package
publishers to implement these controls and for the data to be recorded and published automatically
alongside the built artifacts.
● Future security advances: A centralized toolset adopted by the majority of critical PyPI projects
means that future security improvements can be incorporated into the toolset and widely rolled out,
rather than piecemeal adoption by individual open source projects.
● Integration into package repositories: If individual projects must choose their own approach
piecemeal, the result would be a patchwork of different supply chain security improvements that can’t
be harmonized and which would be impossible to integrate directly into the package repository PyPI or
downstream installers. Establishing a standardized method and format for build provenance and
tracking dependencies allows for a coherent, integrated approach that reduces the burden on both
package publishers and users. We believe that a deep integration into PyPI itself will help with adoption
both by consumers and producers as well as make future improvements possible, such as building
tooling on top of the information exposed by PyPI.
Many Python packages are written using only Python or some combination of Python and other languages like
C, C++, or Rust. Compiled languages are typically distributed as pre-compiled binaries in Python packages,
securing the building of software. Quoting the press release: “SLSA provides a framework to prevent source code and
build system tampering.” Reproducible builds are currently being considered as a part of SLSA standards. See:
https://docs.google.com/document/d/1Jk0yZnkTC3dfp8G5dmO8K9r1Kc7TRX2QVOwcFSKw1OQ
21
52% of Open Source maintainers answered “None” when asked which security standards they are familiar with from a
list of NIST SSDF, SLSA, and OpenSSF Scorecards. Tidelift 2023 State of the Open Source Maintainer Report,
https://tidelift.com/open-source-maintainer-survey-2023
22
D. Ingram, “Introducing Trusted Publishers,” The Python Package Index Blog, April 20th, 2023.
https://blog.pypi.org/posts/2023-04-20-introducing-trusted-publishers/
6
meaning that it’s difficult to determine the initial source code from the binary that’s being distributed. Without
knowledge of exactly which set of steps, versions of tooling, and the source and versions of dependencies
were used during the build process, it can be difficult or impossible to recreate a software artifact byte-for-byte
even with its initial source code due to the steps and dependencies not being recorded or uniform across
different projects.
Artifacts being reproducible from their original inputs gives a guarantee to consumers that an artifact doesn’t
have malicious or unexpected code injected into the final artifact. It also verifies that the source code can be
safely used to review whether to include the package as a dependency or upgrade to a new release.
With verifiable links to source code, build processes, and reproducibility of builds there would be multiple layers
of defense against supply chain attacks for every adopting Python package:
● For attacks compromising a PyPI account, build provenance allows consumers to detect attempts to
work-around a lack of access to the upstream source repository by attackers, which could manifest as
removal of build provenance as a requirement for publishing or a change to the expected source
repository to one that is attacker-controlled. These sorts of attacks are considered the most potent, as
they are able to completely impersonate a highly privileged user, for this reason PyPI and other
platforms have invested in mandatory 2FA for all user accounts.
● For attacks compromising publish API tokens, these attacks don’t have the privileges to modify
PyPI’s expectations of build provenance for a project so attempts to publish malicious code using these
credentials could be rejected due to not having a verifiable build provenance attestation.
● For attacks that compromise the source repository itself, the malicious source code would be
immutably linked to the release and would be publicly auditable before being adopted by consumers.
Attempts to conceal the source code after publishing the artifact by hiding or deletion would break the
verifiability of build provenance as verified by PyPI.
● For attacks that inject code directly into the build process, build reproducibility will allow
downstream consumers and redistributors to detect when software has been injected due to differences
in the resulting artifacts.
These properties provide defense-in-depth against supply chain attacks, vastly raising the bar for attackers
attempting to use existing trusted projects as an avenue to compromise many consumers at once.
To make the benefits of this centralized tooling accessible to users, the information generated by the tooling
must be made available alongside the Python packages on PyPI like SLSA build provenance attestations23 or
Software Bill-of-Materials documents24 describing the build environment, dependencies, and tooling used to
build the package. These additional pieces of information can be verified by PyPI and subsequently
downloaded and verified by package consumers and third parties.
23
See: https://slsa.dev/spec/v1.0/distributing-provenance#where-attestations-are-published
24
See: https://www.cisa.gov/sbom
7
V. Python, Packages, and Memory Safety
RFI Focus Area: Secure Open-Source Software Foundations, Sub-area: Fostering the adoption of memory
safe programming languages
A popular pattern for Python projects is to use Python for the majority of a program's logic to provide a usable
and testable interface that only makes calls to libraries written in any systems language (such as Rust, C, C++)
for the most performance-critical and CPU-intensive code paths. This minimizes the amount of systems
language and memory unsafe code that needs to be maintained without sacrificing overall program
performance, in addition to providing access to the suite of tools for testing and running static and dynamic
analysis of Python code.
Python’s suitability on the basis of performance for replacing or wrapping programs written purely in memory
unsafe languages has improved in recent years thanks to investments to fund the “Faster CPython Project” by
Microsoft25 as well as work to make the Global Interpreter Lock optional in PEP 70326 enabling Python to take
full advantage of modern multi-threaded CPUs without sacrificing memory safety. The Faster CPython project
is seeing large improvements, such as Python 3.11 being 10-60% faster than 3.10, depending on workload.27
A growing number of Python packages are instead written using Rust as a memory safe alternative systems
language. Widely-used examples include cryptography, pydantic, and orjson which collectively are installed
270 million times per month.29 In particular, the package cryptography underwent the challenging process of
rewriting from using C to Rust to take advantage of Rust’s memory safety, because it is a security-critical
package. Pydantic and orjson are both used to process user inputs along with having performance
requirements, highlighting their need for a memory safe systems language.
As can be seen in cryptography’s difficult experience rewriting the package using Rust30, as well as the high
share of critical packages using memory unsafe code, there is still work to be done to enable more Python
packages to migrate from using memory unsafe languages to memory safe languages like Rust. There is
progress in this area already, for example the PyO3 project31 which aims to provide a toolset for Python
developers to write and distribute Rust code in Python packages. More resources are needed to build on and
scale this progress. The key actions needed are to first identify critical projects that are currently using memory
unsafe code and then create a prioritized plan of action for each project.
25
See: https://devblogs.microsoft.com/python/python-311-faster-cpython-team/
26
See: https://peps.python.org/pep-0703/
27
See: https://docs.python.org/3/whatsnew/3.11.html#what's new 311-faster-cpython
28
See: https://py-code.org querying for common C and C++ file extensions.
29
See: https://www.pepy.tech/projects/cryptography, https://www.pepy.tech/projects/pydantic, and
https://www.pepy.tech/projects/orjson
30
A. Gaynor, P. Kehrer: “Shipping Python Extensions in Rust Two Million Times a Day,” recording of PyCon US talk, April
2022. https://www.youtube.com/watch?v=z_Eiy2W0APU
31
See: https://github.com/PyO3
8
Because every migration of a project into memory safety requires time and resources, it is critical to prioritize
projects that are most important to migrate and to avoid placing undue burden on projects which are less or not
safety-critical. The nature of open source consumption doesn’t lend itself to knowing how and where a project
is being used, so many project maintainers don’t know whether security must be prioritized for their own
projects, increasing the complexity of this task. Effectively prioritizing candidate projects for migration would
require surveying the ecosystem with usage information (number of downloads, dependency graph
information, and input from consumers like the federal government) and whether their primary function would
benefit from using a memory safe language (such as packages implementing cryptography or processing
uncontrolled inputs).
After target packages are identified and prioritized, a plan would need to be created for each project depending
on its unique situation and the resources that are available. Example plans could include migrating some or all
of their memory unsafe code to a memory safe language, be it Python or Rust, or by providing consumers with
alternative packages that utilize memory safe languages.
Supply chain issues facing open source ecosystems can’t be solved with technical solutions alone. For many
security vulnerabilities, there are existing, effective solutions, but each project’s maintainers must implement
them, and there is an enormous gap in the capacity for maintainers to do this work. Behind many widely used
open source projects, the majority of maintainers are unpaid individuals33 who are volunteering their time and
are already spread thin by existing responsibilities. Even more worryingly, projects frequently become
unmaintained (according to Sonatype's "State of Software Supply Chain" report, 82% of projects that were
maintained in 2022 are now unmaintained).34 For these unmaintained projects, no one is available or even has
access to make fixes to vulnerabilities or make improvements to the existing security posture of the project,
which in turn creates security risks for every project that is downstream.
In order to address the challenges of scarce maintainer capacity, the PSF and other organizations have been
offering both financial and social support to critical projects, especially those with a single maintainer or a small
group of volunteer maintainers. Crucially this approach marries organizational support alongside the adoption
32
There do exist projects for which it wouldn’t be practical or desirable to migrate to a memory safe language; see
Appendix section 4 for more information.
33
60% of respondents weren’t paid for their Open Source work. Tidelift, “ 2023 State of the Open Source Maintainer
Report,” https://tidelift.com/open-source-maintainer-survey-2023
34
Sonatype, "State of Software Supply Chain" report 2023, https://www.sonatype.com/state-of-the-software-supply-chain
9
of new best practices and maintenance, instead of introducing new policy or requirements for open source
projects to satisfy without support to do so.
The PSF itself has taken on the maintenance of multiple open source projects that were at risk of being
unmaintained and are critical in the Python ecosystem like Requests35 and CacheControl,36 providing a space
for the project to continue receiving maintenance from the community and releasing new versions without the
fear of the project becoming unmaintained. The PSF also acts as a fiscal sponsor for projects, e.g. Pallets37, so
that the PSF takes on administrative work such as accounting and bookkeeping support, and the maintainers
of those projects are freed up to focus on aspects like vulnerability management.
Other notable examples of organizations successfully providing valuable non-technical support to open source
maintainers are Tidelift38 and Jazzband39. Tidelift is a company that helps enterprises manage their
open-source software and that supports maintainers with direct financial payment as well as by providing a
roadmap of security best practices. Tidelift also helps projects create a security policy and manage vulnerability
reports and disclosures. Jazzband is a volunteer organization that accepts projects at risk of becoming
unmaintained and then provides common social and security infrastructure such as a security response team
and policy, a code of conduct, shared testing infrastructure, and more, through a collective of volunteers. This
approach ensures these projects have a home, and connects them with a team of on-call experts rather than
requiring each project to have its own team of experts always available.
35
See: https://github.com/psf/requests/
36
See: https://github.com/psf/cachecontrol/
37
See: https://github.com/pallets/
38
See: https://tidelift.com/
39
See https://jazzband.co/ Jazzband is itself a fiscal sponsoree of the PSF.
10
Appendix
12