You are on page 1of 13

November 8, 2023

Python Software Foundation


9450 SW Gemini Dr. ECM# 90772
Beaverton, OR 97008
U.S. nonprofit public charity under section
501(c)(3) of the Internal Revenue Code

Response to the Request for Information on Open Source Software Security: Areas of Long-Term
Focus and Prioritization
Docket ID: ONCD–2023–0002

The Python Software Foundation is the organization behind the Python programming language. We are
grateful to the requesting agencies for their attention to the critical issue of security in open-source software.
We believe that proactive investment in increasing security in this field will support a healthy future for the
open-source software ecosystem, ultimately delivering widespread benefits to the public.

In our response, we have drawn on our expertise and depth of connection to Python maintainers and users,
one of the largest and most robust open-source software communities. In particular, two recently launched
initiatives have built up our relevant organizational knowledge and technical expertise: The Security
Developer-in-Residence and the PyPI Safety & Security Engineer initiatives are grant-funded programs that
each create a full-time position at the Python Software Foundation allowing a developer to devote themselves
to the highest priority security needs in the ecosystem. These roles have created immediate impact with high
community support, and provide a potential successful model for how the government can engage with
ecosystems like Python.

We recommend the Federal Government prioritize: supporting open-source project maintainers on evolving
security practices; supporting end-to-end verifiable sources and builds of open-source Python projects;
promoting adoption of memory safe languages, and finally supporting non-technical solutions to open-source
software supply chain challenges. Interventions targeting these areas are high-leverage avenues to drive down
systemic risks and increase the sustainability of the Python community and other open source software
communities in the long term. Our submission details our reasoning and suggested interventions.

Contributors: Deb Nicholson, Executive Director; Ee Durbin, Director of Infrastructure; Mike Fiedler, PyPI
Safety and Security Engineer; Seth Larson, Security Developer-in-Residence; Loren Crary, Director of
Resource Development

Contact: To discuss any of this RFI response in further detail, please contact RFI-2023@python.org.
Table of Contents

I. About the Python Programming Language and the Python Software Foundation 1
II. About the Python Package Index 1
III. Support Project Maintainers on Evolving Packaging Security Practices 3
IV. End-to-end verifiable sources and builds of open-source Python packages 5
V. Python, Packages, and Memory Safety 7
VI. Non-Technical Solutions to Supply Chain Challenges 9
Appendix 11

I. About the Python Programming Language and the Python Software Foundation

Python is an open-source, memory safe programming language. It was originally developed by Guido van
Rossum and first released in 1991. By several measures, it is the most widely used programming language in
the world.1 At our last measurement, python.org served over 110 billion downloads for Python releases
annually.2 It is used in a vast range of applications.3 It is among the most common programming languages
learned in schools and universities; it is one of the most commonly used languages in basic and applied
sciences; companies from startups to giant global enterprises rely on Python to build ubiquitous applications
like Instagram (Meta), Dropbox, and Spotify; and it is a critical language for the U.S. Government as a
component in projects for agencies across the federal government. Python is also one of the most important
languages in the burgeoning fields of data science and artificial intelligence.4

The Python Software Foundation (the “PSF”) is the organization behind Python, as well as the Python Package
Index (“PyPI”), providing critical digital infrastructure relied on by an uncountable number of individuals,
projects, companies, organizations, and governments globally. It is a U.S. 501(c)(3) nonprofit that was
originally formed in 2001 to hold the copyrights, licenses, and trademarks for the Python language and to fulfill
its mission to “promote, protect, and advance the Python programming language, and to support and facilitate
the growth of a diverse and international community of Python programmers.” In addition to providing the basic
infrastructure for Python as well as PyPI, the PSF supports the Python community through several initiatives
such as a grants program (in 2022 the PSF distributed $215,000 in grant funds to 138 groups in 42 countries)
and hosts PyCon US, the largest annual gathering of Python users, companies, and projects that attracts
attendees from around the world. The PSF also runs strategic initiatives to improve the Python ecosystem as a
whole, such as the Developer in Residence program which provides for a paid developer dedicated full-time to
improving the core Python language; the PyPI Safety & Security Engineer who is the first paid full-time
software engineer dedicated to the security of PyPI; and the Security Core Developer in Residence, who is
responsible for identifying and prioritizing security risks to the Python ecosystem and taking action to mitigate
those risks.

II. About the Python Package Index

One reason Python is so commonly used across such a wide range of applications is its robust packaging
ecosystem, which is centered on the Python Package Index (“PyPI”).5 PyPI is a public repository of software

1
See Appendix section 1.
2
2021 Python Software Foundation Annual Impact Report p8, https://www.python.org/psf/annual-report/2021/
3
See Appendix section 2.
4
For example, OpenAI’s flagship product ChatGPT is built on Python.
5
https://pypi.org/
1
that is free to use for distributing and downloading bundles of Python software, commonly known as
“packages.” PyPI is a free service the Python Software Foundation provides to the public to make distributing
and downloading Python packages simple, such that packages and their dependencies can be installed
typically with only a single command. The large array of packages available makes it easy for developers to
find and integrate preexisting code that has functionality they need in their own projects.

The first version of PyPI was opened for public use over twenty years ago in 2002,6 and it continues to be the
main repository for open-source Python packages. PyPI has grown to host over 490,000 distinct projects, with
over 5 million releases, which are downloaded collectively over a billion times per day. Maintainers upload
roughly between 12,000-15,000 new releases per day to PyPI.
PyPI website typically receives between 3.7 and 4 million unique visitors (humans) per month. PyPI receives
approximately 3 billion web requests per day for projects hosted on its infrastructure (approx. 90 billion per
month).

PyPI differs from other software platforms like GitHub or GitLab in that it provides a place for maintainers to
produce a Python-compatible artifact (i.e a package) for end-users to download and install. Thus end-users do
not necessarily have to bear the concerns of compiling the source code to their specific computer
architectures, saving them time and inaccuracies. Some notable projects available on PyPI include Google’s
open source projects for Tensorflow,7 which provide many of the foundational components for today’s Artificial
Intelligence and Machine Learning applications; and NumFOCUS8’ open source sponsored projects including
NumPy, Matplotlib, and pandas,9 which are used widely across industries. Government agencies such as the
U.S. General Services Administration, 18F, Center for Disease Control, Department for Veterans Affairs, U.S.
Naval Research Laboratory, National Security Agency, and the National Aeronautics and Space Administration
among others leverage many of these packages.10

PyPI’s unique position


PyPI holds a unique and powerful position in the Python ecosystem. First, it is the default repository most
client-side installation tools will connect with to find the software they are looking for. Many Python-based
applications and systems are built with a number of open-source Python packages, and these packages live
on PyPI, regardless of what kind of source code repository they originated from. An end user does not need to
concern themselves with whether the code was sourced from GitHub, GitLab, or closed-source systems, nor
learn their respective connection semantics, complexities, and quirks. Client installation tools only need to use
the well-established PyPI service and obtain the package they are looking for. (This occurs approximately
35,000 times per second).

Secondly, PyPI is publicly available for any user to register and upload an open-source Python project to be
shared and used by others, as long as they follow the PyPI Terms of Use and Acceptable Use Policy.11 The
Python community also produces tutorials guiding new project maintainers on how to package their project and
distribute via PyPI.12 By making the service freely available and accessible, PyPI encourages users to share
their projects with others.

Finally, as an independent, non-corporate project backed by the Python Software Foundation, PyPI works for
the common good of all projects, maintainers, and users, without prioritizing the concerns of one particular
corporation, class of users, or use case over another.
6
PEP 301 – Package Index and Metadata for Distutils, https://peps.python.org/pep-0301/
7
https://www.tensorflow.org/
8
https://numfocus.org/
9
https://pypi.org/project/numpy/ ; https://pypi.org/project/matplotlib/ ; https://pypi.org/project/pandas/
10
See Appendix section 3.
11
https://pypi.org/policy/terms-of-use/ and https://pypi.org/policy/acceptable-use-policy/
12
https://packaging.python.org/en/latest/tutorials/packaging-projects/
2
III. Support Project Maintainers on Evolving Packaging Security Practices
RFI Focus Area: Secure Open-Source Software Foundations, Sub-Area: Reducing entire classes of
vulnerabilities at scale

Summary
There is constant evolution and innovation in the security best practices for open-source project maintainers.
When new technologies are developed or new threats are identified, often the software solutions are quickly
available, and large corporate actors can easily adopt them, but an enormous number of critical projects don’t
have the resources to do the same. When projects without such corporate backing use out-of-date security
practices, they open vulnerabilities that they pass on to every project downstream. Funding nonprofit
organizations to take security best practices the “last mile” to reach critical projects will increase security for the
whole ecosystem.

Packaging and publishing security best practices: Moving targets


When a developer starts building a new project, they may use a blueprint or template, copy practices from
another project, or look to the ecosystem for the current guides and tutorials to determine what security
practices to adopt. Even if they successfully adopt the best security practices at the time, those practices are
constantly moving forward in response to new technological developments, new information, and new threats.
Project packaging and publishing practices also evolve over time, so the procedures considered correct and
secure at the time of a project’s inception shift with new technologies and security paradigms, which can
quickly send a project’s security practices from cutting-edge to out-of-date.

When the resources package maintainers rely on to determine security best practices are updated with new
practices, there aren’t many feedback mechanisms today informing existing project maintainers on whether
they are following the latest practices, and what a path to integrating those new practices might look like.
Existing projects often need to apply changes to adopt newer practices, but may not be aware that these
opportunities exist. At the same time, publicizing these opportunities poses a risk for the users of said projects
until a maintainer adopts them, effectively putting a bulls-eye on a known vulnerability.

An example of such an evolving best practice is: When PyPI was originally built, it allowed maintainers to
publish packages by authenticating with a valid username/password pair. Since then, PyPI has implemented
API Tokens13, a more secure authentication mechanism, so maintainers can opt into publishing packages with
these tokens rather than with a username/password pair. Tokens can be scoped to specific projects, instead of
full access to any project tied to the maintainer’s account, narrowing the potential fallout of a given token being
compromised. By not updating to the token-based approach, any exposure of the maintainer’s
username/password would allow an attacker to control any of the packages in the maintainer’s account,
instead of only the package allowed by the token. Token-based authentication is the better authentication
mechanism, but each maintainer must first become aware of the new standard, second understand how to
implement it, and finally to actually roll out the update to their individual projects.

Maintainers lack capacity to track and adopt evolving security methods


Some PyPI packages are maintained by corporations, which may have an Open Source Program Office
(OSPO) or other dedicated department that can identify, dictate, and manage risks to their processes of
producing open source software securely, protecting themselves and external users of their software. A large
volume of projects on PyPI, including widely-used projects such as urllib314 or cryptography,15 however, are
maintained by volunteers, not by corporations. Many projects are even maintained by a sole maintainer, often

13
In the name of backwards compatibility, the username/password option still exists, but is actively discouraged. This
option will be removed once 2FA is enforced at the end of 2023.
14
https://pypi.org/project/urllib3/
15
https://pypi.org/project/cryptography/
3
without the resources or even the knowledge to update their package publishing processes to more secure
methods. Out of 490,000 projects on PyPI, 91% of projects have a single account with the maintainer role.16

For example, maintainers using a username/password for publishing packages may receive an email directing
them to adopt API Tokens, but still places the burden of adoption on the maintainer. Providing maintainers with
tools to easily adopt the newer practice could accelerate better security practices, reducing risk exposure
across the ecosystem.

Building security standards resources as digital public goods


We propose that investing in shared tools, processes, and documentation that make it simple and low-effort for
project maintainers to become aware of and adopt newer, more secure practices, leads to better outcomes.
Providing project maintainers with feedback on what practices they are currently following, and what
opportunities they have to increase their overall security posture increases the overall security for everyone
using their projects. These resources could look like project-tailored guidance, automated security
assessments, or risk monitoring.

A targeted investment in security practice resources for collective use can exponentially reduce risk across the
sector. If a trusted actor with appropriate capacity and expertise builds (and, crucially, continuously updates)
these resources and shares them across the ecosystem, it lifts an enormous burden from individual
maintainers making duplicative efforts to separately track evolving security threats and standards and teaching
themselves to adapt their projects. An organization like the PSF that is neutral and already directly connected
with thousands of projects through the critical digital infrastructure it provides via PyPI is uniquely positioned to
take on this work.

We also propose making the information readily available to maintainers, but not to the general public, as this
exposes potential weaknesses in the maintainer’s processes and could become a signal for attackers to
pursue.

The risks of the status quo


In the absence of such tooling, there is a risk that maintainers never learn of a new practice or process, or they
only learn of it when PyPI disables the prior approach, which could prevent the maintainer’s ability to produce a
timely update to their project. We strive to avoid this with lengthy notice periods about behavioral changes, but
inevitably someone misses the notice and a project is thrown into crisis. By providing all PyPI maintainers with
accessible guidance and tools, the barriers to making security improvements are lowered and risks are
reduced for all package consumers.

These tools will benefit individual maintainers, community projects, and corporations alike, as they may all find
opportunities of which they were previously unaware, as well as the resources allowing them to take action.

How the federal government can take action


The federal government should direct funding to non-profit software foundations like the PSF to create this
tooling and roll it out to package maintainers. Public funding for open source organizations to build and
disseminate this tooling is a direct and cost-efficient way to increase adoption of security best practices and
mitigate vulnerabilities across the entire ecosystem. Organizations like the PSF have direct connections to the
target developers and projects, long-standing credibility in the community, and the requisite technical expertise.
As a result, these organizations are well-positioned to take security best practices the “last mile,” bridging the
gap between cutting-edge security and the thousands of independent open-source software projects that users
interact with every day.

16
Data accessed from PyPI.org database on 2023-11-01.

4
The federal government should also provide funding to open-source software foundations for security-focused
developer-in-residence programs and other security initiatives. Existing initiatives of this kind have had a
meaningful impact on the security of our ecosystem,17 and there is a great opportunity for the government to
scale up its role in driving this impact.

IV. End-to-end verifiable sources and builds of open-source Python packages


RFI Focus Area: Secure Open-Source Software Foundations, Sub-Area: Strengthening the software supply
chain

Summary
We’d like to share more detailed context and recommendations with respect to one specific area under the
broader category of security packages related to packaging we discussed in the previous section. We believe it
is critical to prioritize strengthening the software supply chain. An enormous and diverse set of users—from
hobbyists to scientists to multinational enterprises to U.S. government agencies—rely on PyPI and the
packages it hosts. Currently, there is no standard way for those users to verify the source and provenance of
these packages. This means PyPI users don’t have a guarantee that the software they download is what they
are expecting, creating a vulnerability that can be exploited by malicious actors. While tools exist that address
different aspects of this risk, there isn’t currently a cohesive and comprehensive toolset. We propose the
creation of a centralized, standardized toolset that allows for end-to-end verifiability and that is enabled by
default for all PyPI packages, effectively eliminating an entire class of risks for a large class of software
consumers.

The need for verifiable source and build attestations


Currently, when a package is published to PyPI, the package itself doesn’t contain any information verifiably
linking the package to where it originated from or how it was built. This is similar to finding an item in the
grocery store without a label or list of ingredients–there’s a chance that it’s safe to consume but would be more
trustworthy with verifiable labeling. For consumers who need to know the contents and origin of the software
they use, the current situation doesn’t meet their needs.

Supply chain attacks targeting open source projects remain a constant and costly threat. Being able to
verifiably link a package back to its source code and its build process protects against some of the most potent
of these attacks. Some examples of these attacks are malware being silently injected into the build process like
what happened with “Webmin”18 or when a PyPI account with access to a Python package “ctx”19 was
compromised and suddenly began publishing malware directly to PyPI.

Components of an effective end-to-end verification toolset


Fortunately, there have been advances in standards and technology for making verifiable attestations about
software’s origin, build, and dependencies that thwart future supply chain attacks like the ones above.
Examples of such standards are Security Levels for Software Artifacts (“SLSA”) and Reproducible Builds.
While there is broad consensus20 that these two standards are crucial for preventing far-reaching attacks, the

17
See for example the results from just the first three months of the PSF Security Developer-in-Residence program:
https://pyfound.blogspot.com/2023/10/security-developer-in-residence-2023-q3-report.html ; and also these three reports
of PyPI security improvements from the first two months of the PyPI Safety & Security Engineer program:
https://blog.pypi.org/posts/2023-08-08-2fa-enforcement-for-new-users/,
https://blog.pypi.org/posts/2023-08-17-github-token-scanning-for-public-repos/, and
https://blog.pypi.org/posts/2023-09-18-inbound-malware-reporting/ .
18
See https://www.webmin.com/exploit.html
19
See https://python-security.readthedocs.io/pypi-vuln/index-2022-05-24-ctx-domain-takeover.html
20
SLSA is an OpenSSF project with support from many organizations. See: https://slsa.dev/ and
https://openssf.org/press-release/2023/04/19/openssf-announces-slsa-version-1-0-release/ SLSA v1.0 focuses on
5
critical issue is adoption. Today, over half of open source project maintainers are not aware of security
standards like NIST Secure Software Development Framework (“SSDF”), SLSA, or OpenSSF Scorecards.21

Right now, projects need to adopt each standard or tool implementing the standard individually and must use
customized workflows or configurations, which dissuades many projects from adopting. Toward these
standards being adopted by critical open-source Python projects, we propose creating a centrally curated
toolset for building Python packages with optimal security practices enabled by default. A central
toolset would require a relatively small investment to leverage existing security technology to create
widespread adoption in one of the largest open source ecosystems in the world.

This approach builds on how the PSF has optimized security for uploading Python packages to PyPI using
“Trusted Publishers,”22 which is a secure publishing method that Python projects can implement using an
automated platform that provides an OIDC identity, like GitHub Actions. This Trusted Publishers initiative
combines beneficial security practices like short-lived scoped credentials, automated and hosted builds, and
workflow-based authentication alongside an improved user experience in order to drive adoption of the security
technology, improving security for all PyPI users.

While individual package maintainers can implement one or more of these practices “out-of-band”, there are
three core reasons why a centralized set of verifiability tooling is necessary for widespread adoption:

● A trusted builder: SLSA’s underlying technology requires using one or a few central “builder”
workflows which are trusted by both software producers and consumers. These builders should be
owned and maintained by a trusted organization in the ecosystem in order to drive adoption and to
minimize the number of variants.
● A controlled build environment: Reproducible builds and capturing Software Bill-of-Materials require
control over the build process in order to ensure all inputs and dependencies are recorded and come
from known sources. The controlled build environment provided by the trusted builder allows package
publishers to implement these controls and for the data to be recorded and published automatically
alongside the built artifacts.
● Future security advances: A centralized toolset adopted by the majority of critical PyPI projects
means that future security improvements can be incorporated into the toolset and widely rolled out,
rather than piecemeal adoption by individual open source projects.
● Integration into package repositories: If individual projects must choose their own approach
piecemeal, the result would be a patchwork of different supply chain security improvements that can’t
be harmonized and which would be impossible to integrate directly into the package repository PyPI or
downstream installers. Establishing a standardized method and format for build provenance and
tracking dependencies allows for a coherent, integrated approach that reduces the burden on both
package publishers and users. We believe that a deep integration into PyPI itself will help with adoption
both by consumers and producers as well as make future improvements possible, such as building
tooling on top of the information exposed by PyPI.

Many Python packages are written using only Python or some combination of Python and other languages like
C, C++, or Rust. Compiled languages are typically distributed as pre-compiled binaries in Python packages,

securing the building of software. Quoting the press release: “SLSA provides a framework to prevent source code and
build system tampering.” Reproducible builds are currently being considered as a part of SLSA standards. See:
https://docs.google.com/document/d/1Jk0yZnkTC3dfp8G5dmO8K9r1Kc7TRX2QVOwcFSKw1OQ
21
52% of Open Source maintainers answered “None” when asked which security standards they are familiar with from a
list of NIST SSDF, SLSA, and OpenSSF Scorecards. Tidelift 2023 State of the Open Source Maintainer Report,
https://tidelift.com/open-source-maintainer-survey-2023
22
D. Ingram, “Introducing Trusted Publishers,” The Python Package Index Blog, April 20th, 2023.
https://blog.pypi.org/posts/2023-04-20-introducing-trusted-publishers/
6
meaning that it’s difficult to determine the initial source code from the binary that’s being distributed. Without
knowledge of exactly which set of steps, versions of tooling, and the source and versions of dependencies
were used during the build process, it can be difficult or impossible to recreate a software artifact byte-for-byte
even with its initial source code due to the steps and dependencies not being recorded or uniform across
different projects.

Artifacts being reproducible from their original inputs gives a guarantee to consumers that an artifact doesn’t
have malicious or unexpected code injected into the final artifact. It also verifies that the source code can be
safely used to review whether to include the package as a dependency or upgrade to a new release.

With verifiable links to source code, build processes, and reproducibility of builds there would be multiple layers
of defense against supply chain attacks for every adopting Python package:

● For attacks compromising a PyPI account, build provenance allows consumers to detect attempts to
work-around a lack of access to the upstream source repository by attackers, which could manifest as
removal of build provenance as a requirement for publishing or a change to the expected source
repository to one that is attacker-controlled. These sorts of attacks are considered the most potent, as
they are able to completely impersonate a highly privileged user, for this reason PyPI and other
platforms have invested in mandatory 2FA for all user accounts.
● For attacks compromising publish API tokens, these attacks don’t have the privileges to modify
PyPI’s expectations of build provenance for a project so attempts to publish malicious code using these
credentials could be rejected due to not having a verifiable build provenance attestation.
● For attacks that compromise the source repository itself, the malicious source code would be
immutably linked to the release and would be publicly auditable before being adopted by consumers.
Attempts to conceal the source code after publishing the artifact by hiding or deletion would break the
verifiability of build provenance as verified by PyPI.
● For attacks that inject code directly into the build process, build reproducibility will allow
downstream consumers and redistributors to detect when software has been injected due to differences
in the resulting artifacts.

These properties provide defense-in-depth against supply chain attacks, vastly raising the bar for attackers
attempting to use existing trusted projects as an avenue to compromise many consumers at once.

To make the benefits of this centralized tooling accessible to users, the information generated by the tooling
must be made available alongside the Python packages on PyPI like SLSA build provenance attestations23 or
Software Bill-of-Materials documents24 describing the build environment, dependencies, and tooling used to
build the package. These additional pieces of information can be verified by PyPI and subsequently
downloaded and verified by package consumers and third parties.

How the federal government can take action


The federal government should provide funding to non-profit software foundations like the PSF to create this
centralized toolset for end-to-end verifiability and to be steward of a trusted builder workflow. Targeted funding
to a well-positioned, credible actor with the technical capacity to build, roll out, and maintain this toolset will
increase security across an entire ecosystem.

23
See: https://slsa.dev/spec/v1.0/distributing-provenance#where-attestations-are-published
24
See: https://www.cisa.gov/sbom
7
V. Python, Packages, and Memory Safety
RFI Focus Area: Secure Open-Source Software Foundations, Sub-area: Fostering the adoption of memory
safe programming languages

Adoption of Python to wrap or replace memory unsafe languages


Python is a memory safe programming language that allows calling code written in systems languages like
Rust, C, C++, and Fortran from a Python program. This feature is popular for Python and its packaging
ecosystem where users distribute compiled libraries with Python wrappers for ease-of-use and distribution.

A popular pattern for Python projects is to use Python for the majority of a program's logic to provide a usable
and testable interface that only makes calls to libraries written in any systems language (such as Rust, C, C++)
for the most performance-critical and CPU-intensive code paths. This minimizes the amount of systems
language and memory unsafe code that needs to be maintained without sacrificing overall program
performance, in addition to providing access to the suite of tools for testing and running static and dynamic
analysis of Python code.

Python’s suitability on the basis of performance for replacing or wrapping programs written purely in memory
unsafe languages has improved in recent years thanks to investments to fund the “Faster CPython Project” by
Microsoft25 as well as work to make the Global Interpreter Lock optional in PEP 70326 enabling Python to take
full advantage of modern multi-threaded CPUs without sacrificing memory safety. The Faster CPython project
is seeing large improvements, such as Python 3.11 being 10-60% faster than 3.10, depending on workload.27

Promoting adoption of memory safe systems languages in Python packages


Because developers can easily wrap and distribute C and C++ code in Python packages, many popular Python
packages have been implemented with memory unsafe languages. Of the top 1,000 most downloaded
packages on PyPI, over 15% use some amount of C or C++ code.28

A growing number of Python packages are instead written using Rust as a memory safe alternative systems
language. Widely-used examples include cryptography, pydantic, and orjson which collectively are installed
270 million times per month.29 In particular, the package cryptography underwent the challenging process of
rewriting from using C to Rust to take advantage of Rust’s memory safety, because it is a security-critical
package. Pydantic and orjson are both used to process user inputs along with having performance
requirements, highlighting their need for a memory safe systems language.

As can be seen in cryptography’s difficult experience rewriting the package using Rust30, as well as the high
share of critical packages using memory unsafe code, there is still work to be done to enable more Python
packages to migrate from using memory unsafe languages to memory safe languages like Rust. There is
progress in this area already, for example the PyO3 project31 which aims to provide a toolset for Python
developers to write and distribute Rust code in Python packages. More resources are needed to build on and
scale this progress. The key actions needed are to first identify critical projects that are currently using memory
unsafe code and then create a prioritized plan of action for each project.

25
See: https://devblogs.microsoft.com/python/python-311-faster-cpython-team/
26
See: https://peps.python.org/pep-0703/
27
See: https://docs.python.org/3/whatsnew/3.11.html#what's new 311-faster-cpython
28
See: https://py-code.org querying for common C and C++ file extensions.
29
See: https://www.pepy.tech/projects/cryptography, https://www.pepy.tech/projects/pydantic, and
https://www.pepy.tech/projects/orjson
30
A. Gaynor, P. Kehrer: “Shipping Python Extensions in Rust Two Million Times a Day,” recording of PyCon US talk, April
2022. https://www.youtube.com/watch?v=z_Eiy2W0APU
31
See: https://github.com/PyO3
8
Because every migration of a project into memory safety requires time and resources, it is critical to prioritize
projects that are most important to migrate and to avoid placing undue burden on projects which are less or not
safety-critical. The nature of open source consumption doesn’t lend itself to knowing how and where a project
is being used, so many project maintainers don’t know whether security must be prioritized for their own
projects, increasing the complexity of this task. Effectively prioritizing candidate projects for migration would
require surveying the ecosystem with usage information (number of downloads, dependency graph
information, and input from consumers like the federal government) and whether their primary function would
benefit from using a memory safe language (such as packages implementing cryptography or processing
uncontrolled inputs).

After target packages are identified and prioritized, a plan would need to be created for each project depending
on its unique situation and the resources that are available. Example plans could include migrating some or all
of their memory unsafe code to a memory safe language, be it Python or Rust, or by providing consumers with
alternative packages that utilize memory safe languages.

How the federal government can take action


The federal government can support the adoption of memory safe languages by recommending Python and
similar programming languages as complete or partial replacements for memory unsafe codebases used by
government or government contractors. Government grants to project maintainers to replace memory unsafe
code with memory safe code would encourage adoption at the individual project level, and government funding
to support a role like a security developer-in-residence at the PSF and other open-source software foundations
would enable that developer take the most needed steps for increasing migration to memory safe languages in
the ecosystem. They would be positioned to do the work described above of identifying critical projects and
developing migration plans for them, as well as to create tool sets that facilitate this sort of migration in the
ecosystem. There is also a role for funding for safety tooling which mitigates risks for projects that won’t
migrate away from memory unsafe languages in the foreseeable future.32

VI. Non-Technical Solutions to Supply Chain Challenges


RFI Focus Area: Behavioral and Economic Incentives to Secure the Open-Source Software ecosystem,
Sub-Area: Frameworks and models for software developer compensation that incentivize secure software
development practices

Supply chain issues facing open source ecosystems can’t be solved with technical solutions alone. For many
security vulnerabilities, there are existing, effective solutions, but each project’s maintainers must implement
them, and there is an enormous gap in the capacity for maintainers to do this work. Behind many widely used
open source projects, the majority of maintainers are unpaid individuals33 who are volunteering their time and
are already spread thin by existing responsibilities. Even more worryingly, projects frequently become
unmaintained (according to Sonatype's "State of Software Supply Chain" report, 82% of projects that were
maintained in 2022 are now unmaintained).34 For these unmaintained projects, no one is available or even has
access to make fixes to vulnerabilities or make improvements to the existing security posture of the project,
which in turn creates security risks for every project that is downstream.

In order to address the challenges of scarce maintainer capacity, the PSF and other organizations have been
offering both financial and social support to critical projects, especially those with a single maintainer or a small
group of volunteer maintainers. Crucially this approach marries organizational support alongside the adoption

32
There do exist projects for which it wouldn’t be practical or desirable to migrate to a memory safe language; see
Appendix section 4 for more information.
33
60% of respondents weren’t paid for their Open Source work. Tidelift, “ 2023 State of the Open Source Maintainer
Report,” https://tidelift.com/open-source-maintainer-survey-2023
34
Sonatype, "State of Software Supply Chain" report 2023, https://www.sonatype.com/state-of-the-software-supply-chain
9
of new best practices and maintenance, instead of introducing new policy or requirements for open source
projects to satisfy without support to do so.

The PSF itself has taken on the maintenance of multiple open source projects that were at risk of being
unmaintained and are critical in the Python ecosystem like Requests35 and CacheControl,36 providing a space
for the project to continue receiving maintenance from the community and releasing new versions without the
fear of the project becoming unmaintained. The PSF also acts as a fiscal sponsor for projects, e.g. Pallets37, so
that the PSF takes on administrative work such as accounting and bookkeeping support, and the maintainers
of those projects are freed up to focus on aspects like vulnerability management.

Other notable examples of organizations successfully providing valuable non-technical support to open source
maintainers are Tidelift38 and Jazzband39. Tidelift is a company that helps enterprises manage their
open-source software and that supports maintainers with direct financial payment as well as by providing a
roadmap of security best practices. Tidelift also helps projects create a security policy and manage vulnerability
reports and disclosures. Jazzband is a volunteer organization that accepts projects at risk of becoming
unmaintained and then provides common social and security infrastructure such as a security response team
and policy, a code of conduct, shared testing infrastructure, and more, through a collective of volunteers. This
approach ensures these projects have a home, and connects them with a team of on-call experts rather than
requiring each project to have its own team of experts always available.

How the federal government can take action


The federal government can most effectively support these efforts through funding for these and other
organizations doing the challenging and under-resourced work of supporting maintainers. This work is nuanced
and often overlooked, but has a high return on the investment in terms of short- and long-term ecosystem
security and overall health.

35
See: https://github.com/psf/requests/
36
See: https://github.com/psf/cachecontrol/
37
See: https://github.com/pallets/
38
See: https://tidelift.com/
39
See https://jazzband.co/ Jazzband is itself a fiscal sponsoree of the PSF.
10
Appendix

1. References regarding Python’s popularity:


a. Tiobe Index, accessed November 6, 2023. https://www.tiobe.com/tiobe-index/
b. PYPL PopularitY of Programming Language, accessed November 6, 2023.
https://pypl.github.io/PYPL.html
c. S. Cass, “The Top Programming Languages 2023: Python and SQL are on top, but old
languages shouldn’t be forgotten,” IEEE Spectrum, August 29th, 2023.
https://spectrum.ieee.org/the-top-programming-languages-2023
2. References regarding Python’s use across industries:
a. R. M. Siegfried, K. G. Herbert-Berger, K. Leune and J. P. Siegfried, "Trends Of Commonly Used
Programming Languages in CS1 And CS2 Learning," 2021 16th International Conference on
Computer Science & Education (ICCSE), Lancaster, United Kingdom, 2021, pp. 407-412, doi:
10.1109/ICCSE51940.2021.9569444. https://ieeexplore.ieee.org/document/9569444
b. D. Cassel, “What should be a student’s first programming language?”, The New Stack, July
4th,2021. https://thenewstack.io/what-should-be-a-students-first-programming-language/
c. E. Elizondo, “Introducing Immortal Objects for Python,” August 15th, 2023.
https://engineering.fb.com/2023/10/05/developer-tools/python-312-meta-new-features/
d. C. Yang, “Incrementally migrating over one million lines of code from Python 2 to Python 3,”
February 6, 2019.
https://dropbox.tech/application/incrementally-migrating-over-one-million-lines-of-code-from-pyth
on-2-to-python-3
e. L. Root and D. Riordan, “It’s All Just Wiggly Air: Building Infrastructure to Support Audio
Research,” November 4, 2020.
https://engineering.atspotify.com/2020/11/its-all-just-wiggly-air-building-infrastructure-to-support-
audio-research/
f. Statista, “LinkedIn's most wanted data science skills in United States as of April 2019”
https://www.statista.com/statistics/1016247/united-states-wanted-data-science-skills/
g. B. Hayes, “Programming Languages Most Used and Recommended by Data Scientists,”
January 13, 2019.
https://businessoverbroadway.com/2019/01/13/programming-languages-most-used-and-recom
mended-by-data-scientists/
3. Open-source Python repositories for referenced federal government agencies on GitHub:
a. https://github.com/18F?language=python
b. https://github.com/CDCgov?language=python
c. https://github.com/department-of-veterans-affairs?language=python
d. https://github.com/GSA?language=python
e. https://github.com/NASA?language=python
f. https://github.com/NationalSecurityAgency?language=python
4. Support and tooling for projects that cannot migrate to memory safe languages:
For a variety of reasons, it is not feasible or desirable to migrate all code written in memory
unsafe languages to a memory-safe language. Reasons include:
■ Lack of maintainer time or expertise in memory safe languages. Moving wholesale
to a new programming language can disengage existing long-time maintainers who can
no longer contribute meaningfully to a project. The project’s long-term productivity would
be impacted if the entire maintenance team needs to learn a new programming
language to be effective.
■ Unavailability of libraries and platforms for memory safe languages. Some
functionality used by Python projects is unavailable or only partially available in memory
safe languages. Before migrating to memory safe code, these projects would need
11
complete feature coverage to provide backwards compatibility with existing usage. This
also extends to support of platforms and architectures; for example cryptography had to
drop support for s390 and m68k CPU architectures, among others, during their migration
to Rust due to unavailability of those architectures as targets in the Rust compiler. (See
https://github.com/pyca/cryptography/issues/5771.)
■ Some code isn’t security critical. There are many Python projects which use memory
unsafe code and aren’t being used in security-critical applications. These projects would
not majorly benefit from being rewritten to use memory-safe languages and would likely
only suffer downsides from such a migration. This doubly highlights the need to identify
candidate projects for migration to memory-safe languages.
Even with a strong signal to migrate to a memory safe language, some projects may not be able
to migrate in the near future for the reasons specified above. In these cases, there are
mitigations which can be adopted by projects to reduce security risk such as fuzzing, static, and
dynamic analysis of memory unsafe code. These mitigations would benefit from government
funding and support to develop and disseminate them.
5. Python Software Foundation Articles of Incorporation: https://www.python.org/psf/records/incorporation/
6. 2022 Python Software Foundation annual report: https://www.python.org/psf/annual-report/2022/
7. 2021 Python Software Foundation IRS Form 990:
https://s3.dualstack.us-east-2.amazonaws.com/pythondotorg-assets/media/files/PythonSoftwareFound
ation_Form990-2021_20221115.pdf
8. Final determination letter from the IRS regarding tax exempt status, May 5, 2006:
https://s3.dualstack.us-east-2.amazonaws.com/pythondotorg-assets/media/files/determination_letter_0
5May06.pdf

12

You might also like