nus-talk/nus.md
2024-09-17 11:25:16 +08:00

12 KiB

output title author date
slidy_presentation
duration
45
Two decades of Open Source Julian Ospald Sep 20, 2024

Follow the presentation

{#id .class height=500px}

Structure of this talk

  1. Introduction (about me and my career)
  2. Open Source (what it is and its value)
  3. First chapter: Gentoo and package management
  4. Second chapter: GHCup
  5. Third chapter: Haskell Core Libraries
  6. Lessons learned

Introduction

About me

  • From Germany
  • Studied CS
  • Haskell developer
  • I love open source

Professional career

  • Software Engineer in R&D (automotive industry)
  • Go Backend Developer (online advertisement platform
  • Haskell Developer at Capital Match (invoice financing platform in Singapore)
  • Haskell Developer at IOHK (Cardano Blockchain)
  • Haskell Freelancer (blockchain and others)
  • Haskell Developer at Standard Chartered Bank
  • Haskell Freelancer (chimney sweeper app for german businesses)

Open Source career

  • Gentoo Linux developer (core team), 2012-2016
    • Ebuild development (packaging)
    • Code review
    • Development of a git workflow
  • Author of GHCup (the Haskell installer), ca. 2019
  • Maintainer of Haskell core libraries: filepath, unix, os-string, file-io
  • Implementation of the Abstract FilePath Proposal
  • Member of the Haskell Core Libraries Comittee 2023-2026
  • Haskell Influencer (Haskell Foundation, ...)

Open Source

What is Open Source

  • {#id .class height=32px} A group of licenses (see OSI)
    • Not free software
    • Not copyleft
  • 🧑‍🤝‍🧑 A community
    • volunteers
    • companies
  • 🔮 A philosophy
    • sharing
    • collaboration
    • transparency
  • {#id .class height=32px} Linux kernel
    • 1500 developers from 200-250 companies
  • {#id .class height=32px} Firefox
  • {#id .class height=32px} VSCode
  • {#id .class height=32px} Blender
  • {#id .class height=32px} GHC (The Haskell compiler)

Value proposition of Open Source

  • ⚗️ the scientific method
    • share your results
    • allow people to replicate it
  • 🔓 access to a community
    • users
    • collaborators
  • 🕸️ network effects

Reality of Open Source

  • most projects...
    • are one-man shows
    • have no users
    • are underdocumened
    • have horrible code
  • writing new code is easy, maintenance is hard
  • most maintainers
    • don't get paid
    • will stop maintenance at some point
    • don't care much about their users

First chapter: Gentoo and package management

What is Gentoo

  • a Linux distribution
    • rolling release
    • source based
  • 19000 packages (program, library, ...)
  • 200 core developers (at its peak)
  • over 1000 contributors

How does a Linux distro work (relationships)

{#id .class height=500px}

How does a Linux distro work (activities)

{#id .class height=500px}

A typical ebuild

EAPI=8

DESCRIPTION="A dummy package"
HOMEPAGE="https://dummy.org"
SRC_URI="https://github.com/dummy/dummy/archive/refs/tags/${PV}.tar.gz -> ${P}.tar.gz"

LICENSE="BSD-3"
SLOT="0"
KEYWORDS="~amd64 ~x86 "
IUSE="debug"

RDEPEND="dev-util/boost"

PATCHES=( "${FILESDIR}"/${PN}-4.9.2-disable_python_rpath.patch)

src_configure() {
	econf $(use_enable debug)
}

src_compile() {
    emake
}

src_test() {
    emake test
}

src_install() {
    emake DESTDIR="${D}" install
}

Packaging challenges

  • no standard on build systems (make, autotools, meson, cmake, ...)
    • => an abstraction over build systems
  • thousands of different execution environments (fragility)
    • system configuration
    • package configuration
    • platform, architecture
  • reverse dependencies
    • shipping a "chain" instead of a single artifact
  • high impact on small mistakes (e.g. assuming a specific shell)

Packaging challenges (pt. 2)

  • communication between teams/maintainers
  • execution of large changes
    • e.g. introduction of LibreSSL
    • e.g. changing of fundamental workflows (from CVS to git)
  • monitoring upstream changes and making decisions about compatibility/stability
    • when to update

What is a Distro really?

  • a user experience
    • LTS distros vs rolling release
    • binary vs source based
    • choice of init system
  • plug and play (everything works)
  • deviating from the happy path (fixing issues)
  • combining components to a coherent system (init system, coreutils, kernel, ...)
  • a choice of defaults

Programming lessons

  • primary packaging skill: being meticulous
    • small mistakes -> big impact
    • being as precise as possible about what you want to achieve
  • long term maintenance of small code pieces
  • intense review culture
  • strict policies and workflow guidelines
  • how to learn complex system

Second chapter: GHCup

Demo

{#id .class height=500px}

State of 2019 (Haskell Installers)

  • stack is the only "Haskell Installer"
  • no unified alternative for cabal users
  • distro packages, nix, manual installs, ...
  • 😭

How it started

  • 🤹 small team at work (Capital Match), using different platforms
    • originally used stack
    • distro packages constantly out of date
  • 🦾 first version was 165 LOC
    • Posix shell
  • {#id .class height=32px} only supported linux and mac
  • {#id .class height=32px} inspired by rustup
  • support from haskell.org

GHCup today

Haskell Survey 2022:

  • over 17k LOC Haskell
  • supports all platforms: Linux, Windows, macOS, FreeBSD
  • first thing new Haskell users get exposed to

What is GHCup (simplified)?

curl -s -L \
  'https://downloads.haskell.org/~ghc/9.6.5/ghc-9.6.5-x86_64-fedora33-linux.tar.xz' |
  tar -xJ -C /tmp                              &&
  cd /tmp/ghc-9.6.5-x86_64-unknown-linux/      &&
  ./configure --prefix="$HOME/.local"          &&
  make install                                 &&
  rm -rf /tmp/ghc-9.6.5-x86_64-unknown-linux/

What is GHCup really?

  • {#id .class width=32 height=32px} installer (portable)
  • {#id .class width=32 height=32px} distribution channel
  • {#id .class width=32 height=32px} feedback channel
  • {#id .class width=32 height=32px} testing/QA gateway
  • {#id .class width=32 height=32px} provider of sane defaults (e.g. "recommended" GHC version)
  • {#id .class width=32 height=32px} glue for holistic toolchain experience
    • VSCode, stack, cabal-install integration
  • {#id .class width=32 height=32px} CI provisioning (e.g. github actions)

Relationships in detail

Dependencies:

  • supported tools
    • GHC
    • Cabal
    • HLS
    • Stack
  • decisions that affect us
    • release frequency
    • upstream CI
    • platform support
    • binary distributions (the .tar.gz/.zip)

Relationships in detail (pt. 2)

Dependents:

  • {#id .class height=32px} Haskell developers
    • beginners, advanced, students, companies
  • {#id .class width=32 height=32px} end users (e.g. compiling pandoc from source)
  • {#id .class width=32 height=32px} GitHub CI
    • GitHub images, Haskell repos
  • 🪞 mirrors
  • 🧰 tools

Programming lessons

  • writing a small single-purpose program from scratch
  • how to design command line interfaces
  • high impact of decisions (not just mistakes)
    • bugs now affect GitHub CI and companies
    • can make "Haskell" look bad
  • no one to review
    • => review your own code
  • constantly thinking about ways to improve reliability
    • can't rely on anyone else to catch bugs

The difference to Gentoo

Both deal with installation, but...

  • more code to maintain (not just packaging) for me
  • one-man project (mostly)
  • much tighter coupling between upstream (e.g. GHC developers) and downstream (GHCup developers)
    • heavier on relationship issues
  • less dependencies, but much more responsibility
  • position of authority
    • what to consider?
  • most of my work today is support

Third Chapter: Haskell Core Libraries

What are Haskell Core Libraries?

  • bundled with the compiler
  • fundamental building blocks (primitives)
  • base library
    • available to all programs by default
    • contains the "Prelude" (standard library)

Core libraries I maintain

  • filepath
  • unix
  • os-string
  • file-io

Challenges

  • changes are extremly expensive
  • writing good primitives is hard (non-specific APIs)
  • lots of odd knowledge
    • e.g. Windows filepaths
      • C:foo
      • /bar
      • \\?\GLOBALROOT\Device\Harddisk0\Partition2\foo\bar
    • Posix standard
  • portability

Core Libraries Committee

  • 7 members
  • manages API changes of base only
    • requires a proposal
    • requires impact assessment for breaking changes
    • requires an up-front implementation of the change
  • ensures other core libraries have active maintainers
    • does not interfere with maintenance

Driving changes across core libraries (case study)

Abstract FilePath Proposal:

  • Haskell String type: type String = [Char]
    • Char is a unicode code point
    • not bytes
    • is interpreted (decoded)
    • depends on locale
  • affects most core libraries
  • implement as a breaking change (base), or...
    • in "user-space"
  • lack of higher authority
    • building consensus
    • convincing multiple maintainers
    • patching many libraries
  • open source politics

Programming lessons

  • how to design good primitives
    • as opposed to abstractions
  • considering every impact of API changes
  • doing history research on past design choices
    • important design decisions may not be documented
    • may look innocent
    • chaning them might be devastating

Lessons Learned

Collaboration

::: incremental

  • main currency in Open Source is energy
  • treat contributors like kings
  • be mindful about boundaries (tricky balance)
  • respect other projects workflows
  • driving large changes requires
    • consensus
    • support
    • a good value proposition
    • a good execution plan (risk, breakage, ...)
  • Haskell Foundation

:::

Project maintenance

::: incremental

  • dicatorships work, but are not sustainable
  • plan for your departure
  • bus factor is your constant enemy
  • good decision making processes
    • lightweight when risk is low
    • elaborate when risk is high
  • actively think about the contribution experience
    • comment early
  • how to maintain the project vision?

:::

User Experience

::: incremental

  • UX is harder than CS
    • yet often an afterthought
  • toolchains often lack a holistic UX vision
  • UX vision gets easily lost in "maintenance mode"
    • feature creep
    • maintainer turnover
    • collective decisions
  • UX is a fascinating problem (e.g. OS)
    • plug & play (intuition... about interfaces)
    • happy path (control)
    • defaults (expectations... about behavior)

:::

Stability vs Progress

  • 🤼 conflicting goals
  • ⚔️ breaking API can have large rippling effects
    • experience report of a facebook engineer on GHC upgrades
  • 🗼 small breakages add up
    • large projects have hundreds of dependencies
  • 🗣️ many discussions in the Haskell community
    • upgrade cost
    • language changes (Haskell report)
    • academic background of Haskell (academia vs industry?)
    • the role of committees
  • ⚖️ how to strike a balance?
    • SemVer does not solve it (why?)

Composition

  • I love small programs
  • categories of composition
    • functions
    • libraries
    • programs
  • unix philosophy
    • 🛠️ do one thing and do it well
    • ⚗️ pipes, compose stdout and stdin (re-usable)
  • how to make your project composable?

Questions/Arguments?

{#id .class height=500px}