From d6674c5ee1310541ee36755ba1cb40a8fe2f6529 Mon Sep 17 00:00:00 2001 From: mrkkrp Date: Thu, 28 Jan 2016 17:47:46 +0600 Subject: [PATCH] =?UTF-8?q?Improve=20the=20=E2=80=98README.md=E2=80=99=20f?= =?UTF-8?q?ile?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Here I copied the blog post announcing the library, because it describes it very well, and not everyone who discovers the library will know where to look for such a comprehensive description. http://chrisdone.com/posts/path-package I've made two edits to that post to reflect new things: 1. On line 123 there is a mention of ‘fromAbsDir’ and other similar functions. 2. On line 363 I've put a link to my ‘path-io’ package that provides well-typed interface to ‘directory’ and ‘temporary’. I've written the package for my personal needs, because I was tired of the endless conversion and I wanted things like recursive copying of directories. When I published it, someone opened an issue asking to add some functions from Stack's ‘Path.IO’ — that's what I'm going to do. I expect it will be able to replace ‘Path.IO’ in Stack soon. I've talked to Stack maintainers and they like the package and have nothing against the switch. --- README.md | 518 +++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 516 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index f4a0392..19775fa 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,518 @@ -path -===== +# Path Support for well-typed paths in Haskell. + +* [Motivation](#motivation) +* [Approach](#approach) +* [Solution](#solution) +* [Implementation](#implementation) + * [The data types](#the-data-types) + * [Parsers](#parsers) + * [Smart constructors](#smart-constructors) + * [Overloaded stings](#overloaded-strings) + * [Operations](#operations) +* [Review](#review) + * [Relative vs absolute confusion](#relative-vs-absolute-confusion) + * [The equality problem](#the-equality-problem) + * [Unpredictable concatenation issues](#unpredictable-concatenation-issues) + * [Confusing files and directories](#confusing-files-and-directories) + * [Self-documentation](#self-documentation) +* [In practice](#in-practice) +* [Doing I/O](#doing-io) +* [Doing textual manipulations](#doing-textual-manipulations) +* [Accepting user input](#accepting-user-input) +* [Comparing with existing path libraries](#comparing-with-existing-path-libraries) + * [filepath and system-filepath](#filepath-and-system-filepath) + * [system-canonicalpath, canonical-filepath, directory-tree](#system-canonicalpath-canonical-filepath-directory-tree) + * [pathtype](#pathtype) + * [data-filepath](#data-filepath) +* [Summary](#summary) + +## Motivation + +It was after working on a number of projects at FP Complete that use file +paths in various ways. We used the system-filepath package, which was +supposed to solve many path problems by being an opaque path type. It +occurred to me that the same kind of bugs kept cropping up: + +* Expected a path to be absolute but it was relative, or vice-versa. + +* Expected two equivalent paths to be equal or order the same, but they did + not (`/home//foo` vs `/home/foo/` vs `/home/bar/../foo`, etc.). + +* Unpredictable behaviour with regards to concatenating paths. + +* Confusing files and directories. + +* Not knowing whether a path was a file or directory or relative or absolute + based on the type alone was a drag. + +All of these bugs are preventable. + +## Approach + +My approach to problems like this is to make a type that encodes the +properties I want and then make it impossible to let those invariants be +broken, without compromise or backdoors to let the wrong value “slip +in”. Once I have a path, I want to be able to trust it fully. This theme +will be seen throughout the things I lay out below. + +## Solution + +After having to fix bugs due to these in our software, I put my foot down +and made: + +* An opaque `Path` type (a newtype wrapper around `String`). + +* Smart constructors which are very stringent in the parsing. + +* Make the parsers highly normalizing. + +* Leave equality and concatenation to basic string equality and + concatenation. + +* Include relativity (absolute/relative) and type (directory/file) in the + type itself. + +* Use the already cross-platform + [filepath](http://hackage.haskell.org/package/filepath) package for + implementation details. + +## Implementation + +### The data types + +Here is the type: + +```haskell +newtype Path b t = Path FilePath + deriving (Typeable) +``` + +The type variables are: + +* `b` — base, the base location of the path; absolute or relative. +* `t` — type, whether file or directory. + +The base types can be filled with these: + +```haskell +data Abs deriving (Typeable) +data Rel deriving (Typeable) +``` + +And the type can be filled with these: + +```haskell +data File deriving (Typeable) +data Dir deriving (Typeable) +``` + +(Why not use data kinds like `data Type = File | Dir`? Because that imposes +an extension overhead of adding `{-# LANGUAGE DataKinds #-}` to every module +you might want to write out a path type in. Given that one cannot construct +paths of types other than these, via the operations in the module, it’s not +a concern for me.) + +There is a conversion function to give you back the filepath: + +```haskell +toFilePath :: Path b t -> FilePath +toFilePath (Path l) = l +``` + +Beginning from version 0.5.3, there are type-constrained versions of +`toFilePath` with the following signatures: + +```haskell +fromAbsDir :: Path Abs Dir -> FilePath +fromRelDir :: Path Rel Dir -> FilePath +fromAbsFile :: Path Abs File -> FilePath +fromRelFile :: Path Rel File -> FilePath +``` + +### Parsers + +To get a `Path` value, you need to use one of the four parsers: + +```haskell +parseAbsDir :: MonadThrow m => FilePath -> m (Path Abs Dir) +parseRelDir :: MonadThrow m => FilePath -> m (Path Rel Dir) +parseAbsFile :: MonadThrow m => FilePath -> m (Path Abs File) +parseRelFile :: MonadThrow m => FilePath -> m (Path Rel File) +``` + +The following properties apply: + +* Absolute parsers will reject non-absolute paths. + +* The only delimiter syntax accepted is the path separator; `/` on POSIX and + `\` on Windows. + +* Any other delimiter is rejected; `..`, `~/`, `/./`, etc. + +* All parsers normalize into single separators: `/home//foo` → `/home/foo`. + +* Directory parsers always normalize with a final trailing `/`. So `/home/foo` + parses into the string `/home/foo/`. + +It was discussed briefly whether we should just have a class for parsing +rather than four separate parsing functions. In my experience so far, I have +had type errors where I wrote something `like x <- parseAbsDir +someAbsDirString` because `x` was then passed to a place that expected a +relative directory. In this way, overloading the return value would’ve just +been accepted. So I don’t think having a class is a good idea. Being +explicit here doesn’t exactly waste our time, either. + +Why are these functions in `MonadThrow`? Because it means I can have it +return an `Either`, or a `Maybe`, if I’m in pure code, and if I’m in `IO`, +and I don’t expect parsing to ever fail, I can use it in IO like this: + +```haskell +do x <- parseRelFile (fromCabalFileName x) + foo x + … +``` + +That’s really convenient and we take advantage of this at FP Complete a lot. +The instances + +Equality, ordering and printing are simply re-using the `String` instances: + +```haskell +instance Eq (Path b t) where + (==) (Path x) (Path y) = x == y + +instance Ord (Path b t) where + compare (Path x) (Path y) = compare x y + +instance Show (Path b t) where + show (Path x) = show x +``` + +Which gives us for free the following equational properties: + +```haskell +toFilePath x == toFilePath y ≡ x == y -- Eq instance +toFilePath x `compare` toFilePath y ≡ x `compare` y -- Ord instance +toFilePath x == toFilePath y ≡ show x == show y -- Show instance +``` + +In other words, the representation and the path you get out at the end are +the same. Two paths that are equal will always give you back the same thing. + +### Smart constructors + +For when you know what a path will be at compile-time, there are +constructors for that: + +```haskell +$(mkAbsDir "/home/chris") +$(mkRelDir "chris") +$(mkAbsFile "/home/chris/x.txt") +$(mkRelFile "chris/x.txt") +``` + +These will run at compile-time and underneath use the appropriate parser. + +### Overloaded strings + +No `IsString` instance is provided, because that has no way to statically +determine whether the path is correct, and would otherwise have to be a +partial function. + +In practice I have written the wrong path format in a `$(mk… "")` and been +thankful it was caught early. + +### Operations + +There is path concatenation: + +```haskell +() :: Path b Dir -> Path Rel t -> Path b t +``` + +Get the parent directory of a path: + +```haskell +parent :: Path Abs t -> Path Abs Dir +``` + +Get the filename of a file path: + +```haskell +filename :: Path b File -> Path Rel File +``` + +Get the directory name of a directory path: + +```haskell +dirname :: Path b Dir -> Path Rel Dir +``` + +Stripping the parent directory from a path: + +```haskell +stripDir :: MonadThrow m => Path b Dir -> Path b t -> m (Path Rel t) +``` + +## Review + +Let’s review my initial list of complaints and see if they’ve been +satisfied. + +### Relative vs absolute confusion + +Paths now distinguish in the type system whether they are relative or +absolute. You can’t append two absolute paths, for example: + +```haskell +λ> $(mkAbsDir "/home/chris") $(mkAbsDir "/home/chris") +:23:31-55: + Couldn't match type ‘Abs’ with ‘Rel’ +``` + +### The equality problem + +Paths are now stringently normalized. They have to be a valid path, and they +only support single path separators, and all directories are suffixed with a +trailing path separator: + +```haskell +λ> $(mkAbsDir "/home/chris//") == $(mkAbsDir "/./home//chris") +True +λ> toFilePath $(mkAbsDir "/home/chris//") == + toFilePath $(mkAbsDir "/./home//chris") +True +λ> ($(mkAbsDir "/home/chris//"),toFilePath $(mkAbsDir "/./home//chris")) +("/home/chris/","/home/chris/") +``` + +### Unpredictable concatenation issues + +Because of the stringent normalization, path concatenation, as seen above, +is simply string concatenation. This is about as predictable as it can get: + +```haskell +λ> toFilePath $(mkAbsDir "/home/chris//") +"/home/chris/" +λ> toFilePath $(mkRelDir "foo//bar") +"foo/bar/" +λ> $(mkAbsDir "/home/chris//") $(mkRelDir "foo//bar") +"/home/chris/foo/bar/" +``` + +### Confusing files and directories + +Now that the path type is encoded in the type system, our `` operator +prevents improper appending: + +```haskell +λ> $(mkAbsDir "/home/chris/") $(mkRelFile "foo//bar") +"/home/chris/foo/bar" +λ> $(mkAbsFile "/home/chris") $(mkRelFile "foo//bar") +:35:1-26: + Couldn't match type ‘File’ with ‘Dir’ +``` + +### Self-documentation + +Now I can read the path like: + +```haskell +{ fooPath :: Path Rel Dir, ... } +``` + +And know that this refers to the directory relative to some other path, +meaning I should be careful to consider the current directory when using +this in IO, or that I’ll probably need a parent to append to it at some +point. + +## In practice + +We’ve been using this at FP Complete in a number of packages for some months +now, it’s turned out surprisingly sufficient for most of our path work with +only one bug found. We weren’t sure initially whether it would just be too +much of a pain to use, but really it’s quite acceptable given the +advantages. You can see its use all over the +[`stack`](https://github.com/commercialhaskell/stack) codebase. + +## Doing I/O + +Currently any operations involving I/O can be done by using the existing I/O +library: + +```haskell +doesFileExist (toFilePath fp) +readFile (toFilePath fp) +``` + +etc. This has problems with respect to accidentally running something like: + +```haskell +doesFileExist $(mkRelDir "foo") +``` + +But I/O is currently outside the scope of what this package solves. Once you +leave the realm of the `Path` type invariants are back to your responsibility. + +As with the original version of this library, we’re currently building up a +set of functions in a `Path.IO` module over time that fits our real-world +use-cases. It may or may not appear in the path package eventually. It’ll +need cleaning up and considering what should really be included. + +**Edit:** There is now +[`path-io`](https://hackage.haskell.org/package/path-io) package that +complements the `path` library and includes complete well-typed interface to +[`directory`](https://hackage.haskell.org/package/directory) and +[`temporary`](https://hackage.haskell.org/package/temporary). There is work +to add more generally useful functions from Stack's `Path.IO` to it and make +Stack depend on the `path-io` package. + +## Doing textual manipulations + +One problem that crops up sometimes is wanting to manipulate +paths. Currently the way we do it is via the filepath library and re-parsing +the path: + +```haskell +parseAbsFile . addExtension "/directory/path" "ext" . toFilePath +``` + +It doesn’t happen too often, in our experience, to the extent this needs to +be more convenient. + +## Accepting user input + +Sometimes you have user input that contains `../`. The solution we went with +is to have a function like `resolveDir`: + +```haskell +resolveDir :: (MonadIO m, MonadThrow m) + => Path Abs Dir -> FilePath -> m (Path Abs Dir) +``` + +Which will call `canonicalizePath` which collapses and normalizes a path and +then we parse with regular old `parseAbsDir` and we’re cooking with +gas. This and others like it might get added to the `path` package. + +## Comparing with existing path libraries + +### filepath and system-filepath + +The [filepath](http://hackage.haskell.org/package/filepath) package is +intended as the complimentary package to be used before parsing into a Path +value, and/or after printing from a Path value. The package itself contains +no type-safety, instead contains a range of cross-platform textual +operations. Definitely reach for this library when you want to do more +involved manipulations. + +The `system-filepath` package is deprecated in favour of `filepath`. + +### system-canonicalpath, canonical-filepath, directory-tree + +The +[`system-canonicalpath`](http://hackage.haskell.org/package/system-canonicalpath) +and the +[`canonical-filepath`](http://hackage.haskell.org/package/canonical-filepath) +packages both are a kind of subset of `path`. They canonicalize a string +into an opaque path, but neither distinguish directories from files or +absolute/relative. Useful if you just want a canonical path but doesn’t do +anything else. + +The [`directory-tree`](http://hackage.haskell.org/package/directory-tree) +package contains a sum type of dir/file/etc but doesn’t distinguish in its +operations relativity or path type. + +### pathtype + +Finally, we come to a path library that path is similar to: the +[`pathtype`](http://hackage.haskell.org/package/pathtype) library. There are +the same types of `Path Abs File` / `Path Rel Dir`, etc. + +The points where this library isn’t enough for me are: + +* There is an `IsString` instance, which means people will use it, and will + make mistakes. + +* Paths are not normalized into a predictable format, leading to me being + unsure when equality will succeed. This is the same problem I encountered + in `system-filepath`. The equality function normalizes, but according to + what properties I can reason about? I don’t know. + +```haskell +System.Path.Posix> ("/tmp//" :: Path a Dir) == ("/tmp" :: Path a Dir) +True +System.Path.Posix> ("tmp" :: Path a Dir) == ("/tmp" :: Path a Dir) +True +System.Path.Posix> ("/etc/passwd/" :: Path a b) == ("/etc/passwd" :: Path a b) +True +System.Path.Posix> ("/tmp//" :: Path Abs Dir) == ("/tmp/./" :: Path Abs Dir) +False +System.Path.Posix> ("/tmp/../" :: Path Abs Dir) == ("/" :: Path Abs Dir) +False +``` +* Empty string should not be allowed, and introduction of `.` due to that + gets weird: + +```haskell +System.Path.Posix> fmap getPathString (Right ("." :: Path Rel File)) +Right "." +System.Path.Posix> fmap getPathString (mkPathAbsOrRel "") +Right "." +System.Path.Posix> (Right ("." :: Path Rel File)) == (mkPathAbsOrRel "") +False +System.Path.Posix> takeDirectory ("tmp" :: Path Rel Dir) +. +System.Path.Posix> (getPathString ("." :: Path Rel File) == + getPathString ("" :: Path Rel File)) +True +System.Path.Posix> (("." :: Path Rel File) == ("" :: Path Rel File)) +False +``` + +* It has functions like `<.>/addExtension` which lets you insert an + arbitrary string into a path. + +* Some functions let you produce nonsense (could be prevented by a stricter + type), for example: + +```haskell +System.Path.Posix> takeFileName ("/tmp/" :: Path Abs Dir) +tmp +``` + +I’m being a bit picky here, a bit unfair. But the point is really to show +the kind of things I tried to avoid in `path`. In summary, it’s just hard to +know where things can go wrong, similar to what was going on in +`system-filepath`. + +### data-filepath + +The [`data-filepath`](https://hackage.haskell.org/package/data-filepath) is +also very similar, I discovered it after writing my own at work and was +pleased to see it’s mostly the same. The main differences are: + +* Uses `DataKinds` for the relative/absolute and file/dir distinction which + as I said above is an overhead. + +* Uses a GADT for the path type, which is fine. In my case I wanted to + retain the original string which functions that work on the `FilePath` + (`String`) type already deal with well. It does change the parsing step + somewhat, because it parses into segments. + +* It’s more lenient at parsing (allowing `..` and trailing `.`). + +The API is a bit awkward to just parse a directory, requires a couple +functions to get it (going via `WeakFilePath`), returning only an `Either`, +and there are no functions like parent. But there’s not much to complain +about. It’s a fine library, but I didn’t feel the need to drop my own in +favor of it. Check it out and decide for yourself. + +## Summary + +There’s a growing interest in making practical use of well-typed file path +handling. I think everyone’s wanted it for a while, but few people have +really committed to it in practice. Now that I’ve been using `path` for a +while, I can’t really go back. It’ll be interesting to see what new packages +crop up in the coming year, I expect there’ll be more.