webcc/content/ghc-api.md

461 lines
17 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

title: GHC API tutorial
date: 2014-06-18 00:00
tags: haskell
---
# Intro
*Disclamer: these notes have been written a couple of years ago. While
some of the basic facts are still the case, some details may be
subject to change. Trust, but verify!*
Its hard to get into writing code that uses GHC API. The API itself is and the number of various functions and options significantly outnumber the amount of tutorials around.
In this note I will try to elaborate on some of the peculiar, interesting problems Ive encountered during my experience writing code that uses GHC API and also provide various tips I find useful.
Many of the points I make in this note are actually trivial, but nevertheless I made all of the mistakes mentioned in here myself, perhaps due to my naive approach of quickly diving in and experimenting, instead of reading into the documentation and source code.
This note is an adaptation of a [series of my blog posts](http://parenz.wordpress.com/tag/ghc/) on the subject.
In order to run examples presented in this note you'll have to
1. Install the `ghc-paths` library
2. Compile the code with `ghc -package ghc file.hs` OR
3. Expose the `ghc-7.8.x` library: `ghc-pkg expose ghc-7.8.2`
The up to date Haddocks for GHC are located [here](http://www.haskell.org/ghc/docs/latest/html/libraries/ghc/index.html).
# Interpreted, compiled, and package modules
There are different ways of bringing contents of Haskell modules into scope, a process that is necessary for evaluating/interpreting bits of code on-the-fly. We will walk through some of the caveats of this process.
## Interpreted modules
Imagine the following situation: we have a Haskell source file with code we want to load dynamically and evaluate. That is a basic task in the GHC API terms, but there are actually different ways of doing that.
Let us have a file 'test.hs' containing the code we want to access:
```haskell
module Test (test) where
test :: Int
test = 123
```
The basic way to get the 'test' data would be to load `Test` as an interpreted module:
```haskell
import Control.Applicative
import DynFlags
import GHC
import GHC.Paths
import MonadUtils (liftIO)
import Unsafe.Coerce
main = defaultErrorHandler defaultFatalMessager defaultFlushOut $
runGhc (Just libdir) $ do
-- we have to call 'setSessionDynFlags' before doing
-- everything else
dflags <- getSessionDynFlags
-- If we want to make GHC interpret our code on the fly, we
-- ought to set those two flags, otherwise we
-- wouldn't be able to use 'setContext' below
setSessionDynFlags $ dflags { hscTarget = HscInterpreted
, ghcLink = LinkInMemory
}
setTargets =<< sequence [guessTarget "test.hs" Nothing]
load LoadAllTargets
-- Bringing the module into the context
setContext [IIModule $ mkModuleName "Test"]
-- evaluating and running an action
act <- unsafeCoerce <$> compileExpr "print test"
liftIO act
```
The reason that we have to use `HscInterpreted` and `LinkInMemory` is that otherwise it would compile test.hs in the current directory and leave test.hi and test.o files, which we would not be able to load in the interpreted mode. However, `setContext`, will try to bring the code in those files first, when looking for the module `Test`.
$ ghc -package ghc --make Interp.hs
[1 of 1] Compiling Main ( Interp.hs, Interp.o )
Linking Interp ...
$ ./Interp
123
Yay, it works! But let's try something fancier like printing a list of integers, one-by-one. For that, we will use an awesome `forM_` function.
--- the rest of the file is the same
--- ...
-- evaluating and running an action
act <- unsafeCoerce <$> compileExpr "forM_ [1,2,test] print"
liftIO act
When we try to run it:
$ ./Interp
Interp: panic! (the 'impossible' happened)
(GHC version 7.8.2 for x86_64-apple-darwin):
Not in scope: forM_
Well, fair enough, where can we expect GHC to get the `forM_` from? We have to bring `Control.Monad` into the scope in order to do that.
This brings us to the next section
## Package modules
Naively, we might want to load `Control.Monad` in a similar fashion as we did with loading `test.hs`.
```haskell
main = defaultErrorHandler defaultFatalMessager defaultFlushOut $
runGhc (Just libdir) $ do
dflags <- getSessionDynFlags
setSessionDynFlags $ dflags { hscTarget = HscInterpreted
, ghcLink = LinkInMemory
}
setTargets =<< sequence [ guessTarget "test.hs" Nothing
, guessTarget "Control.Monad" Nothing]
load LoadAllTargets
-- Bringing the module into the context
setContext [IIModule $ mkModuleName "Test"]
-- evaluating and running an action
act <- unsafeCoerce <$> compileExpr "forM_ [1,2,test] print"
liftIO act
```
Unfortunately, this attempt fails:
Interp: panic! (the 'impossible' happened)
(GHC version 7.8.2 for x86_64-apple-darwin):
module Control.Monad is a package module
Huh, what? I thought `guessTarget` works on all kinds of modules.
Well, it does. But it doesn't "load the module", it merely sets it as the *target for compilation*. Basically, it (together with `load LoadAllTargets`) is equivalent to calling `ghc --make`. And surely it doesn't make much sense trying to `ghc --make Control.Monad` when `Control.Monad` is a module from the base package. What we need to do instead is to bring the compiled `Control.Monad` module into scope. Luckily it's not very hard to do with the help of the `simpleImportDecl :: ModuleName -> ImportDecl name` function:
```haskell
main = defaultErrorHandler defaultFatalMessager defaultFlushOut $
runGhc (Just libdir) $ do
-- we have to call 'setSessionDynFlags' before doing
-- everything else
dflags <- getSessionDynFlags
-- If we want to make GHC interpret our code on the fly, we
-- ought to set those two flags, otherwise we
-- wouldn't be able to use 'setContext' below
setSessionDynFlags $ dflags { hscTarget = HscInterpreted
, ghcLink = LinkInMemory
}
setTargets =<< sequence [ guessTarget "test.hs" Nothing ]
load LoadAllTargets
-- Bringing the module into the context
setContext [ IIModule $ mkModuleName "Test"
, IIDecl . simpleImportDecl
. mkModuleName $ "Control.Monad" ]
-- evaluating and running an action
act <- unsafeCoerce <$> compileExpr "forM_ [1,2,test] print"
liftIO act
```
And this works like a charm:
$ ./Interp
1
2
123
## Compiled modules
What we have implemented so far corresponds to the `:load* test.hs`
command in GHCi, which gives us the full access to the source code of
the program. To illustrate this let's modify our test file:
```haskell
module Test (test) where
test :: Int
test = 123
test2 :: String
test2 = "Hi"
```
Now, if we want to load that file as an interpreted module and evaluate
`test2` nothing will stop us from doing so.
$ ./Interp2
(123,"Hi")
If we want to use the compiled module (like `:load test.hs` in GHCi),
we have to bring `Test` into the context the same way we dealt with
`Control.Monad`:
```haskell
main = defaultErrorHandler defaultFatalMessager defaultFlushOut $
runGhc (Just libdir) $ do
dflags <- getSessionDynFlags
setSessionDynFlags $ dflags { hscTarget = HscInterpreted
, ghcLink = LinkInMemory
}
setTargets =<< sequence [ guessTarget "Test" Nothing ]
load LoadAllTargets
-- Bringing the module into the context
setContext [ IIDecl $ simpleImportDecl (mkModuleName "Test")
, IIDecl $ simpleImportDecl (mkModuleName "Prelude")
]
printExpr "test"
printExpr "test2"
printExpr :: String -> Ghc ()
printExpr expr = do
liftIO $ putStrLn ("-- Going to print " ++ expr)
act <- unsafeCoerce <$> compileExpr ("print (" ++ expr ++ ")")
liftIO act
```
The output:
$ ./Interp2
-- Going to print test
123
-- Going to print test2
target: panic! (the 'impossible' happened)
(GHC version 7.6.3 for x86_64-apple-darwin):
Not in scope: `test2'
Perhaps you meant `test' (imported from Test)
Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug
# Error handling
Our exercises above produced a number of GHC panics/erros. By default
you can expect GHC to spew all the errors onto your screen, but for
different purposes you might want to, e.g. log them.
Naturally at first I tried the exception handling mechanism:
```haskell
-- Main.hs:
import GHC
import GHC.Paths
import MonadUtils
import Exception
import Panic
import Unsafe.Coerce
import System.IO.Unsafe
-- I thought this code would handle the exception
handleException :: (ExceptionMonad m, MonadIO m)
=> m a -> m (Either String a)
handleException m =
ghandle (\(ex :: SomeException) -> return (Left (show ex))) $
handleGhcException (\ge -> return (Left (showGhcException ge ""))) $
flip gfinally (liftIO restoreHandlers) $
m >>= return . Right
-- Initializations, needed if you want to compile code on the fly
initGhc :: Ghc ()
initGhc = do
dfs <- getSessionDynFlags
setSessionDynFlags $ dfs { hscTarget = HscInterpreted
, ghcLink = LinkInMemory }
return ()
-- main entry point
main = test >>= print
test :: IO (Either String Int)
test = handleException $ runGhc (Just libdir) $ do
initGhc
setTargets =<< sequence [ guessTarget "file1.hs" Nothing ]
graph <- depanal [] False
loaded <- load LoadAllTargets
-- when (failed loaded) $ throw LoadingException
setContext (map (IIModule . moduleName . ms_mod) graph)
let expr = "run"
res <- unsafePerformIO . unsafeCoerce <$> compileExpr expr
return res
-- file1.hs:
module Main where
main = return ()
run :: IO Int
run = do
n <- x
return (n+1)
```
The problem is when I run the 'test' function above I receive the
following output:
h> test
test/file1.hs:4:10: Not in scope: `x'
Left "Cannot add module Main to context: not a home module"
it :: Either String Int
It appears that the exception handler did cat *an* error, but a
peculiar one, and not that one I wanted to catch.
## Solution
We've studied the GHC API together with Luite Stegeman and I think
we've found a more or less satisfactory solution.
Errors are handled using the
[LogAction](https://downloads.haskell.org/~ghc/latest/docs/html/libraries/ghc-7.10.2/DynFlags.html#t:LogAction)
specified in the
[DynFlags](https://downloads.haskell.org/~ghc/latest/docs/html/libraries/ghc-7.10.2/DynFlags.html#t:DynFlags)
for your GHC session. So to fix this you need to change 'log_action'
parameter in dynFlags. For example, you can do this:
```haskell
initGhc = do
..
ref <- liftIO $ newIORef ""
dfs <- getSessionDynFlags
setSessionDynFlags $ dfs { hscTarget = HscInterpreted
, ghcLink = LinkInMemory
, log_action = logHandler ref -- ^ this
}
-- LogAction == DynFlags -> Severity -> SrcSpan -> PprStyle -> MsgDoc -> IO ()
logHandler :: IORef String -> LogAction
logHandler ref dflags severity srcSpan style msg =
case severity of
SevError -> modifyIORef' ref (++ printDoc)
SevFatal -> modifyIORef' ref (++ printDoc)
_ -> return () -- ignore the rest
where cntx = initSDocContext dflags style
locMsg = mkLocMessage severity srcSpan msg
printDoc = show (runSDoc locMsg cntx)
```
# Package databases
A *package database* is a directory where the information about your
installed packages is stored. For each package registered in the
database there is a .conf file with the package details. The .conf
file contains the package description (just like in the .cabal file)
as well as path to binaries and a list of resolved dependencies:
$ cat aeson-0.6.1.0.1-5a107a6c6642055d7d5f98c65284796a.conf
name: aeson
version: 0.6.1.0.1
id: aeson-0.6.1.0.1-5a107a6c6642055d7d5f98c65284796a
<..snip..>
import-dirs: /home/dan/.cabal/lib/aeson-0.6.1.0.1/ghc-7.7.20130722
library-dirs: /home/dan/.cabal/lib/aeson-0.6.1.0.1/ghc-7.7.20130722
<..snip..>
depends: attoparsec-0.10.4.0-acffb7126aca47a107cf7722d75f1f5e
base-4.7.0.0-b67b4d8660168c197a2f385a9347434d
blaze-builder-0.3.1.1-9fd49ac1608ca25e284a8ac6908d5148
bytestring-0.10.3.0-66e3f5813c3dc8ef9647156d1743f0ef
<..snip..>
You can use `ghc-pkg` to manage installed packages on your system. For
example, to list all the packages you've installed run `ghc-pkg list`.
To list all the package databases that are automatically picked up by
`ghc-pkg` do the following:
$ ghc-pkg nonexistentpkg
/home/dan/ghc/lib/ghc-7.7.20130722/package.conf.d
/home/dan/.ghc/i386-linux-7.7.20130722/package.conf.d
See `ghc-pkg --help` or the [online documentation](http://www.haskell.org/ghc/docs/latest/html/users_guide/packages.html#package-management) for more details.
## Adding a package db
By default GHC knows only about two package databases: the global
package database (usually `/usr/lib/ghc-something/` on Linux) and the
user-specific database (usually `~/.ghc/lib`). In order to pick up a
package that resides in a different package database you have to
employ some tricks.
For some reason GHC API does not export an clear and easy-to-use
function that would allow you to do that, although the code we need is
present in the GHC sources.
The way this whole thing works is the following:
1. GHC calls [initPackages](https://downloads.haskell.org/~ghc/latest/docs/html/libraries/ghc-7.10.2/Packages.html#v:initPackages),
which reads the database files and sets up the [internal
table](https://downloads.haskell.org/~ghc/latest/docs/html/libraries/ghc-7.10.2/Packages.html#t:PackageState)
of package information
2. The reading of package databases is performed via the
[readPackageConfigs](https://downloads.haskell.org/~ghc/latest/docs/html/libraries/ghc-7.10.2/Packages.html#v:readPackageConfigs)
function. It reads the user package database, the global package
database, the "GHC_PACKAGE_PATH" environment variable, and *applies
the extraPkgConfs function*, which is a dynflag and has the following
type: `extraPkgConfs :: [PkgConfRef] -> [PkgConfRef]`
([PkgConfRef](https://downloads.haskell.org/~ghc/latest/docs/html/libraries/ghc-7.10.2/DynFlags.html#t:PkgConfRef)
is a type representing the package database). The `extraPkgConf` flag
is supposed to represent the `-package-db` command line option.
3. Once the database is parsed, the loaded packages are stored in the
`pkgDatabase` dynflag which is a list of [PackageConfig](https://downloads.haskell.org/~ghc/latest/docs/html/libraries/ghc-7.10.2/PackageConfig.html#t:PackageConfig)s
So, in order to add a package database to the current session we have
to simply modify the `extraPkgConfs` dynflag. Actually, there is
already a function [present](https://downloads.haskell.org/~ghc/latest/docs/html/libraries/ghc-7.10.2/src/DynFlags.html#line-3656) in the GHC source that does exactly what we
need: `addPkgConfRef :: PkgConfRef -> DynP ()`. Unfortunately it's not
exported so we can't use it in our own code. I rolled my own functions
that I am using in the interactive-diagrams project, feel free to copy
them:
```haskell
-- | Add a package database to the Ghc monad
#if __GLASGOW_HASKELL_ >= 707
addPkgDb :: GhcMonad m => FilePath -> m ()
#else
addPkgDb :: (MonadIO m, GhcMonad m) => FilePath -> m ()
#endif
addPkgDb fp = do
dfs <- getSessionDynFlags
let pkg = PkgConfFile fp
let dfs' = dfs { extraPkgConfs = (pkg:) . extraPkgConfs dfs }
setSessionDynFlags dfs'
#if __GLASGOW_HASKELL_ >= 707
_ <- initPackages dfs'
#else
_ <- liftIO $ initPackages dfs'
#endif
return ()
-- | Add a list of package databases to the Ghc monad
-- This should be equivalen to
-- > addPkgDbs ls = mapM_ addPkgDb ls
-- but it is actaully faster, because it does the package
-- reintialization after adding all the databases
#if __GLASGOW_HASKELL_ >= 707
addPkgDbs :: GhcMonad m => [FilePath] -> m ()
#else
addPkgDbs :: (MonadIO m, GhcMonad m) => [FilePath] -> m ()
#endif
addPkgDbs fps = do
dfs <- getSessionDynFlags
let pkgs = map PkgConfFile fps
let dfs' = dfs { extraPkgConfs = (pkgs ++) . extraPkgConfs dfs }
setSessionDynFlags dfs'
#if __GLASGOW_HASKELL_ >= 707
_ <- initPackages dfs'
#else
_ <- liftIO $ initPackages dfs'
#endif
return ()
```
## Links
- [Packages](https://downloads.haskell.org/~ghc/latest/docs/html/libraries/ghc-7.10.2/Packages.html) module,
contains other functions that modify/make use of `extraPkgConfs`