Parsing
Parser combinators are a technique for parsing.
megaparsec
(and a more restrictive but faster library, attoparsec
) are industrial strength parser combinator libraries in Haskell.
External resources¶
This tutorial explains how to use megaparsec
in detail.
This tutorial explains the design and implementation of parser combinators.
What are parsers?¶
A very simple version of a parser
import Data.Char (digitToInt, isDigit)
type Parser a = [Char] -> Maybe (a, [Char])
digit :: Parser Int
digit (x:xs)
| isDigit x = Just (digitToInt x, xs)
| otherwise = Nothing
digit [] = Nothing
Libraries like parsec
, megaparsec
and attoparsec
provide more sophisticated versions of this idea:
repl example
> import Text.Megaparsec
> import Text.Megaparsec.Char
> :set -XOverloadedStrings
> type Parser = Parsec Void T.Text
> parseWith parser = putStrLn . (either errorBundlePretty T.unpack ) . parse parser ""
> aParser = "a" :: Parser T.Text
> parseWith aParser "ab"
a
> parseWith aParser "ba"
1:1:
|
1 | ba
| ^
unexpected 'b'
expecting 'a'
abParser = "ab" :: Parser T.Text
> parseWith abParser "ab"
ab
> parseWith abParser "ba"
1:1:
|
1 | ba
| ^^
unexpected "ba"
expecting "ab"
What are combinators¶
Parser combinator libraries provide not just simple parsers like the above, but the ability to combine parsers to build more complex ones.
Here are examples from megaparsec
:
{-# LANGUAGE OverloadedStrings #-}
import Text.Megaparsec
import Data.Text ( Text, unpack )
import Data.Void (Void)
import Text.Megaparsec.Char (space)
type Parser = Parsec Void Text
-- helper function to run parser
parseAndPrint :: Show b => Parsec Void Text b -> Text -> IO ()
parseAndPrint parser line =
either
(putStrLn . errorBundlePretty)
print
(runParser parser "" line)
-- parse 'a' then 'b'
abParser :: Parser Text
abParser = "ab" -- (2)!
-- parse 'b' then 'a'
baParser :: Parser Text
baParser = "ba"
-- parse 'ab' then 'ba'
abbaParser :: Parser Text -- (1)!
abbaParser = do
ab <- abParser -- (3)!
ba <- baParser -- (4)!
return (ab <> ba) -- (5)!
-- parse either 'ba' or 'ab'
baOrabParser :: Parser Text
baOrabParser = baParser <|> abParser -- (6)!
main :: IO ()
main = parseAndPrint baOrabParser "ab"
This runs as follows:
repl example
> parseAndPrint baParser "ba"
"ba"
> parseAndPrint baParser "ab"
1:1:
|
1 | ab
| ^^
unexpected "ab"
expecting "ba"
> parseAndPrint baOrabParser "ab"
"ab"
> parseAndPrint abbaParser "abba"
"abba"
> parseAndPrint abbaParser "baab"
1:1:
|
1 | baab
| ^^
unexpected "ba"
expecting "ab"
Parser
is a monad, so we can use do-notation, which is very convenient for building complex parsers out of simpler ones.megaparsec
takes advantage of OverloadedStrings, so that"ab"
is actually a parser.- This means: run
abParser
and name the resultab
. - This means: run
baParser
(on what remains after runningabParser
previously) and name the resultba
. - This means: the result of the parser is the value
ab <> ba
. - This means: try first
baParser
and if it fails, tryabParser
.
With custom types¶
One of the main appeals of Haskell's parser combinators is that the output of your parser can be a custom type:
data PieceType = Bishop | Rook deriving Show
data Color = White | Black deriving Show
data Piece = Piece PieceType Color deriving Show
pieceTypeParser :: Parser PieceType
pieceTypeParser = do
str <- "bishop" <|> "rook"
return $ case str of
"bishop" -> Bishop
_ -> Rook
colorParser :: Parser Color
colorParser = do
str <- "white" <|> "black"
return $ case str of
"white" -> White
_ -> Black
pieceParser :: Parser Piece
pieceParser = do
color <- colorParser
space
pieceType <- pieceTypeParser
return (Piece pieceType color)
optionalColorParser :: Parser Piece
optionalColorParser = do
maybeColor <- optional colorParser
pieceType <- pieceTypeParser
return $ case maybeColor of
Just color -> Piece pieceType color
Nothing -> Piece pieceType White
Examples:
repl example
> parseAndPrint optionalColorParser "white rook"
Piece Rook White
> parseAndPrint optionalColorParser "rook"
Piece Rook White
> parseAndPrint pieceParser "white rook"
Piece Rook White'
> parseAndPrint pieceParser "rook"
1:1:
|
1 | rook
| ^^^^
unexpected "rook"
expecting "black" or "white"