Skip to content

Parsing

Parser combinators are a technique for parsing.

megaparsec (and a more restrictive but faster library, attoparsec) are industrial strength parser combinator libraries in Haskell.

External resources

This tutorial explains how to use megaparsec in detail.

This tutorial explains the design and implementation of parser combinators.

What are parsers?

A very simple version of a parser
import Data.Char (digitToInt, isDigit)

type Parser a = [Char] -> Maybe (a, [Char]) -- (1)!

digit :: Parser Int -- (2)!
digit (x:xs) 
  | isDigit x = Just (digitToInt x, xs) 
  | otherwise = Nothing
digit [] = Nothing
  1. A Parser (parameterized by some type a), takes a sequence of characters to be parsed, and either fails or returns a parse result from some initial segment of the sequence and the remaining sequence with that segment removed.

  2. digit is a Parser parameterized by Int (which is the type of its result). It will try to strip a numeral (1, 2, etc..) from the input sequence. If the sequence does not begin with a digit, it will fail (i.e., return Nothing). If it succeeds, it will return an Int, not a Char.

Libraries like parsec, megaparsec and attoparsec provide more sophisticated versions of this idea:

repl example
> import Text.Megaparsec
> import Text.Megaparsec.Char
> :set -XOverloadedStrings

> type Parser = Parsec Void T.Text

-- (1)!
> parseWith parser = putStrLn . (either errorBundlePretty T.unpack ) . parse parser ""

-- (3)!
> aParser = "a" :: Parser T.Text

> parseWith aParser "ab"
a
> parseWith aParser "ba" -- (2)!
1:1:
  |
1 | ba
  | ^
unexpected 'b'
expecting 'a'


abParser = "ab" :: Parser T.Text
> parseWith abParser "ab" 
ab

> parseWith abParser "ba"
1:1:
  |
1 | ba
  | ^^
unexpected "ba"
expecting "ab"
  1. A helper function for parsing.
  2. megaparsec generates pretty error messages when it fails.
  3. Parser T.Text is the type of a parser that returns a result of type T.Text

What are combinators

Parser combinator libraries provide not just simple parsers like the above, but the ability to combine parsers to build more complex ones.

Here are examples from megaparsec:

{-# LANGUAGE OverloadedStrings #-}

import Text.Megaparsec
import Data.Text ( Text, unpack ) 
import Data.Void (Void)
import Text.Megaparsec.Char (space)


type Parser = Parsec Void Text

-- helper function to run parser
parseAndPrint :: Show b => Parsec Void Text b -> Text -> IO ()
parseAndPrint parser line =
  either
    (putStrLn . errorBundlePretty)
    print
    (runParser parser "" line)

-- parse 'a' then 'b'
abParser :: Parser Text
abParser = "ab" -- (2)!

-- parse 'b' then 'a'
baParser :: Parser Text
baParser = "ba"

-- parse 'ab' then 'ba'
abbaParser :: Parser Text -- (1)!
abbaParser = do 
  ab <- abParser -- (3)!
  ba <- baParser -- (4)!
  return (ab <> ba) -- (5)!

-- parse either 'ba' or 'ab'
baOrabParser :: Parser Text
baOrabParser = baParser <|> abParser -- (6)!

main :: IO ()
main = parseAndPrint baOrabParser "ab"

This runs as follows:

repl example
> parseAndPrint baParser "ba"
"ba"
> parseAndPrint baParser "ab"
1:1:
  |
1 | ab
  | ^^
unexpected "ab"
expecting "ba"

> parseAndPrint baOrabParser "ab"
"ab"
> parseAndPrint abbaParser "abba"
"abba"
> parseAndPrint abbaParser "baab"
1:1:
  |
1 | baab
  | ^^
unexpected "ba"
expecting "ab"
  1. Parser is a monad, so we can use do-notation, which is very convenient for building complex parsers out of simpler ones.
  2. megaparsec takes advantage of OverloadedStrings, so that "ab" is actually a parser.
  3. This means: run abParser and name the result ab.
  4. This means: run baParser (on what remains after running abParser previously) and name the result ba.
  5. This means: the result of the parser is the value ab <> ba.
  6. This means: try first baParser and if it fails, try abParser.

With custom types

One of the main appeals of Haskell's parser combinators is that the output of your parser can be a custom type:

data PieceType = Bishop | Rook deriving Show
data Color = White | Black deriving Show
data Piece = Piece PieceType Color deriving Show

pieceTypeParser :: Parser PieceType -- (3)!
pieceTypeParser = do
  str <- "bishop" <|> "rook"
  return $ case str of
    "bishop" -> Bishop
    _ -> Rook

colorParser :: Parser Color
colorParser = do
  str <- "white" <|> "black"
  return $ case str of
    "white" -> White
    _ -> Black

pieceParser :: Parser Piece
pieceParser = do
  color <- colorParser
  space -- (1)!
  pieceType <- pieceTypeParser
  return (Piece pieceType color)

optionalColorParser :: Parser Piece
optionalColorParser = do
  maybeColor <- optional colorParser -- (2)!
  pieceType <- pieceTypeParser
  return $ case maybeColor of
    Just color -> Piece pieceType color
    Nothing -> Piece pieceType White
  1. 0 or more whitespace.
  2. optional takes colorParser and returns a new parser that either parses nothing or colorParser. The result, appropriately, is a Maybe Color.
  3. The result type is the custom type just defined above.

Examples:

repl example
> parseAndPrint optionalColorParser "white rook"
Piece Rook White
> parseAndPrint optionalColorParser "rook"
Piece Rook White
> parseAndPrint pieceParser "white rook"
Piece Rook White'
> parseAndPrint pieceParser "rook"
1:1:
  |
1 | rook
  | ^^^^
unexpected "rook"
expecting "black" or "white"

Comments