Skip to content

Parsing

Parser combinators are a technique for parsing.

megaparsec (and a more restrictive but faster library, attoparsec) are industrial strength parser combinator libraries in Haskell.

External resources

This tutorial explains how to use megaparsec in detail.

This tutorial explains the design and implementation of parser combinators.

What are parsers?

A very simple version of a parser
import Data.Char (digitToInt, isDigit)

type Parser a = [Char] -> Maybe (a, [Char]) 

digit :: Parser Int 
digit (x:xs) 
  | isDigit x = Just (digitToInt x, xs) 
  | otherwise = Nothing
digit [] = Nothing

Libraries like parsec, megaparsec and attoparsec provide more sophisticated versions of this idea:

repl example
> import Text.Megaparsec
> import Text.Megaparsec.Char
> :set -XOverloadedStrings

> type Parser = Parsec Void T.Text


> parseWith parser = putStrLn . (either errorBundlePretty T.unpack ) . parse parser ""


> aParser = "a" :: Parser T.Text

> parseWith aParser "ab"
a
> parseWith aParser "ba" 
1:1:
  |
1 | ba
  | ^
unexpected 'b'
expecting 'a'


abParser = "ab" :: Parser T.Text
> parseWith abParser "ab" 
ab

> parseWith abParser "ba"
1:1:
  |
1 | ba
  | ^^
unexpected "ba"
expecting "ab"

What are combinators

Parser combinator libraries provide not just simple parsers like the above, but the ability to combine parsers to build more complex ones.

Here are examples from megaparsec:

{-# LANGUAGE OverloadedStrings #-}

import Text.Megaparsec
import Data.Text ( Text, unpack ) 
import Data.Void (Void)
import Text.Megaparsec.Char (space)


type Parser = Parsec Void Text

-- helper function to run parser
parseAndPrint :: Show b => Parsec Void Text b -> Text -> IO ()
parseAndPrint parser line =
  either
    (putStrLn . errorBundlePretty)
    print
    (runParser parser "" line)

-- parse 'a' then 'b'
abParser :: Parser Text
abParser = "ab" -- (2)!

-- parse 'b' then 'a'
baParser :: Parser Text
baParser = "ba"

-- parse 'ab' then 'ba'
abbaParser :: Parser Text -- (1)!
abbaParser = do 
  ab <- abParser -- (3)!
  ba <- baParser -- (4)!
  return (ab <> ba) -- (5)!

-- parse either 'ba' or 'ab'
baOrabParser :: Parser Text
baOrabParser = baParser <|> abParser -- (6)!

main :: IO ()
main = parseAndPrint baOrabParser "ab"

This runs as follows:

repl example
> parseAndPrint baParser "ba"
"ba"
> parseAndPrint baParser "ab"
1:1:
  |
1 | ab
  | ^^
unexpected "ab"
expecting "ba"

> parseAndPrint baOrabParser "ab"
"ab"
> parseAndPrint abbaParser "abba"
"abba"
> parseAndPrint abbaParser "baab"
1:1:
  |
1 | baab
  | ^^
unexpected "ba"
expecting "ab"
  1. Parser is a monad, so we can use do-notation, which is very convenient for building complex parsers out of simpler ones.
  2. megaparsec takes advantage of OverloadedStrings, so that "ab" is actually a parser.
  3. This means: run abParser and name the result ab.
  4. This means: run baParser (on what remains after running abParser previously) and name the result ba.
  5. This means: the result of the parser is the value ab <> ba.
  6. This means: try first baParser and if it fails, try abParser.

With custom types

One of the main appeals of Haskell's parser combinators is that the output of your parser can be a custom type:

data PieceType = Bishop | Rook deriving Show
data Color = White | Black deriving Show
data Piece = Piece PieceType Color deriving Show

pieceTypeParser :: Parser PieceType 
pieceTypeParser = do
  str <- "bishop" <|> "rook"
  return $ case str of
    "bishop" -> Bishop
    _ -> Rook

colorParser :: Parser Color
colorParser = do
  str <- "white" <|> "black"
  return $ case str of
    "white" -> White
    _ -> Black

pieceParser :: Parser Piece
pieceParser = do
  color <- colorParser
  space 
  pieceType <- pieceTypeParser
  return (Piece pieceType color)

optionalColorParser :: Parser Piece
optionalColorParser = do
  maybeColor <- optional colorParser 
  pieceType <- pieceTypeParser
  return $ case maybeColor of
    Just color -> Piece pieceType color
    Nothing -> Piece pieceType White

Examples:

repl example
> parseAndPrint optionalColorParser "white rook"
Piece Rook White
> parseAndPrint optionalColorParser "rook"
Piece Rook White
> parseAndPrint pieceParser "white rook"
Piece Rook White'
> parseAndPrint pieceParser "rook"
1:1:
  |
1 | rook
  | ^^^^
unexpected "rook"
expecting "black" or "white"

Comments