CSCI400 – 11/30 Handout
Recap
In the
lecture,
we constructed our own parsers and parser combinators, where ‘parser
combinators’ basically means ‘functions that can be used to combine parsers’. In
the lecture, our combinators were andThen
and stuff
, which we then related
to the functions in the Monad
typeclass: (>>=)
(‘bind’) and return
.
-- same purpose as >>=
andThen :: Parser s a -> (a -> Parser s b) -> Parser s b
andThen parse next = \input ->
case parse input of
Nothing -> Nothing
Just (a, input') -> next a input'
-- same purpose as return
stuff :: a -> Parser s a
stuff a = \input -> Just (a, input)
These functions, andThen
and stuff
, allowed us to take a parser that looked
like this:
versionDumb :: Parser String (Int, Int)
versionDumb = \input ->
case string "HTTP/" input of
Nothing -> Nothing
Just (_,rest1) ->
case number rest1 of
Nothing -> Nothing
Just (maj,rest2) ->
case string "." rest2 of
Nothing -> Nothing
Just (n,rest3) ->
case number rest3 of
Nothing -> Nothing
Just (min,rest4) -> Just ((maj,min),rest4)
and instead write it like this:
version2 :: Parser String (Int, Int)
version2 =
string "HTTP/" `andThen` \_ ->
number `andThen` \maj ->
string "." `andThen` \_ ->
number `andThen` \min ->
stuff (maj,min)
While this is certainly better than versionDumb
, it’s still not that pretty to look at.
do
Notation
Let’s assume that we’d implemented the Monad
typeclass for our Parser
type.
As a reminder, the Monad
typeclass looks like this:
class Monad m where
(>>=) :: m a -> (a -> m b) -> m b
return :: a -> m a
This means that we’d have to implement the (>>=)
and return
functions for
our Parser
type. We can leverage the functions we’ve already written here:
instance Monad (Parser s) where
(>>=) = andThen
return = stuff
The andThen
and stuff
functions behaved exactly as we’d want (>>=)
and
return
to, so we just use those functions in defining the Monad
typeclass
instance for our parser type.
In doing this, we’d be able to rewrite our version2
parser to use these
functions by replacing andThen
with >>=
and stuff
with return
:
version3 :: Parser String (Int, Int)
version3 =
string "HTTP/" >>= \_ ->
number >>= \maj ->
string "." >>= \_ ->
number >>= \min ->
return (maj,min)
This is just as ugly though – all we’ve done so far is rename a couple of functions.
Now we come to something that you may have seen around, but we haven’t talked
about yet: do
notation. do
notation is simply syntactic sugar for
expressions that are built using the Monad
functions (>>=)
and
return
. We can use do
notation to rewrite version3
as:
version4 :: Parser String (Int, Int)
version4 = do
string "HTTP/"
maj <- number
string "."
min <- number
return (maj, min)
Note that, as far as the compiler is concerned, version4
is exactly the same
as version3
. The compiler would simply de-sugar version4
to version3
,
because do
notation is just syntactic sugar.
You can use do
notation for any function that returns a type that is a part of
the Monad
typeclass, which our parser type is now. This provides a means of
using the Monad
design pattern without the ugly syntax.
Applicative Parsing
do
notation gives us one way to clean up our parser definitions. The thing
about do
, though, is that 1. it’s fairly straightforward to use, but 2. it’s
difficult to really understand what it’s doing.
So let’s approach this from a different angle. Let’s write our parsers using a
few other typeclasses, namely Functor
, which gives us the (<$>)
(‘fmap’)
operator, and Applicative
, which gives us the (<*>)
(‘apply’) operator and a
few others (namely, (*>)
and (<*)
). We’ll also see the Alternative
typeclass, which gives us (<|>)
– you’ll get an idea of how this operator
works from the chapter reading. We’ll call this style ‘applicative Parsec’.
Applicative Parsec will allow us to take parsers that look like this:
majorVersion :: Parser String Int
majorVersion = do
string "HTTP/"
maj <- number
return maj
And rewrite them like this:
majorVersion :: Parser String Int
majorVersion = string "HTTP/" *> number
As another example, consider a Person
data type:
data Person = Person Name Age
type Name = String
type Age = Int
Given parsers for a Name
and Age
, called p_name
and p_age
, resp.:
-- definitions not given, just type signatures
p_name :: Parser Name
p_age :: Parser Age
Note: The type we’ve been using for parsing has been Parser s a
, where s
is
the type of the input to be parsed (String
for us) and a
is the result of
the parsing, AKA the type of the value that the parser holds for us. When we
write parsers for the assignment, the s
will be removed and we’ll just have
Parser a
as the generic parser type – this just means that the fact that
we’re always parsing a String
is implicit in the Parser
type itself, so we
don’t include the additional s
type variable. I’m doing the same thing in
this example for parsing a Person
.
We can write a parser for a Person
by combining these two parsers. Using do
notation, this would look like:
p_person :: Parser Person
p_person = do
name <- p_name
age <- p_age
return $ Person name age
Translating this to use the applicative Parsec style would give us:
p_person :: Parser Person
p_person = Person <$> p_name <*> p_age
You might think one of these is cleaner than the other, but the latter form is the one we’ll be using on the second half of the project.
From here, you should read the Parsec
chapter of Real World
Haskell. Parsec
is a
parsing library in Haskell. The one we’ll actually be using is
megaparsec
, but
they’re similar enough that the book chapter should give you a solid start.
Note that the chapter starts off by using do
notation to define parsers,
then moves to the notation that we want to use, which is ‘applicative Parsec’.
However, it explains applicative Parsec using do
notation, so it’s helpful to
understand both.