Discussion:
Line by line parsing
(too old to reply)
Nikolay Shaplov
2011-11-26 10:52:29 UTC
Permalink
Hi! I have a newbie question:

I'd like to parse a file line by line... Line can be terminated by \n or by
EOF

I do something like this:

===========================
$Parse::RecDescent::skip = '';

my $grammar = q{
Page: Line(s) {print "Finished! \n"}
Line: /^.*/ /\n|\Z/ {print "Line: '$item[1]'\n"}
};

my $parser = new Parse::RecDescent($grammar);

my $text = "aaaaa\nbbbbbb\nccccc";

if ($parser->Page($text))
{
print "Happy!\n"
} else
{
print "Unhapy :-(\n"
}
===========================

but I get an empty line at the end of the output:

==================
Line: 'aaaaa'
Line: 'bbbbbb'
Line: 'ccccc'
Line: ''
Finished!
Happy!
===================

I can not understand why it is here... $::RD_TRACE = 1; does not give me any
useful information....

Can you help me by

1. explaining why does it happens

2. telling what is the most right way to do such line by line parsing? (In
future it should parse something similar to MediaWiki markup).
Damian Conway
2011-11-28 11:11:12 UTC
Permalink
Hi Nikolay,

Your grammar allows a line to be completely empty
at the end of the input, so the end sequence is:

Try to match Line against 'ccccc'
Match
Input is now ''
Try to match Line against ''
Match
Parser detects no change in the input and terminates.

What you actually want is to parse lines that are *separated*
by newlines. RecDescent has a convenient way of doing that:

Rulename(s /separator/)

So you could rewrite your grammar like so:

my $grammar = q{
Page:
<skip:''>
Line(s /\n/) {print "Finished! \n"}

Line:
/[^\n]*/ {print "Line: '$item[1]'\n"}
};

Try this version and you'll get what you expected.

Note also that I used the <skip:...> directive instead of setting
the global variable. The directive version is much safer.

Damian

Loading...