Pyparsing: A Powerful Library for Parsing and Text Processing in Python
Pyparsing is a Python library that simplifies the creation of parsers for structured or semi-structured text data. It delivers a clean and understandable solution for creating grammars, handling tokens, and specifying parsing rules. With its intuitive syntax and powerful capabilities, Pyparsing can be used for creating custom DSLs (Domain Specific Languages), parsing configuration files, or pre-processing data before analysis.
Why Choose Pyparsing?
Unlike traditional parsing libraries that require intricate knowledge of lexers and parsers, Pyparsing provides a high-level API for defining and working with grammars. Let’s explore some of its useful methods and features with examples.
Getting Started
from pyparsing import Word, alphas # Define a simple grammar greeting = Word(alphas) + "," + Word(alphas) + "!" # Parse a string result = greeting.parse_string("Hello, World!") print(result) # Output: ['Hello', ',', 'World', '!']
Dozens of Useful Pyparsing APIs
1. Literal and CaselessLiteral
Match specific strings in a case-sensitive or case-insensitive manner.
from pyparsing import Literal, CaselessLiteral lit_example = Literal("Python") print(lit_example.parse_string("Python")) # Output: ['Python'] caseless_lit_example = CaselessLiteral("python") print(caseless_lit_example.parse_string("Python")) # Output: ['Python']
2. Combine
Concatenate matched tokens into a single string.
from pyparsing import Word, Combine, nums combined = Combine(Word(nums) + "." + Word(nums)) print(combined.parse_string("3.14")) # Output: ['3.14']
3. Optional
Define an optional portion of the grammar.
from pyparsing import Word, alphas, Optional optional_grammar = Word(alphas) + Optional("," + Word(alphas)) print(optional_grammar.parse_string("Hello")) # Output: ['Hello'] print(optional_grammar.parse_string("Hello,World")) # Output: ['Hello', ',', 'World']
4. OneOrMore and ZeroOrMore
Specify repeating patterns in the grammar.
from pyparsing import OneOrMore, Word, alphanums multiple_words = OneOrMore(Word(alphanums)) print(multiple_words.parse_string("123 abc DEF")) # Output: ['123', 'abc', 'DEF']
5. Group
Group elements into nested lists for more structured output.
from pyparsing import Word, alphas, Group grouped = Group(Word(alphas) + Word(alphas)) print(grouped.parse_string("Hello World")) # Output: [['Hello', 'World']]
6. Suppress
Suppress unwanted tokens from the output results.
from pyparsing import Word, alphas, Suppress grammar = Word(alphas) + Suppress(",") + Word(alphas) print(grammar.parse_string("Hello,World")) # Output: ['Hello', 'World']
7. Forward
Create recursive grammars.
from pyparsing import Forward, Word, alphas recursive = Forward() recursive << (Word(alphas) + "|" + recursive | Word(alphas)) print(recursive.parse_string("a|b|c")) # Output: ['a', '|', 'b', '|', 'c']
8. Parse Actions
Attach custom logic to process parsing results.
from pyparsing import Word, alphas def capitalize(tokens): return [t.upper() for t in tokens] grammar = Word(alphas).set_parse_action(capitalize) print(grammar.parse_string("hello")) # Output: ['HELLO']
9. Regex
Use regular expressions for advanced matching.
from pyparsing import Regex regex_grammar = Regex(r"\d{4}-\d{2}-\d{2}") print(regex_grammar.parse_string("2023-10-05")) # Output: ['2023-10-05']
10. Example Application
Here’s a practical example of building a date parser using Pyparsing:
from pyparsing import Word, nums, Combine # Define grammar for dates (YYYY-MM-DD) year = Word(nums, exact=4) month = Word(nums, exact=2) day = Word(nums, exact=2) date = Combine(year + "-" + month + "-" + day) # Parse and validate dates result = date.parse_string("2023-05-15") print(result) # Output: ['2023-05-15']
11. Application Example: Simple Math Expression Parser
from pyparsing import Word, nums, Literal, Forward, Group, ZeroOrMore # Grammar for a simple math expression integer = Word(nums) plus = Literal("+") minus = Literal("-") mult = Literal("*") div = Literal("/") lpar = Literal("(") rpar = Literal(")") expr = Forward() term = Group((integer | Group(lpar + expr + rpar)) + ZeroOrMore((mult | div) + (integer | Group(lpar + expr + rpar)))) expr << term + ZeroOrMore((plus | minus) + term) # Example parsing parsed_expr = expr.parse_string("3 + 5 * ( 2 - 8 )") print(parsed_expr)
Conclusion
Pyparsing provides an extensive set of tools to simplify the development of text parsers. With its clear API and practical use cases, you can tackle a wide variety of parsing tasks. Whether you’re interpreting complex data formats or constructing a DSL, Pyparsing has you covered.