April 8, 2020

REGULAR EXPRESSIONS

  1. Regular Expressions are special sequences of characters that describe a pattern of text that is to be matched.

  2. Alternation uses the pipe symbol, |, and allows us to match the text preceding or following the |.

  3. Character Sets are denoted by a pair of brackets [ ], and let us match one character from a series of characters.

  4. Wildcards are represented by the period or dot. They can match any single character. (letter number, symbol, or whitespace)

  5. Ranges allow us to specify a range of characters for a match.

  6. Shorthand Character Classes, like \w, \d, and \s, represent the ranges representing word characters, digit characters, and whitespace characters, respectively.

  7. Groupings are denoted with parenthesis, ( ), and group parts of a regular expression together. They allow us to limit alternation to part of a regex.

  8. Fixed Quantifiers are represented by curly braces, { }, and let us indicate an exact quantity or range of quantity of a character we wish to match.

  9. Optimal Quantifiers are indicated by the question mark, ?, and allow us to indicate a character in regex that is optional, or can appear either 0 or 1 time.

  10. Kleene Star is denoted by an asterisk, *, and is a quantifier that matches the preceding character 0 or more times.

  11. Kleene Plus is denoted by the plus sign, +, and matches the preceding character 1 or more times.

  12. The Anchor symbols, hat (^) and dollar sign ($) are used to match text at the start and end of a string, respectively.

  13. Glob can open multiple files by using regex matching to get the file names.

    1. import glob

    2. files = glob.glob(“file*.csv”)

    3. df_list = [ ]

    4. for filename in files:

      1. data = pd.read_csv(filename)

      2. df_list.append(data)

    5. df = pd.concat(df_list)

Previous
Previous

April 10, 2020

Next
Next

April 7, 2020