April 8, 2020

Apr 8

Written By Case Muller

REGULAR EXPRESSIONS

Regular Expressions are special sequences of characters that describe a pattern of text that is to be matched.
Alternation uses the pipe symbol, |, and allows us to match the text preceding or following the |.
Character Sets are denoted by a pair of brackets [ ], and let us match one character from a series of characters.
Wildcards are represented by the period or dot. They can match any single character. (letter number, symbol, or whitespace)
Ranges allow us to specify a range of characters for a match.
Shorthand Character Classes, like \w, \d, and \s, represent the ranges representing word characters, digit characters, and whitespace characters, respectively.
Groupings are denoted with parenthesis, ( ), and group parts of a regular expression together. They allow us to limit alternation to part of a regex.
Fixed Quantifiers are represented by curly braces, { }, and let us indicate an exact quantity or range of quantity of a character we wish to match.
Optimal Quantifiers are indicated by the question mark, ?, and allow us to indicate a character in regex that is optional, or can appear either 0 or 1 time.
Kleene Star is denoted by an asterisk, *, and is a quantifier that matches the preceding character 0 or more times.
Kleene Plus is denoted by the plus sign, +, and matches the preceding character 1 or more times.
The Anchor symbols, hat (^) and dollar sign ($) are used to match text at the start and end of a string, respectively.
Glob can open multiple files by using regex matching to get the file names.
1. import glob
2. files = glob.glob(“file*.csv”)
3. df_list = [ ]
4. for filename in files:
  1. data = pd.read_csv(filename)
  2. df_list.append(data)
5. df = pd.concat(df_list)

Case Muller https://muller-industries.com/

April 8, 2020

REGULAR EXPRESSIONS

April 10, 2020

April 7, 2020