Regex (Regular Expression) is a kind of algorithm that allows us to easily find a phrase in texts. It allows us to find the phrase we want to find in long sentences with the patterns we will use. We use Regular Expression in the programming world to be able to do what we think of in a short time without a crowd of code. Especially those dealing with log and dlp work have to use these expressions to reach the values they will parse. You can use this structure in almost all modern programming languages. With the examples we will do, you can visualize the regex patterns that you can use in your next projects. Now let’s try to understand this issue with our examples. Below is a visual version of the command for us to better understand the logic of Regex. If you want a command to appear in the form of a map like this, you can see it by typing it on https://regex101.com.
HOW REGEX WORKS IN PYTHON?
Regex commands are generally found in the “re” module in python. You can get the commands from this library by saying “import re”. The Re module offers us several functions. The purpose of these functions is to search for the string or character we want. Then he accesses it. You can see examples of functions in the Re module below.
- Findall()
- Search()
- Split()
- Sub()
Findall() Function
The findall () function returns a list with all its matches. The list contains matches in the order in which they are found. If no match is found, an empty list is returned.
Search() Function
The Search () function searches for a string to match, and if it matches, it returns the object it matches. If there is more than one match, it returns only the first one found. The sample code looking for the first space in the string is as follows. If no match is found, the value returns None. So it will return empty.
Split() Function
The Split () function array returns a split list every time it matches. Thanks to this list, we can make our writing or printing operations easier. Below you can find the code example for splitting each space character.
Maxsplit() Function
You can check the number of events that occur using the maxsplit function. Below you can find the code sample that separates it according to the desired state.
Sub() Function
The sub () function allows you to print the character or text you want instead of matching it. Below is the code sample replacing each space character with 9.
Match Object
It gives us information about the search and the result. Sometimes it also explains when there is no matching result. If there is no match, it returns the value instead of the match object. You can find an example below.
Flags of the Regex Library
This structure, which we call flag or options, means the setting of our regex patterns. Although the picture below is similar for each programming language, the differences are also quite present. Therefore, check the options of the regex library of the platform you are using.
“Global” Flag
When not used, it returns only the first found result. It never returns other values. If we don’t use this Flag, we don’t need to use the array structure. Because only the zeroth index will have a value. You can see an example of this in the picture below.
“Unicode” Flag
It helps us solve the Turkish character shortage. It will automatically recognize characters such as Ç, Ş, Ğ, Ö, even if we do not write our pattern. It is an important flag. We can only use this flag with the “\ w” pattern.
Regex Meta Characters
You can try the functions of the following characters by using google’s https://regex101.com site. You can shape it according to your own wishes by using more than one character.
[abc] Meta Character
Matches the letters a, b, and c enclosed in parentheses. You can type any letter or number you want here. When we examine the example below, we see that the regex pattern finds results from all of the text. Assuming we keep them in array format;
array [0] = b
array [1] = c
will continue in the form.
[^abc] Meta Character
Whichever letters or numbers are written in the parentheses will match any other letters or numbers. When we examine the following regex example, we see that it takes all the characters except the character a, b or c. Considering that we store this data in the array structure as in the first example; Index 0 will give the result B, index 1 will give the result u.
[a-z] Meta Character
Retrieves all characters that include and between characters in parentheses. Here it takes the letters from a to z. It does not take uppercase letters, Turkish characters and numbers as shown in the example below.
[a-zA-Z] Meta Character
It allows us to take all characters except Turkish characters from small a to big A.
“.” Meta Character
Allows us to take all characters (including spaces) except Newline. When we examine the example below, we see that he selected all the characters separately. The dot (.) Character points to us all characters.
“\s” Meta Character
Indicates the space or tab character. This pattern can come to mind from the initials of the word space. As you can see in the example below, it has marked all the space characters separately.
“\S” Meta Character
Indicates all characters except spaces or tab characters. The example below gives the opposite result. For example, we can use this when we want to choose the sentence word by word.
“+” Meta Character
Indicates a situation where the expression to the left is at least one or more. In the example below, we see that he has selected each word individually. In this case, the \ S character marks every character except space and tab. When we use it with the + (plus) operator, he chose that sentence until he saw the space character and gave us the chance to choose a word.
“\d” Meta Character
You can actually guess. The meaning of the letter d means digit. When we use this operator, we choose the number character. In the example, he has selected each number character separately.
“+” Meta Character
Indicates a situation where the expression to the left is at least one or more. In the example below, we see that he has selected each word individually. In this case, the \ S character marks every character except space and tab. When we use it with the + (plus) operator, he chose that sentence until he saw the space character and gave us the chance to choose a word.
Now we will use the (/d) operator, which chooses the number character, together with the plus (+) operator. Yes, when we look at our example below, we see that you have selected the number clauses separately.
\d+
“\D” Meta Character
If you want to do the opposite of this situation, we’ll use the \D clause. So the \D pattern points to characters other than numbers. I will not do an example of this.
\D
\D+
“\w” Meta Character
Returns all numbers and letters, meaning the same as [a-zA-Z0-9_]. The only difference is that \ w also takes Turkish characters. [a-zA-Z0-9_] does not receive Turkish characters. It does not take characters other than numbers and numbers as shown in the picture. Unicode Flag only works on this pattern.
\w
“\W” Meta Character
It yields all characters except letters and numbers. (“\ W” is the opposite).
\W
“\v” Meta Character
It gives new rows and vertical tabs. It works with Unicode. You can add vertical tabs in some word processors using CMD/CTRL+ENTER. It is not a very common command.
\v
“\ddd” Meta Character
Equates eight-bit characters with the assigned octal values and returns them to us. Type the code of the character you want to look at the table and get it instead of ddd. You can get help from the “Octal Character Table” here. You can reach the limit below. https://www.utf8-chartable.de/unicode-utf8-table.pl?utf8=oct
\056
“[\b]” Meta Karakter
This operator signals that the sentence ends with the statement to the left. You can examine the example below.
[a-z]+r\b
“\” Meta Character
This character allows us to get the true value of a metacharacter or delimiter.
\.
“(a|b)” Meta Character
It matches part a or b of the subexpressions and shows us there. Its use is not very common.
(a|b)
“?” Meta Character
Indicates whether the character preceding this operator or not. You can see this in the example below.
sa?
“(? #…)” Meta Character
Any text appearing in this group can be ignored in the regex. Another option is to allow the x flag to # comments. This flag also causes the regex to ignore whitespace.
(?#...)
“(?…)” Meta Character
This structure (…) is very similar to the structure but it doesn’t give us anything as it does. Cannot be used in the same mission.
(?:al)
“(?P<name>…)” Meta Character
With this command, we can capture the capture group by using the name given instead of a number. Alternative methods are (? <name>…) and (? ‘Name’…). You can also use these methods while using PCRE.
(?P<name>Ömer)
“(?imsxXU)” Meta Character
This statement allows the regex flags to be set inside the expression. You can also set flags using a minus sign. (?-i)
“(?(1)yes|no)” Meta Character
This command tries to match the first left of the capture group. If it doesn’t match the left one, it matches the right one. Usually used.
(a short)?(?(1) a crowd|of code)
“(?P=name)” Meta Character
This command is a command specific to python. Captures text that matches a predefined capture group. It can be very useful when combined with other commands. In the picture below, it is combined with other commands.
(?P<named_group>systemconf.com)[a-z ]+(?P=named_group)
“(? =…)” Meta Character
Although the command is like this, its usage is usually (…(?=…)). This command tells you that the given subpath can be matched without using a character. It can be in the form of sample usage (system(?=conf)).
(system(?=conf))
“(?!…)” Meta Character
Makes the given pattern mismatch, starting from the current position in the expression. It does not consume character. We can also say the opposite of the (? =…) command.
“(?<=…)” Meta Character
Although the command is like this, it is usually used in the form (?<=system)conf. Returns the place in the expression that ends in the current position of the specified pattern with this method. The pattern should have a fixed width. It does not consume any characters.
(?<=system)conf
“(?<!…)” Meta Character
Ensures that the specified pattern matches that end at the current position in the expression. The pattern should have a fixed width. Here, too, the general usage can be used like “(?<!not)conf“.
(?<!not)conf
“a?” Meta Character
It matches or does not match any character you typed in place of a.
“a+” Meta Character
Matches any character we type in place of a with consecutive characters typed one or more times.
“a{3}” Meta Character
It tells you how many consecutive characters are left of the parentheses. The number in parentheses tells us how many consecutive characters there are.
“a{3,}” Meta Character
This pattern is actually similar to the 4.29 pattern, but here the number you write in the brackets job indicates at least that much. Here, as an example, it allows us to take 3 and more than 3 consecutive written a’s.
The characters used at the beginning and the end of the template have nothing to do with the pattern here. The system perceives it like this. If you fill in after the comma, it returns the character that has repeated so often, including the numbers you filled.
“^“ Meta Character
Matches the beginning of the string without consuming any characters. Matches after newline characters in multi-line text or structs. It does the same thing as the (\A) command.
“$” Meta Character
Matches the end of the string without consuming any characters. Matches after newline characters in multi-line structures. In short, it gives the end of the line. It does the same thing as the (\Z) command.
“g” Meta Character
It tells the engine not to stop after the first match has been found, but to continue until it finds no more matches.
“m” Meta Character
This command is actually like a combination of two commands. A combination of the newline (^) and line ending ($) commands. It makes sure that each line in turn matches from the beginning to the end. It allows us to get the whole line.
“\0” Meta Character
Returns a string containing the exact match result of Regex.
“$1” Meta Character
This command returns a string with the content in the first capture group. In this case, the number 1 can be any number as long as it corresponds to a valid capture group.
“\t” Meta Character
This command adds one tab character.
“\x20” Meta Character
You can use hexadecimal elements to add any character to the replacement string using standard syntax.
“*” Meta Character
This operator, on the other hand, does not have the statement to the left, but if there is, it allows it to select all of them if more than one is coming together. We will be able to understand this better with an example. In the example below, the character s is strictly, and the character can or may not be.
Note: The difference between the + character and the + character must be the character to the left of it at least once.
Regex Examples
Phone Number Format Example
You can return a phone number such as “5xx-xxx-xx-xx” with the following command.
“5[0-9]{2}-[0-9]{3}-[0-9]{2}-[0-9]{2}”
Serial Number Format Example
You can return a serial number in the form of “b34f12345” with the command below.
(?i)[a-z]{1}[0-9]{2}[a-z]{1}[0-9]{5}
Blood Type Format Example
You can return a blood type sample in the form of “Blood type: a rh +” with the command below. You can return all blood group samples with this command.
\b(?i)(blood)(\s)(type){1}(\s|\:\s|\-|\-\s)(A|B|AB|B|0)(| )(R|r)(H|h)(| )(\+|-)