Newsletter Subject

12 DAYS OF LINUX TECH TIPS #6

From

linuxjournal.com

Email Address

subs@linuxjournal.com

Sent On

Wed, Dec 12, 2018 12:33 PM

Email Preheader Text

12 Days of Tech Tips #6 12 DAYS OF LINUX TECH TIPS FOR YOU THIS HOLIDAY SEASON! JOIN US TODAY FOR TE

Desktop View
HTML
Text
Mobile View

Go Premium to Unlock

Subscribe Now

12 Days of Tech Tips #6 [view in browser Â»]( 12 DAYS OF LINUX TECH TIPS FOR YOU THIS HOLIDAY SEASON! JOIN US TODAY FOR TECH TIP #6 [Terminal]( Globbing and Regex: So Similar, So Different Shawn Powers Grepping is awesome, as long as you don't glob it up! This article covers some grep and regex basics. There are generally two types of coffee drinkers. The first type buys a can of pre-ground beans and uses the included scoop to make their automatic drip coffee in the morning. The second type picks single-origin beans from various parts of the world, accepts only beans that have been roasted within the past week and grinds those beans with a conical burr grinder moments before brewing in any number of complicated methods. Text searching is a bit like that. For most things on the command line, people think of *.* or *.txt and are happy to use file globbing to select the files they want. When it comes to grepping a log file, however, you need to get a little fancier. The confusing part is when the syntax of globbing and regex overlap. Thankfully, it's not hard to figure out when to use which construct. Globbing The command shell uses globbing for filename completion. If you type something like ls *.txt, you'll get a list of all the files that end in .txt in the current directory. If you do ls R*.txt, you'll get all the files that start with capital R and have the .txt extension. The asterisk is a wild card that lets you quickly filter which files you mean. You also can use a question mark in globbing if you want to specify a single character. So, typing ls read??.txt will list readme.txt, but not read.txt. That's different from ls read*.txt, which will match both readme.txt and read.txt, because the asterisk means "zero or more characters" in the file glob. Here's the easy way to remember if you're using globbing (which is very simple) vs. regular expressions: globbing is done to filenames by the shell, and regex is used for searching text. The only frustrating exception to this is that sometimes the shell is too smart and conveniently does globbing when you don't want it toâfor example: grep file* README.TXT In most cases, this will search the file README.TXT looking for the regular expression file*, which is what you normally want. But if there happens to be a file in the current folder that matches the file* glob (let's say filename.txt), the shell will assume you meant to pass that to grep, and so grep actually will see: grep filename.txt README.TXT Gee, thank you so much Mr. Shell, but that's not what I wanted to do. For that reason, I recommend always using quotation marks when using grep. 99% of the time you won't get an accidental glob match, but that 1% can be infuriating. So when using grep, this is much safer: grep "file*" README.TXT Because even if there is a filename.txt, the shell won't substitute it automatically. So, globs are for filenames, and regex is for searching text. That's the first thing to understand. The next thing is to realize that similar syntax means different things. Glob and Regex Conflicts I don't want this article to become a super in-depth piece on regex; rather, I want you to understand simple regex, especially as it conflicts with blobbing. Table 1 shows a few of the most commonly confused symbols and what they mean in each case. Table 1. Commonly Used Symbols Special Character Meaning in Globs Meaning in Regex * zero or more characters zero or more of the character it follows ? single occurrence of any character zero or one of the character it follows but not more than 1 . literal "." character any single character To add insult to injury, you might be thinking about globs when you use grep, but just because you get the expected results doesn't mean you got the results for the correct reason. Let me try to explain. Here is a text file called filename.doc: The fast dog is fast. The faster dogs are faster. A sick dog should see a dogdoc. This file is filename.doc If you type: grep "fast*" filename.doc The first two lines will match. Whether you're thinking globs or regex, that makes sense. But if you type: grep "dogs*" filename.doc The first three lines will match, but if you're thinking in globs, that doesn't make sense. Since grep uses regular expressions (regex) when searching files, the asterisk means "zero or more occurrences of the previous character", so in the second example, it matches dog and dogs, because having zero "s" characters matches the regex. And let's say you typed this: grep "*.doc" filename.doc This will match the last two lines. The asterisk doesn't actually do anything in this command, because it's not following any character. The dot in regex means "any character", so it will match the ".doc", but it also will match "gdoc" in "dogdoc", so both lines match. The moral of the story is that grep never uses globbing. The only exception is when the shell does globbing before passing the command on to grep, which is why it's always a good idea to use quotation marks around the regular expression you are trying to grep for. Use fgrep to Avoid Regex If you don't want the power of regex, it can be very frustrating. This is especially true if you're actually looking for some of the special characters in a bunch of text. You can use the fgrep command (or grep -F, which is the same thing) in order to skip any regex substitutions. Using fgrep, you'll search for exactly what you type, even if they are special characters. Here is a text file called file.txt: I really hate regex. All those stupid $, {}, and \ stuff ticks me off. Why can't text be text? If you try to use regular grep like this: grep "$," file.txt you'll get no results. That's because the "$" is a special character (more on that in a bit). If you'd like to grep for special characters without escaping them, or knowing the regex code to get what you want, this will work fine: grep -F "$," file.txt And, grep will return the second line of the text file because it matches the literal characters. It's possible to build a regex query to search for special characters, but it can become complicated quickly. Plus, fgrep is much, much faster on a large text file. Some Simple, Useful Regex Okay, now that you know when to use globbing and when to use regular expressions, let's look at a bit of regex that can make grepping much more useful. I find myself using the caret and dollar sign symbols in regex fairly often. Caret means "at the beginning of the line", and dollar sign means "at the end of the line". I used to mix them up, so my silly method to remember is that a farmer has to plant carrots at the beginning of the season in order to sell them for dollars at the end of the season. It's silly, but it works for me! Here's a sample text file named file.txt: chickens eat corn corn rarely eats chickens people eat chickens and corn chickens rarely eat people If you were to type: grep "chickens" file.txt you will get all four lines returned, because "chickens" is in each line. But if you add some regex to the mix: grep "^chickens" file.txt you'll get both the first and fourth line returned, because the word "chickens" is at the beginning of those lines. If you type: grep "corn$" file.txt you will see the first and third lines, because they both end with "corn". However, if you type: grep "^chickens.*corn$" file.txt you will get only the first line, because it is the only one that begins with chickens and ends with corn. This example might look confusing, but there are three regular expressions that build the search. Let's look at each of them. First, ^chickens means the line must start with chickens. Second, .* means zero or more of any character, because remember, the dot means any character, and the asterisk means zero or more of the previous character. Third, corn$ means the line must end with corn. When you're building regular expressions, you just mush them all together like that in a long string. It can become confusing, but if you break down each piece, it makes sense. In order for the entire regular expression to match, all of the pieces must match. That's why only the first line matches the example regex statement. A handful of other common regex characters are useful when grepping text files. Remember just to mush them together to form the entire regular expression: - \ â the backslash negates the "special-ness" of special characters, which means you actually can search for them with regex. For example, \$ will search for the $ character, instead of looking for the end of a line. - \s â this construct means "whitespace", which can be a space or spaces, tabs or newline characters. To find the word pickle surrounded by whitespace, you could search for \spickle\s, and that will find "pickle" but not "pickles". - .* â this is really just a specific use of the asterisk, but it's a very common combination, so I mention it here. It basically means "zero or more of any characters", which is what was used in the corn/chicken example above. - | â this means "or" in regex. So hi|hello will match either "hi" or "hello". It's often used in parentheses to separate it from other parts of the regular expression. For example, (F|f)rankfurter will search for the word frankfurter, whether or not it's capitalized. - [] â brackets are another way to specify "or" options, but they support ranges. So the regex [Ff]rankfurter is the same as the above example. Brackets support ranges though, so ^[A-Z] will match any line that starts with a capital letter. It also supports numbers, so [0-9]$ will match any line that ends in a digit. Your Mission You can do far more complicated things with regular expressions. These basic building blocks are usually enough to get the sort of text you need out of a log file. If you want to learn more, by all means, either do some googling on regex, or get a book explaining all the nitty-gritty goodness. If you want me to write more about it, send a note to ljeditor@linuxjournal.com and let me know. I really, really encourage you to practice using regex. The best way to learn is to do, so make a few text files and see if the regex statements you create give you the results you expect. Thankfully, grep highlights the "match" it finds in the line it returns. That means if you're getting more results than you expect, you'll see why the regex matched more than you expected, because grep will show you. The most important thing to remember is that grep doesn't do globbingâthat wild-card stuff is for filenames on the shell only. Even if globbing with grep seems to work, it's probably just coincidence (look back at the dog/dogs example here if you don't know what I'm talking about). Have fun grepping! [read more articles]( Subscribe to Linux Journal for only $2.88 an issue! [Subscribe Now Â»]( [Renew now Â»]( You are receiving this email because you signed up to receive newsletters from Linux Journal. You can [update your preferences]( or [unsubscribe]( from this list. Copyright Â© 2018 Linux Journal LLC, All rights reserved. Mailing address: 9597 Jones Rd, #331 Houston, TX 77065

Edit & Download HTML

Add To Favourites

EDM Keywords (183)

zero write works work whitespace wanted want using uses useful used use update unsubscribe understand typed txt trying try together time thinking things thing text talking syntax super substitute stupid story starts start spickle specify space sort sometimes smart skip similar silly signed show shell separate send sell select see season search say returns return results renew remember regex receiving reason really realize probably preferences power possible piece pickles passing pass parts parentheses order options one occurrences number note need mush morning moral mix mission might mention meant means mean matches match make looking look long list lines line like lets let learn knowing know injury infuriating hello hard happy happens handful grinds grepping grep got googling globs globbing glob getting get frustrating form follows following first finds find files filenames file figure faster fast farmer far explain expected expect exception example exactly even ends end email dot done dollars dogs dogdoc doc digit different corn conveniently conflicts command comes chickens characters character cases caret bunch build brewing break bit begins beginning become beans awesome automatically asterisk assume article anything always also add actually

linuxjournal.com

Linux Journal

Follow domain to get weekly email update

Marketing emails from linuxjournal.com

Sent On

08/08/2019

Sent On

05/08/2019

Sent On

29/07/2019

Sent On

22/07/2019

Sent On

15/07/2019

Sent On

08/07/2019

Email Content Statistics

Subscribe Now

Subject Line Length

Data shows that subject lines with 6 to 10 words generated 21 percent higher open rate.

Subscribe Now

Average in this category

Subscribe Now

Number of Words

The more words in the content, the more time the user will need to spend reading. Get straight to the point with catchy short phrases and interesting photos and graphics.

Subscribe Now

Average in this category

Subscribe Now

Number of Images

More images or large images might cause the email to load slower. Aim for a balance of words and images.

Subscribe Now

Average in this category

Subscribe Now

Time to Read

Longer reading time requires more attention and patience from users. Aim for short phrases and catchy keywords.

Subscribe Now

Average in this category

Subscribe Now

Predicted open rate

Subscribe Now

Spam Score

Spam score is determined by a large number of checks performed on the content of the email. For the best delivery results, it is advised to lower your spam score as much as possible.

Subscribe Now

Flesch reading score

Flesch reading score measures how complex a text is. The lower the score, the more difficult the text is to read. The Flesch readability score uses the average length of your sentences (measured by the number of words) and the average number of syllables per word in an equation to calculate the reading ease. Text with a very high Flesch reading ease score (about 100) is straightforward and easy to read, with short sentences and no words of more than two syllables. Usually, a reading ease score of 60-70 is considered acceptable/normal for web copy.

Subscribe Now

Technologies

What powers this email? Every email we receive is parsed to determine the sending ESP and any additional email technologies used.

Subscribe Now

Email Size (not include images)

No.	Font Name
Subscribe Now

12 DAYS OF LINUX TECH TIPS #6

Email Preheader Text

EDM Keywords (183)

linuxjournal.com

Marketing emails from linuxjournal.com

Email Content Statistics

Font Used