Showing posts with label bash. Show all posts
Showing posts with label bash. Show all posts

Tuesday, August 25, 2015

xargs: Praise the Lord and Pass the Argument!

One interesting command in UNIX is the little xargs command. It takes something from standard input and passes it as an argument to another command. For example, if a command expects a user name, you can get that name from somewhere and feed it to the command through xargs:
whoami | xargs passwd
The silly example above is also superfluous—passwd can accept a user name through standard input directly, so xargs isn’t needed here. And of course, passwd with no arguments would change the current user’s password anyway. But it is a simple way to illustrate what xargs does. Some commands don’t look for a required argument in the standard input stream, so they won’t read directly from the pipe. xargs takes whatever is in the pipe and sends it as an argument to the command that follows; in this case, passwd.
I personally use xargs in my .xinitrc file to set a random background wallpaper on my desktop when I start X:
find ~/pictures/wallpapers -type f | sort -R | tail -1 | xargs feh --bg-fill
The find command locates all the files in my wallpapers directory. The sort command puts the file names in random order and the tail command grabs only the last one in the list. xargs then passes that to feh, an image display utility, which sets the randomly selected file as my wallpaper. And so X greets me with a surprise every time I go GUI.
Be seeing you.

Monday, August 11, 2014

Express Yourself, Regularly

The Basics of Regular Expressions

Introduction

Unix administrators have long used regular expressions to help them locate files, modify data, and manage system configurations. Tools like grep and sed are designed to process regular expressions to provide the administrator with exactly the information he wants. While versions of these tools have been ported to Windows, most Windows administrators are unaware that they exist. Because of the limitations of the Windows command shell, Windows administrators typically stick with slower, more complicated graphical tools to manage the operating system.
Enter PowerShell. PowerShell is designed to be a replacement for the standard Windows shell, and it is far more powerful and flexible than its predecessor. Among the many command enhancements PowerShell offers is built-in support for regular expressions. It borrows this capability heavily from Perl, a scripting language that was developed specifically for processing text.
Regular expressions are used to search for character sequences inside text strings or files. Programs that process regular expressions look for text that matches a given pattern. The components of a regular expression are not complicated, but the available combinations are many and varied, making it possible to perform some very sophisticated matches. Whether you’re administering Windows, Linux, or the Unix-based Mac OS X, you should invest some time learning the cryptic syntax of regular expressions so that you can manage systems and automate common tasks.
This tutorial will introduce regular expressions. It is not aimed at a particular operating system. Students of both Linux and PowerShell will come away with a basic knowledge of how regular expressions work and how to craft their own. Specific tools such as Linux’s grep command and PowerShell’s -match operator are covered in those respective classes at Centriq Training.

Overview

Whenever we begin to learn a new technology we get excited about the possibilities. We catch a glimpse of all kinds of nifty things that we can do with this knowledge. We often forget, though, that each technology has its limitations. Regular expressions are cool and powerful and flexible and a lot of other things, but there are some things that they’re not—some things that they cannot do. Like all things new, regular expressions come with a learning curve that is best overcome with practice. To avoid getting frustrated as you begin to learn regular expressions you must always keep three rules in mind.

Rule 1. Regular expressions match text, not numbers.

Regular expressions can represent any sequence of characters that you can find on a typical keyboard, and even some that you can’t, but they can’t express any other kind of data. They don’t understand numbers.
This confuses people at first, because regular expressions are frequently employed to determine things like whether a user’s input is a numeric age or zip code or year. But in these cases the regular expression is used to test the characters and ensure that they are numerals. The quantities that these numerals stand for are completely lost on the regular expression: it can’t identify a numeral’s value.
Remember, even numbers are written with characters. Regular expressions can be used to recognize those characters, but they can’t be used to determine their values, so they don’t work with actual numbers.

Rule 2. Regular expressions are made from three components: characters, anchors, and modifiers.

Every regular expression must have at least one character. This provides the basis for any match that will be performed. Anchors may be used to establish that the characters belong in certain positions in the text. Modifiers may be used to match repeated instances of a character or change a character’s meaning.
Characters are pretty simple, but the anchors and modifiers are what make regular expressions so powerful—and difficult to read. No matter how complex the expression, though, it always begins with at least one character.

Rule 3. Each program is a little different.

There are many programs and programming languages that can process regular expressions. While there is a standard definition of characters, anchors, and modifiers individual programs have sometimes extended and customized their definition of a regular expression. Generally, this is done to make expressions easier for us humans to use, but it can lead to confusion for the student who is just learning the syntax. In this tutorial I’m going to stick mostly with standard syntax, but at the end I’ll provide specific examples with grep and PowerShell.
So if you run across a regular expression that looks unusual or that doesn’t work in your specific tool, that doesn’t necessarily mean that it’s wrong. Each program is a little different.

A First Look at Characters

The group of characters that make up a regular expression’s search string is called a pattern. Patterns can be very simple. For example, d is a valid pattern. It matches any string of text that contains a “d” character, like “Dave”, “dude”, or “all work and no play makes Jack a dull boy”. Whether the matching is case-sensitive depends on the processor that’s performing the match. Remember Rule 3: each program is a little different.
When it comes to matching characters, you should know that most regular expression processors are going to match only the first instance that they find. In that last string, the expression matches the “d” in “and”, but not the one in “dull.” Once a program finds the first match, it usually stops processing the string altogether.
Characters can be combined to make whole words or phrases. The pattern error will match any string that contains the word “error” in any form. This includes “errors”, as well as other words like “terror”. Remember that the pattern you’re searching for is just a string of characters. If your pattern includes spaces or other special characters, you usually have to enclose it in quotation marks.
Sometimes we don’t want to match a particular character, just any character. For this we use a wildcard, which in regular expression syntax is a period. The pattern d.d would match “dad”, “dude”, and “katydid”, because there must be exactly one character between the two d’s. The pattern d...d will match “David”, “domed”, “android”, and even “my good buddy Steve” because in each case there are d’s separated by three characters. Note that in the last example those three characters include a space. That’s okay: a wildcard matches any character—even numerals, spaces, and punctuation marks.

Anchors Put It Where You Want It

An anchor is a special character that ties a part of your pattern to the beginning or end of a text string. The caret symbol, ^, can be created by pressing Shift-6 on most US keyboards. It anchors a pattern to the beginning of a search string. The pattern ^car will match “caret”, but not “lascar”. The caret symbol at the beginning of the pattern tells a regular expression processor that no character may precede the pattern.
The dollar symbol, $, can be created by pressing Shift-4 on most US keyboards. It anchors the pattern to the end of a text string. The pattern ave$ will match “Dave” but not “avenue”. The dollar symbol at the end of the pattern tells the regular expression processor that no character may follow the pattern.
A pattern that includes both anchors can be used to search for an exact match of the pattern to the text string. The pattern ^error$ will exactly match the string “error”, but not “terror” or “errors”. It will not match a string that contains the word “error” among other things, like “error in module fuse.ko”—it is always an exact match.
Of course, you can combine wildcards with anchors. The pattern ^d...d$ will match “David” and “dared”, but not “android” or “dreaded”. And if you want to match on a blank line, you can use the pattern ^$.

Expanding a Pattern With Modifiers

The Great Escape

You’ve learned that, in regular expression syntax, a period is a wildcard. But sometimes we want to search for a literal period. Because the period stands for any character, the pattern ^169.254. would match the IP address “169.254.14.2”, but it would also match the string “1698254a”, which is not what we’re looking for. What we want to do is modify the period and change its meaning.
The backslash character, \, is a special modifier called the “escape character”. It changes the meaning of the character that immediately follows it, “escaping” from the normal interpretation of the pattern. When it precedes a period, the backslash takes away the period’s meaning as a wildcard so that it becomes a normal period. So the pattern ^169\.254\. will match “169.254.14.2” but not “1698254a”. Since the periods have been “escaped”, there are no wildcards in this pattern. Likewise, the pattern ^\$ will look for a string that begins with a dollar sign, and ^4\^2$ will match a string that contains exactly “4^2”.

Time and Time Again

It is often necessary to look for a repeating number of characters in a text string. Multipliers allow you to extend your expression to include repetition in your searches. They are special characters that follow some other character in a pattern. They multiply the character that appears immediately before them by some value. There are several multipliers, so you’ll need to commit them to memory.
The question mark, ?, can be produced on most US keyboards by pressing Shift-/. It multiplies the preceding character by zero or one. The pattern ^d.?d$ will match both “dd” and “did”, requiring zero or one characters between the d’s.
The asterisk, *, can be produced on most US keyboards by pressing Shift-8. The asterisk multiplies the preceding character by zero or more. The pattern ^d.*d$ will match “dd”, “did”, “dreamed”, and “drumming in your head”, because it allows any number of characters to exist between the d’s.
The plus symbol, +, can be produced on most US keyboards by pressing Shift-=. It multiplies the preceding character by one or more. The pattern ^d.+d$ matches “did”, “dreamed”, and “drumming in your head”, but it does not match “dd”. The plus symbol requires at least one character to appear between the d’s.
Advanced multiplication is not supported by all regular expression processors, and not consistently among those that do. Remember, each program is a little different. However, because many programs and programming languages support it to some extent you should get to know it and its variations.
Advanced multipliers contain values inside curly braces, the { and } characters. These can be produced on most US keyboards by pressing Shift-[ and Shift-] respectively.
Placing a single value in the braces multiplies the preceding character by exactly that number. The pattern ^d.{3}d$ matches “David” and “druid”, requiring exactly three characters between the d’s. Note that this pattern could have been written ^d...d$, but as we learn about more characters we’ll see that the advanced multiplier can be much easier to read.
Enclosing two values separated by a comma within the braces, we get a specific range of multipliers. The pattern ^d.{3,5}d$ multiplies the wildcard by three, four, and five. It matches “druid”, “darned”, and “dreamed”, requiring three to five characters between the d’s.
If the braces contain a single number followed by a comma, the range of multipliers has no upper limit. The pattern ^d.{3,}d$ matches any string with three or more characters between the d’s.

Advanced Characters

There’s more to characters than literal symbols and wildcards. While the advanced characters may not look like single characters to you, to the regular expression processor they are indeed just a character. Sometimes these are called “meta-characters”, because they are a group of symbols that stand for a single character. Combined with multipliers, these meta-characters make it possible to create sophisticated searches.
Square brackets allow you to specify one from a group of characters that you want to match. The pattern ^d[aiu]d$ represents three characters. The symbols between the brackets are applied in turn to the search, so that this pattern matches “dad”, “did”, and “dud”. The pattern requires that there must be nothing but an a, i, or u between the d’s.
If you want to search for a string that begins with a vowel you can use the pattern ^[aeiou]. You can negate the pattern with a caret symbol inside the brackets. The pattern ^[^aeiou] matches all strings that begin with any character that is not a vowel. Don’t let the use of the caret confuse you. At the beginning of a pattern the caret is an anchor. Inside square brackets it reverses the meaning of the group. This could be read as “not a, e, i, o, or u”, so inside brackets the caret symbol means “not”.
The brackets can also contain a range, two values separated by a hyphen. The pattern [0-9] represents all numerals. The pattern ^169\.254\.[0-9]{1,3}\.[0-9]{1,3}$ uses escape sequences, ranges, and multipliers to match IP addresses that begin with “169.254.”.
Since a range of characters is just a character in regular expression syntax, ranges can be grouped. The pattern ^[0-9a-z] will match all strings that begin with a numeral or a lowercase letter. Many shell scripts use the pound sign at the beginning of a line to identify comments. The pattern ^[#a-zA-Z] matches all the lines in a script that begin with a pound sign or a letter.
Many common character groups have special classes defined for them. The range [0-9] can also be written using the class [:digit:]. Other classes include [:alpha:] for all letters, [:lower:] for lowercase letters, [:upper:] for uppercase letters, [:alnum:] for letters and numbers, [:space:] for white space characters like space and tab, [:cntrl:] for non-printable control characters, and [:xdigit:] for characters used to represent hexadecimal numbers, equivalent to [0-9a-fA-F].
Some advanced regular expression processors such as those found in Perl and PowerShell can also use escape sequences to represent character classes. Some common ones are: \d to represent any digit; \w to represent any word character such as letters, numbers, and some punctuation; and \s to represent white space characters. A capital letter in the escape sequence negates it, so \D represents any character that is not a digit, and \S represents any character that is not white space.
Parentheses, ( and ), can be produced on most US keyboards by pressing Shift-9 and Shift-0 respectively. They can be used to combine multiple patterns together. The pipe symbol, |, is produced by pressing Shift-\ on most US keyboards. It ties together multiple patterns within the parentheses using “or” logic. The expression (^[:digit:]{5}$ | ^[:digit:]{5}-[:digit:]{4}$) contains two patterns, and will process a string until one pattern or the other matches some text. This regular expression matches a US zip code written in either the five-digit or five-plus-four-digit format, such as “02134” and “64119-4105”.
Finally come the angle brackets, < and >, which can be produced on most US keyboards by pressing Shift-, and Shift-. respectively. These identify word boundaries, so whatever is enclosed within is considered to be a whole word. The pattern < error > matches “error”, but not “errors” or “terror”. The angle brackets require that only white space or punctuation may appear on either side of the enclosed pattern.
Please note that many regular expression processors will require that curly braces and angle brackets be preceded by a backslash to escape them, otherwise they are treated as literal brackets. You may have to experiment or read your program’s documentation to determine what it will support.

Summary

Regular expressions consist of character patterns that are matched against text strings. Each pattern must contain at least one character, but its matching capabilities can be enhanced with anchors, modifiers, and advanced meta-characters.
Regular expression patterns can be written to match almost any kind of text, but they don’t assign any meaning to that text. A regular expression recognizes no numeric values, it doesn’t understand what to do with punctuation marks, and it’s limited to matching on one line of a file at a time. All that the pattern can represent is text characters.
Regular expression processors are programs and programming language constructs that use patterns to find and work with text. Some of these have very advanced capabilities, such as extending a search to include multiple lines of a file or performing pattern matches backward as well as forward on the text. The beginner will need to practice with each of the regular expression tools that he intends to use to gain an understanding of its features, but the fundamental concepts covered in this tutorial will be applicable.
Regular expressions provide the administrator with tools to search for any kinds of text within files. They can be added to scripts to check for patterns within user input. They are often used to identify important information from log files, email servers, and web sites. Any process that works with text can be improved by the judicious use of regular expressions.

PowerShell Examples

When the PowerShell -match operator finds the pattern in a string it returns “True”. It returns “False” if the pattern is not found.
PS C:\> $value = "This is a test. 1234"
PS C:\> $value -match "a.t"
True
PS C:\> $value -match "^t"
True
Note that PowerShell is not case-sensitive.
PS C:\> $value -match "\st\w{3}"
True
PS C:\> $value -match "\d$"
True
PS C:\> $value -match "t[aeiou]s"
True
PS C:\> $value -match "t[^aeiou]s"
False
PS C:\> $value -match "\w+\. \d+$"
True
PS C:\> $value -match "\w+\. \d?$"
False
PS C:\> $value -match "(tisk | test)"
True

Grep Examples

The grep command is intended to work with files, so these examples pass a test string to the command through standard input. The command returns the string that matches the pattern, or null if there is no match.
$ value="This is a test. 1234"
$ echo $value | egrep a.t
This is a test. 1234
$ echo $value | egrep ^t
$ echo $value | egrep -i ^t
This is a test. 1234
Note that grep is case-sensitive. Use the -i parameter switch to enable case-insensitive matches.
$ echo $value | egrep [[:space:]]t[[:alpha:]]\{3\}
This is a test. 1234
With grep, character classes must be quoted or contained within square brackets. The double-bracket form is the most common. Note that curly braces must be escaped.
$ echo $value | egrep [[:digit:]]$
This is a test. 1234
$ echo $value | egrep t[aeiou]s
This is a test. 1234
$ echo $value | egrep t[^aeiou]s
$ echo $value | egrep '[[:alpha:]]+\. [[:digit:]]+$'
This is a test. 1234
$ echo $value | egrep '[[:alpha:]]+\. [[:digit:]]?$'
$ echo $value | egrep '(tisk | test)'
This is a test. 1234
$ echo $value | egrep '\'
This is a test. 1234

Wednesday, October 7, 2009

it's the law (1 of 2)

ritter's first law of network administration: an administrator at rest tends to stay at rest.

an administrator's day could easily be consumed with all the little, mundane tasks that are necessary to keep things running smoothly. backing up servers, reading log files, preparing reports on resource utilization, playing world of warcraft—it all really eats into one's time. that's why i formulated my first law of network administration. i noted that, as a network admin, when things could pretty much take care of themselves, i could relax and better savor the more fulfilling moments of my job, like reducing a user's disk quota or reading a user's more provocative email messages. here is a short alliterative list of tips to help you achieve network nirvana:
  • aggregate: duplicating work increases the likelihood that you'll introduce errors and inconsistencies into your network's security, which is a bad idea no matter how you slice it. instead...

    1. locate shared resources that have common security requirements in the same directory structure on your file server. set access permissions only once on the highest-level directory that these files have in common. use permission inheritance to ensure consistent security on all the files in the hierarchy.

    2. don't assign permissions directly to users. add users to appropriate groups and assign permissions to the groups. that way you need only add a user to a group to ensure that all the access they require is properly configured.

  • automate: do nothing by hand if possible, because hands can be so mistake-prone sometimes. learn a scripting language and write (or download and customize) scripts to perform common, repetitive tasks like reading log files and collecting report data. if you administer a windows network, you must learn powershell. it's available for windows versions from xp onward, and is the "wave of the future." if you administer a linux network, you must learn bash. if you manage a mixed environment, i strongly recommend that you learn python—it's sufficiently platform-independent and very mature, with a smörgåsbord of cool features built in.

  • alert: let your network tell you when there are problems. install a network monitor system that's capable of notifying you when your file and email servers run low on disk space, or when your web server stops responding. when you can address a problem before your users even know it's there, they'll come to respect your precognitive powers and revere you for the system superhero you really are.

well, that last one, not really, because they won't know there was a problem in the first place, right? but hey, we're geeks: we're good at fantasy. now roll a d20 to see whether your invisibility-from-lumbergh spell worked before he asks for those tps reports. again.