• Aucun résultat trouvé

What is

N/A
N/A
Protected

Academic year: 2022

Partager "What is"

Copied!
144
0
0

Texte intégral

(1)
(2)

What is Awk

Awk is a programming language used for manipulating data and generating reports

The data may come from standard input, one or more files, or as output from a process

Awk scans a file (or input) line by line, from the first to the last line, searching for lines that match a specified pattern and performing selected actions (enclosed in curly braces) on those lines.

If there is a pattern with no specific action, all lines that match the pattern are displayed;

If there is an action with no pattern, all input lines specified by the action are executed upon.

(3)

Which Awk

The command is awk if using the old version,

nawk if using the new version, and gawk is using the gnu version

(4)

Awk ’s format

An awk program consists of:

the awk command

the program instructions enclosed in quotes (or a file) , and

the name of the input file

If an input file is not specified, input comes from standard input (stdin), the keyboard

Awk instructions consists of

patterns,

actions, or

a combination of patterns and actions

A pattern is a statement consisting of an expression of

(5)

Awk ’s format (continue.)

Actions consist of one or more statements separated by semicolons or new lines and enclosed in curly braces

Patterns cannot be enclosed in curly braces, and consist of regular expressions enclosed in forward slashes or expressions consisting of one or more of the many operators provided by awk

awk commands can be typed at the command line or in awk script files

The input lines can come from files, pipes, or standard input

(6)

Awk ’s format (continue.)

Format:

nawk 'pattern' filename nawk '{action}' filename

nawk 'pattern {action}' filename

(7)

Input from Files

Example 1:

$ cat employees

Chen Cho 5/19/63 203-344-1234 $76 Tom Billy 4/12/45 913-972-4536 $102 Larry White 11/2/54 908-657-2389 $54 Bill Clinton 1/14/60 654-576-4114 $201 Steve Ann 9/15/71 202-545-8899 $58

$ nawk '/Tom/' employees

Tom Billy 4/12/45 913-972-4536 $102

$

(8)

Input from Files (continue.)

Example 2:

$ cat employees

Chen Cho 5/19/63 203-344-1234 $76 Tom Billy 4/12/45 913-972-4536 $102 Larry White 11/2/54 908-657-2389 $54 Bill Clinton 1/14/60 654-576-4114 $201 Steve Ann 9/15/71 202-545-8899 $58

$ nawk '{print $1}' employees

ChenTom Larry BillSteve

(9)

Awk ’s format (continue.)

Example 3:

$ cat employees

Chen Cho 5/19/63 203-344-1234 $76 Tom Billy 4/12/45 913-972-4536 $102 Larry White 11/2/54 908-657-2389 $54 Bill Clinton 1/14/60 654-576-4114 $201 Steve Ann 9/15/71 202-545-8899 $58

$ nawk '/Steve/{print $1, $2}' employees

Steve Ann

(10)

The print function

The default action is to print the lines that are matched to the screen

The print function can also be explicitly used in the action part of awk as {print}

The print function accepts arguments as

variables,

computed values, or

string constants

String must be enclosed in double quotes

Commas are used to separate the arguments: if commas are not provided, the arguments are concatenated

(11)

The print function (continue.)

The comma evaluates to the value of the output field separator (OFS), which is by default a space

The output of the print function can be redirected or piped to another program, and another

program can be piped to awk for printing

(12)

The print function (continue.)

Example:

$ date

Fri Feb 9 07:49:28 EST 2001

$ date | nawk '{ print "Month: " $2

"\nYear: ", $6}'

Month: Feb Year: 2001

(13)

Escape sequences

Escape sequences are represented by a backslash and a letter or number

Escape sequence Meaning

\b Backspace

\f Form feed

\n Newline

\r Carriage return

\t Tab

\047 Octal value 47, a single quote

\c c represents any other character, e.g., \"

(14)

Escape sequences (continue.)

Example:

$ cat employees

Chen Cho 5/19/63 203-344-1234 $76 Tom Billy 4/12/45 913-972-4536 $102 Larry White 11/2/54 908-657-2389 $54 Bill Clinton 1/14/60 654-576-4114 $201 Steve Ann 9/15/71 202-545-8899 $58

$ nawk '/Ann/{print "\t\tHave a nice day, " $1,

$2 "\!"}' employees Have a nice day, Steve Ann!

(15)

The printf Function

The printf function can be used for formatting fancy output

The printf function returns a formatted string to standard output, like the printf statement in C.

Unlike the print function, printf does not provide a newline.

The escape, \n, must be provided if a newline is desired

When an argument is printed, the place where the output is printed is called the field, and when the width of the field is the number of characters contained in that field

(16)

The printf Function (continue.)

Example 1:

$ echo "UNIX" | nawk ' {printf "|%-15s|\n",

$1}'

|UNIX |

$ echo "UNIX" | nawk '{printf "|%15s|\n",

$1}'

| UNIX|

$

(17)

The printf Function (continue.)

Example 2:

$ cat employees

Chen Cho 5/19/63 203-344-1234 $76 Tom Billy 4/12/45 913-972-4536 $102 Larry White 11/2/54 908-657-2389 $54 Bill Clinton 1/14/60 654-576-4114 $201 Steve Ann 9/15/71 202-545-8899 $58

$ nawk '{printf "The name is: %-15s ID is

%8d\n", $1, $3}' employees

The name is: Chen ID is 5 The name is: Tom ID is 4 The name is: Larry ID is 11 The name is: Bill ID is 1 The name is: Steve ID is 9

$

(18)

The printf Function (continue.)

Conversion Character Definition

c Character

s String

d Decimal number

ld Long decimal number

u Unsigned decimal number

lu Long unsigned decimal

(19)

The printf Function (continue.)

Conversion Character Definition

x Hexadecimal number

lx Long hexadecimal number

o Octal number

lo Long octal number

e Floating point number in

scientific notation (e-notation)

f Floating point number

g Floating point number using

either e or f conversion,

whichever takes the least space

(20)

The printf Function (continue.)

Modifier Character Definition

- Left-justification modifier

# Integers in octal form are

displayed with a leading 0;

integers in hexadecimal form are displayed with a leading 0x.

+ For conversions using d,e,f, and

g, integers are displayed with a numeric sign + or -

0 The displayed value is padded

with zeros instead of white space

(21)

The printf Function (continue.)

Printf Format Specifier What it Does Given x=‘A’,y=15,z=2.3, and $1 = Bob Smith:

%c Prints a single ASCII character.

printf("The character is %c\n",x) prints: The character is A

%d Prints a decimal number

printf("The boy is %d years old\n", y) prints:

The boy is 15 years old

%e Prints the e notation of a number

printf("z is %f\n", z) prints: z is 2.3e+01

%f Prints a floating point number

printf("z is %e\n", z) prints: z is 4.600000

(22)

The printf Function (continue.)

Printf Format Specifier What it Does

Given x=‘A’,y=15,z=2.3, and $1 = Bob Smith:

%o Prints the octal value of a number

printf("y is %o\n",y) prints: y is 16

%s Prints a string of characters

printf("The name of the culprit is %s\n",

$1) prints: The name of the culprit is Bob Smith

%x Prints the hex value of a number

(23)

awk commands from within a file (continue.)

If awk commands are placed in a file, the -f option is used with the name of the awk file, followed by the name of the input file to be processed

A record is read into awk's buffer and each of the commands in the awk file are tested and executed for that record

If an action is not controlled by a pattern, the default behavior is to print the entire record

(24)

If a pattern does not have an action associated with it, the default is to print the record where the pattern matches an input line

awk commands from within a file

(continue.)

(25)

awk commands from within a file (continue.)

Example:

Chen Cho 5/19/63 203-344-1234 $76 Tom Billy 4/12/45 913-972-4536 $102 Larry White 11/2/54 908-657-2389 $54 Bill Clinton 1/14/60 654-576-4114 $201 Steve Ann 9/15/71 202-545-8899 $58

$ cat awkfile

/^Steve/{print "Hello Steve!"}

{print $1, $2, $3}

$ nawk -f awkfile employees

Chen Cho 5/19/63 Tom Billy 4/12/45 Larry White 11/2/54 Bill Clinton 1/14/60 Hello Steve!

Steve Ann 9/15/71

(26)

Records

By default, each line is called a record and is terminated with a newline

(27)

The Record Separator

By default, the output and input record separator (line separator) is a carriage return, stored in the built-in awk variables ORS and RS, respectively

The ORS and RS values can be changed, but only in a limited fashion

(28)

The $0 Variable

An entire record is referenced as $0 by awk

When $0 is changed by substitution or

assignment, the value of NF, the number of fields, may be changed

The newline value is stored in awk's built-in variable RS, a carriage return by default

(29)

The $0 Variable (continue.)

Example:

$ cat employees

Chen Cho 5/19/63 203-344-1234 $76 Tom Billy 4/12/45 913-972-4536 $102 Larry White 11/2/54 908-657-2389 $54 Bill Clinton 1/14/60 654-576-4114 $201 Steve Ann 9/15/71 202-545-8899 $58

$ nawk '{print $0}' employees

Chen Cho 5/19/63 203-344-1234 $76 Tom Billy 4/12/45 913-972-4536 $102 Larry White 11/2/54 908-657-2389 $54 Bill Clinton 1/14/60 654-576-4114 $201 Steve Ann 9/15/71 202-545-8899 $58

(30)

The NR Variable

The number of each record is stored in awk's built-in variable, NR

After a record has been processed, the value of NR is incremented by one

(31)

The NR Variable (continue.)

Example:

$ cat employees

Chen Cho 5/19/63 203-344-1234 $76 Tom Billy 4/12/45 913-972-4536 $102 Larry White 11/2/54 908-657-2389 $54 Bill Clinton 1/14/60 654-576-4114 $201 Steve Ann 9/15/71 202-545-8899 $58

$ nawk '{print NR, $0}' employees

1 Chen Cho 5/19/63 203-344-1234 $76 2 Tom Billy 4/12/45 913-972-4536 $102 3 Larry White 11/2/54 908-657-2389 $54 4 Bill Clinton 1/14/60 654-576-4114 $201 5 Steve Ann 9/15/71 202-545-8899 $58

(32)

Fields

Each record consists of words called fields

which, by default, are separated by white space, that is, blank spaces or tabs. Each of these

words is called a field, an awk keeps track of the number of fields in its built-in variable, NF

The value of NF can vary from line to line, and the limit is implementation-dependent, typically 100 fields per line

(33)

Fields (continue.)

Example 1:

$1 $2 $3 $4 $5

Chen Cho 5/19/63 203-344-1234 $76 Tom Billy 4/12/45 913-972-4536 $102 Larry White 11/2/54 908-657-2389 $54 Bill Clinton 1/14/60 654-576-4114 $201 Steve Ann 9/15/71 202-545-8899 $58

$ nawk '{print NR, $1, $2, $5}' employees

1 Chen Cho $76 2 Tom Billy $102 3 Larry White $54 4 Bill Clinton $201 5 Steve Ann $58

(34)

Fields (continue.)

Example 2:

nawk '{print $0, NF}' employees

Chen Cho 5/19/63 203-344-1234 $76 5 Tom Billy 4/12/45 913-972-4536 $102 5 Larry White 11/2/54 908-657-2389 $54 5 Bill Clinton 1/14/60 654-576-4114 $201 5 Steve Ann 9/15/71 202-545-8899 $58 5

(35)

The Input Field Separator

awk's built-in variable, FS, holds the value of the input field separator.

• When the default value of FS is used, awk separates fields by spaces and/or tabs, stripping leading blanks and tabs

• The FS can be changed by assigning new value to it, either:

– in a BEGIN statement, or – at the command line

(36)

The Input Field Separator (continue.)

• To change the value of FS at the command line, the –F option is used, followed by the character representing the new separator

(37)

The Input Field Separator (continue.)

• Example:

$ cat employees

Chen Cho:5/19/63:203-344-1234:$76 Tom Billy:4/12/45:913-972-4536:$102 Larry White:11/2/54:908-657-2389:$54 Bill Clinton:1/14/60:654-576-4114:$201 Steve Ann:9/15/71:202-545-8899:$58

$ nawk -F: '/Tom Billy/{print $1, $2}' employees

Tom Billy 4/12/45

(38)

Using More than One Field Separator

• You may specify more than one input separator

• If more than one character is used for the field

separator, FS, then the string is a regular expression and is enclosed in square brackets

• Example

$ nawk -F'[ :\t]' '{print $1, $2, $3}‘ employees

Chen Cho 5/19/63 Tom Billy 4/12/45 Larry White 11/2/54 Bill Clinton 1/14/60 Steve Ann 9/15/71

(39)

The Output Field Separator

• The default output field separator is a single space and is stored in awk's internal variable, OFS

• The OFS will not be evaluated unless the comma separates the fields

• Example:

$ cat employees

Chen Cho:5/19/63:203-344-1234:$76 Tom Billy:4/12/45:913-972-4536:$102 Larry White:11/2/54:908-657-2389:$54 Bill Clinton:1/14/60:654-576-4114:$201 Steve Ann:9/15/71:202-545-8899:$58

$ nawk -F: '/Tom Billy/{print $1 $2 $3 $4}' employees

Tom Billy4/12/45913-972-4536$102

(40)

Patterns

• A pattern consists of – a regular expression,

– an expression resulting in a true or false condition, or – a combination of these

• When reading a pattern expression, there is an implied if statement

(41)

Patterns (continue.)

• Example:

$ cat employees

Chen Cho 5/19/63 203-344-1234 $76 Tom Billy 4/12/45 913-972-4536 $102 Larry White 11/2/54 908-657-2389 $54 Bill Clinton 1/14/60 654-576-4114 $201 Steve Ann 9/15/71 202-545-8899 $58

$ nawk '/Tom/' employees

Tom Billy 4/12/45 913-972-4536 $102

$ nawk '$4 < 40' employees

Chen Cho 5/19/63 203-344-1234 $76 Steve Ann 9/15/71 202-545-8899 $58

(42)

Actions

• Actions are statements enclosed within curly braces and separated by semicolons

• Actions can be simple statements or complex groups of statements

• Statements are separated – by semicolons, or

– by a newline if placed on their own line

(43)

Regular Expressions

• A regular expression to awk is a pattern that consists of characters enclosed in forward slashes

• Example 1:

• Example 2:

$ nawk '/Steve/' employees

Steve Ann 9/15/71 202-545-8899 $58

$ nawk '/Steve/{print $1, $2}' employees

Steve Ann

(44)

Regular Expression Meta characters

^ Matches at the beginning of string

$ Matches at the end of string . Matches for a single character

* Matches zero or more of preceding character + Matches for one or more of preceding character

? Matches for zero or one of preceding character [ABC] Matches for any one character in the set of

(45)

Regular Expression Meta characters (continue.)

[^ABC] Matches characters not in the set of characters, i.e.,A,B or C

[A-Z] Matches for any character in the range from A to Z

A|B Matches either A or B

(AB)+ Matches one or more sets of AB

\* Matches for a literal asterisk

& Used in the replacement to represent what was found in the search string

(46)

Regular Expressions (continue.)

• Example 3:

• Example 4:

$ nawk '/^Steve/' employees

Steve Ann 9/15/71 202-545-8899 $58

$ nawk '/^[A-Z][a-z]+ /' employees

Chen Cho 5/19/63 203-344-1234 $76 Tom Billy 4/12/45 913-972-4536 $102 Larry White 11/2/54 908-657-2389 $54

Bill Clinton 1/14/60 654-576-4114 $201 Steve Ann 9/15/71 202-545-8899 $58

(47)

The Match Operator

• The match operator, the tilde (~), is used to match an expression within a record or a field

• Example 1:

$ cat employees

Chen Cho 5/19/63 203-344-1234 $76 Tom Billy 4/12/45 913-972-4536 $102 Larry White 11/2/54 908-657-2389 $54 Bill Clinton 1/14/60 654-576-4114 $201 Steve Ann 9/15/71 202-545-8899 $58

$ nawk '$1 ~ /[Bb]ill/' employees

Bill Clinton 1/14/60 654-576-4114 $201

(48)

The Match Operator (continue.)

• Example 2:

$ nawk '$1 !~ /lee$/' employees

Chen Cho 5/19/63 203-344-1234 $76 Tom Billy 4/12/45 913-972-4536 $102 Larry White 11/2/54 908-657-2389 $54 Bill Clinton 1/14/60 654-576-4114 $201 Steve Ann 9/15/71 202-545-8899 $58

(49)

awk Commands in a Script File

• When you have multiple awk pattern/action statements, it is often easier to put the statements in a script

• The script file is a file containing awk comments and statements

• If statements and actions are on the same line, they are separated by semicolons

• Comments are preceded by a pound (#) sign

(50)

awk Commands in a Script File (continue.)

• Example:

$ cat employees

Chen Cho:5/19/63:203-344-1234:$76 Tom Billy:4/12/45:913-972-4536:$102 Larry White:11/2/54:908-657-2389:$54 Bill Clinton:1/14/60:654-576-4114:$201 Steve Ann:9/15/71:202-545-8899:$58

$ cat info

# My first awk script by Abdelshakour Abuzneid

# Script name: info; Date: February 09, 2001 /Tom/{print "Tom's birthday is "$3}

/Bill/{print NR, $0}

/^Steve/{print "Hi Steve. " $1 " has a salary of " $4

"."}

(51)

awk Commands in a Script File (continue.)

• Example (continue.):

• To view info script, click here

$ nawk -F: -f info employees

Tom's birthday is 913-972-4536

2 Tom Billy:4/12/45:913-972-4536:$102

4 Bill Clinton:1/14/60:654-576-4114:$201 Hi Steve. Steve Ann has a salary of $58.

(52)
(53)

Comparison Expressions

• Comparison expressions match lines where if the condition is true, the action is performed

• The value of the expression evaluates true, and 0 if false

• Comparison expressions match lines where if the condition is true, the action is performed

• The value of the expression evaluates true, and 0 if false

(54)

Relational Operators

Operator Meaning Example

< Less than x < y

<= Less than or equal to x <= y

== Equal to x == y

!= Not equal to x != y

>= Greater than or equal to x >= y

> Greater than x > y

~ Matched by regular expression x ~ / y/

!~ Not matched by regular expression x !~ /y/

(55)

Relational Operators (continue.)

• Example:

$ cat employees

Chen Cho 5/19/63 203-344-1234 76 Tom Billy 4/12/45 913-972-4536 102 Larry White 11/2/54 908-657-2389 54 Bill Clinton 1/14/60 654-576-4114 201 Steve Ann 9/15/71 202-545-8899 58

$ nawk '$5 == 201' employees

Bill Clinton 1/14/60 654-576-4114 201

$ nawk '$5 > 100' employees

Tom Billy 4/12/45 913-972-4536 102

$ nawk '$2 ~ /Ann/ ' employees

Steve Ann 9/15/71 202-545-8899 58

(56)

Relational Operators (continue.)

• Example (continue):

$ nawk '$2 !~ /Ann/ ' employees

Chen Cho 5/19/63 203-344-1234 76 Tom Billy 4/12/45 913-972-4536 102 Larry White 11/2/54 908-657-2389 54 Bill Clinton 1/14/60 654-576-4114 201

(57)

Conditional Expressions

• A conditional expression uses two symbols, the

question mark and the colon, to evaluate expression

• Format:

conditional expression1 ? expression2 : expression3

(58)

Conditional Expressions (continue.)

• Example:

$ nawk '{max=($1 > $2) ? $1 : $2; print max}' employees Cho

Tom White Clinton Steve

(59)

Computation

awk performs all arithmetic in floating point Operator Meaning Example

+ Add x + y

- Substract x – y

* Multiply x * y

/ Divide x / y

% Modulus x % y

^ Exponentiation x ^ y

(60)

Computation (continue.)

• Example:

$ nawk '$3 * $4 > 500' filename

(61)

Compound Patterns

• Compounds patterns are expressions that combine patterns with logical operators

Operator Meaning Example

&& Logical AND a && b

|| Logical OR a || b

! NOT !a

(62)

Compound Patterns (continue.)

• Example :

$ nawk '$2 > 5 && $2 <= 15' employees

$

$ nawk '$5 == 1000 || $3 > 50' employees

Steve Ann 9/15/71 202-545-8899 58

$

(63)

Range Patterns

• Range patterns match from the first occurrence of one pattern to the first occurrence of the second pattern, then match for the next occurrence of the second pattern, etc

• If the first pattern is matched and the second pattern is not found, awk will display all lines to the end of the file

• Example :

$ nawk '/Tom/,/Steve/' employees

Tom Billy 4/12/45 913-972-4536 102 Larry White 11/2/54 908-657-2389 54 Bill Clinton 1/14/60 654-576-4114 201 Steve Ann 9/15/71 202-545-8899 58

$

(64)
(65)

Numeric and String Constants

• Numeric constants can be represented as – Integer like 243

– Floating point numbers like 3.14, or

– Numbers using scientific notation like .723E-1 or 3.4

• Strings, such as Hello are enclosed in double quotes

(66)

Initialization and Type Coercion

• A variable can be – a string

– a number, or – both

• When it is set, it becomes the type of the expression on the right-hand side of the equal sign

• Initialized variables have the value zero or the value

" ", depending on the context in which they are used

(67)

User-Defined Variables

• User-defined variables consist of letters, digits, and underscores, and cannot begin with a digit

• Variables in awk are not declared

• If the variable is not initialized, awk initializes string variables to null and numeric variables to zero

• Variables are assigned values with awk's assignment operators

• Example :

$ nawk '$1 ~ /Tom/ {wage = $5 * 40; print wage}' employees

4080

(68)

Increment and Decrement Operators

•The expression x++ is equivalent to x=x+1

•The expression x– is equivalent to x=x-1

•You can use the increment and decrement operators either preceding operator, as in ++x, or after the operator, as x++

{x = 1;y = x++; print x, y}

name=“Nancy” name is string x++ x is a number; x is

initialized to zero and incremented by 1

number=35 number is a number

(69)

User-Defined Variables at the Command line

•A variable can be assigned a value at the command line and passed into awk script

•Example:

$ nawk –F: -f awkscript month=4 year=1999 filename

(70)

The –v Option ( nawk )

•The –v option provided by nawk allows command line arguments to be processed within a BEGIN statement

•For each argument passed at the command line, there must be a –v option preceding it

(71)

Built-in Variables

•Built-in variables have uppercase names. They can be used in expressions and can be reset

Variable Name Variable Contents

ARGC Number of command line argument ARGV Array of command line arguments FILENAME Name of current input file

FNR Record number in current file

FS The input field separator, by default a space

(72)

Built-in Variables (continue.)

Variable Name Variable Contents

NF Number of fields in current record

NR Number of record so far

OFMT Output format for numbers

OFS Output field separator

ORS Output record separator

RLENGTH Length of string matched by match function

RS Input record separator

RSTART Offset of string matched by match function SUBSEP Subscript separator

(73)

Built-in Variables (continue.)

• Example:

$ nawk -F: '$1 == "Steve Ann"{print NR, $1, $2,

$NF}' employees2

5 Steve Ann 9/15/71 $58

(74)

BEGIN Patterns

•The BEGIN pattern is followed by an action block

that is executed before awk processes any lines from the input file

•The BEGIN action is often used to change the value of the built-in variables, OFS, RS, FS, and so forth, to assign initial values to user-defined variables, and to print headers or titles as part of the output

(75)

BEGIN Patterns (continue.)

•Example 1 :

$ nawk 'BEGIN{FS=":"; OFS="\t"; ORS="\n\n"}

{print $1,$2,$3}' employees2

Chen Cho 5/19/63 203-344-1234 Tom Billy 4/12/45 913-972-4536 Larry White 11/2/54 908-657-2389 Bill Clinton 1/14/60 654-576-4114 Steve Ann 9/15/71 202-545-8899

$

(76)

BEGIN Patterns (continue.)

•Example 2 :

$ nawk 'BEGIN{print "Make Year"}'

Make Year

(77)

END Patterns

END patterns do not match any input lines, but executes any actions that are associated with the END pattern. END patterns are handled after all lines of input have been processed

•Examples:

$ nawk 'END{print "The number of records is "

NR }' employees

The number of records is 5

$ nawk '/Steve/{count++}END{print "Steve was found

" count " times."}' employees

Steve was found 1 times.

$

(78)

Output Redirection

•When redirecting output from within awk to a UNIX file, the shell redirection operators are used

•The filename must be enclosed in double quotes

•Once the file is opened, it remains opened until explicitly closed or the awk program terminates

•Example:

$ nawk '$5 >= 70 {print $1, $2 > "passing_file" }' employees

$ cat passing_file

Chen Cho Tom Billy Bill Clinton

(79)

The getline Function

•Reads input from

– The standard input, – a pipe, or

– a file other than from the current file being processed

•It gets the next line of input and sets the NF, NR and the FNR built-in variables

•The getline function returns 1 if a record is found 0 if EOF (end of file) -1 if there is an error

(80)

The getline Function (continue.)

•Examples :

$ nawk 'BEGIN{ "date" | getline d; print d}' employees2

Fri Feb 9 09:39:53 EST 2001

$ nawk 'BEGIN{ "date" | getline d; split( d, mon);

print mon[2]}' employees

Feb $ nawk 'BEGIN{while("ls" | getline) print}

UNIXvarfile varfile2 varfile3 varfile4 varfile5 varfile6

(81)

The getline Function (continue.)

•Examples :

$ nawk 'BEGIN{ print "What is your name?" ;\

> getline name < "/dev/tty"}\

> $1 ~ name {print "Found " name " on line ", NR "."}\

> END{print "See ya, " name "."}' employees

What is your name?

abdul

See ya, abdul.

$

(82)

Pipes

•If you open a pipe in an awk program, you must close it before opening another one

•The command on the right-hand side of the pipe symbol is enclosed in double quotes

(83)

Pipes (continue.)

•Example:

$ cat names

jhon smith alice cheba tony tram dan savage

eliza goldborg

$ nawk '{print $1, $2 | "sort -r +1 -2 +0 -1

"}' names

tony tram jhon smith dan savage

eliza goldborg alice cheba

$

(84)

Closing Files and Pipes

•The pipe remains opened until awk exits

•Statements in the END block will also be affected by the pipe. The first line in the END block closes the pipe

•Example:

( In script)

{ print $1, $2, $3 | “ sort –r +1 –2 +0 –1”}

END{

Close(“sort –r +1 –2 +0 –1”)

<rest of statement> }

(85)

The System Function

•The built-in system function takes a UNIX (operating system command) command as its argument, executes the command, and returns the exit status to the awk program

•The UNIX command must be enclosed in double quotes

•Example:

( In script) {

System ( “cat “ $1 ) System ( “clear” ) }

(86)

If Statement

•Format:

If (expression) {

statement; statement; … }

(87)

If/else Statement

•Format:

{If (expression) {

statement; statement; … }

else {

statement; statement; … }

}

(88)

If/else Statement

•Example:

$ nawk '{if($6 > 50) print $1 "Too high"; \

> else print "Range is OK"}' names

Range is OK Range is OK Range is OK Range is OK Range is OK

$

(89)

If/else else if Statement

•Format:

{ If (expression) {

statement; statement; … else if (expression) {}

statement; statement; … else if (expression) {}

statement; statement; … }

else {

statement; statement; … } }

(90)

Loops

• Loops are used to iterate through the field within a record and to loop through the elements of an array in the END block

(91)

While Loop

•The first step in using a while loop is to set a variable to an initial value

•The do/while loop is similar to the while loop, except that the expression is not tested until the body of the

loop is executed at least once

(92)

While Loop (continue.)

•Example:

$ nawk '{ i = 1; while (i <= NF ) { print NF,

$i; i++}}' names

2 jhon 2 smith 2 alice 2 cheba 2 tony 2 tram 2 dan

2 savage 2 eliza

2 goldborg

(93)

for Loop

for loop requires three expressions within the parentheses: the initialization expression, the test

expression and the expression to update the variables within the test expression

•The first statement within the parentheses of the for loop can perform only one initialization

(94)

for Loop (continue.)

•Example:

$ nawk '{ i = 1; while (i <= NF ) { print NF,

$i; i++}}' names

2 jhon 2 smith 2 alice 2 cheba 2 tony 2 tram 2 dan

2 savage 2 eliza

2 goldborg

(95)

break and continue Statement

•The break statement lets you break out of a loop if a certain condition is true

•The continue statement causes the loop to skip any

statement that follow if a certain condition is true, and returns control to the top of the loop, starting at the next iteration

•Example:

(In Script)

{ if ($1 Peter} {next}}

else {print}

}

(96)

next Statement

•The next statement gets the next line of input from the input file, restarting execution at the top of the awk script

•Example:

(In Script)

1 {for ( x = 3; x <= NF; x++)

if ( $x < 0) {print “Bottomed out!”; break}

# breaks out of the loop }

2 {for ( x = 3; x <= NF; x++ )

if ( $x == 0 ) { print “Get next item”; continue}

# starts next iteration of the for loop

(97)

exit Statement

•The exit statement is used to terminate the awk

program. It stops processing records, but does not skip over an END statement

•If the exit statement is given a value between 0 and

255 as an argument (exit 1), this value can be printed at the command line to indicate success or failure by typing:

(98)

exit Statement (continue.)

•Example:

(In Script) {exit (1)}

(The Command Line)

% echo $status (csh)

$ echo $? (sh/ksh)1 1

(99)

Arrays

•Arrays in awk are called associative arrays because

the subscripts can be either –number, or

–string

•The keys and values are stored internally is a table where a hashing algorithm is applied to the value of the key in question

•An array is created by using it, and awk can infer whether or not is used to store numbers or strings

(100)

Arrays (continue.)

•Array elements are initialized with –numeric value, and

–string value null

•You do not have to declare the size of an aw array

awk arrays are used to collect information from records and may be used for accumulating totals, counting words, tracking the number of times a pattern occurred

(101)

Arrays (continue.)

•Example:

$ cat employees

Chen Cho 5/19/63 203-344-1234 76 Tom Billy 4/12/45 913-972-4536 102 Larry White 11/2/54 908-657-2389 54 Bill Clinton 1/14/60 654-576-4114 201 Steve Ann 9/15/71 202-545-8899 58

$ nawk '{name[x++]=$2};END{for(i=0; i<NR;i++)\

> print i, name[i]}' employees

0 Cho 1 Billy 2 White 3 Clinton 4 Ann

(102)

Arrays (continue.)

•Example:

$ nawk '{id[NR]=$3};END{for(x = 1; x<= NR;

x++)\

> print id[x]}' employees

5/19/63 4/12/45 11/2/54 1/14/60 9/15/71

$

(103)

The special for Loop

•The special for loop is used to read through an

associative array when strings are used as subscripts or the subscripts are not consecutive numbers

•When strings are used as subscripts or the subscripts are not consecutive numbers

(104)

The special for Loop (continue.)

•Example:

$ cat db

Tom Jones Mary Adams Sally Chang Billy Black Tom Savage

$ nawk '/^Tom/{name[NR]=$1};\

> END{for( i=1; i <= NR; i++ )print name[i]}' db

Tom Tom

$

(105)

The special for Loop (continue.)

•Example:

$ nawk '/^Tom/{name[NR]=$1};\

> END{for(i in name){print name[i]}}' db

Tom Tom

$

(106)

Using Strings as Array Subscripts

•A subscript may consist of a variable containing a string or literal string

•If the string is a literal, it must be enclosed in double quotes

•Example

$ cat db

Tom Jones Mary Adams Sally Chang Billy Black Tom Savage

$ nawk -f awkscript db

(107)

Using Field Values as Array Subscripts (continue.)

•Any expression can be used as a subscript in an array.

Therefore, fields can be used

•Example 1:

$ cat db

Tom Jones Mary Adams Sally Chang Billy Black Tom Savage

$ nawk '{count[$2]++}END{for(name in count)print name, count[name] }' db

Chang 1 Black 1 Jones 1 Savage 1 Adams 1

$

(108)

Using Field Values as Array Subscripts (continue.)

•Example 2:

$ cat db

Tom Jones Mary Adams Sally Chang Billy Black Tom Savage

$ nawk '{count[$2]++}END{for(name in count)print name, count[name] }' db

Chang 1 Black 1 Jones 1 Savage 1 Adams 1

$

(109)

Arrays and the split Function

awk’s built-in split function allows you to split string into words and store them in an array

•You can define the field separator or use the value currently stored in FS

•Format:

split(string, array, field separator) split(string, array)

(110)

The delete Function

•The delete function removes an array elements

(111)

Multidimensional Arrays: Nawk

•Multidimensional array is done by concatenating the indices into a string separated by the value of a special built-in variable, SUBSEP

•The SUBSEP variable contains the value “\034”, an unprintable character

•The expression matrix[2,8] is really the array

matrix[2 SUBSEP 8] which evaluates to matrix[“2\0348”]

(112)

ARGV

•Command line arguments are available to nawk with the built-in array called ARGV

•These arguments include the command nawk, but not any of the options passed to nawk

•The index of the ARGV array starts at zero

(113)

ARGC

ARGC is a built-in variable that contains the number of command line arguments

•Example:

$ cat myscript

#This script is called myscript BEGIN{

for ( i = 0; i< ARGC; i++){

printf("argv[%d] is %s\n", i, ARGV[i])

}

printf("The number of arguments, ARGC=%d\n", ARGC)

}

(114)

ARGC (continue.)

•Example :

$ cat ARGVS

#This script is called argvs BEGIN{

for ( i=0; i < ARGC; i++ ) {

printf("argv[%d] is %s\n", i, ARGV[i]) }

printf("The number of arguments, ARGC=%d\n", ARGC) }$

$ nawk -f ARGVS datafile

argv[0] is nawk

argv[1] is datafile

The number of arguments, ARGC=2

$

(115)

ARGC (continue.)

•Example :

$ nawk -f ARGVS datafile "Peter Pan" 12

argv[0] is nawk

argv[1] is datafile argv[2] is Peter Pan argv[3] is 12

The number of arguments, ARGC=4

$

(116)

ARGC (continue.)

•Example :

$ cat arging

#This script is called argin BEGIN{FS=":"; name=ARGV[2]

print "ARGV[2] is "ARGV[2]

}

$1 ~ name { print $0 }

$ nawk -f arging employees2 "Chen Cho"

ARGV[2] is Chen Cho

Chen Cho:5/19/63:203-344-1234:$76 nawk: can't open file Chen Cho

input record number 5, file Chen Cho source line number 1

(117)

ARGC (continue.)

•Example :

$ cat arging1

#This script is called argin BEGIN{FS=":"; name=ARGV[2]

print "ARGV[2] is "ARGV[2]

delete ARGV[2]

}

$1 ~ name { print $0 }

$ nawk -f arging1 employees2 "Chen Cho"

ARGV[2] is Chen Cho

Chen Cho:5/19/63:203-344-1234:$76

(118)

The sub and gsub Functions

•The sub function matches the regular expression for the largest and leftmost substring in the record, and

replaces that substring with the substitution string

•If a target string is specified, the regular expression is matched for the largest and leftmost substring in the target string, and the substring is replaced with the substitution string

•If a target string is not specified, the entire record is used

(119)

The sub and gsub Functions (continue.)

•Format:

sub (regular expression, substitution string);

sub (regular string, substitution string, target string)

•Format:

gsub (regular expression, substitution string) sub (regular string, substitution string,

target string)

(120)

The index Function

•The index function returns the first position where a substring is found in a string

•Offset starts at position 1

(121)

The length Function

•The length function returns the number of characters in a string

•Without an argument, the length function returns the number of characters in a record

•Format:

length (string) length

(122)

The length Function

•Example

$ nawk '{ print length("hello") }' employees

5 5 5 5 5

$

(123)

The substr Function

•The substr function returns the substring of a string starting at a position where the first position is one

•If the length of the substring is given, that part of the string is returned

•If the specified length exceeds the actual string, the string is returned

•Format:

substr (string, starting position) substr (string, starting position, length of string

(124)

The match Function

•The match function returns the index where the regular expression is found in the string, or zero if not found

•The match function sets the built-in variable RSTART to the starting position of the substring within the string, and RLENGTH to the number of characters to the end of the substring

•Format:

match (string, regular expression)

(125)

The match Function (continue.)

•Example :

$ nawk 'END{start=match("Good ole USA", /[A-Z]

+$/);\

> print RSTART, RLENGTH}' employees

10 3

(126)

The split Function

•The split function splits a string into an array using whatever field separator is designated as the third

parameter

•If the third parameter is not provided, awk will use the current value of FS

•Format:

split (string, array, field separator) split (string, array)

(127)

The sprintf Function

•The sprintf function returns an expression in a specified format. It allows you to apply the format specifications of the printf function

•Format:

Variable=sprintf(“string with format specifiers” expr1, expr2, …, exprn)

(128)

The sprintf Function (continue.)

•Example :

$ awk '{line = sprintf ( "%-15s %6.2f ", $1 ,

$3 );\

> print line}' employees

Chen 0.00 Tom 0.00 Larry 0.00 Bill 0.00 Steve 0.00

(129)

Built-in Arithmetic Functions

Name Value returned

atan2(x,y) Arctangent of y/x in the range

cos(x) Cosine of x, with x in radians

exp(x) Exponential function of x, e

int(x) Integer part of x; truncated

toward 0 when x>0

log(x) Natural (base e) logarithm of x

rand() Random number r, where

0<r<1

(130)

Built-in Arithmetic Functions

(continue.)

Name Value returned

sin() Sine of x, with x radians

sqrt(x) Square root of x

srand(x) X is a new seed for rand()

(131)

Integer Function

•The int function truncates any digits to the right of the decimal point to create a whole number. There is no rounding off

(132)

The rand Function

•The rand function generates a pseudorandom floating point number greater than or equal to zero and less than one

•Example

$ nawk '{print rand()}' employees

0.513871 0.175726 0.308634 0.534532 0.94763

$ nawk '{print rand()}' employees

0.513871 0.175726 0.308634 0.534532

(133)

The srand Function

•The srand function without an argument uses the time of day to generate the seed for the rand function

srand(x) uses x as the seed. Normally, x should vary during the run of the program

•Example

$ nawk 'BEGIN{srand()};{print rand()}' employees

0.548753 0.392254 0.972472 0.821497 0.153722

$

(134)

The srand Function (continue.)

•Example

$ nawk 'BEGIN{srand()};{print rand()}' employees

0.548753 0.392254 0.972472 0.821497 0.153722

$ nawk 'BEGIN{srand()};{print rand()}' employees

0.00537126 0.312784 0.23722 0.132023 0.17304

(135)

User-Defined Functions ( nawk )

•A user-defined function can be placed anywhere in the script that a pattern action rule can

•Format

function name (parameter, parameter, parameter, …){

statements

return expression

(the return statement and expression are optional)

}

Références

Documents relatifs

No calculators are to be used on this exam. Note: This examination consists of ten questions on one page. Give the first three terms only. a) What is an orthogonal matrix?

I am thankful to the CNRS–IRD program IRN GandA for its support, to Fabien Pazuki and Yuri Bilu who run it, to Cathy M´ etivier for the administrative help, to Florian Luca for my

For an explanation of this notation see our paper 'Some Problems of Diophantine Ap- proximation (II)', Acta Mathematica, vol.. BOHR and I,ANDAU, GOttinger

For the impact of long-wave radiative exchanges, three modalities: the building exchanges with the sky without taking into account the masks of the environment and

It is very hard to calculate the exact reliability of software, but to come to those results we have included a bunch of factors involved with this process, such as availability

“[m]ore so than any other American author of the twentieth century, Pynchon documents in fiction the tectonic movements of American history, to the extent that

First, in stable coronary artery disease patients the rate and clinical impact of HTPR (High on-Treatment Platelet Reactivity) are reduced (2).. Second, the

To see that a qualitative argument can be used: we know that the completely ordered case minimize the energy and we are interested in the states that have the closer possible