• Aucun résultat trouvé

Mastering Control Characters With CTRL-V

Dans le document Register for Free Membership to (Page 152-162)

There is another way to represent a control character in most shells—the direct way. The CTRL-Vkey enters a sort of escape mode, which waits for you to type a control character and then prints it to the screen as a single character. For example, pressing Control-V followed immediately by Control-M in bash will print ^Mto the Terminal. However, this is not simply the caret (^) followed by a capital M, this is a control character in printed form. You can see proof of this by pressing the Backspace key immediately after pressing Control-M. One backspace deletes both the caret and the capital M at once. This can come in handy when you know what a control character looks like, but cannot locate an escape sequence for it. Be warned, however, that not all programs handle these control character representations in exactly the same way. Some will be happy to accept these representations, while others will balk. Vi, tr, awk and grep are all happy to accept these representations as control characters.

So, the “shorthand” for the nasty ^M carriage return is simply \r. We can easily turn this “Mac” file into a “UNIX” file with this quick trcommand:

j0pb12:~ $ cat osxhah.txt | tr '\r' '\n' > osxhah-unix.txt

j0pb12:~ $ wc -l osxhah-unix.txt 10 osxhah-unix.txt

This replaced every occurrence of an ^M with a UNIX new line. Notice that the wc -l command now recognizes ten distinct lines in the file, one for each of the nine songs in the play list and one header line. Likewise, we can turn this “Mac” file into a “Windows” file with this quick tr command:

j0pb12:~ $ cat osxhah.txt | tr '\r' '\r\n' > osxhah-windows.txt j0pb12:~ $ wc -l osxhah-windows.txt

0 osxhah-windows.txt

However, the Windows version of this file contains no UNIX-recognized line breaks, and will be incompatible with our UNIX commands. So, we can first convert our song list file to a UNIX-friendly file, or we can simply inform awkabout the goofy line delimiters (or as awk calls them,record separa-tors) in the original file. While we have already seen that field separators are defined via awk’s -Foption, record separators are defined in a slightly different way, via the -v option and the RS variable name. For example, to properly process our carriage-return delimited file, we could use this command:

j0pb12:~ $ awk -v RS='\r' '{print $0}' osxhah.txt | wc -l 10

By setting the proper record separator, printing each line, and counting the number of results, we can see that awkis now recognizing distinct lines.

We can verify this by printing the first field ($1) of each record:

j0pb12:~ $ awk -v RS='\r' '{print $1}' osxhah.txt Name

Obviously,awknow properly recognizes line breaks, but judging from the name of the songs listed, there may be a problem with the field delimiters.Awk uses a space by default, but viewing the file through cat -t reveals that tabs (not spaces) are being used to delimit fields:

j0pb12:~/Desktop johnnylong$ cat -t osxhah.txt

When executed with this option,catdisplays control characters as printed equivalents, and tabs as the ^I character. Setting the record separator as \rand the field separator as \t (tab), we begin to see quite a bit of progress in wran-gling the song list file:

j0pb12:~ $ awk -F'\t' -v RS='\r' '{print $1}' osxhah.txt Name

A toast to my former Self Chapter 2

A Shadow On Me Little Green Men

Now, it’s taken quite a bit of page space to describe this process, but once you get the hang of thinking in terms of record and field separators,awk can be wielded for insanely powerful results.The nasty song list file can now be tossed around with very little effort. For example, we can poke through the headers to find the ones we’re interested in:

j0pb12:~/ $ awk -F'\t' -v RS='\r' '{print $1}' osxhah.txt | head -1

The head -1 command is used to show only the first line of the output file.Therefore, to see only the artist and the name of the song, in that order, separated by a colon, we can use a quick awk command:

j0pb12:~/ $ awk -F'\t' -v RS='\r' '{print $2": "$1}' osxhah.txt Artist: Name

Disciple: Knocked_Down Pillar: Fireproof Pillar: Echelon P.O.D.: Wildfire

Project 86: Me Against Me

Project 86: A toast to my former Self Project 86: Chapter 2

Project 86: A Shadow On Me Project 86: Little Green Men

Notice that we’ve re-ordered the fields (artist printed first, followed by song title) and that a colon and a space were inserted after the artist name by simply inserting them in the print statement, surrounded by quotes. Re-ordering fields is a no-brainer with awk, and inserting text is simple as well, thanks to the print statement. Changing awk’s print statement to ‘{print $1”:

“$2}’would print the song title first, followed by the artist name.

Simple text insertion is nice, but we can do so much more that simple text-based output. We could, for example, print each line as basic HTML, dis-playing the artist name in bold text, and inserting an HTML break (<BR>) after each line (see Figure 2.47).The backslash character has been used to break a single line of bash into two, which is more visually appealing parts.

Figure 2.47iTunes Song List To Ugly HTML With awk

This awkcommand used the printstatement to print both static text and awkvariables. Static text is enclosed in double-quotes while awkvariables are not.This awkcommand did a great job printing the <b>tag, the artist name, the song title, and the trailing <br>tag, but the output is ugly. A table would look nicer, but the format of a table is a bit more complex, as shown in this skeleton of an HTML table:

<HTML>

<TABLE BORDER=1>

<TR><TD>Artist </TD><TD>Name </TD></TR>

</TABLE>

</HTML>

To create a table, we need to insert the <TR>, </TR>, <TD>. and

</TD> tags on each line of output, and we need to print the <HTML> and

<TABLE> tags at the beginning and the </TABLE> and </HTML> tags at the end of the HTML. We could build an HTML file by echoing tags into a file, appending the awkscript’s output to that file, and appending the closing tags, using commands like echo alongside awk, but this is tedious.Awk can handle this behavior with the BEGIN and END functions.The BEGIN func-tion executes once, before awkstarts processing the input file, and the END function executes once after the input file has been processed.This would

allow us to print the “head” and “tail” of our HTML file with ease.The script itself would look like this:

'BEGIN {print"<HTML><TABLE border=1>"}

{print "<tr><td>" $2"</td><td>"$1"</td></tr>"}

END {print "</table></HTML>"}'

This script (like all awk scripts) can be run on one line, and the output written to a file, as shown in Figure 2.48:

Figure 2.48iTunes Song List To HTML Table With awk

Having performed this task from the command line, a single shell script can be written (or cut and pasted) that accepts the name of an output file, and automatically opens it in Safari:

j0pb12:~/Desktop johnnylong$ cat convert_songlist.sh

# Build a table-based HTML file based on iTunes song list

# If no output file is supplied, call it "out.html"

test $# -eq "0" && OUTPUT_FILE=out.html || OUTPUT_FILE=$1

awk -F'\t' -v RS='\r' '\

BEGIN {print"<HTML><TABLE border=1>"} \

{print "<tr><td>" $2"</td><td>"$1"</td></tr>"} \

END {print "</table></HTML>"}' ~/Desktop/osxhah.txt > $OUTPUT_FILE

echo "Writing to $OUTPUT_FILE..."

open $OUTPUT_FILE

Of course, there’s no end to the amount of automation you can employ, including AppleScripting the iTunes file export, but at some point, it’s best to leave well enough alone and move on to more pressing (or interesting) automation tasks. And while there are certainly other ways to produce this song list output, the skills presented apply well to many different tasks, and show how UNIX tools can be used together to obtain great results.

Curl

Curl is a tool designed to transfer data from a server using various protocols including HTTP, HTTPS, FTP and others. It can run without user interac-tion as long as the parameters required to complete the transacinterac-tion are sup-plied at the command line.The basic use of curl requires the name of a remote file and the -o option to save the file as a supplied name, or the -O option to download the file using the name used on the remote server.This example downloads an HTML file from a remote Web server:

j0pb12:~ $ curl -O http://gimbo.org.uk/archives/2004/01/mac_os_x_for_ha.html

% Total % Received % Xferd Average Speed Time Time Time Current

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"

"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">

<head>

<title>Gimboland: Mac OS X for hackers</title>

Curlhas many cool features, including the ability to download groups of files with a similar name.This example downloads two JPEG images (01.jpg and 02.jpg) from the Syngress.com Web site:

j0pb12:~ $ curl -O http://www.syngress.com/vegas/[01-02].jpg

[1/2]: http://www.syngress.com/vegas/01.jpg -| 01.jpg --_curl_--http://www.syngress.com/vegas/01.jpg

% Total % Received % Xferd Average Speed Time Time Time Current

Dload Upload Total Spent Left Speed

100 308k 100 308k 0 0 40983 0 0:00:07 0:00:07 --:--:--43476

[2/2]: http://www.syngress.com/vegas/02.jpg -| 02.jpg --_curl_--http://www.syngress.com/vegas/02.jpg

100 468k 100 468k 0 0 40107 0 0:00:11 0:00:11 --:--:--34639

Curlinterpreted a portion of the URL ([01-02].jpg) as referring to two specific files, downloading and saving them accordingly.This type of expan-sion can be performed on any part of a URL, and can use alphabetic charac-ters to walk though a series of letcharac-ters, allowing you to download a large number of related files with a single command.

Lynx

Lynx is a command line Web browser that can be installed via various package management utilities. See the Chapter 5,Penetration Testing, for more details. Like curl, lynxcan be used to access Web-based content with relative ease.Lynx has several unique and interesting features, including the ability to display a Web page as it might appear in a browser.This can make it easy to extract, or scrape, data from the Web page. For example, a typical Google results page is fairly uncluttered (see Figure 2.49).

Figure 2.49 Google Results Pages are Prime For Scraping

However, if you are interested in getting to the results line of this page to determine the number of hits for this search term, a curl-saved HTML file can be very difficult to search:

j0pb12:~/ $ curl

"http://www.google.com/search?&safe=active&q=%22OS+X+For+Hackers%22" -A lynx -s -o output.html

j0pb12:~/Desktop johnnylong$ grep Results output.html

<table width=100% border=0 cellpadding=0 cellspacing=0><tr><td

bgcolor=#3366cc><img width=1 height=1 alt=""></td></tr></table><table width=100% border=0 cellpadding=0 cellspacing=0 bgcolor=#e5ecf9><tr><td bgcolor=#e5ecf9 nowrap><font size=+1>&nbsp;<b>Web</b></font>&nbsp;</td><td bgcolor=#e5ecf9 align=right nowrap ><font size=1>Results <b>1</b>

-<b>10</b> of about <b>151</b> for <b><b>&quot;

The results of this grep statement (cut for brevity) are difficult to parse through, but the displayed version of the page (accessible with lynx’s -dump option) makes the job a snap:

j0pb12:~/ $ lynx -useragent lynx -dump \

"http://www.google.com/search?q=%22OS+X+For+Hackers%22" | grep Results Web Results 1 - 10 of about 145 for "[10]OS [11]X For [12]Hackers".

By pasting the Google query from the browser into a lynx command, we can get to the Results line very quickly. Putting this into some creative awk commands, we can get the total results string:

j0pb12:~/Desktop johnnylong$ lynx -useragent lynx -dump \

"http://www.google.com/search?q=%22OS+X+For+Hackers%22" | \ grep Results | awk '{print $2": "$8}'

Results: 145

The lynx dump output is also handy, as it displays a list of referenced links

; which is easily accessed with a grep statement that displays everything (at least 10000 lines) after the word References in the output file:

j0pb12:~/Desktop johnnylong$ lynx -useragent lynx -dump \

"http://www.google.com/search?q=%22OS+X+For+Hackers%22" | \

A quick awkstatement, a grep to chop out the Google links, and a sort (with -u for unique lines) will show only the unique non-Google links.This type of script can be used to siphon off the outbound links from pretty much any Web page:

j0pb12:~/ $ lynx -useragent lynx -dump \

"http://www.google.com/search?q=%22OS+X+For+Hackers%22" | \ grep -A 10000 References | awk '{print $2}' | \

grep -v "google.com\|cache" | sort -u

http://gimbo.org.uk/archives/2004/01/mac_os_x_for_ha.html

If you see something interesting in the browser view of a Web page and

Dans le document Register for Free Membership to (Page 152-162)