Using awk grep sed: Difference between revisions
| m →awk | |||
| (3 intermediate revisions by the same user not shown) | |||
| Line 22: | Line 22: | ||
| Get rid of all duplicate lines | Get rid of all duplicate lines | ||
|   sed -i '$!N; /^\(.*\)\n\1$/!P; D' Razor-Fen.txt |   sed -i '$!N; /^\(.*\)\n\1$/!P; D' Razor-Fen.txt | ||
| === example: edit each line of text with character removal or substitution === | |||
| Just like we would edit lines of text using vim, we can use sed to make modifications on a line by line basis.  In this example we are going to process a text file and remove part of each line containing | |||
| information that we do not need.  If we were in the vim editor it would look like this:  | |||
|  :%s/\[202.*PlayerID:\ // | |||
| To mimic this using sed it will look like this: | |||
|  sed -E 's/\[202.*PlayerID:\ //' playerids.txt > playerids2.txt | |||
| The output file contains the same number of lines, with each line having had a portion removed. | |||
| === example: cat, head, and tail displaying a portion of a text file === | |||
| If you have a large text file and wish to display only a portion of the file from within (the center) | |||
|  sed -n '5001,6000p' largetextfile.txt | |||
| starts at line 5001 and ends at line 6000. | |||
| This would be similar to the following examples | |||
|  head -n6000 largetextfile.txt | tail -n1000 | |||
|  tail -n5001 largetextfile.txt | head -n1000 | |||
| With sed you can grab text from the middle without having to pipe. | |||
| == grep == | == grep == | ||
| Line 55: | Line 73: | ||
| In the inventory file there is a second column of text that specifies the department of the listed supply item.  Only lines from the text file will be printed if the item is in the plumbing department.  Alternatively, if it is a composition, only lines where the word 'plumbing' is matched will be printed. | In the inventory file there is a second column of text that specifies the department of the listed supply item.  Only lines from the text file will be printed if the item is in the plumbing department.  Alternatively, if it is a composition, only lines where the word 'plumbing' is matched will be printed. | ||
| === unique lines only without changing sort order === | |||
| Isolate all of the unique lines from the text file.  | |||
|  awk '!seen[$0]++' playeridsduplicates.txt > playerids.txt | |||
| If you want them sorted, or are interested in an alternative way of producing only unique lines of text, see: [[uniq]] | |||
| == Related Pages== | == Related Pages== | ||
Latest revision as of 15:08, 23 February 2022
grep does not alter a file, it only finds matches while awk and sed are text processors.
awk is mostly used for data extraction and reporting. sed is a stream editor. Each one of them has its own functionality and specialties.
sed
Stream EDitor (sed). Things that you can accomplish using RegEx within the Vi editor on text files can also be accomplished at the command line with sed.
sed -i 's/old-text/new-text/g' input.txt
- s is the substitute
- -i flag, tells sed to update the file
- g/ means global replace
The most basic form is to use sed as a simple search and replace.
sed 's/windows/linux/'
example: process text file by removing blanks, unwanted lines, and duplicates
Get rid of all lines of text containing numerical stats
sed -i '/[0-9]/d' Razor-Fen.txt
Get rid of all empty lines containing no characters
sed -i '/^\s*$/d' Razor-Fen.txt
Get rid of all duplicate lines
sed -i '$!N; /^\(.*\)\n\1$/!P; D' Razor-Fen.txt
example: edit each line of text with character removal or substitution
Just like we would edit lines of text using vim, we can use sed to make modifications on a line by line basis. In this example we are going to process a text file and remove part of each line containing information that we do not need. If we were in the vim editor it would look like this:
:%s/\[202.*PlayerID:\ //
To mimic this using sed it will look like this:
sed -E 's/\[202.*PlayerID:\ //' playerids.txt > playerids2.txt
The output file contains the same number of lines, with each line having had a portion removed.
example: cat, head, and tail displaying a portion of a text file
If you have a large text file and wish to display only a portion of the file from within (the center)
sed -n '5001,6000p' largetextfile.txt
starts at line 5001 and ends at line 6000.
This would be similar to the following examples
head -n6000 largetextfile.txt | tail -n1000 tail -n5001 largetextfile.txt | head -n1000
With sed you can grab text from the middle without having to pipe.
grep
example: rgrep
rgrep is grep -r or recursive grep
If you want to search all text files within all subfolders for a particular matching string, the syntax might not be what you would think
For example, rgrep string *.txt will not search though all text files under the current directory, the correct syntax would be:
rgrep -s string --include \*.txt
Here is an example that searches for multiple specific types
rgrep -i --include \*.h --include \*.cpp CP_Image ~/path[12345]
awk
The awk utility operates on a line-by-line basis and iterates through the entire file and is therefore useful for changing data files and generating reports.
This command utility is extremely useful for formatting text to the screen or for print. You can process huge log files to output a readable report that you can better utilize.
Format:
awk '/pattern/ { action_to_take; another_action; }' file_to_parse
By default Awk behaves like 'cat' in that it prints every line of data from the specified file.
awk '{print}' inventory.txt
GNU Awk has the option of "inplace" file editing since 4.1.0
awk -i inplace 'BEGIN { FS="/"; } {print $2}' /tmp/biomes.txt
picking out specific lines in the data or text file
We might have a space separated data file or a text file with lines of composition, either way, we can pick out and print entire lines of text if a single word in the line is matched.
awk '/plumbing/ {print}' inventory.txt 
In the inventory file there is a second column of text that specifies the department of the listed supply item. Only lines from the text file will be printed if the item is in the plumbing department. Alternatively, if it is a composition, only lines where the word 'plumbing' is matched will be printed.
unique lines only without changing sort order
Isolate all of the unique lines from the text file.
awk '!seen[$0]++' playeridsduplicates.txt > playerids.txt
If you want them sorted, or are interested in an alternative way of producing only unique lines of text, see: uniq