Often, files will not come pre-formatted in the exact way you need them. Instead, you will need to use command-line tools in bash
to format things in the necessary manner. You are already familiar with extracting columns with cut
, extracting rows with grep
and organizing rows with sort
. Recall that you can direct output to a file using >
and append to an existing file using >>
. Here, we will introduce three methods of restructuring and adding content to files.
Note: These examples make use of the text file fruits.txt
included in the formatting_in_bash
subdirectory. Check the contents of fruits.txt
using less
or cat
, so you know what the file’s structure is. As you read through the examples, make sure to actually execute the commands in your own command-line so you can see the output and practice using the commands.
If you know how many columns are in a file, you can assign each column a variable name, and then refer to the variable while reading line-by-line. Similar to how you can write to files using > outfile.name
, you can read files using < infile.name
Now, say you wanted to loop through the file and perform some operation on each line. You could do the following:
while read fruit_name fruit_color fruit_size; do
echo $fruit_name $fruit_color $fruit_size
done < fruits.txt
Or, you could print a subset of columns, in any order you want:
while read fruit_name fruit_color fruit_size; do
echo $fruit_color $fruit_name
done < fruits.txt
Or adding extra text as well:
while read fruit_name fruit_color fruit_size; do
echo some text here $fruit_color $fruit_name more text here
done < fruits.txt
The output can be directed to a file with >
at the end, like you might expect:
while read fruit_name fruit_color fruit_size; do
echo some text here $fruit_color $fruit_name more text here
done < fruits.txt > output.txt
sed
sed
is a useful substitution command that lets you change one regular expression pattern into another, and has the following general format:
sed 's/<pattern_to_remove>/<pattern_to_add>/g' fruits.txt
sed
searches through the file, and replaces
sed 's/^/prepended_values\t/g' fruits.txt
sed 's/$/\tappended_values/g' fruits.txt
Of course, sed
commands can be chained together with pipes:
sed 's/^/prepended_values\t/g' fruits.txt | sed 's/$/\tappended_values/g'
awk
awk
is a scripting language that allows you to print contents of a file in various ways.
Say you have a tab-delimited file that contains 3 columns You can print only column 1:
awk '{FS="\t"} {print $1}' fruits.txt
Here, FS="\t"
is setting the field separator (delimiter) to tab. This tells awk
to identify columns as being separated by a tab. If the file had columns separated by commas, you would instead use FS=","
. By default, awk
treats whitespace as a delimiter, but specifying it explicitly is a good habit, if you know what it should be.
$1
identifies ‘column one’ or any other column number if you change the digit.
You can also use $0
to refer to all columns at once (i.e., an entire row of a file):
awk '{FS="\t"} {print $0}' fruits.txt
If you wanted to rearrange the columns, you can change the order they are printed:
awk '{FS="\t"} {print $3,$2,$1}' fruits.txt
Note that awk
separates columns by a single space by default. You can change this behavior by setting the Output Field Separator value:
awk '{FS="\t"} {OFS="\t"} {print $3,$2,$1}' fruits.txt
You can add text with awk
by enclosing it within double-quote marks:
awk '{FS="\t"} {OFS="\t"} {print $3,$2,$1,"NewData"}' fruits.txt