removing first line with AWK

koobi · 08-02-2006, 12:10 PM

I want to remove the first line of a CSV using AWK.

I could do this in PHP but then what i'd have to do is read in the CSV, unset the first element of the array and fwrite the data back. but the CSV is HUGE and i don't want to take up all that memory.

so, how would i use awk to remove the first line of a CSV? i suppose we would remove characters till we come accross a CR or LF but how would i do that?
also, i'll have to run this as a cron. any guidance?

thanks for your time

acid_kewpie · 08-02-2006, 12:55 PM

Code:

awk '{if (NR!=1) {print}}' filename.csv

koobi · 08-02-2006, 04:01 PM

great, thanks

would you also mind telling me what exactly happens please? or at least refer me to a good site where i can look this up?

does this open filename.csv, remove the first line and leave the file (minus the first line, of course?)

i don't know AWK but it seems like it prints everything but the first line to stdout?

what is NR?
does {print} output to stdout?

thanks for your time

acid_kewpie · 08-02-2006, 04:14 PM

NR = Number of Record (Row?... not sure...) which is, by default, seperated on a newline. so if the value of NR is 1 then it is on the first line, so ignore it. otherwise print the whole line.

koobi · 08-02-2006, 04:26 PM

great, thanks

so then awk accesses filename.csv, reads all its contents to memory, ignores the first line and prints everything else to stdout?

how would i write it back to the same file?

would this work?

Code:

awk '{if (NR!=1) {print}}' filename.csv > filename.csv

ideally, it would read only the first line and delete it along with the CR/LF character at the end so that the rest of the records move up by a row.
can i do that in awk?

acid_kewpie · 08-02-2006, 04:40 PM

you should never write directly back to the same file. if you want to automate this, write to say, filename.tmp and then rename the file to overwrite the original one once it's completed.

sirclif · 08-02-2006, 08:03 PM

Quote:

so then awk accesses filename.csv, reads all its contents to memory, ignores the first line and prints everything else to stdout?

i don't think this is what happens. the file is not read into memory. you can work on files that are larger than your available ram.

also, being able to replace the file by redirecting the stdout stream may depend on the shell, so i can't say this for all command lines. but if you try this in bash, you will get an empty file. when you redirect the stdout to a file, like '$ command > file.txt', the first thing the shell does is create the file 'file.txt'. so by the time gawk tries to read it, it is an empty file.

if you type the command '$ ls > newfile.txt', you will see the file 'newfile.txt' listed in itself.

ckin2001 · 08-03-2006, 01:48 AM

Awk reads input line by line rather than the whole file at once. The line

awk 'NR>1' filename.csv

will do the same thing, printing the file contents after record one - but without a comparison before each print. Probably not a big timesaver.