Replace line with line from other file, if...

mosthated · 05-30-2012, 09:01 AM

Hi all,

I have two files, that look like this:

file 1)
a data
b data
c location://wrong-path-to-file/filename.mp3
d data
e
f data
g data
h location://wrong-path-to-file/filename.mp3
i data

etc.

file 2)
a no data
b no data
c location://right-path-to-file/filename.mp3
d no data
e
f no data
g no data
h location://right-path-to-file/filename.mp3
i no data

etc.

I want to merge the two files, so that:

new file)
a data
b data
c location://right-path-to-file/filename.mp3
d data

etc.

The files are not the same in layout, so I cannot just select the right lines.

What I'm thinking about is something like this:

if line contains "location" in file 1
then select filename.mp3 using sed/cut /...
search for filename.mp3 in file 2 (assuming the filename is unique)
replace the line in file 1 with the line containg filename.mp3 from file 2

I am wondering if this is possible with bash?

Thanks! M0s..

grail · 05-30-2012, 01:03 PM

Are the paths in file 2 always correct?

As you have pointed out, what if the filename is not unique?

Are the a, b, c, etc actually in the file or just there to denote each line?

What if the filename you find is not on the same line in the second file as the first file?

As you can see, there seem to be more questions about your question than answers so far

Nominal Animal · 05-31-2012, 12:32 AM

I think I'm on the same lines as Grail. If we make some assumptions, and use the first field and the filename as an unique key, in awk this would be:

Code:

awk '#
    BEGIN {
        # Accept any newline convention, removing leading and trailing whitespace.
        RS = "[\t\v\f ]*(\r\n|\n\r|\r|\n)[\t\v\f ]*"

        # Fields are separated by one or more consecutive whitespace characters.
        FS = "[\t\v\f ]+"

        # For output, use Unix newline convention.
        ORS = "\n"

        # For output, use space as a field separator.
        # (This only affects when the input record is modified).
        OFS = " "

        # Ordinal for file being accessed (1, 2, ...)
        FILE = 0
    }

    # Increment FILE for each new file.
    (FNR == 1) { FILE++ }

    # For the first file, only location:// lines are remembered.
    (FILE == 1 && $2 ~ /^location:/) {
        # As the key, take the filename part of the location,
        key = $2
        sub(/^.*\//, "", key)
        # but prepend the first field and a space to it.
        key = $1 " " key

        # Do we already have this key used?
        if (key in location)
            if (location[key] != $2)
                printf("Warning: %s redefined from %s to %s.\n", key, location[key], $2) > "/dev/stderr"

        # Save the entire location under the key.
        location[key] = $2
    }

    # A location field in the second file?
    (FILE == 2 && $2 ~ /^location:/) {
        # Construct the key the same way as above.
        key = $2
        sub(/^.*\//, "", key)
        key = $1 " " key

        # If there is a stored location corresponding to key, replace the second field.
        if (key in location)
            $2 = location[key]

        # Note: To retain the original field separator, you can use
        #   sub(/location:[^\t\v\f ]*/, location[key])
    }

    # Output all records for the second file.
    # The "default rule" is { print $0 }; no need to write it.
    (FILE == 2)
    ' locations-but-no-data data-but-wrong-locations > new-file

The awk command is much, much longer than really needed, but I wanted to make it as readable and robust as possible. The first line begins with a comment because some awk variants do not like an empty first line, and I wanted to have it nicely indented.

Running with your example files new-file will contain

Code:

a data
b data
c location://right-path-to-file/filename.mp3
d data
e
f data
g data
h location://right-path-to-file/filename.mp3
i data

Questions?

mosthated · 05-31-2012, 02:49 AM

Thanks for the answers!

Here are the answers to your questions:

1) Are the paths in file 2 always correct? YES

2) As you have pointed out, what if the filename is not unique? I think I can check that quickly by using find all MP3 files recursively in the parent directory and than use sort unique. For now, I'm pretty sure the all are unique.

Are the a, b, c, etc actually in the file or just there to denote each line? The a, b, c, etc only refer to the lines. They are NOT in the actual file.

What if the filename you find is not on the same line in the second file as the first file? As I said before, the files are not the same in layout, nor in order. Therefore I cannot simply swap line X in file 1 with line X in file 2.

Would this change anything in the script?

Thanks!

M0s..

grail · 05-31-2012, 09:15 AM

SO if I am on the right track, something like:

Code:

awk -F/ 'FNR==NR{if(/location/)file[$NF]=$(NF-1);next}$NF in file{$(NF-1)=file[$NF]}1' file2 file1 > new_file