LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   sed for relative path to a literal path starting at the top of the directory structure in a text file (https://www.linuxquestions.org/questions/linux-newbie-8/sed-for-relative-path-to-a-literal-path-starting-at-the-top-of-the-directory-structure-in-a-text-file-4175599843/)

atjurhs 02-15-2017 03:36 PM

sed for relative path to a literal path starting at the top of the directory structure in a text file
 
hi guys,

i have a text file that contains several lines of paths to several files located on my system, with each path on a newline. the paths are relative to my current directory, and i need to make them literal so each of the paths begin at the top level in the directory structure, like this:

Code:

./ABC/more_text.txt
and i want to come out as:
Code:

/topdir/some_text/ABC/more_text.txt
i've been trying to do this with sed commands, and have had some success with the string replacement if the string of the paths do not contain a ./ like this

Code:

ABC/more_text.txt
and it comes out like this
Code:

topdir/some_text/ABC/more_text.txt
but again the relative strings of paths on each new line really look like this:

Code:

./ABC/more_text.txt
and i want to come out as:
Code:

/topdir/some_text/ABC/more_text.txt
so i've been trying without success:

Code:

sed -i 's/\.\/ABC/\/topdir\/some_text/g' file.txt
and i get from standard i/o
Code:

unknown option to 's
what am i missing?

i've also tried this with awk using the sub command without success.

surely this is a relatively common occurance with file string manipulation

(unfortunately, the file is on a different machine and i'm not able to transfer it to this machijne, but i think i've copied over the exact text)

anyways, thanks for whatever help!

Todd

astrogeek 02-15-2017 04:05 PM

If the file paths.txt contains this...
Code:

./ABC/more_text.txt
./DEF/more_text.txt
GHI/more_text.txt
/JKL/more_text.txt

And you want to change any path that does not already begin at the filesystem root, then...

Code:

$ sed '/^[^\/]/s/^\(\.\/\)\?/\/topdir\/some_text\//' paths.txt
/topdir/some_text/ABC/more_text.txt
/topdir/some_text/DEF/more_text.txt
/topdir/some_text/GHI/more_text.txt
/JKL/more_text.txt

I added the address to skip lines already beginning at root and change anything beginning with ./ or a directory name.

I would also suggest never changing files in-place, at least until you know your sed expression works!

syg00 02-15-2017 04:53 PM

Quote:

Originally Posted by atjurhs (Post 5671116)
and i get from standard i/o
Code:

unknown option to 's
what am i missing?

Try using a different separator character - it may be more obvious what you have omitted.

As for the broader question, I prefer a KISS approach - have separate stanzas (-e or separate sed commands) for each element of the problem. Makes it easier to nut out, and usually easier to spot what is happening.
What happens if a name such as "../../some/file" appears in the list ?.

Turbocapitalist 02-16-2017 02:47 AM

Also, there is an existing utility, realpath, which can resolve the absolute file name when given a relative name. It takes several options so as to make different priorities when working out the absolute path.

See also the utilities basename and dirname, if you need to strip off either just the path or just the filename.

atjurhs 02-16-2017 09:57 AM

hi astrogeek,

your sed command work perfectly for the text example that i provided, namely a file with paths that look like this:
Code:

./ABC/more_text.txt
./DEF/more_text.txt
GHI/more_text.txt
/JKL/more_text.txt

however the file has several hundred lines of paths and it turns out that some of the paths have a whitespace in the path (no doubt caused by a dirty rotten PC user), which i didn't find out until attempting to use a paths half way down the output file, and of course things went sideways, so some of the paths will look more like:

Code:

./ABC/more_text (more_texts)-even_more_text.txt
and using your sed command i do get

Code:

/topdir/some_text/ABC/more_text.txt
knowing to get a linux box to read a path with a white space i need to precede the whitespace with a \ so linux ignores the white space, and so i wrote a "tag on" command using a pipe to your command, namely:

Code:

sed '/^[^\/]/s/^\(\.\/\)\?/\/topdir\/some_text\//' pathfile.txt | sed 's/ /\\ /g' > new_pathfile.txt
but the new_pathfile did not contain a \whitespace as part of the result, the results were the same as had i just run your command, namely my output was:

/topdir/some_text/PDQ/more_text(more_texts)-even_more_text.txt[/code]

when i needed to have

/topdir/some_text/PDQ/more_text\ (more_texts)-even_more_text.txt[/code]

so the paths could be read and used by a linux box down the road.

i'm also thinking the maybe a should be using some sort of literal string pattern matching, for example, if i read anywhere in my path string:

./ABC then replace it with
Code:

/topdir/some_text/ABC
this should also work for a string that contains a whitespace, for example in the string

./ABC/more text.txt

pattern matching would find the whitespace between more and text and replace it with

./ABC/more\ text.txt

thank you so very much for your help! note, i save off info in threads like to to ues under similar circumstancse down the road..

Todd

masinick 02-16-2017 10:19 AM

I think that the suggestions made by Turbocapitalist are likely to help you in the long run.

In the scripts that I've written, I break down various sections of a pathname and when I do so, I'm able to make effective use of the basename and dirname utilities, though I don't recall taking advantage of the realpath - that would also be a useful tool to utilize. The suggestions from syg00 are also good - keeping variables and code paths simple and easy to debug makes the code easier to maintain years down the road, whether it's your change or someone who maintains or copies your work.

When I use one of my old scripts or grab some public code, when simple methods are used, it is so much easier to integrate the work of others into what I'm doing, so keeping your code simple and well documented benefits both you and others.

So some easy things to get your file and path parsing algorithms rock solid:

1. Use common utilities - basename, dirname, realpath to extract specific file elements
2. Use variables to store subsets of the full path
3. Keep the overall code simple and well-documented. If it's understandable, documenting it is easier, so that's yet another benefit of simplicity.

atjurhs 02-16-2017 11:54 AM

thanks masinick, but i'm not at all that good of a coder/script writer.

i do agree with you on making it well documented and keeping code public so that others can make use if it and as a backup copy for yourself, and i do reuse my previous codes, embed or hack on them as necessary.

Todd

Luridis 02-16-2017 01:36 PM

You could go for getting the real path based on the relative path you're already generating. Assuming you have an array of relative paths, including the file name, you can regenerate the text file for each target in whole. With $files as your array...

Code:

for file in $files
do
    targetpath=$( realpath $( dirname $file ) )
    <call function to write out to files, or save to an array until all are ready>
done

dirname gets the path off the file, realpath gets it's absolute position in the filesystem. Note that if you're using links you should read the man pages for those, I've not had them follow links before.

If you're willing to put something unique in the file names, you can use find. Let's say all of these files will have "-MARKER-" in the name. You could use find to generate an array.

Code:

for file in $( find -name *-MARKER-* )
do
    call_update_function($file)
done



All times are GMT -5. The time now is 01:46 AM.