LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   advanced text-sorting (https://www.linuxquestions.org/questions/linux-newbie-8/advanced-text-sorting-846682/)

ernieball 11-26-2010 06:16 AM

advanced text-sorting
 
Hi.

I have a text-file with html-code, which goes like this:

<!***"Item A" >
<Item A html code ...
.../>

<More Item A html code ...
.../>

<!***"Item G" >
<Item G html code ...
.../>

<More Item G html code ...
.../>

..and so on, and I wish to sort alphabetically by the commented "items". Each comment is 7 lines apart (the three asterixes are actually part of the comment). How can I do this and also keep the following code lines where it belongs?

Thanks for any help :)

GrapefruiTgirl 11-26-2010 07:40 AM

bash & grep to alphabetically rearrange HTML file
 
You *really* should be using some sort of good parser for HTML for this, such as PERL for example. I don't know PERL myself.. So, here's some bash shell code that does what you're asking. As implied, this is not the right tool for the job, and I would probably not expect high performance, especially if the input file is very large. But anyway, it is something to toy with until someone suggests a better method:
Code:

#!/bin/bash

grep '^<!' htmlfile | sort | while read ITEM; do
    printing="off"
    cat htmlfile | while read LINE; do
        if [ "$printing" = "off" ]; then
          if [ "${LINE}" = "${ITEM}" ]; then
              printing="on"; echo "${LINE}"
          fi
        else
          if ! echo "${LINE}" | grep -q '^<!'; then
              echo "${LINE}"
          else
              break
          fi
        fi
    done
done

This does not depend on the 7 line spacing, so that doesn't matter. It does depend somewhat on the formatting of the input - it won't tolerate any variation in the comment lines.
Note that I have highlighted in bold the two places where you need to put the filename of your actual input file.

Good luck! I'll look forward to hopefully seeing some better solutions than this.

theNbomr 11-26-2010 08:15 AM

You haven't said what programming language you want to use to implement this, so only general strategy can be provided. You have a file containing blocks of data. Each block is delimited by a regular pattern. You want to sort the blocks, so you need a method to compare them, which you can supply as the method to a sort() function, which is usually implemented in popular programming languages. Many programming languages that are well suited to this kind of task provide a way to define the delimiters used to read data on a record-at-time basis (Perl, AWK, Bash). So, define the delimiter (probably the string '<!***"Item '), read the file as an array of records, and pass the array to the sort function. Create a record comparison function that returns -1, 0, or 1 based on the comparison of a specified pair of records, and pass that function as an argument to the sort() function. Finally, print the sorted array back to a file or to standard output.

When you have some code to test, come back here for help with the details.

--- rod.

ernieball 11-26-2010 10:53 AM

Hello, guys! :D

@theNbomr
It makes sense, I guess, but I'm still far too inexperienced with scripting to come up with something myself on this. Anyway, thanks for your attention

@GrapefruiTgirl
Luckily, my file is pretty simple. Just follows that routine as I stated. I just copy/pasted your script, and it did exactly what I needed :D

Thank you very much to both of you! This was great help! :D


All times are GMT -5. The time now is 05:25 PM.