LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-07-2012, 11:32 AM   #1
ni_hao
LQ Newbie
 
Registered: Mar 2012
Posts: 7

Rep: Reputation: Disabled
sort file data


I have always found an answer to my questions but for this one didnt find it yet.
I can only use bin/sh so no bash.

I have a text file which consitst of

[subject1]
row 1
row 2
row 3

[subject2]
row 4
row 5

[subject3]
row 6
row 7
row 8

The rows belongs to the subject above. Sometimes a subject has 3 rows, sometimes 4 rows (independently).

I want to sort the file based on the alphabetical order of the subjects and of course the original rows should stay together with their subject.

So as an example the output could be (depending the sort of the subjects):
[subject3]
row 6
row 7
row 8

[subject1]
row 1
row 2
row 3

[subject2]
row 4
row 5

Thanks in advance for any solution
 
Old 03-07-2012, 12:32 PM   #2
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Maybe there are more concise and elegant solutions, but... here we go...
Code:
#!/bin/sh
#
mkdir /tmp/my_dir

awk '
!/row|^$/ {
  
  file = $0
  print > "/tmp/my_dir/" file
  while ( getline line > 0 ) {
   
    if ( line ~ /row/ )
      print line > "/tmp/my_dir/" file
    else
      break
     
  }
}' infile

for file in `echo /tmp/my_dir/*`
do
  cat $file >> ./sorted_file
  echo      >> ./sorted_file
done

rm -r /tmp/my_dir
The first part uses awk to split the input file into multiple files, named after each subject. Since you didn't provide a real example, I had to do some strong assumptions:
  • each section (the subject and its rows) is always separated by a single blank line,
  • the rows contain a common pattern, that never appears in the subject,
  • the subject doesn't contain blank spaces, punctuation or any other special character.

I used a test file like this:
Code:
banana
row1
row2
row3

strawberry
row4
row5

apple
row6
row7
row8
where the common pattern is "row".

The second part of the script just let the shell sort the file names through the filename expansion in the echo statement. The sorted file is built accordingly.

If you are on a Solaris machine, using getline into a variable in awk doesn't work. In this case, try nawk instead. Or gawk if available. As I said, not a great solution, but just an idea!
 
Old 03-07-2012, 01:30 PM   #3
ni_hao
LQ Newbie
 
Registered: Mar 2012
Posts: 7

Original Poster
Rep: Reputation: Disabled
Wow thank you so much, because I was not clear enough I have to change it a bit but - as you wrote - as an idea it is very good ! Thanks angain.
 
Old 03-07-2012, 03:20 PM   #4
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
You're welcome! When you've done, please post your solution. I'm curious and it may serve others having a similar problem.
 
Old 03-07-2012, 10:05 PM   #5
makyo
Member
 
Registered: Aug 2006
Location: Saint Paul, MN, USA
Distribution: {Free,Open}BSD, CentOS, Debian, Fedora, Solaris, SuSE
Posts: 735

Rep: Reputation: 76
Hi.

Available utility msort can, among other features, sort text blocks:
Code:
#!/usr/bin/env bash

# @(#) s1	Demonstrate sort of blocks of text lines, msort
# http://billposer.org/Software/msort.html

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
edges() { local _f _n _l;: ${1?"edges: need file"}; _f=$1;_l=$(wc -l $_f);
  head -${_n:=3} $_f ; pe "--- ( $_l: lines total )" ; tail -$_n $_f ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C msort

FILE=${1-data1}

pl " Input data file $FILE:"
cat $FILE

pl " Results:"
msort --block --quiet --suppress-log --position 1,1 $FILE

exit 0
producing:
Code:
% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny) 
bash GNU bash 3.2.39
msort 8.44

-----
 Input data file data1:
banana
row1
row2
row3

strawberry
row4
row5

apple
row6
row7
row8

-----
 Results:
apple
row6
row7
row8

banana
row1
row2
row3

strawberry
row4
row5
See link in script. It was available in the Debian repository.

Best wishes ... cheers, makyo

Last edited by makyo; 04-14-2012 at 12:50 PM.
 
1 members found this post helpful.
Old 03-09-2012, 03:22 PM   #6
ni_hao
LQ Newbie
 
Registered: Mar 2012
Posts: 7

Original Poster
Rep: Reputation: Disabled
Thanks all for responding; appreciated very much. I used the input of colucix and created my own script. Although I tried to use msort-gui on my ubuntu (nasty/11.04) computer. I could not run it, but as written before I used the input of colucix.

The script is as follows:
Code:
#!/bin/sh
#
# script to sort a file based on a certain structure
#
# ----------------------------------[ VARIABLES ]-------------------------------------------------------------------------------------
 FILE_IN="inputfile.txt"						                                        # input file name
 CONFIG_DIR="/conf"						                                                # directory where the config files are
 TEMP_DIR="/tmp"							                                        # temporary directory
#
# just to make sure ;)
[ ${TEMP_DIR} = */ ] && TEMP_DIR=`echo ${TEMP_DIR:0:-1}`							# if exits remove last backslash of TEMP_DIR 
if [ ${#CONFIG_DIR} -gt 1 ]; then										# if length of CONFIG_DIR name is greater then 1
  [ ${CONFIG_DIR} = */ ] && CONFIG_DIR=`echo ${CONFIG_DIR:0:-1}`						# if exits remove last backslash of CONFIG_DIR
else
  CONFIG_DIR=""
fi

TEMP_DIR=${TEMP_DIR}/my_dir
[ -d ${TEMP_DIR} ] && rm -r ${TEMP_DIR} > /dev/null								# if exists: delete temporary directory
mkdir ${TEMP_DIR}												# create temporary directory
[ ! -d ${TEMP_DIR} ] && echo "Could not create ${TEMP_DIR}, script aborted." && exit
clear

[ ! -f  ${CONFIG_DIR}/${FILE_IN} ] && echo "${CONFIG_DIR}/${FILE_IN} not found, nothing to do." && exit		# abort script if $CONFIG_DIR does not exist
FILE_OUT="_${FILE_IN}"												# temporary output filename
echo -n "[1;33;40mReading, sorting and writing ${CONFIG_DIR}/${FILE_IN}...[0m"				        # screen output
[ ! -f ${CONFIG_DIR}/${FILE_IN}_OLD ] && mv ${CONFIG_DIR}/${FILE_IN} ${CONFIG_DIR}/${FILE_IN}_OLD		# rename $FILE_IN to $FILE_IN_OLD

awk '														# read $FILE_IN and create temporary files
/\[.+\]/ {
  
  gsub(/\[|\]/,"")
  file = $0
  print "[" $0 "]" > "/tmp/my_dir/" file
  while ( getline line > 0 ) {
   
    if ( line !~ /^$/ )
      print line > "/tmp/my_dir/" file
    else
      break
  }
}' ${CONFIG_DIR}/${FILE_IN}

[ -f  ${CONFIG_DIR}/${FILE_OUT} ] && rm -f ${CONFIG_DIR}/${FILE_OUT} > /dev/null				# if $FILE_OUT exists: delete it
touch ${CONFIG_DIR}/${FILE_OUT}											# create $FILE_OUT

for file in `echo ${TEMP_DIR}/*`										# read and combine files in temporary directory to $FILE_OUT
do
  cat $file >> ${CONFIG_DIR}/${FILE_OUT}									# read file and write to $FILE_OUT
  echo      >> ${CONFIG_DIR}/${FILE_OUT}									# write empty line to $FILE_OUT
done

rm -r /tmp/my_dir > /dev/null											# remove temporary directory
echo; echo "[1;36;40mSorted file created: ${CONFIG_DIR}/${FILE_OUT}[0m"					        # screen output
echo
I am an AWK-rookie so unfortunately I do not understand everything of what is written in the AWK-part of the script. It is easy to copy & paste source code of someone else but I think it is much better if you understand that source code.

In the AWK-part is written (twice): "/tmp/my_dir/". I tried to change it to the variable ${TEMP_DIR} but that resulted in an error. Question is: if and how I can replace "/tmp/my_dir/" by the variable ${TEMP_DIR}.
 
Old 03-10-2012, 01:09 AM   #7
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Notice the awk code is written inside single quotes: this prevents the parameter expansion from the shell. The most straightforward method to pass an external variable to awk (provided your awk version supports it) is the -v option. Example:
Code:
awk -v dir=${TEMP_DIR} '
/\[.+\]/ {
  
  gsub(/\[|\]/,"")
  file = $0
  print "[" $0 "]" > dir "/" file
  while ( getline line > 0 ) {
   
    if ( line !~ /^$/ )
      print line > dir "/" file
    else
      break
  }
}' ${CONFIG_DIR}/${FILE_IN}
Regarding other methods to pass shell variables to the awk code, please see http://www.gnu.org/software/gawk/man...hell-Variables. Hope this helps.
 
Old 03-10-2012, 04:07 AM   #8
ni_hao
LQ Newbie
 
Registered: Mar 2012
Posts: 7

Original Poster
Rep: Reputation: Disabled
awfull, thanks. The awk I use does recognize the -v option, so it worked.
I will view the link you posted to know more about awk. AFAI now recognized a lot is possible with awk, but a long way to go to understand it.

Have to find out how to sort data in another file:
[banana]
color=
length=
price=

[strawberry]
length=
price=

(blocks in) file should be sorted at "lenght="
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Sort a data according to a given criteria saheervc Linux - Newbie 2 12-19-2010 11:03 AM
[SOLVED] Bash Script; Sort files into directory based on data in the file name MTAS Programming 31 10-06-2010 11:47 AM
data file sort problem johnpaulodonnell Linux - Newbie 2 05-01-2008 08:10 AM
How sort data in some file which are greater than some value? sarajevo Programming 5 08-25-2006 03:48 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 06:27 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration