find max length of characters in a perticular field

suneelbabu.etl · 03-08-2014, 06:35 AM

can u tell me how to find max length of characters in a perticular field.
my input file is
s.no,sname
1,sd
35,jtud
here max length is 2 right ..
i want output is
s.no,sname
01,sd
35,jtud
like this ..
here we just add the zero's before number if <2..
plz help me/...

Ser Olmy · 03-08-2014, 07:03 AM

The title of your post seems to indicate that you want to find the largest number of characters in a column/field, while in the post itself you seem to want to pad a numeric field with leading zeroes. Which is it?

TB0ne · 03-08-2014, 07:52 AM

Quote:

Originally Posted by suneelbabu.etl

can u tell me how to find max length of characters in a perticular field.
my input file is
s.no,sname
1,sd
35,jtud
here max length is 2 right ..
i want output is
s.no,sname
01,sd
35,jtud
like this ..
here we just add the zero's before number if <2..
plz help me/...

We will be happy to help...but you need to spell out your words, and quit using text-speak, and you need to show us what you've written/tried so far, along with answering Ser Olmy's question. Not exactly sure what you're looking for/needing here.

We will NOT write your scripts for you, but will be happy to help.

danielbmartin · 03-08-2014, 11:40 AM

This snippet doesn't solve the problem but provides an idea to guide you.

With this comma-delimited InFile ...

Code:

audi,bentley,bmw
chevrolet,dodge,ford,honda
mazda,nissan,subaru,toyota

... this awk ...

Code:

awk -F, '{for (j=1;j<=NF;j++) {print "In record",NR," field",j,
" name="$j," length of name=",length($j)}}' $InFile >$OutFile

... produced this OutFile ...

Code:

In record 1  field 1  name=audi  length of name= 4
In record 1  field 2  name=bentley  length of name= 7
In record 1  field 3  name=bmw  length of name= 3
In record 2  field 1  name=chevrolet  length of name= 9
In record 2  field 2  name=dodge  length of name= 5
In record 2  field 3  name=ford  length of name= 4
In record 2  field 4  name=honda  length of name= 5
In record 3  field 1  name=mazda  length of name= 5
In record 3  field 2  name=nissan  length of name= 6
In record 3  field 3  name=subaru  length of name= 6
In record 3  field 4  name=toyota  length of name= 6

Daniel B. Martin

schneidz · 03-08-2014, 06:17 PM

Quote:

Originally Posted by danielbmartin

Code:

audi,bentley,bmw
chevrolet,dodge,ford,honda
mazda,nissan,subaru,toyota

doesn't honda belong in the third line ?

suneelbabu.etl · 03-09-2014, 04:54 AM

Hi Daniel B. Martin,

I don't want to display length of the every field,

Quote:

audi,bentley,bmw
chevrolet,dodge,ford
mazda,nissan,subaru

in this example take first field, max length is 9 right.
it check the every line and if it is <9 then add 0(zero) before the first field.
the Out-file is:

Quote:

00000audi,bentley,bmw
chevrolet,dodge,ford,honda
00000mazda,nissan,subaru,toyota

like this..
u got my point right,..

grail · 03-09-2014, 10:09 AM

So we get your point, where is your attempt to either alter this script to do as you require or your own script that attempts to do the same?

As said earlier, we are not here to write the scripts for you.

Ser Olmy · 03-09-2014, 10:34 AM

Do you have any previous experience with creating scripts? As grail said, users in this forum will be happy to help you debug and develop a script, but we'd rather not just write one for you.

This is a general "Programming" forum, and your problem could be solved with many different programming/scripting languages. It would be helpful to know which one you'd prefer.

(People just looking for a working solution rather than help with solving the problem, should preferably pay someone to create and implement that solution.)

TB0ne · 03-09-2014, 10:39 AM

Quote:

Originally Posted by suneelbabu.etl

I don't want to display length of the every field,

in this example take first field, max length is 9 right. it check the every line and if it is <9 then add 0(zero) before the first field. the Out-file is:

like this..u got my point right,..

Again, you need to SPELL OUT YOUR WORDS, and stop using text-speak, and you need to show us what you have done/tried on your own. We will be happy to HELP you, but we ARE NOT going to write your scripts for you.

You've been offered a great hint/solution, but have shown zero effort of your own to implement it, or think about how to modify it to do what you want. Show some effort, and we can help. Show no effort, and there's no point in posting.

suneelbabu.etl · 03-09-2014, 10:59 AM

I don't have experience to write scripting, i am Beginner.
I tried the code is:

Quote:

#!/bin/sh
cd /TESTING/DATA
#cat`ls -lrt *.csv | perl -lane 'print $F[-1]' | tail -1` #recently modified file name
awk -F, '
NR == 1
NR > 1 {
data[NR] = $0
w1[NR] = length($1)
if (length($1) > max) max = length($1)
}
END {
for (i = 2; i <= NR; ++i) {
w = max - w1[i]
if (w > 0) printf "%0" w "d", 0
print data[i]
}
}' test.csv #this code append only one zero

Ser Olmy · 03-09-2014, 11:08 AM

OK, from the snippet you posted it seems that you want to do this:

Parse the file to find the currently largest number of characters in field/column 1
Pad column 1 in each row/line with zeroes to make them all of equal length

Is that correct?

Also, it seems you want the script to process only the most recent .csv file in a given directory, is that so?

suneelbabu.etl · 03-09-2014, 11:24 AM

exactly .. that is only..

danielbmartin · 03-09-2014, 12:19 PM

Quote:

Originally Posted by schneidz

doesn't honda belong in the third line ?

You are right. That embarrassing oversight has been corrected.

A previous post showed OP how to use length(string). This post goes one step further, showing how to "left pad" a field with length(string) and substr(string)... yet refrains from writing the script for him. Let him read, digest, and adapt this code to his application.

With this InFile ...

Code:

audi,bentley,bmw
chevrolet,dodge,ford
honda,mazda,nissan,subaru,toyota

... this awk ...

Code:

awk -F, '{for (j=1;j<=NF;j++) { 
 pnr="00000000"NR;
 pnr=substr(pnr,length(pnr)-4);
print "In record",pnr", field number",j,"contains",$j}}'  \
 $InFile >$OutFile

... produced this OutFile ...

Code:

In record 00001, field number 1 contains audi
In record 00001, field number 2 contains bentley
In record 00001, field number 3 contains bmw
In record 00002, field number 1 contains chevrolet
In record 00002, field number 2 contains dodge
In record 00002, field number 3 contains ford
In record 00003, field number 1 contains honda
In record 00003, field number 2 contains mazda
In record 00003, field number 3 contains nissan
In record 00003, field number 4 contains subaru
In record 00003, field number 5 contains toyota

Daniel B. Martin

Ser Olmy · 03-09-2014, 12:23 PM

Right, then. First, you can indeed get the most recently modified file with ls -lrt | tail -n 1 (or ls -lt | head -n 1 for that matter), but there's really no need to invoke perl when cut can do the job just as well:

Code:

!/bin/sh
input_dir=/TESTING/DATA
input_file=`ls -lt $input_dir/*.csv | head -n 1 | tr -s " " | cut -d " " -f 9`

tr is used to "squeeze" the spaces to make sure the file name is really in column 9. Note that neither the cut version nor the perl version will handle file names containing spaces.

The remainder of your script is written in awk, a language I'm sadly unfamiliar with, but I'll do my best. The first part of the program attempts to check if the length of field 1 is greater than a variable called "max" and updates the variable if it is, but the second part just prints the input line verbatim.

The problem with the current program is twofold:

this operation requires two passes; one to determine the field length, and one to modify the field; while an awk program only processes the input once
the code as it stands doesn't do what it's supposed to

As far as I've been able to determine, awk program blocks are executed against each line of the input file, unless you specify a "pattern" (such as BEGIN, which gets executed before any data is processed; and END, which is run when the program runs out of input data) before the code block. The first part can thus be replaced by this:

Code:

max=`awk -F, '
BEGIN \
{
  max = 0
}
  {
    if (max < length($1)) max = length($1)
  }
END \
{
  print max
}' $input_file`

The script simply returns the value of "max" after having scanned through the entire file, line by line. I've chosen to initialize the "max" variable in a BEGIN block, but I'm not sure if that's necessary or not in awk.

Since the entire thing is in backticks (`), the output is captured by the shell and stored in a shell variable also (confusingly) called "max". You can use this inside a second awk program as $max, as long as you use double quotes instead of single quotes around the awk program, AND escape all other dollar signs (like this: \$).

Edit: A much better way would be to simply end the quoting right before the shell variable and restart it afterwards, like this: awk '{print "'$variable'" }'. *goes back to reading awk articles and howtos*

Now try and create the second part of the program.