LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 02-25-2022, 02:28 PM   #1
babag
Member
 
Registered: Aug 2003
Posts: 419

Rep: Reputation: 31
bash - remove spaces up to tab in each line


I have text files that I need to prep before bringing them in to libreoffice calc. The lines I want to edit look similar to this:
Code:
       4096 2012-07-28 19:32:20     /media/babag/Projects_01_A/Audio_Group-00/e135208cdf2d721346eb/
     204096 2010-09-23 14:53:47     /media/babag/Projects_01_A/Audio_Group-00/e135208cdf2d721346eb/update/
The space after the time is a tab that I've inserted as a delimiter. I need to remove all of the spaces up to but not past the tab in each line. (I want it to stop at the tab because there can be entries for the directories that follow it that might have spaces in their names that need to be preserved.)

I've looked at sed and awk but haven't been able to figure out how to do this.

This is what I'd like to end up with:
Code:
40962012-07-2819:32:20     /media/babag/Projects_01_A/Audio_Group-00/e135208cdf2d721346eb/
2040962010-09-2314:53:47     /media/babag/Projects_01_A/Audio_Group-00/e135208cdf2d721346eb/update/
edit:
This removes all of the spaces but goes beyond the tab, which would affect file/directory names:
Code:
tr -d " " < infile.txt > outfile.txt
thanks for any help,
babag

Last edited by babag; 02-25-2022 at 03:51 PM.
 
Old 02-25-2022, 04:22 PM   #2
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,616

Rep: Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555

There don't appear to be any tabs in your post at all.

Assuming the following as what your input actually looks like...
Code:
       4096	2012-07-28	19:32:20	/media/babag/Projects_01_A/Audio_Group-00/e135208cdf2d721346eb/
     204096	2010-09-23	14:53:47	/media/babag/Projects_01_A/Audio_Group-00/e135208cdf2d721346eb/update/
(In the following examples, "cat -A" is used to visualize tabs as ^I and end of lines as $)

Based on what you posted, it might be enough to simply remove spaces at the start of a string (which can be indicated with "^"), so:
Code:
$ cat -A input.txt
       4096^I2012-07-28^I19:32:20^I/media/babag/Projects_01_A/Audio_Group-00/e135208cdf2d721346eb/$
     204096^I2010-09-23^I14:53:47^I/media/babag/Projects_01_A/Audio_Group-00/e135208cdf2d721346eb/update/

$ sed 's/^  *//' input.txt | cat -A
4096^I2012-07-28^I19:32:20^I/media/babag/Projects_01_A/Audio_Group-00/e135208cdf2d721346eb/$
204096^I2010-09-23^I14:53:47^I/media/babag/Projects_01_A/Audio_Group-00/e135208cdf2d721346eb/update/

But potentially more useful is to remove all spaces that are adjacent to tabs (or start/end of string), e.g:
Code:
$ cat -A input2.txt
       4096^I2012-07-28^I19:32:20^I   /media/babag/Projects_01_A/Audio_Group-00/e135208cdf2d721346eb/   $
     204096^I2010-09-23^I14:53:47  ^I  /media/babag/Projects_01_A/Audio_Group-00/e135208cdf2d721346eb/update/

$ sed -r 's/ *(^|$|\t) */\1/g' input2.txt | cat -A
4096^I2012-07-28^I19:32:20^I/media/babag/Projects_01_A/Audio_Group-00/e135208cdf2d721346eb/$
204096^I2010-09-23^I14:53:47^I/media/babag/Projects_01_A/Audio_Group-00/e135208cdf2d721346eb/update/
 
Old 02-25-2022, 05:42 PM   #3
babag
Member
 
Registered: Aug 2003
Posts: 419

Original Poster
Rep: Reputation: 31
Thanks for the response, boughtonp.

Actually, my input looks exactly like what I posted. There are leading spaces, then there is a space between each of the size/date/time listings, followed by a tab, then the directories.

I want to delete the leading spaces, spaces between size/date/time, stop there, retaining the tab after time and any spaces in directory/filenames.

I'll see what the things you posted do to a test file.

edit:
Just ran both commands on a test file and each:
Code:
sed 's/^  *//' input.txt | cat -A
sed -r 's/ *(^|$|\t) */\1/g' input2.txt | cat -A
seems to do the opposite of what I was looking for. They both retain the spaces preceding the tab and delete the tab.

thanks again,
babag

Last edited by babag; 02-25-2022 at 05:58 PM.
 
Old 02-25-2022, 05:55 PM   #4
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=15, FreeBSD_12{.0|.1}
Posts: 6,269
Blog Entries: 24

Rep: Reputation: 4206Reputation: 4206Reputation: 4206Reputation: 4206Reputation: 4206Reputation: 4206Reputation: 4206Reputation: 4206Reputation: 4206Reputation: 4206Reputation: 4206
Maybe something like this...

Code:
cat infile
       4096 2012-07-28 19:32:20 /media/babag/Projects_01_A/Audio_Group-00/e135208cdf2d721346eb/
     204096 2010-09-23 14:53:47 /media/babag/Projects_01_A/Audio_Group-00/e135208cdf2d721346eb/update/


awk 'BEGIN{FS="[\t]";}{gsub(" ","",$1); print $1"\t"$2}' infile
40962012-07-2819:32:20  /media/babag/Projects_01_A/Audio_Group-00/e135208cdf2d721346eb/
2040962010-09-2314:53:47        /media/babag/Projects_01_A/Audio_Group-00/e135208cdf2d721346eb/update/
Of course, this presumes that "your" <tab> is the one and only <tab> in each line.

Last edited by astrogeek; 02-25-2022 at 06:08 PM. Reason: one and only <tab>
 
1 members found this post helpful.
Old 02-25-2022, 06:09 PM   #5
babag
Member
 
Registered: Aug 2003
Posts: 419

Original Poster
Rep: Reputation: 31
Thanks astrogeek! That did it. I like that it's awk too. Also seems to preserve spaces in directories/filenames. And, yes, there's only a single tab per line.

thanks again,
babag

Last edited by babag; 02-25-2022 at 06:10 PM.
 
Old 02-25-2022, 06:11 PM   #6
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=15, FreeBSD_12{.0|.1}
Posts: 6,269
Blog Entries: 24

Rep: Reputation: 4206Reputation: 4206Reputation: 4206Reputation: 4206Reputation: 4206Reputation: 4206Reputation: 4206Reputation: 4206Reputation: 4206Reputation: 4206Reputation: 4206
You are welcome, glad it helped!
 
Old 02-26-2022, 07:36 AM   #7
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,616

Rep: Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555Reputation: 2555

So you do actually want to merge the id, date and time into a single field?!?

That seems like an odd thing to do - makes me think there's perhaps a different underlying issue - but anyway I would do a slight variation of Astrogeek's solution:
Code:
awk 'BEGIN{FS="\t";OFS="\t"}{gsub(" ","",$1); print}' infile
The main benefit being that if there is a third/fourth/etc field it still works, without the need to explicitly add them.

 
1 members found this post helpful.
Old 02-27-2022, 06:28 AM   #8
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,817

Rep: Reputation: 1211Reputation: 1211Reputation: 1211Reputation: 1211Reputation: 1211Reputation: 1211Reputation: 1211Reputation: 1211Reputation: 1211
(Only) the last awk solution (with FS=OFS="\t") is okay, because changing a field causes a rebuild of the input line using OFS as a field separator.

sed solutions:
if you know there are two fields
Code:
sed 's/^ *\([^ ]*\) \([^ ]*\)/\1\2/'
If you want to remove spaces before a tab without knowing the format then you must use a loop
Code:
sed -e ':L' -e 's/^\([^\t]*\) /\1/; tL'

Last edited by MadeInGermany; 02-27-2022 at 06:30 AM.
 
Old 02-27-2022, 06:37 AM   #9
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,146

Rep: Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124Reputation: 4124
Rubbish - read post #5. Why do you always have to pontificate ?.
 
Old 02-27-2022, 08:07 AM   #10
GazL
LQ Veteran
 
Registered: May 2008
Posts: 6,915

Rep: Reputation: 5033Reputation: 5033Reputation: 5033Reputation: 5033Reputation: 5033Reputation: 5033Reputation: 5033Reputation: 5033Reputation: 5033Reputation: 5033Reputation: 5033
Here's a solution just using bash commands:
Code:
while read num date time dir
do
  printf "%d%s%s\t%s\n" "$num" "$date" "$time" "$dir"
done < /tmp/input.txt
It will cope with spaces in the dirname so long as they're not leading spaces (which read will strip).

bash is not the fastest however, so if you have millions of rows, you might want to use one of the other solutions, but you did ask how to do it in bash.
 
Old 02-27-2022, 08:22 AM   #11
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,987

Rep: Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337Reputation: 7337
Quote:
Originally Posted by GazL View Post
Here's a solution just using bash commands:
Code:
while read num date time dir
do
  printf "%d%s%s\t%s\n" "$num" "$date" "$time" "$dir"
done < /tmp/input.txt
It will cope with spaces in the dirname so long as they're not leading spaces (which read will strip).

bash is not the fastest however, so if you have millions of rows, you might want to use one of the other solutions, but you did ask how to do it in bash.
Yes, that's what I wanted to say too. Split line by whitespaces and reconstruct the line. Either in bash or awk/perl/python/sed/whatever, like this
Code:
awk '{ printf "%d%s%s\t%s\n", $1, $2, $3, $4 }'
 
1 members found this post helpful.
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: The first browser to introduce a second level in the tab bar for managing tab groups: Two-Level Tab LXer Syndicated Linux News 0 03-01-2021 05:50 AM
Tab separated file - remove CR/LF if it occurs before n tab characters ? thesnow Programming 5 06-12-2019 06:04 PM
remove line spaces and add new line entry for array sam.987 Linux - Newbie 2 06-25-2017 09:57 PM
shell question: pad end of each line with spaces to = 80 chars ?? di11rod Programming 19 04-21-2011 07:03 PM
[SOLVED] BASH Script: Spaces are getting converted to TAB character singajeet Programming 3 12-23-2010 03:34 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 10:18 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration