LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 04-12-2023, 06:30 PM   #1
sharky
Member
 
Registered: Oct 2002
Posts: 569

Rep: Reputation: 84
text munging problem


Input data is a text file about 300k - almost 7000 lines.

Using python 3.

I want to append every term in the text file that contains MAX (just an example) with _33.

For example:

"left -21 exMAX_14" would become "left -21 exMAX_14_33"

There are several search terms besides MAX but the append term will always be _33.
 
Old 04-12-2023, 06:48 PM   #2
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,151

Rep: Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125
I'm sure python can handle it, but this is what sed was built for.
Code:
sed 's/MAX/MAX_33/' some.file
Presumes your description of requirements is robust. You can do several in one invocation, use regex ...
Looks simple, can be sophisticated.

Can be done in-place, but I always redirect to a new file for sanity.

Last edited by syg00; 04-12-2023 at 06:49 PM. Reason: last sentence
 
1 members found this post helpful.
Old 04-12-2023, 07:14 PM   #3
sharky
Member
 
Registered: Oct 2002
Posts: 569

Original Poster
Rep: Reputation: 84
I was thinking python would be the better solution because I want to append every term that contains MAX. I do not necessarily want to append MAX directly as your sed example shows. MAX could be at the beginning, middle, or end of the term but I always want to append the end of the term.

Examples
MAX -> MAX_33
MAX_XXX -> MAX_XXX_33
XXX_MAX23 -> XXX_MAX23_33
N0MIX -> N0MIX

For that sed might still be the better solution but I am more familiar with python.
 
Old 04-12-2023, 07:52 PM   #4
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,151

Rep: Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125
Certainly simple enough to accommodate - good luck with your python adventures. Gotta get round to getting in it one day ...
 
1 members found this post helpful.
Old 04-12-2023, 09:50 PM   #5
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,880
Blog Entries: 1

Rep: Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871
Code:
sed 's/^.*MAX.*$/&_33/' inputfile >outputfile
 
1 members found this post helpful.
Old 04-12-2023, 10:17 PM   #6
michaelk
Moderator
 
Registered: Aug 2002
Posts: 25,781

Rep: Reputation: 5936Reputation: 5936Reputation: 5936Reputation: 5936Reputation: 5936Reputation: 5936Reputation: 5936Reputation: 5936Reputation: 5936Reputation: 5936Reputation: 5936
Maybe a better way using awk but:

Code:
awk ' { if ( $0 ~ /MAX/ )  printf "%s_33\n",$0; else printf "%s\n",$0 }' /path/to/file

Last edited by michaelk; 04-12-2023 at 10:19 PM.
 
Old 04-12-2023, 11:14 PM   #7
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,151

Rep: Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125
I'm guessing that isn't what the OP is looking for. Seems the OP wants to learn python - after s?he/they furnishes a working python solution, I'll provide a trivial sed one.
 
Old 04-12-2023, 11:57 PM   #8
sharky
Member
 
Registered: Oct 2002
Posts: 569

Original Poster
Rep: Reputation: 84
Quote:
Originally Posted by NevemTeve View Post
Code:
sed 's/^.*MAX.*$/&_33/' inputfile >outputfile
This is close.

Input:
Quote:
not MAX_PIN1 MAX_drawing bad_MAX_PIN
select -interact -not MAX_PIN1 bad_MAX_PIN -outputlayer MAX_PIN
select -interact METAL2_slot M2_PIN1 M2_PIN1_slot
select -interact M2_PIN1 METAL2_slot M2_slot_PIN
Output:
Quote:
not MAX_PIN1 MAX_drawing bad_MAX_PIN_33
select -interact -not MAX_PIN1 bad_MAX_PIN -outputlayer MAX_PIN_33
select -interact METAL2_slot M2_PIN1 M2_PIN1_slot
select -interact M2_PIN1 METAL2_slot M2_slot_PIN
Needed:
Quote:
not MAX_PIN1_33 MAX_drawing_33 bad_MAX_PIN_33
select -interact -not MAX_PIN1_33 bad_MAX_PIN_33 -outputlayer MAX_PIN_33
select -interact METAL2_slot M2_PIN1 M2_PIN1_slot
select -interact M2_PIN1 METAL2_slot M2_slot_PIN
Not just at the end of the line but appended to every term.
 
Old 04-13-2023, 02:29 AM   #9
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,880
Blog Entries: 1

Rep: Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871
With GNU!sed:
Code:
sed -E '/MAX/{ s/\b(\w*)(_33)*\b/\1_33/g }'
But most certainly, you can do it with Python. Some help can be found here:
https://docs.python.org/3/library/re.html
https://www.hackerrank.com/domains/p...5B%5D=py-regex

Last edited by NevemTeve; 04-13-2023 at 02:31 AM.
 
Old 04-13-2023, 09:50 AM   #10
sharky
Member
 
Registered: Oct 2002
Posts: 569

Original Poster
Rep: Reputation: 84
Quote:
Originally Posted by NevemTeve View Post
With GNU!sed:
Code:
sed -E '/MAX/{ s/\b(\w*)(_33)*\b/\1_33/g }'
But most certainly, you can do it with Python. Some help can be found here:
https://docs.python.org/3/library/re.html
https://www.hackerrank.com/domains/p...5B%5D=py-regex
Appreciate the helpful links.

The sed command you suggest appends every term. Only the terms containing 'MAX' should be appended.

Quote:
not_33 MAX_PIN1_33 MAX_drawing_33 bad_MAX_PIN_33
select_33 -interact_33 -not_33 MAX_PIN1_33 bad_MAX_PIN_33 -outputlayer_33 MAX_PIN_33
select -interact METAL2_slot M2_PIN1 M2_PIN1_slot
select -interact M2_PIN1 METAL2_slot M2_slot_PIN
 
Old 04-13-2023, 10:03 AM   #11
michaelk
Moderator
 
Registered: Aug 2002
Posts: 25,781

Rep: Reputation: 5936Reputation: 5936Reputation: 5936Reputation: 5936Reputation: 5936Reputation: 5936Reputation: 5936Reputation: 5936Reputation: 5936Reputation: 5936Reputation: 5936
Code:
import os
import sys

txt = "select -interact -not MAX_PIN1 bad_MAX_PIN -outputlayer MAX_PIN"
x = txt.split(" ")
y = len(x)
print(x) 
for m in range(0,y-1):
    if "MAX" in x[m]:
        sys.stdout.write(x[m]+"_33 ")
    else:
        sys.stdout.write(x[m]+" ")

if "MAX" in x[y-1]:
   print(x[y-1]+"_33")
else:
   print(x[y-1])
Just throwing something out...
 
Old 04-13-2023, 10:35 AM   #12
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,880
Blog Entries: 1

Rep: Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871
@OP Would you mind showing what you have done so far and where you're stuck?

Last edited by NevemTeve; 04-13-2023 at 11:34 AM.
 
Old 04-13-2023, 12:05 PM   #13
teckk
LQ Guru
 
Registered: Oct 2004
Distribution: Arch
Posts: 5,152
Blog Entries: 6

Rep: Reputation: 1835Reputation: 1835Reputation: 1835Reputation: 1835Reputation: 1835Reputation: 1835Reputation: 1835Reputation: 1835Reputation: 1835Reputation: 1835Reputation: 1835
cat 1test.txt
Code:
left -21 exMAX_14
left -22 exMAX_15
left -23 exMAX_16
left -24 exMAX_17

Code:
def green(text):
    return '\033[0;32m' + text + '\033[0m'
        
with open('1test.txt', 'r') as f:
    for i in f:
        print(i.rstrip()+f'{green("_33")}')
 
Old 04-13-2023, 12:10 PM   #14
teckk
LQ Guru
 
Registered: Oct 2004
Distribution: Arch
Posts: 5,152
Blog Entries: 6

Rep: Reputation: 1835Reputation: 1835Reputation: 1835Reputation: 1835Reputation: 1835Reputation: 1835Reputation: 1835Reputation: 1835Reputation: 1835Reputation: 1835Reputation: 1835
Actually that is a little confusing with all the f's, how about
Code:
with open('1test.txt', 'r') as x:
    for i in x:
        print(i.rstrip()+f'{green("_33")}')
 
Old 04-14-2023, 11:08 AM   #15
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
With this InFile ...
Code:
not MAX_PIN1 MAX_drawing bad_MAX_PIN
select -interact -not MAX_PIN1 bad_MAX_PIN -outputlayer MAX_PIN
select -interact METAL2_slot M2_PIN1 M2_PIN1_slot
select -interact M2_PIN1 METAL2_slot M2_slot_PIN
... this sed ...
Code:
sed -r 's/(MAX[_[:alnum:]]*)/\1_33/g' $InFile >$OutFile
... produced this OutFile ...
Code:
not MAX_PIN1_33 MAX_drawing_33 bad_MAX_PIN_33
select -interact -not MAX_PIN1_33 bad_MAX_PIN_33 -outputlayer MAX_PIN_33
select -interact METAL2_slot M2_PIN1 M2_PIN1_slot
select -interact M2_PIN1 METAL2_slot M2_slot_PIN
I cannot take credit for this solution. A Google search led me to a similar solution written by mashuptwice at https://stackoverflow.com/questions/71731677/

Daniel B. Martin

.

Last edited by danielbmartin; 04-14-2023 at 11:17 AM. Reason: Cosmetic improvement
 
1 members found this post helpful.
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Bash command to 'cut' text into another text file & modifying text. velgasius Programming 4 10-17-2011 04:55 AM
Firefox problem with text boxes automatically highlighting text loki993 Linux - Software 2 07-16-2010 01:52 PM
xmllint munging UTF8 chars in HTML document... workaround? mattp52 Programming 0 01-12-2009 01:08 AM
How to parse text file to a set text column width and output to new text file? jsstevenson Programming 12 04-23-2008 02:36 PM
More text in Text Mode LandRoverMan Linux - Newbie 2 06-10-2003 11:47 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 04:12 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration