LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 03-12-2019, 05:29 PM   #1
soph75
LQ Newbie
 
Registered: Jun 2011
Posts: 9

Rep: Reputation: Disabled
Gap spaces with URL's in incoming batch files


Hello, I'm running into an issue with incoming data using oracle SQL Loader utility in Linux server where some URL's are breaking or putting space then putting in the next line(See samples below). I need some input with a script or a command to identify all the gaps within the offended URL's that was tied to in the previous line then reattach them. I appreciate all the help.



Data Samples
==========
2019-03-08 10:05:51~,~104799~,~104799_650_27893416~,~3937565~,~MB~,~1225801.4393d~,~2019-03-05 18:42:36~,~{AC1903BE-01694EBA53C4-1C1E-0E4BB4C2}~,~51.148.67.167~,~Chrome 72.0~,~https://www.intervalworld.com/web/cs...~Cancún,
Mexico*~,~~,~R~,~Windows 7~,~Personal computer


2019-03-08 14:03:56~,~104799~,~104799_672_27893416~,~8192687~,~MB~,~1225801.a5290~,~2019-03-05 17:15:19~,~{AC1903BE-01694EBA53C4-792D-670F55C8}~,~178.197.233.123~,~Mobile Safari 12.0~,~https://www.intervalworld.com/web/cs...~,~Countryside and Safari, South
Africa~,~~,~R~,~iOS~,~Smartphon



Thanks,
Soph
 
Old 03-12-2019, 07:12 PM   #2
berndbausch
Senior Member
 
Registered: Nov 2013
Location: Tokyo
Distribution: A few
Posts: 3,602

Rep: Reputation: 972Reputation: 972Reputation: 972Reputation: 972Reputation: 972Reputation: 972Reputation: 972Reputation: 972
From this output, it is impossible to know where a URL ends. For example, it looks like “and Safari” is part of the URL, but what about the comma and South Africa? Or does the tilde after “South Africa” belong to the URL?

To approach this question, one would need to know how is this output generated. Do the URLs come from a database field? If so, access the database and convert the string in that field to an HTML-safe string.
 
Old 03-13-2019, 01:53 PM   #3
soph75
LQ Newbie
 
Registered: Jun 2011
Posts: 9

Original Poster
Rep: Reputation: Disabled
This is not an issue with URL structure... the tilde comma are field separators in control file to load an oracle table. The data must be fixed in the source prior appending the load but it's not coming in one line format.

URL 1 ==> https://www.intervalworld.com/web/cs...cute;n,Mexico*
URL 2 ==> ~https://www.intervalworld.com/web/cs...~,~Countryside and Safari, South Africa


<code>
2019-03-11 23:15:09~,~104799~,~104799_706_27893416~,~2575383~,~MB~,~1225801.319dd~,~2019-03-05 20:33:33~,~~,~174.214.33.47~,~Mobile Safari~,~~,~~,~~,~V~,~iOS~,~Smartphone
2019-03-12 06:13:33~,~104799~,~104799_276_27893416~,~3551959~,~MB~,~1225801.31a5b~,~2019-03-05 20:07:12~,~~,~97.84.173.49~,~unknown~,~~,~~,~~,~O~,~unknown~,~unknown
2019-03-12 13:02:53~,~104799~,~104799_619_27893416~,~2511880~,~MB~,~1225801.31c68~,~2019-03-05 18:23:09~,~~,~45.20.108.102~,~Apple Mail~,~~,~~,~~,~V~,~OS X~,~Personal computer
2019-03-12 20:10:53~,~104799~,~104799_654_27893416~,~8724550~,~MB~,~1225801.31d49~,~2019-03-05 18:48:51~,~~,~58.95.164.188~,~IE 10.0~,~~,~~,~~,~V~,~Windows~,~Personal computer
2019-03-12 10:22:26~,~104799~,~104799_706_27893416~,~2248496~,~MB~,~1225801.31dd5~,~2019-03-05 18:55:19~,~~,~174.226.15.147~,~unknown~,~~,~~,~~,~V~,~unknown~,~unknown
2019-03-12 18:41:47~,~104799~,~104799_617_27893416~,~5982090~,~MB~,~1225801.32060~,~2019-03-05 18:05:59~,~~,~66.249.88.167~,~Gmail image proxy~,~~,~~,~~,~O~,~Linux~,~Other
2019-03-12 18:41:57~,~104799~,~104799_617_27893416~,~5982090~,~MB~,~1225801.32060~,~2019-03-05 18:05:59~,~~,~66.249.88.161~,~Gmail image proxy~,~~,~~,~~,~V~,~Linux~,~Other
2019-03-12 11:33:14~,~104799~,~104799_705_27893416~,~5782600~,~MB~,~1225801.32074~,~2019-03-05 18:06:02~,~~,~73.27.90.108~,~Chrome 58.0~,~~,~~,~~,~V~,~Windows~,~Personal computer
2019-03-12 10:37:26~,~104799~,~104799_706_27893416~,~8908667~,~MB~,~1225801.328d1~,~2019-03-05 19:52:34~,~~,~4.79.163.189~,~Chrome Mobile 72.0~,~~,~~,~~,~V~,~Android~,~Smartphone
2019-03-12 10:37:58~,~104799~,~104799_706_27893416~,~8908667~,~MB~,~1225801.328d1~,~2019-03-05 19:52:34~,~~,~4.79.163.189~,~Chrome Mobile 72.0~,~~,~~,~~,~V~,~Android~,~Smartphone
2019-03-12 15:00:35~,~104799~,~104799_706_27893416~,~8908667~,~MB~,~1225801.328d1~,~2019-03-05 19:52:34~,~{AC1903BE-01694EBA5390-5F2B-0C91962F}~,~96.2.247.112~,~Chrome Mobile 63.0~,~https://www.intervalworld.com/web/cs...ntent~,~Spring to the Bahamas With Savings and Free Airport
Transfers!~,~~,~R~,~Android~,~Smartphone
</code>

Last edited by soph75; 03-13-2019 at 03:51 PM.
 
Old 03-13-2019, 03:20 PM   #4
scasey
Senior Member
 
Registered: Feb 2013
Location: Tucson, AZ, USA
Distribution: CentOS 7.6
Posts: 3,975

Rep: Reputation: 1352Reputation: 1352Reputation: 1352Reputation: 1352Reputation: 1352Reputation: 1352Reputation: 1352Reputation: 1352Reputation: 1352Reputation: 1352
OP If you would put your output in [code] tags, it would preserve the formatting so we could see exactly what the data looks like...please. You can edit your previous posts to do that.

How are you looking at these data?
What is the difference between the two files? ...that is what happened to make them different?
How are they generated?
Do you have control of their generation?

You're asking for help in fixing the problem, but you're not sharing anything about what you think is causing the problem.
 
Old 03-13-2019, 04:12 PM   #5
soph75
LQ Newbie
 
Registered: Jun 2011
Posts: 9

Original Poster
Rep: Reputation: Disabled
The data is coming from a feedback loop mechanism where a campaign was originally deployed from "E-Mail Marketing system" - Then an ETL process is kicked off to capture the feedback via a flat file using Oracle SQL Loader utility to load a target table. Not sure if the sample was properly attached in the previous but that's a good sample.

How are you looking at these data?
vi command to open the flat file.

What is the difference between the two files? ...that is what happened to make them different?
There's only one file.

How are they generated?
SHLIB_PATH=;sqlldr userid=$$USER[Output]/$$PASSWORD[Output]@$$SERVER[Output] control=$$INSTALLROOTDIR/EM/metadata/trans.ldr log=trans.log bad=trans.bad data=$$INSTALLROOTDIR/EM/logs/trans.log

Do you have control of their generation?
No control of the file generation.
 
Old 03-13-2019, 04:28 PM   #6
soph75
LQ Newbie
 
Registered: Jun 2011
Posts: 9

Original Poster
Rep: Reputation: Disabled
I need a script or command to fix the trans.log file - In sample bad line as you can see it's inputting space before "Spain~,~~,~R~,~iOS~,~Smartphone".



Sample Bad line
============
https://www.intervalworld.com/web/cs...te=~,~Balearic Islands,
Spain~,~~,~R~,~iOS~,~Smartphone


Desired output
===========
https://www.intervalworld.com/web/cs...te=~,~Balearic Islands, Spain~,~~,~R~,~iOS~,~Smartphone
 
Old 03-13-2019, 04:39 PM   #7
scasey
Senior Member
 
Registered: Feb 2013
Location: Tucson, AZ, USA
Distribution: CentOS 7.6
Posts: 3,975

Rep: Reputation: 1352Reputation: 1352Reputation: 1352Reputation: 1352Reputation: 1352Reputation: 1352Reputation: 1352Reputation: 1352Reputation: 1352Reputation: 1352
Quote:
Originally Posted by soph75 View Post
The data is coming from a feedback loop mechanism where a campaign was originally deployed from "E-Mail Marketing system" - Then an ETL process is kicked off to capture the feedback via a flat file using Oracle SQL Loader utility to load a target table. Not sure if the sample was properly attached in the previous but that's a good sample.

How are you looking at these data?
vi command to open the flat file.

What is the difference between the two files? ...that is what happened to make them different?
There's only one file.

How are they generated?
SHLIB_PATH=;sqlldr userid=$$USER[Output]/$$PASSWORD[Output]@$$SERVER[Output] control=$$INSTALLROOTDIR/EM/metadata/trans.ldr log=trans.log bad=trans.bad data=$$INSTALLROOTDIR/EM/logs/trans.log

Do you have control of their generation?
No control of the file generation.
Thanks for trying with the code tags, but they are square brackets, not < >, or use the # button after highlighting what you want to include.

I don't use vi much, but I'm pretty sure that long lines get wrapped. That also happens with more and less, so it's possible that what you're seeing (and pasting) is not what's actually in the file.

try opening with nano, which doesn't wrap lines by default.
Or use cat -E which will put a $ at the end of each line, so wrapped lines look like
Code:
The directory needs to be writable by the web server user.  For example, if the web server 
user is <TT>httpd</TT> and your document root is <tt>/home/httpd/htdocs</tt>, you could create the
directory like this:$
(emphasis added)
Although, when I pasted that from my terminal screen, it ended up in the code tags as all one line...I had to fudge it to make it look like it did in the terminal.

My point is that I'm not sure we know that the lines are broken up as you've posted them.
Do the loads into the database fail?
 
Old 03-14-2019, 12:10 PM   #8
soph75
LQ Newbie
 
Registered: Jun 2011
Posts: 9

Original Poster
Rep: Reputation: Disabled
Thanks for cat -E command - The snippet below shows that the bad file error is breaking at ( Greece*~,~~,~R~,~Windows~,~Personal computer$) which belongs to previous line block (2019-03-12 02:12:46~,~104799~,~104799_652_27893416~,~6508427~,~MB~,~1225801.289fa~,~2019-03-05 18:19:20~,~{AC1903BE-01694EBA53C4-66CE-58468261}~,~121.220.144.115~,~Chrome 72.0~,~https://www.intervalworld.com/web/cs...oDate=~,~Crete and Corfu,$)

Output:
2019-03-12 00:09:14~,~104799~,~104799_652_27893416~,~6784503~,~MB~,~1225801.289b6~,~2019-03-05 18:19:08~,~~,~138.217.176.176~,~Thunderbird 60.5~,~~,~~,~~,~V~,~Windows~,~Personal computer$
2019-03-12 02:12:46~,~104799~,~104799_652_27893416~,~6508427~,~MB~,~1225801.289fa~,~2019-03-05 18:19:20~,~{AC1903BE-01694EBA53C4-66CE-58468261}~,~121.220.144.115~,~Chrome 72.0~,~https://www.intervalworld.com/web/cs...oDate=~,~Crete and Corfu,$
Greece*~,~~,~R~,~Windows~,~Personal computer$
2019-03-12 02:10:33~,~104799~,~104799_652_27893416~,~6508427~,~MB~,~1225801.289fa~,~2019-03-05 18:19:20~,~~,~121.220.144.115~,~Outlook 2010~,~~,~~,~~,~V~,~Windows 8~,~Personal computer$
2019-03-12 02:10:56~,~104799~,~104799_652_27893416~,~6508427~,~MB~,~1225801.289fa~,~2019-03-05 18:19:20~,~{AC1903BE-01694EBA5390-7819-3BD1C3A6}~,~121.220.144.115~,~Chrome 72.0~,~https://www.intervalworld.com/web/cs...6toDate=~,~Las Vegas, Nevada~,~~,~R~,~Windows~,~Personal computer$


The ETL load is failing every time it encounters records ( Greece*~,~~,~R~,~Windows~,~Personal computer$) since the control file doesn't align with number of fields in the output.
 
Old 03-14-2019, 02:37 PM   #9
scasey
Senior Member
 
Registered: Feb 2013
Location: Tucson, AZ, USA
Distribution: CentOS 7.6
Posts: 3,975

Rep: Reputation: 1352Reputation: 1352Reputation: 1352Reputation: 1352Reputation: 1352Reputation: 1352Reputation: 1352Reputation: 1352Reputation: 1352Reputation: 1352
Quote:
Originally Posted by soph75 View Post
Thanks for cat -E command - The snippet below shows that the bad file error is breaking at ( Greece*~,~~,~R~,~Windows~,~Personal computer$) which belongs to previous line block (2019-03-12 02:12:46~,~104799~,~104799_652_27893416~,~6508427~,~MB~,~1225801.289fa~,~2019-03-05 18:19:20~,~{AC1903BE-01694EBA53C4-66CE-58468261}~,~121.220.144.115~,~Chrome 72.0~,~https://www.intervalworld.com/web/cs...oDate=~,~Crete and Corfu,$)

Output:
2019-03-12 00:09:14~,~104799~,~104799_652_27893416~,~6784503~,~MB~,~1225801.289b6~,~2019-03-05 18:19:08~,~~,~138.217.176.176~,~Thunderbird 60.5~,~~,~~,~~,~V~,~Windows~,~Personal computer$
2019-03-12 02:12:46~,~104799~,~104799_652_27893416~,~6508427~,~MB~,~1225801.289fa~,~2019-03-05 18:19:20~,~{AC1903BE-01694EBA53C4-66CE-58468261}~,~121.220.144.115~,~Chrome 72.0~,~https://www.intervalworld.com/web/cs...oDate=~,~Crete and Corfu,$
Greece*~,~~,~R~,~Windows~,~Personal computer$
2019-03-12 02:10:33~,~104799~,~104799_652_27893416~,~6508427~,~MB~,~1225801.289fa~,~2019-03-05 18:19:20~,~~,~121.220.144.115~,~Outlook 2010~,~~,~~,~~,~V~,~Windows 8~,~Personal computer$
2019-03-12 02:10:56~,~104799~,~104799_652_27893416~,~6508427~,~MB~,~1225801.289fa~,~2019-03-05 18:19:20~,~{AC1903BE-01694EBA5390-7819-3BD1C3A6}~,~121.220.144.115~,~Chrome 72.0~,~https://www.intervalworld.com/web/cs...6toDate=~,~Las Vegas, Nevada~,~~,~R~,~Windows~,~Personal computer$


The ETL load is failing every time it encounters records ( Greece*~,~~,~R~,~Windows~,~Personal computer$) since the control file doesn't align with number of fields in the output.
It would be helpful to both us and you if you'd use [code] tags when posting output or file contents, please. Besides showing exactly what's in the file, the URLs wouldn't be changed into links, which confuses things even more.

I'd go back to the ETL process. I'm not sure what they're trying to accomplish using ~,~ as delimiters, since the delimiter still contains a comma, so it becomes a challenge when the data also contains a comma, unless each field is quoted...which might also be an issue.

Although it is interesting the problem doesn't appear to always occur when there's a comma in the field.

In any event, if the ETL process is breaking the record across lines, then that's where the problem needs to be fixed. Something is broken in the Transform.

The last ETL shop I worked in used the pipe character (|) as a delimiter, which mostly worked as long as someone didn't enter a pipe in the data...that didn't happen often, but when it did, we'd have to open the file in a text editor and remove the bogus pipe...manually...and we'd have to get the source data changed. Also manually.

I never understood why they just didn't use tab for a delimiter. Our data was all coming from forms, and it's nearly impossible to key a tab in a form, since pressing tab just move the cursor to the next field. But that's an off-topic rant, for which I apologize.

Please. Post your data in code tags so we can see what's actually there. Thank you.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Tabs or spaces? Spaces, obviously, but how many? LXer Syndicated Linux News 0 09-13-2018 10:50 AM
LXer: The hacker 'skills gap' may be more of a strategy gap LXer Syndicated Linux News 0 09-04-2014 07:41 PM
[SOLVED] how to write a batch file in linux?and what are batch(.bat) file benefits? karthilin Linux - Newbie 8 10-15-2012 03:10 PM
Spaces and escaped spaces pslacerda Linux - Newbie 13 12-20-2008 10:03 AM
Windows XP batch script - mkdir %date% creates spaces! Micro420 Programming 1 10-19-2006 03:55 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 01:54 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration