LinuxQuestions.org
Latest LQ Deal: Complete CCNA, CCNP & Red Hat Certification Training Bundle
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 09-22-2017, 05:54 PM   #1
grumpyskeptic
Member
 
Registered: Apr 2016
Posts: 162

Rep: Reputation: Disabled
Converting new line symbol to a tab symbol when pasting into LibreOffice Calc


When I copy and paste a table from a webpage into LibreOffice Calc, it is shown as a single column rather than as a table.

Investigation with a hex editor, after pasting the table into a text file instead of into Calc, shows that where I would expect a tab there is instead (in hex) 0A 20 20 20 20.

Ends of lines are similar: 0A 0A 20 20 20 20.

As far as I recall the 0A is a new line symbol, and 20 is a space symbol. In other words, what should be a tab is one new line, but ends of lines have two new lines.

What's the easiest way to get this into Calc as a table please, rather than a column?

If it cannot be done directly, what's the easiest way with a GUI to convert 0A 20 20 20 20 into a tab while keeping 0A 0A 20 20 20 20 as a new line? For example converting it to CSV format.

Is there any other Linux spreadsheet that would format this more correctly than Calc?

Thanks.
 
Old 09-23-2017, 08:10 AM   #2
RockDoctor
Senior Member
 
Registered: Nov 2003
Location: Minnesota, US
Distribution: Fedora, Linux Mint, Ubuntu
Posts: 1,599

Rep: Reputation: 352Reputation: 352Reputation: 352Reputation: 352
What's the web page? No problem here copying this table into Gnumeric or LibreOffice Calc.
 
Old 09-23-2017, 12:53 PM   #3
DavidMcCann
Senior Member
 
Registered: Jul 2006
Location: London
Distribution: CentOS, Salix
Posts: 4,676

Rep: Reputation: 1485Reputation: 1485Reputation: 1485Reputation: 1485Reputation: 1485Reputation: 1485Reputation: 1485Reputation: 1485Reputation: 1485Reputation: 1485
Don't cut and paste from the site to Calc. Instead, save as a .txt file via an editor. Then, in Calc, use File — Open and choose the file type "text csv". Then you can specify the separator character in the import dialogue.
 
Old 09-23-2017, 07:52 PM   #4
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 15,950

Rep: Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210
As above - do it the easy way.
Quote:
Originally Posted by grumpyskeptic View Post
If it cannot be done directly, what's the easiest way with a GUI to convert 0A 20 20 20 20 into a tab while keeping 0A 0A 20 20 20 20 as a new line? For example converting it to CSV format.
Dunno about a GUI, but if your data really are like that, and you really have to munge it manually, sed would be my go-to. But it seems you are just making things harder than they need to be - you would have to be careful that the data are well-structured and your interpretation is correct. For all the data, not just a convenient subset.
Quote:
Is there any other Linux spreadsheet that would format this more correctly than Calc?
Calc is obeying the data - as will everything else.
 
Old 09-26-2017, 09:36 AM   #5
grumpyskeptic
Member
 
Registered: Apr 2016
Posts: 162

Original Poster
Rep: Reputation: Disabled
This is an example of a table: https://www.msn.com/en-gb/money/stoc...126.1.MSFT.NAS

Thanks
 
Old 09-26-2017, 07:16 PM   #6
RockDoctor
Senior Member
 
Registered: Nov 2003
Location: Minnesota, US
Distribution: Fedora, Linux Mint, Ubuntu
Posts: 1,599

Rep: Reputation: 352Reputation: 352Reputation: 352Reputation: 352
Quote:
Originally Posted by grumpyskeptic View Post
I tried it and got the same results you did with both Gnumeric and Calc..

Awkward workaround #1 (tested, but still awkward): print the table to a pdf (ensure all columns appear on the page), then copy and paste the table from the pdf into a spreadsheet.

Awkward workaround #2: dump results to a text file, then write a script to read th4 text file and spit out the results in a nice table

Neither of my proposed workarounds is aesthetically pleasing, but there are times when brute force is justified.
 
Old 09-26-2017, 07:27 PM   #7
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 15,950

Rep: Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210
I also dumped it to text, and the data are indeed line separated - by a single "\n" in the text file I have, not as described above.
Looks like the page was constructed to be obtuse - how appropriate.
 
Old 09-27-2017, 08:26 AM   #8
grumpyskeptic
Member
 
Registered: Apr 2016
Posts: 162

Original Poster
Rep: Reputation: Disabled
Thanks for the replies.

I wonder if there is anything that can do three passes of "search and replace" in hex without a lot of hassle?

First pass: replace "0A 0A 20 20 20 20" by a temporary symbol, eg "&".

Second pass: replace "0A 20 20 20 20" by a comma (to make a CSV file).

Third pass: replace "&" by "0A" or other end of line symbol appropriate to a CSV file.

This also gets rid of the surplus spaces.

There was or is an obscure text editor in Windows that could search for and replace line-endings. Is there anything that can do the same in Linux?

Edit: The text editor that can find and replace new lines and tabs is Metapad, as its help file says "find & replace now support newlines (\n) and tab characters (\t)", which I remember using. The source code is now open source, so it could presumably be compiled under Linux. Unfortunately that is way beyond my abilities. It has its own article on Wikipedia: https://en.wikipedia.org/wiki/Metapad

Last edited by grumpyskeptic; 09-28-2017 at 04:03 AM.
 
Old 09-28-2017, 06:41 PM   #9
RandomTroll
Member
 
Registered: Mar 2010
Distribution: Slackware
Posts: 601

Rep: Reputation: 72
You can do this in emacs. ESC-% is 'query replace'; ctl-j is a newline; you can put in as many as you want.
 
Old 09-30-2017, 08:15 AM   #10
grumpyskeptic
Member
 
Registered: Apr 2016
Posts: 162

Original Poster
Rep: Reputation: Disabled
Thanks. I've now discovered that the Linux gedit, called Text Editor on my computer, can also search for and replace newlines and tabs. From its manual:

"You can include the following escape sequences in the text to find or replace to represent special characters:
\n
Specifies a new line.
\t
Specifies a tab character.
\r
Specifies a carriage return."

Although it would be nice not to have to have to do it all manually.

Last edited by grumpyskeptic; 09-30-2017 at 08:17 AM.
 
Old 09-30-2017, 10:52 AM   #11
RockDoctor
Senior Member
 
Registered: Nov 2003
Location: Minnesota, US
Distribution: Fedora, Linux Mint, Ubuntu
Posts: 1,599

Rep: Reputation: 352Reputation: 352Reputation: 352Reputation: 352
I'm pretty sure it can be scripted, using sed and regex's, but regular expressions, like Chinese and Thai, are languages with which I am not familiar.
 
Old 09-30-2017, 02:20 PM   #12
RandomTroll
Member
 
Registered: Mar 2010
Distribution: Slackware
Posts: 601

Rep: Reputation: 72
Quote:
Originally Posted by RockDoctor View Post
I'm pretty sure it can be scripted, using sed and regex's
I've never found a way to use sed to remove newlines; I can insert them. tr can replace newlines, but only with a single character. If there's an otherwise-unused character in the target text I convert from newline to that character then use sed to convert that character to a string.
 
Old 09-30-2017, 09:16 PM   #13
RockDoctor
Senior Member
 
Registered: Nov 2003
Location: Minnesota, US
Distribution: Fedora, Linux Mint, Ubuntu
Posts: 1,599

Rep: Reputation: 352Reputation: 352Reputation: 352Reputation: 352
I didn't say I was 100% positive it could be done with sed .

Faced with the task, and given my unsuccessful attempts to find an elegant solution via Google, I'd write myself a little Python script to read the file as saved and output the information in a properly formatted csv file, which I would then read using Calc.
 
Old 09-30-2017, 10:27 PM   #14
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 15,950

Rep: Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210Reputation: 2210
This issue is that *nix is stream oriented - and a stream terminates with a "\n". Users don't care about newlines or carriage-return of line-feed or anything else that is metadata, they want their data.
So sed shows you everything up to (but excluding) the newline. It can be co-erced to present them, but isn't trivial. Better to properly define the data (blank line followed by ...) and mangle that rather than concentrate on the raw data.
Before anyone asks, the first "\n" is the end of the previous data, so two give you one blank line ...
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
What are the differences between the normal symbol table, the dynamic symbol table, and the debugging symbol table? watchintv Linux - Software 5 10-22-2016 09:38 AM
[SOLVED] Symbol changes when copy and pasting to a new file mrm5102 Linux - Newbie 4 06-19-2012 03:11 PM
[SOLVED] symbol lookup error: /usr/sbin/httpd: undefined symbol: apr_file_link bagi Slackware 3 05-17-2011 08:49 AM
symbol lookup error: /usr/lib/libavcodec.so.51: undefined symbol: av_crc04C11DB7 priceey Linux - Software 0 05-06-2009 09:14 AM
symbol lookup error: /usr/lib/libgtk-x11.2.0.so.0: undefined symbol:... IamI Slackware 17 02-29-2008 12:10 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 01:42 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration