LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 01-03-2017, 10:20 AM   #1
slimcharles
LQ Newbie
 
Registered: Dec 2016
Distribution: Ubuntu 20.04 & Debian 11
Posts: 13

Rep: Reputation: Disabled
Python .encode("utf-8") is causing lost of end of lines while writing to file


While scraping data with beautifulsoup,
The output is:
Code:
письмо 1
Test 2
Note 3
Über 4
But when I tried to write the output into file
Code:
with open("/home/user/Desktop/file", "a") as myfile:
    myfile.write(i['title'].encode("utf-8"))
the file's content is:

Code:
Yazı 1Test 2Note 3Über 4
As you can see the end of lines are missing.
Can you tell me what did I do wrong or not doing it ?
 
Old 01-03-2017, 10:27 AM   #2
hydrurga
LQ Guru
 
Registered: Nov 2008
Location: Pictland
Distribution: Linux Mint 21 MATE
Posts: 8,048
Blog Entries: 5

Rep: Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925
What code did you use to produce the output in box 1?
 
Old 01-03-2017, 10:39 AM   #3
slimcharles
LQ Newbie
 
Registered: Dec 2016
Distribution: Ubuntu 20.04 & Debian 11
Posts: 13

Original Poster
Rep: Reputation: Disabled
This is the code:

Quote:
Originally Posted by hydrurga View Post
What code did you use to produce the output in box 1?
Code:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import urllib2
import re

import sys
var = raw_input("Please enter something: ")
html_page = urllib2.urlopen("http://www.url.com/"+var+"files/")

soup = BeautifulSoup(html_page, "lxml")
li = soup.select("ul > li > a")
for i in soup.find('div',{'class':'boxList'}).findAll('a'):
      print i['title']
      with open("/home/user/Desktop/"+var, "a") as myfile:
        myfile.write(i['title'].encode("utf-8"))
 
Old 01-03-2017, 10:46 AM   #4
hydrurga
LQ Guru
 
Registered: Nov 2008
Location: Pictland
Distribution: Linux Mint 21 MATE
Posts: 8,048
Blog Entries: 5

Rep: Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925
The print() function automatically adds a newline on to the end of its output, unless you use the following format (for example):

print("Hello ",end='')
print("World")

(those are two single quotes that I've added after end=)

So, the newlines in your case have been artificially added to the output, they are not generated by beautifulsoup in the array entries. You therefore have to add newlines on to the end of each call to the write() function to do similar.

Last edited by hydrurga; 01-03-2017 at 10:48 AM. Reason: 2 single quotes looks like a double quote when post published
 
1 members found this post helpful.
Old 01-03-2017, 11:01 AM   #5
slimcharles
LQ Newbie
 
Registered: Dec 2016
Distribution: Ubuntu 20.04 & Debian 11
Posts: 13

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by hydrurga View Post
The print() function automatically adds a newline on to the end of its output, unless you use the following format (for example):

print("Hello ",end='')
print("World")

(those are two single quotes that I've added after end=)

So, the newlines in your case have been artificially added to the output, they are not generated by beautifulsoup in the array entries. You therefore have to add newlines on to the end of each call to the write() function to do similar.
I solved my problem by adding \n to the write function
Code:
myfile.write(i['title'].encode("utf-8") + "\n")
Thank you for your time and help.
 
Old 01-03-2017, 11:26 AM   #6
hydrurga
LQ Guru
 
Registered: Nov 2008
Location: Pictland
Distribution: Linux Mint 21 MATE
Posts: 8,048
Blog Entries: 5

Rep: Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925
No problem, slimcharles. I enjoy working with beautifulsoup (and requests too). Have fun!
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Delete lines with "|" and numbers from 0-10 at the end georgi Programming 5 10-01-2013 12:22 AM
how can I "cat" or "grep" a file to ignore lines starting with "#" ??? callagga Linux - Newbie 7 08-16-2013 06:58 AM
Lost partition table, "does not end on cylinder boundary" kinetic Linux - Hardware 3 03-13-2011 09:02 PM
fdisk reports odd "Start "and "End" sectors on single partition eponymous Linux - Software 3 10-01-2007 03:41 PM
charset "UTF-8" not supported, using "ISO8859-1". satishpatel Linux - Software 3 04-09-2004 07:11 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 10:59 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration