LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 09-27-2011, 06:42 PM   #1
tuxlux
LQ Newbie
 
Registered: Aug 2011
Posts: 19

Rep: Reputation: Disabled
PHP Help


I am working on a research project using article and edit data from Wikipedia. Someone put together a script for me to combine data in a particular way but it doesn't do what I need (I think they had another idea in mind). I know nothing about PHP and only the very basics in Bash. I was wondering if someone could help fix this script.

Script and sample files are attached in the "sample.txt" file which is actually a zip file so you would need to change the extension to .zip first. Not sure if that will work, but it wouldn't let me upload a zip file. If this doesn't work, I'll try another approach.

Files to be worked are in two directories /article_info and /user_info.

Basically, the script should open a text file in article_info and read the names of the users. It should then read in all those user text files from user_info into one combined file. It should exclude all lines in each user file with less than 6 (i.e. 5 or fewer) edits in column 3.

Next, it should find all items in column 2 that are the same (i.e. duplicates) and preserve those or output them to a new file.

The final product would be a file with all duplicates (or triplicates, etc.) preserved and nothing else. The Hydrology-output.txt file attached is a sample of the output.

It should be able to do this for all files in article_info (about 150) and there will be times when different user_info files are used in more than one article_info file. However, since it should do each article_info file one at a time this should be okay.

Thanks much.
Attached Files
File Type: txt Sample.txt (143.0 KB, 69 views)
 
Old 09-27-2011, 10:17 PM   #2
sag47
Senior Member
 
Registered: Sep 2009
Location: Raleigh, NC
Distribution: Ubuntu, PopOS, Raspbian
Posts: 1,899
Blog Entries: 36

Rep: Reputation: 477Reputation: 477Reputation: 477Reputation: 477Reputation: 477
Not sure if it has to be in php but I did what you want based on your description in python.

Code:
#!/usr/bin/env python
from os import listdir
from sys import exit
from os.path import isfile

if len(listdir("./article_info")) == 0:
  print "empty directory..."
  exit(1)

for file in listdir("./article_info"):
  users=[]
  f=open("./article_info/" + file,'r')
  
  #grab all the users out of the files in article info
  for line in f:
    if line.split()[0] != "rev_user_text":
      users.append(line.rsplit(None,1)[0])
  f.close()
  
  #open ./file-output.txt file and write to it
  f=open("./" + file.rsplit('.',1)[0] + "-output." + file.rsplit('.',1)[1], 'w')
  #now work on each user in ./user_info
  for user in users:
    if not isfile("./user_info/" + user + ".txt"):
      continue
    fuser=open("./user_info/" + user + ".txt",'r')
    for line in fuser:
      if line.split()[0] != "rev_user_text":
        if int(line.rsplit(None,1)[1]) > 5:
          f.write(line)
    fuser.close()
  f.close()
Thanks for an interesting text manipulation problem. With my script you could have multiple articles in the article_info folder. I win the race with your friend because my script is shorter (including comments, not empty lines).

SAM

Last edited by sag47; 09-27-2011 at 10:27 PM.
 
Old 09-27-2011, 11:31 PM   #3
tuxlux
LQ Newbie
 
Registered: Aug 2011
Posts: 19

Original Poster
Rep: Reputation: Disabled
Thanks for the help. Python is fine. Doesn't really matter to me.

I ran this and it seems to have run, but the output isn't quite what I need. Looks like it properly combined all the user files into one new file but the next step is to find all the duplicate article names (col 2) and save those lines (lines matching something else based on column 2) while deleting everything else or outputting those to a new file. Either way, the final product would be a file with just all the duplicates (or triplicates, etc.) of the article titles along with the user names and edit counts for each line. Let me know if that doesn't make sense.

EDIT: you also win the race for a script that ran first time

Last edited by tuxlux; 09-27-2011 at 11:39 PM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to make an online PHP contact / phone book with simple PHP (no database)? Xeratul Programming 23 07-11-2011 12:50 PM
LXer: Installing Nginx With PHP 5.3 And PHP-FPM On Ubuntu Lucid Lynx (10.04) LXer Syndicated Linux News 0 06-14-2010 11:42 PM
LXer: Installing PHP 5.3, Nginx And PHP-fpm On Ubuntu/Debian LXer Syndicated Linux News 0 02-10-2010 05:40 PM
php5 apache2 mysql4 don't work, php does not seem to read php.ini atom Linux - Software 5 03-24-2005 11:05 AM
php apache or php cgi - php learner rblampain Linux - Security 3 12-17-2004 11:10 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 11:46 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration