Text file data into Postgres database
Hello Group,
I've wanted to start this project for a long time but never knew where to start. I figure it starts with finding out what you don't know. Any guidance and suggestions are greatly appreciated. Background: I have thousands of folders and files (approx. 650,000 files) filled with client data on my workplace linux server. I cannot encourage, guide, cajole, or threaten anyone into following any set procedures for naming schemes or business data procedures. The bosses don't see it as a priority and everyone has the attitude "as long as I know where my stuff is, I don't care." But what happens is, when people want to check out others' research or other written work from the past, it's difficult to find and identify the work product. And then of course I receive complaints about how crappy "MY" server and network is. I see folks running "Search" from the Win XP Start menu and scanning multiple hard drives. It's a waste of time and server resources. Objective: Make a searchable database that will produce results in 1 or 2 seconds. Use php, postgresql, and apache to host this LAN-only accessible app. Procedural Posture: 1. Obtain file and directory names and locations. [Completed] 2. Use data mining techniques to pull the necessary data out of "1." and place it in a postgresql database. 3. Build the php/apache interface to perform searches by SQL queries. - - - - - - - - - - I believe No. 1 is completed. I used the following command to gather all file and directory names. "/abc" is the main directory on the linux samba server where all company data is located: root@acme:/# ls -Ralh /abc > /home/mike/file_output.txt file_output.txt is 33 megs. and lists approximately 650,000 file names and their directory paths. I want to pull all the file name info. and directory info. from "file_output.txt". I want to place each file name into the database in one row and have the related file location info. stated in the next database row. Ultimately I want it to show up in the web. app. as a clickable link that will open the document. How do I pull the data from "file_output.txt" and place it into the postgres database? Due to years of ever-changing file-naming schemes and changing directory paths, I simply want to provide a search field where users can type in any alpha-numeric sequence and pull matching results. |
Moin,
I think, you could get better loadable results using find instead of ls. Try find in combination with the -printf option to get a list of the files you want to store in the database. To load the file, look for the COPY PostgreSQL command. Jan EDIT: In my opinion it's not a good idea to store the file's name and it's location in different rows. It would be a better idea to store the locations (i. e. directories) in one table with a unique ID and to store the filename and a reference to it's location in another table each in one row. |
Quote:
Silly question - why don't you use locate? Cheers, Tink |
I had the hope the intention of the OP is to add tags in order to improve the search capabilities. Searchin file names is fine if you know what to look for and then locate is fine as you say.
jlinkels |
Moin,
Quote:
Jan |
but you can execute the locate from the PHP script and show the results in a table anyway...
jlinkels |
Moin,
Quote:
Jan |
I only showed that locate could be used from XP. But you are right that it would be better to include more search keys.
Having said that, it seems that the OP did not plan to store that information in his database. Looking at his first post he intended to store only file names and paths. Unfortunately the OP doesn't seem to be interested in the discussion anymore. jlinkels |
No!
I am very interested in this thread and appreciate your time and assistance. My sincere apologies for not answering your responses more quickly. I posted and then the next day was called away on a family responsibility. I was checking email but for some reason didn't receive response notifications --- (need to check my forum settings). I am at work and will try to sit down with your input this evening. Thank you for your help. |
Quote:
Excellent - this is the format I was looking for: directory path and file name, line by line. I did ~$find /main_directory > /home/mike/test_find_output.txt So now it's time to learn something about Postgresql's COPY command and make some trouble with databases. :-) Thanks, Jan. Sometimes a little help with a small step can help someone make greater/larger steps in the future. - - - - - Thanks to jlinkels too -- I believe I understand your approach and how it could be used. I may see if I can learn to use this too. |
Moin,
Quote:
For example: Code:
find /main -printf "%h|%f|%y|%s|%T+|%u|%g\n" - the directory name - the file name - the file type - the size in bytes - the last modification time - the owner - the group in a "|" delimited format, which is easy to load into a database. To search for regular files only, add the "-type f" option. Much more is possible, look at the manual page. Jan |
Quote:
Yes, all linux on the server and Win XP Pro clients connecting to it. Slackware ver. 12.2. Thank you very much for the guidance. I am looking at man find and google foo examples. Regards, Mike |
All times are GMT -5. The time now is 12:21 AM. |