LinuxQuestions.org
LinuxAnswers - the LQ Linux tutorial section.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
Search this Thread
Old 01-30-2012, 05:58 PM   #1
dombrowsky
Member
 
Registered: Dec 2005
Location: New York
Distribution: Debian/GNU
Posts: 235

Rep: Reputation: 31
Post Need a file-based tagging file organizer


I've been looking for a while, and I cannot find an app, or system plugin, or desktop setting, which will allow me to take a set of files, and tag them according to my wishes. Let me add a set of filenames to this application, tag each as "bill," "receipt," "medical," "auto," etc., and I will be happy.

So far, my solution is referencer, but my use case is very much not the goal of the project. Referencer is a way to organize bibliographical documents in a research project, not a general purpose organizer for personal scanned documents.

How do other people do this? How do you organize the scanned bills and other stuff you get in the mail? How do you take a stack of files a categorize them with useful tags?
 
Old 01-30-2012, 06:13 PM   #2
pljvaldez
Guru
 
Registered: Dec 2005
Location: Somewhere on the String
Distribution: Debian Squeeze (x86)
Posts: 6,092

Rep: Reputation: 269Reputation: 269Reputation: 269
In KDE4, I think nepomuk (the general desktop search program) does this. http://nepomuk.kde.org/discover/user

What desktop (name and version) are you using and what distro (name and version)?
 
Old 01-30-2012, 06:22 PM   #3
Dark_Helmet
Senior Member
 
Registered: Jan 2003
Posts: 2,786

Rep: Reputation: 368Reputation: 368Reputation: 368Reputation: 368
I don't do much with Document Management, but I probably should. Anyway, I did some searching and found this page:

Linux User & Developer: OpenOffice.org Base--No Frills Document Management

It's a bit of a do-it-yourself system, which could be good or bad depending on your persuasion. I could swear I've seen a pre-packaged, tag-based document management system somewhere though. I'll probably do a little more searching just in case.

EDIT:
--post edit: I looked a little more closely at the description, and I can't guarantee that DocMGR allows for tag-based organization. There is reference to "keywords" in some of the documentation, but no clear indication that "keywords" and "tags" are equivalent.

Also came across this blog post: Cool Web-based Software-DocMGR. And a just a tad more digging to find the DocMGR homepage.

Again, I don't do much document management. So I don't know how good/bad either of these are or if they fit your needs. Just pointing them out in case your searching has not turned them up.

Last edited by Dark_Helmet; 01-30-2012 at 06:34 PM.
 
Old 01-30-2012, 06:30 PM   #4
dombrowsky
Member
 
Registered: Dec 2005
Location: New York
Distribution: Debian/GNU
Posts: 235

Original Poster
Rep: Reputation: 31
Quote:
Originally Posted by pljvaldez View Post
In KDE4, I think nepomuk (the general desktop search program) does this. http://nepomuk.kde.org/discover/user

What desktop (name and version) are you using and what distro (name and version)?
ubuntu at work, mint at home, and debian on the server. But it shouldn't matter. If the solution is so coupled to the desktop, it isn't a solution to this problem. Referencer does fit the puzzle, only it is not designed for my use case. I'm trying to make sure there isn't an obvious solution, before I start dedicating development time to referencer to mold it to my needs.
 
Old 01-30-2012, 06:35 PM   #5
dombrowsky
Member
 
Registered: Dec 2005
Location: New York
Distribution: Debian/GNU
Posts: 235

Original Poster
Rep: Reputation: 31
Quote:
Originally Posted by Dark_Helmet View Post
I don't do much with Document Management, but I probably should. Anyway, I did some searching and found this page:

Linux User & Developer: OpenOffice.org Base--No Frills Document Management

It's a bit of a do-it-yourself system, which could be good or bad depending on your persuasion.
It's is often good, but like most of us, my time is valuable. I stopped reading the link after it mentioned "open office base." OObase is interesting, but I'm not going to develop a personal microsoft-access-style solution around OO just because I can. It seems that this problem should be more common. I get paper in the mail, I scan it into a computer, and I want to organize it. How do I do this?
 
Old 01-30-2012, 06:45 PM   #6
Dark_Helmet
Senior Member
 
Registered: Jan 2003
Posts: 2,786

Rep: Reputation: 368Reputation: 368Reputation: 368Reputation: 368
Re: time being valuable. Sure, I understand.

The last two options I'll throw at you are from this Open Source Document Management blog post. The link to KnowledgeTree in the article is dead. But the link to jLibrary is still good. Scanning over the description, it looks like jLibrary supports customizable meta information for files.

I did find (what I assume is) the open source Knowledge Tree document management system on Sourceforge.
 
Old 01-30-2012, 07:22 PM   #7
pljvaldez
Guru
 
Registered: Dec 2005
Location: Somewhere on the String
Distribution: Debian Squeeze (x86)
Posts: 6,092

Rep: Reputation: 269Reputation: 269Reputation: 269
I did find this novel approach of a "filesystem" for tags. http://www.tagsistant.net/
 
Old 06-07-2012, 05:07 AM   #8
oniony
LQ Newbie
 
Registered: Jun 2012
Posts: 2

Rep: Reputation: Disabled
I have been working on a general file tagging program for a little while now. It's called TMSU and works by providing a tool with which you can tag your files. It also then lets you mount a tag based view of your files so that you can use tags to access your files from any other program.

(It's GPL3 and tested only on Linux at present, though it should theoretically work on BSD too. I've had a report it's not yet working on OSX. Windows port planned but not started yet.)

---------- Post added 06-07-12 at 11:07 AM ----------

http://www.tmsu.org/

Edit: just noticed the link to 'tagsistant' above. I have completely independently taken an almost identical path to that tool with my own. Shame I didn't find that before I started work on tmsu!

Last edited by oniony; 06-07-2012 at 05:16 AM.
 
Old 01-11-2013, 05:26 AM   #9
darkfeline
LQ Newbie
 
Registered: Jan 2013
Posts: 2

Rep: Reputation: Disabled
Add one more to the list =). Sorry for the necro, but I'm wondering if this is still an issue and whether the following seems like a good solution for it: (I can't post links yet, but search for 'arch linux hitagiFS' for the thread) (disclaimer: this is my project). In short, it provides a tag-based general file organization system based on hard links.

The reason for this shameless plug is that I'm looking to see if there is a need for the project I'm working on. If no one needs it, then I won't be as motivated getting it into shape. If there IS a need for it, I'll be glad to work on it, and hopefully others can get some use out of it.

I've looked at tagsistant, but it wasn't for me. Plus, it looks abandoned now. I haven't seen TMSU before, but it looks similar. The difference is that mine (hitagiFS) is much more transparent, relying solely on file system hard links and symlinks. There's no database or portability issues. If you use it for a while and decide that you don't like it, you can ditch the program, but the directory structure is left the same.

The reason I'm putting this here is (again) I would like to see if there's still interest in yet another (although quite different) tag-based file organization system. If not, I won't work as hard on it. As much as I like contributing, I'm not going to put in hours if no one'll appreciate it =).
 
Old 01-11-2013, 05:40 AM   #10
unSpawn
Moderator
 
Registered: May 2001
Posts: 26,944
Blog Entries: 54

Rep: Reputation: 2731Reputation: 2731Reputation: 2731Reputation: 2731Reputation: 2731Reputation: 2731Reputation: 2731Reputation: 2731Reputation: 2731Reputation: 2731Reputation: 2731
Quote:
Originally Posted by darkfeline View Post
I can't post links yet, but search for 'arch linux hitagiFS' for the thread) (disclaimer: this is my project).
Here, let me add that link for you: https://github.com/darkfeline/hitagiFS
 
Old 04-13-2013, 10:43 AM   #11
Tx0
LQ Newbie
 
Registered: Apr 2013
Posts: 2

Rep: Reputation: Disabled
@darkfeline: I'm working in these very days on release 0.6 of Tagsistant. Why do you think it's abandoned?
 
Old 04-13-2013, 03:04 PM   #12
darkfeline
LQ Newbie
 
Registered: Jan 2013
Posts: 2

Rep: Reputation: Disabled
Hi Tx0

Sorry if I was wrong. I think the last time I checked, the last news item was from a while back, and the documentation and everything seemed incomplete/old. But I seem to be wrong.

I think your tagsistant is cool, and it was one of the options I considered for my needs, but the documentation is a little messy. Could you explain the internals of tagsistant a little more? As I'm working on my program (now called "dantalian" because the first name was ill-chosen), I think there may be some duplication of effort here.

Are you storing the files in MySQL, or just the paths to the files? How do you handle filenames (I heard something about unique names for the entire database, but that may be wrong/outdated)?

P.S. I just read the 0.6 howto and I must say it looks a lot better than when I looked before, I am considering dropping my own project with mixed feelings, but I still think there's a fundamental difference in our approaches to the problem. For example, the deduplication: What if I want two separate identical files (because I want to edit one of them in the future)? It seems like you can't do that with tagsistant.

I'm also interested in your backend, and performance with huge numbers of tags/files (100,000 tags with as many files each, for example), as a programmers asking another for advice.
 
Old 04-14-2013, 05:08 AM   #13
Tx0
LQ Newbie
 
Registered: Apr 2013
Posts: 2

Rep: Reputation: Disabled
@darkfeline: I noticed right now that your post dates to January 2013 and I'm not sure I published news about 0.6 before February, so it's perfectly possible that you perceived Tagsistant as an abandoned project. The good news it's: it isn't.

There are still minor issues with Tagsistant 0.6 I'm fixing in the SVN repository, publishing a new release candidate every week or two. Yesterday I've changed a callback used to retrieve an integer from an SQL query because it used arbitrary chosen libDBI functions to get the number, while now the callback checks the return type and uses the proper libDBI function to fetch it.

The documentation on the site is largely outdated because it targets the 0.2 release which I don't support any more. As you noticed, the 0.6 release has a long howto here: http://www.tagsistant.net/documents-...tant/0-6-howto.

I'm not storing the file content in MySQL, just the name of the file. The file is actually stored in the archive/ directory inside the repository. I plan to organize archive/ in subfolders based on the inode of the objects stored to avoid hogging the archive/ directory and slowing down its browsing.

The issue about unique names in the filesystem is related to Tagsistant 0.2 (another reason for dropping it in favour of Tagsistant 0.6).

You are right about deduplication: if two files with the same content (same MD5 hash) are created, the second gets deleted and its tag-set gets transferred to the first copy. So it's not possible to edit just one copy: both are altered because they're one.

This is also a bit rough because can confuse the user: if two identical files named A.jpg and B.jpg are respectively copied inside tag1/ and tag2/ directories, after deduplication the file in tag2/ is called A.jpg too! But I'll address this in a future release.

About performance: this is hitting a nerve. The biggest load in every query is due to:
  1. parse the query in tokens
  2. build the tag tree (a tree representing the tags involved and the unions made with +/)
  3. reasoning the tag tree (that's finding related tags)
  4. build the corresponding SQL query (just for readdir)

The second and third steps are the most SQL intensive. So I decided to create a query cache which follows the previous list just the first time, duplicates cached queries from second time on, and deletes cache entries as soon as a new relation involves at least one of the tags included in the entry.

This is giving a 5-10x improvement! But I still don't have metrics about 100K tags, so I can't really answer to your question.

Last edited by Tx0; 04-14-2013 at 05:15 AM. Reason: improve the language
 
Old 11-15-2013, 05:04 AM   #14
oniony
LQ Newbie
 
Registered: Jun 2012
Posts: 2

Rep: Reputation: Disabled
Quote:
Originally Posted by darkfeline View Post
I haven't seen TMSU before, but it looks similar. The difference is that mine (hitagiFS) is much more transparent, relying solely on file system hard links and symlinks. There's no database or portability issues. If you use it for a while and decide that you don't like it, you can ditch the program, but the directory structure is left the same.
TMSU does not work this way. It does not store any files (only metadata) in its database and does not alter the original filesystem in any way, so you can likewise ditch TMSU and start up exactly where you started (only with a marginally shorter life). I think I'll throw up a comparison table on the TMSU wiki so that prospective users can pick the tool most appropriate for them.
 
Old 12-09-2013, 12:52 AM   #15
BryanFRitt
LQ Newbie
 
Registered: May 2004
Location: Charlotte, N.C.
Posts: 5

Rep: Reputation: 0
Found these links to TagFileSystem and related projects...
http://code.google.com/p/tagfilesystem/
http://code.google.com/p/tagfilesyst...SummaryOfTagFS
(note: I haven't tried anything out as of this posting)
 
  


Reply

Tags
hitagifs, tags filesystem organize


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
web based music organizer janesvdwal Linux - Server 4 04-11-2011 07:42 AM
How to processing the log file within certain dates based on the file name shyork2001 Linux - General 1 04-08-2010 03:35 PM
Remove lines in a text file based on another text file asiandude Programming 10 01-29-2009 10:59 AM
bash: renaming file extension based on actual file type alekone Linux - General 9 12-28-2008 09:12 AM
Bash remove part of a file based on contents of another file bhepdogg Programming 4 01-31-2007 03:13 PM


All times are GMT -5. The time now is 01:13 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration