LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   General (https://www.linuxquestions.org/questions/general-10/)
-   -   MYSQL Database learning project. I could use some help. (https://www.linuxquestions.org/questions/general-10/mysql-database-learning-project-i-could-use-some-help-4175621517/)

mwx 01-12-2018 06:19 PM

MYSQL Database learning project. I could use some help.
 
Ok. So let's forget for a moment that this concerns Wordpress - I don't think it has anything to do with what I have going on.

I built my dad a blog several years ago and after a brief amount of instruction, set him free to upload pictures and write to his hearts content. Several times he sort of crashed the site by trying to upload files with 140+ character file names - multiple dots (.) - slashes - extensions - you name it...

He also uploaded about 2000 pictures and WP being WP - it made 3 additional copies of each photo - so now we have 8000 photos on the server and no real way of telling what's what. What has been used and what hasn't. What is actively embedded in an article and what's not, etc.

There are a few utilities out there which were supposed to help with this exact problem - most of which have been abandoned by their developers. The only one that remotely works - only does about 1/4/ to 1/3 the job you wished it would... So I'm going to try to learn how to come at it from a different angle. Hopefully someone will be able to give me a few ideas and make this a "teach a man to fish" moment.

What I want to do - or at least think I want to do - is be able to query the DB tables of all active pages (roughly 240) and get a return of all the photos listed in the published articles. If I'm really lucky - I'll also get a list of what articles each one is inserted into.

Then I'd like to query the DB and find out which photos it has a record of - that are not in use - as I don't think WP scrubs those records. I think it keeps a record of every photo ever uploaded which still resides on the server and hasn't been deleted through its media manager.

Then - with those two data sets culled - I can begin to formulate a plan of attack as to how I'm going to go about cleaning up the DB and removing the old records - as well as removing the older, no longer in use photos. The site is bulky - and lags more than it should - and I'd like to shrink the size of the DB and make it a bit more portable and streamline it for speed and efficiency...

Here's the thing... I don't know the first thing about how to get started. How to formulate the search strings. How to isolate the data I'm looking for. How to manipulate it once I find it...

It's a bit of a project - so I thought I'd ask for a hand in getting started... Sound too crazy? I just need some logical breadcrumbs that will lead me in the right direction. I've never tried to tackle anything like this. I'm in my first year of Linux Sys Admin classes and thought that this might be a neat project to work through so that I can hopefully learn something and help someone else in the future...

Sefyir 01-12-2018 11:01 PM

I don't want to discourage learning sql.. but
I think with removing duplicate files there's a easier solution (which I just tested)
As always, backup before doing this and check afterwards.

From here

Quote:

Deleting Unused Media Files in WordPress

Removing unused media files in WordPress is easy. All you have to do is to select the “Unattached” category from the drop-down in the menu bar found in the media library:
unattached media files

It will now display all “unattached” media files; that is, it will display all media files that aren’t being used anywhere on your site. You can delete them all, but it’s recommended that you create a backup of your wp-content/uploads directory first so you can restore them in case something goes wrong.
Edit: If the intention is to learn sql, you can use the above method to perhaps compare the two.
To start with basics though, you can try Sololearns course on SQL
https://www.sololearn.com/Course/SQL/

mwx 01-13-2018 07:55 AM

If only it were that easy. Selecting that option returns zero results.

Now I'm starting to think that I'll have to poll the DB and get listings of active or attached photos - then find or write a script which will traverse the directory structure and list the contents of all of the media folders - compare the two results and then manually delete the additional unused photos.

Habitual 01-13-2018 10:02 AM

Quote:

Originally Posted by mwx (Post 5805903)
If only it were that easy. Selecting that option returns zero results.

they may be buried in the wp_posts table.
I don't always use a CMS,
but when I do, it's Wordpress.

And I never leave home w\out https://codex.wordpress.org

mwx 01-14-2018 12:59 PM

So - using (find) I was able to pipe a list of all the files in the media directories out to a text file... Turns out my problem is considerably worse than I had first suspected. I thought there were 8000 photos? Nope... More like 24000 individual images scattered across multiple directories...

At least I have them all in a file which can be sorted. Next step is to get into the DB and see if I can get a listing of what's in use and what's not..

We shall see. I'll report back what I find... One lesson of the day is that I've been able to use basic Linux thus far to answer most of the questions. Have yet to learn any SQL... That'll be next I guess.

Habitual 01-14-2018 05:42 PM

Section 9.3.1
I'd scan all such unaccounted for content with clamscan because "as it turns out" media attachments are notoriously abused.
Upload one file in an incorrectly perm'd owner:group directory, and you could be in a world of hurt. I hope not ;)

Hardening Wordpress properly adhered to will be a good start on Securing your Wordpress.

Use Images says that some posts have attachments and those may account for the discrepancy in file count, however...
Use the codex. - It **is** the Official Source


Good Luck on your learning objectives. :)

Peace.

Sefyir 01-14-2018 10:16 PM

Quote:

I thought there were 8000 photos? Nope... More like 24000 individual images scattered across multiple directories...
Wordpress will by default create multiple duplicate images of varying sizes (for convenience?)
It is likely many of those images are duplicates, but with different resolution.


All times are GMT -5. The time now is 11:17 AM.