OJS OCS OMP OHS

You are viewing the PKP Support Forum | PKP Home Wiki



Database issue

Are you responsible for making OJS work -- installing, upgrading, migrating or troubleshooting? Do you think you've found a bug? Post in this forum.

Moderators: jmacgreg, btbell, michael, bdgregg, barbarah, asmecher

Forum rules
What to do if you have a technical problem with OJS:

1. Search the forum. You can do this from the Advanced Search Page or from our Google Custom Search, which will search the entire PKP site. If you are encountering an error, we especially recommend searching the forum for said error.

2. Check the FAQ to see if your question or error has already been resolved.

3. Post a question, but please, only after trying the above two solutions. If it's a workflow or usability question you should probably post to the OJS Editorial Support and Discussion subforum; if you have a development question, try the OJS Development subforum.

Database issue

Postby khufu » Wed May 23, 2012 2:53 am

I have a LAMP server where the latest ojs version (2.3.7) is installed and running fine.
Thanks to the ImportExport plugin the system receives articles automatically (this is what I need).
The problem is that I have not the control of the database because it grows in an uncontrolled manner.
I would like to delete automatically articles older then a week.
Does anybody know how to do it?

Many thanks in advance

regards
khufu
 
Posts: 23
Joined: Tue Mar 20, 2012 8:19 am

Re: Database issue

Postby JasonNugent » Wed May 23, 2012 12:16 pm

Hi khufu,

Deleting an article is a pretty complex undertaking from a code perspective. There are a number of entities that need to be removed, and I'd be hesitant about recommending a solution that did this automatically. Having said that, there is a script that comes with OJS in the tools/ directory called deleteSubmissions.php that can accept a number of submission_ids on the command line as arguments and remove them. You'd just have to figure out how to get a list of submissions older than a week, and perhaps write a scheduled task or a regular cron entry to handle it.

Regards,
Jason
JasonNugent
Site Admin
 
Posts: 891
Joined: Tue Jan 10, 2006 6:20 am

Re: Database issue

Postby khufu » Thu May 24, 2012 1:42 am

Jason,
Many thanks for your answer.
I cant find any documentation about this script, can you help me?
Browsing my database with phpmyadmin I see that the value subbmission_file_id is always NULL for all articles.
Instead the article_id field is filled with a number that represents the order of arrival of the item.
I tried, for example, the following :
Code: Select all
php deleteSubmissions.php 1

I get the following message :
Code: Select all
Warning: no files found for submission 1

But actually the article with the article_id=1 has been deleted.
I am not sure if this is fine, what do you think ?
Have you any idea how to get the list of articles_ids stored in the database?
I written that I want to keep only one week of articles, but I can change mind and keep a certain number of articles.
The most important thing is that the Database must not grow indefinitely

regards
khufu
 
Posts: 23
Joined: Tue Mar 20, 2012 8:19 am

Re: Database issue

Postby JasonNugent » Thu May 24, 2012 6:04 am

Hi khufu,

What you're really deleting is the article, so you'd use the article_id on the command line. You don't need to worry about submission files -- those get removed when the article is deleted, and that warning is just letting you know that it deleted an article that had no files. You would need to write a script of some sort to get the list of article ids that you would want to delete, based on submission date. A few lines of PHP with an SQL query would do that just fine.

Something like this would work:
Code: Select all
SELECT article_id FROM articles WHERE unix_timestamp() - unix_timestamp(date_submitted) >= 604800;


with 604800 being the number of seconds in a week.

Regards,
Jason
JasonNugent
Site Admin
 
Posts: 891
Joined: Tue Jan 10, 2006 6:20 am

Re: Database issue

Postby khufu » Sun Jun 03, 2012 12:47 pm

Jason,
Thanks again for your answer.
Because I am not a PHP programmer, I resolved in another way :

Thanks to this bash script command :
Code: Select all
MYSQL ojs -u root -ppassword -e 'select article_id,date_submitted from articles' > articles_list

The file articles_list contains a list of articles_id and the submitted date like :
....
1104 2012-05-29 21:14:42
1105 2012-05-29 21:14:57
1106 2012-05-29 21:15:02
1107 2012-05-29 21:15:08
1108 2012-05-29 21:15:12
1109 2012-05-29 21:15:16
1110 2012-05-29 21:15:39
1111 2012-05-29 21:16:54
1112 2012-05-29 21:16:57
1113 2012-05-29 21:17:06
....
With a bash script I parse this file and delete the items older then X hours launching the php script you suggested.
Code: Select all
php /var/www/ojs/tools/deleteSubmissions.php $YYY

Where YYY is the article ID of the article I want to delete.

If you are interested to the entire script, let me know, I can post it.

Now, I have another problem.
After several days of automatic importing and automatic deleting of articles, I found another problem.
The automatic importing procedure requires more and more time.
At the beginning to import an article it took 3 or 4 seconds.
The day after 5 or 6 seconds and so on until now that it takes 25 seconds.
I found this very strange also because the dimension of the articles are always the same.
Do you have any idea ?
khufu
 
Posts: 23
Joined: Tue Mar 20, 2012 8:19 am

Re: Database issue

Postby asmecher » Sun Jun 03, 2012 1:13 pm

Hi khufu,

There may be some related tables that aren't getting purged at the same time (and if that's the case and we're able to track down which ones, we can fix it). However, the most likely cause is that your search index is getting big. Try re-running "php tools/rebuildSearchIndex.php" to see if that speeds up your process.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8869
Joined: Wed Aug 10, 2005 12:56 pm

Re: Database issue

Postby khufu » Mon Jun 04, 2012 3:07 am

Alec,
Many thanks for your answer.
No luck! I have just tried to lunch the rebuildSearchIndex.php script but the import speed is the same.
Any other idea ?
regards
khufu
 
Posts: 23
Joined: Tue Mar 20, 2012 8:19 am

Re: Database issue

Postby JasonNugent » Mon Jun 04, 2012 3:46 am

Hi khufu,

Out of curiosity, do you perform any nightly or weekly maintenance on your MySQL database? MySQL tables and their related indexes can become fragmented over time, especially if you are deleting and adding records to them a lot. It may be worth looking into running myisamchk or perhaps 'optimize table xxxxx' on some of the tables that your data has been going into. The 'articles' and 'article_settings' tables would be good candidates as a start.

http://dev.mysql.com/doc/refman/5.1/en/ ... table.html

(Make backups of your database before running maintenance commands, please)

Regards,
Jason
JasonNugent
Site Admin
 
Posts: 891
Joined: Tue Jan 10, 2006 6:20 am

Re: Database issue

Postby khufu » Mon Jun 04, 2012 8:54 am

Jason,
I perform a hourly cleanup (deleting old articles) and the flow of new articles is about 2000 per day.
So what happens when I delete old articles is that article_ids from 1 to X are not used anymore.
To better explain the situation let me write a practical example :
Let start with 100 articles in the database (id from 1 to 100).
Then i start the cleanup of first (older) 10 articles:
In the DB now I has articles from id=11 to id=100
Then I insert new articles for example 5.
The ImportExport plugin automatically import the articles assigning id from 101 to 105 (and not from 1 to 5 that are free)
So the first ids already used and now empty are not used anymore.

Following your suggestion I did this :
mysql -u root -p
mysql> use ojs;
mysql> OPTIMIZE TABLE articles;

mysql answers immediately in this way :
+--------------+----------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+--------------+----------+----------+----------+
| ojs.articles | optimize | status | OK |
+--------------+----------+----------+----------+
1 row in set (0.00 sec)

But nothing apparently changed.

What I did wrong ?
Any suggestions ?
khufu
 
Posts: 23
Joined: Tue Mar 20, 2012 8:19 am

Re: Database issue

Postby JasonNugent » Mon Jun 04, 2012 9:43 am

Hi khufu,

You didn't do anything wrong -- that's all the command returns. On really large tables (in the Gb or Tb size), that command can take an hour or more to run. OJS stores data related to articles in many tables though, very little of which is actually in the articles table. You would also want to run that on the other tables within OJS, especially those related to article storage (like the article_settings table, and the published_articles table) Simply deleting articles (your hourly cleanup) is not going to optimize the database structure. It is completely fine that the article_id values are no longer in sequence. That column is a uniquely generated auto-increment field, and gaps are allowed.

Alec's point regarding a large table that may not be getting cleaned up is a good one. What you might want to do is take a look inside of the directory containing your OJS database files, and see which of those files are the largest. It might provide some insight into why new imports take so long.

Jason
JasonNugent
Site Admin
 
Posts: 891
Joined: Tue Jan 10, 2006 6:20 am

Re: Database issue

Postby khufu » Mon Jun 04, 2012 11:41 pm

Gents,
I probably found what the problem is;
I did this test :
1) With a batch script and the deleteSubmissions.php script I deleted all the articles entries.
--> The importExport.php plugin takes 25 sec to import an article
2) I lunched 'optimize table' for all the tables in ojs. This reduced the size of the database but
--> The importExport.php plugin takes 25 sec again to import an article
3) I deleted manually all the entries into the following tables :
issue_settings
issues
custom_section_orders
custom_issue_orders
This third step changed all --> The importExport.php plugin takes less then 1 second to import an article !!!!!

So, it seems that the deleteSubmissions.php script does not do all the job !
The tables issue_settings, issues, custom_section_orders and custom_issue_orders keep track also of all the deleted articles so they grow up indefinitely.

I think that, when I delete an article, I should delete the correspondent item into the 4th tables (issue_settings, issues, custom_section_orders, custom_issue_orders keep)

Now the problem is how to identify the correspondent items into the 4th tables.
The 4th tables entries are identified by the issue_id while the articles are identified by the article_id

Apparently issue_id =1 correspond to article_id =1 , 2 to 2, etc...
but I am not really sure ...
khufu
 
Posts: 23
Joined: Tue Mar 20, 2012 8:19 am

Re: Database issue

Postby JasonNugent » Tue Jun 05, 2012 3:28 am

Hi khufu,

There'd be no reason for the deleteSubmissions script to delete anything in the issues tables -- that's not what that script is for.

If you have a new issue for every article that you import (are you sure that you've got unique issue_id values for each article?) then I would probably figure out a way to remove issues that contained no articles, once you removed the submissions themselves. A better approach might be to look at why each imported article is getting a new issue in the first place. I'm not sure how you present this data to your end users, but I'd generally recommend re-using an existing issue instead of creating a new one.

Jason
JasonNugent
Site Admin
 
Posts: 891
Joined: Tue Jan 10, 2006 6:20 am

Re: Database issue

Postby khufu » Tue Jun 05, 2012 11:54 pm

Jason,
The number of the articles and of the issues is exactly the same.
What I am doing now is that when an article is deleted also the issue with the same ID is deleted.
This has been working for more then 24 hours with about 400 articles in the database (now). In this way importing takes 2 seconds (no bad!).
You put in my mind that probably there is a way to import all the articles using the same issue!
I have to figure out how. Maybe is the use of the php import script but much more probably is the structure of the xml file that I import that need to be modified.

Your opinion is welcome.

meanwhile I will study on it ....
khufu
 
Posts: 23
Joined: Tue Mar 20, 2012 8:19 am

Re: Database issue

Postby khufu » Wed Jun 06, 2012 12:38 am

Jason,
As usual you hit the target.
The xml file that I import is badly structured. Here is the top :
Code: Select all
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE issue PUBLIC "-//PKP//OJS Articles and Issues XML//EN" "http://pkp.sfu.ca/ojs/dtds/native.dtd">
<issue published="true" current="true">
   <title>
06-GIU-12 09:24
   </title>
...........

As you can see the issue title is the date of the article (I do not remember why I put the date there !!!!)
This is why for each article a new issue is created.
Now I want to try using a fixed 'string'

I will let you know.
khufu
 
Posts: 23
Joined: Tue Mar 20, 2012 8:19 am

Re: Database issue

Postby khufu » Wed Jun 06, 2012 1:34 am

Jason,
even if I put a fixed string in the issue title the import creates a new issue for each new article.

any idea ?
khufu
 
Posts: 23
Joined: Tue Mar 20, 2012 8:19 am

Next

Return to OJS Technical Support

Who is online

Users browsing this forum: Bing [Bot] and 4 guests