【问题标题】:php script running multiple times when ran as cronjob but works find manually running from the browserphp script running multiple times when ran as cronjob but works find manually running from the browser
【发布时间】:2022-12-19 10:04:08
【问题描述】:

I have a scenario that has got me very confused and I need some other brains on this to help point me in the right direction.

I have a PHP script that I have had running for about 3 years with no issues starting to do something weird. The script's job is to pull records from a MySQL DB that contains rows of emails to send out. By rows of emails, I mean a record with a subject, body, To and From names and emails, and so on. I also have a column labeled [sent] which has a default value of 0 meaning that the email has not been sent, After a successful send, it changes the value to 1 so the main SQL call only looks for records where sent = 0. And of course, I have an ID column.

To send the email out I am using AWS SES (Simple Email Service) SDK. I am using a try/catch when sending the email to ensure to catch any errors if they happen, but for the most part, this script runs great or used to at least lol.

PHP Script

// the emails are one-to-one, meaning that for every record only one email is sent out.
// There is never a reason to send out any duplicates
$sql = "SELECT * FROM table_with_emails WHERE sent = 0";
$result = $conn->query($sql);
while ($row = $result->fetch_array()) {

    // This is the record ID of each row so I can log it later in the [emails_with_or_without_errors] table.
    $email_id = $row['ID'];'

    try {
        $result = $SesClient->sendEmail([
            'Destination' => [
                'ToAddresses' => $to_address_recipient_emails,
                'BccAddresses' => $bcc_address_recipient_emails,
                'CcAddresses' => $cc_address_recipient_emails
            ],
            'ReplyToAddresses' => ["$from_name <$reply_to_email>"],
            'Source' => $sender_email,
            'Message' => [
              'Body' => [
                  'Html' => [
                      'Charset' => $char_set,
                      'Data' => $html_body,
                  ],
                  'Text' => [
                      'Charset' => $char_set,
                      'Data' => $plaintext_body,
                  ],
              ],
              'Subject' => [
                  'Charset' => $char_set,
                  'Data' => $subject,
              ],
            ],
        ]);

        $messageId = $result['MessageId'];

        $timestamp = time();
        $ok = "Email sent successfully";
        
        // Log the email as successful along with the row's ID, this column should never have any duplicate entries.
        $sql_error = "INSERT INTO emails_with_or_without_errors (status,ok,timestamp,email_id) VALUES ('ID: $messageId','$ok','$timestamp','$email_id')";
        $result_error = $conn->query($sql_error);

        // After we log the transaction I then mark the row in table_with_emails `sent = 1` so that it will not choose that record again.
        $sql_update = "UPDATE table_with_emails SET sent = 1 WHERE ID = '$email_id'";
        $result_update = $conn->query($sql_update);
        
    } catch (AwsException $e) {
        // I catch the error and log it, but this almost never happens
    }

}

What's going on?

This is where the confusion starts. This script has always run as a cronjob every minute. For some reason about 1.5 weeks ago, duplicate emails have been being sent out. I know this because A) Customers called in support telling us they are getting duplicates, and B) the emails_with_or_without_errors table column email_id also contains duplicate IDs. This should never happen since that row should immediately be updated to sent = 1.

Also how many duplicates that are sent out are random. Sometimes 2, 3, 4, and 5, but usually no more than 5. What's kind of making my head hurt is if you look at the code in the try/catch you can see that after a successful send of that email it will immediately log it but most important mark that record as sent = 1. This should be preventing duplicate emails from going out, but for some reason, after the email is sentsuccessfullyit isstillable to send it out again regardless of sent = 1

Here is where it gets worse. If I instead stop the cronjob from running on the server and go to the script's URL directly and run it manually from my browser every minute it runs absolutely fine. No duplicates ever!

This only happens when I run it as a cronjob

So the first thing I did was

  • Checked to see if there is more than one instance of cron running.Nope, just one.
  • I restarted the server to see if that fixes it,Nope not that.
  • I thought to myself "Maybe there is a delay in writing to the table_with_emails table sent = 1. That would make sense to the random amount of duplicates going out. I can see this happening if the loop tries to send the next email, but if there is a delay to writing sent = 1 it would keep sending out the same email again until the row is updated, but this does not make sense because if that was the case then it would do it whether I run it manually or as a cronjob so that can't be it.
  • I also confirmed that AWS SES is not sending out the same email several times because when I log the response ID from AWS they are all unique. That tells me it is sending out separate emails and not duplicates.

Final Thoughts

  • Why does the script run fine when it is run manually from a browser, but not as a cronjob?
  • How in the world can that record be sent out with duplicates when directly after the email is sent out successfully it should be updating the record as sent = 1 preventing the main SQL statement from retrieving it again?

That's what I got, I really don't think my code is the issue and there is something else outside the box I am not seeing and I haven't touched that script in a few years, something else changed somewhere.

Can anyone give me ideas on where to look? and thanks in advance.

【问题讨论】:

  • It sounds like the script is taking longer than 1 minute to complete. So say you have 100 lines. The script runs at 8:00, takes 5 minutes, and updates the rows one by one. The script runs again at 8:01, and grabs the 80 rows that the first pass hasn't finished. At 8:02, it grabs the 50 that 1 and 2 haven't finished. You'll want to either spread out the cron a bit more, figure out how to speed it up, or check for instances currently running.
  • ok...that is something different. But what about being able to refresh my browser every minute pointing to the script's URL and it does not produce any duplicates? Could there be a delay in cron running the script as apposed to the browser?
  • The browser may stop running running the script after X seconds, or it's not 60 seconds exactly between requests.
  • Yeah, I am not totally convinced (but still something different) that would be it, because it will do it even with one record. I am going to log the run times on the script and see if I can see a difference in cron/manually. Thanks, man, that is something different for me to check.
  • Another thing you can double-check is try in your database SELECT email_id, min(timestamp), max(timestamp), count(*) as totals from emails_with_or_without_errors group by email_id having totals &gt; 1 to see the rows being sent more than once and the times on them.

标签: php mysql apache ubuntu-20.04


【解决方案1】:

I have had this issue and this happens when cron job triggers again while the previous one is in progress. I pulled my hairs figuring out the issue as situation was same running cron job from browser just worked fine.

Solution:

I created another script which have following code and it triggers the cron job code making sure that now instance of php-cli is running

$pid = shell_exec("ps -A | grep php-cli | awk '{print $1}'");
if(empty($pid)) {
    shell_exec("php-cli ~path-to-my-script/cron.php >> path-to-cronfolder/cron/err.txt 2>&1 & echo $!");
}

save it to file and add this file to cron job

Additionally since then I have design change to my packages so that I don't need to run the cron job every 5 minutes or so by including the above code where ever email cron needs to be run. so there is no pileup of any emails in db they are sent with user actions as following

1- user completes actions on ui and post data to server 2- server process the data, do the need full create entry in alert table ready to send 3- finally include the file with above code which triggers the cron script immediately and send out the email instantly

as backup I do run cron job every 30 minutes so if an email was missed for any reason its sent out (this has not happened yet).

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2022-12-02
    • 2023-02-14
    • 2022-12-02
    • 2021-06-19
    • 2022-12-19
    • 2022-12-02
    • 2022-12-02
    相关资源
    最近更新 更多