PHP Faster in Array
Recently I was porting some legacy PHP code into a Laravel framework. I came upon an interesting function in the legacy code with a name that intrigued me and got me wondering how useful this could be.
In the place it was being used, the array only had one element in it, so not very useful. But I noted it for possible later use.
One of the requirements for the project was to send out blast emails. Blasts could be sent out to a few users or up to 50,000 users. They contained optional sections, substitutions variables, custom signatures, tracking codes etc.
The emails were written by professional writers and customized for each sender to appear as a personal email from sender to recipient. For this reason, it is very important that the same email never went out to the same person twice.
Upon sending, we would look through sent logs (in a data store) and remove any users that already received that specific email.
Originally I looped through all the “sent” objects and compare their emails to recipients objects and remove any matches. So two nested for each loops. Almost an N-Squared O(N2) solution. In other words, an algorithm whose performance is directly proportional to the square of the size of the input data set. Not quite in this case since it was not iterating over the same data sets, but close enough.
This is fine when working will small sets of data. However as that data grows, it will start increasing run times dramatically.
Here are some time results with different data sets using nested iterations:
Number of users: 300 Number of sents : 100 Elapsed time: 0.0031440258026123
Very decent for a small data sets. But what happens when the data set size increases to 30,000 users and 10,000 “sents".
Number of users: 30000 Number of sents : 10000 Elapsed time: 104.06943202019
We can see that it is now taking a significant amount of time to process this.
And what happens if we increase it to 60,000 users but keeps the “sents" at 10,000.
Number of users: 60000 Number of sents : 10000 Elapsed time: 226.92668390274
Doubling the original users array more than doubles the processing time. So lets try implementing the fast_in_array function.
The first thing we need to do is create a new array which has the element we are searching for as the key.
Next, we still need to loop through our users once but instead of the nested loop we use our fast_in_array function.
At 300 users and 100 "sents" we get:
Number of users: 300 Number of sents : 100 Elapsed time: 0.00028800964355469
A significant improvement over the nested loops, but insignificant at this point.
So lets increase it to 30,000 users and 10,000 sents.
Number of users: 30000 Number of sents : 10000 Elapsed time: 0.040152072906494
We go from 104.069 secs to 0.040 secs. Now that is a very significant improvement.
And when we try 60,000 users and 10,000?
Number of users: 60000 Number of sents : 10000 Elapsed time: 0.078256845474243
We now go from 226.927 secs to 0.078 secs. Unbelievable performance gain.
So why is this?
Surely a few reasons, but we know PHP stores arrays in C using a hash table. My guess would be that the keys are sorted and when using PHP’s isset() it does not need to iterate over the complete set of elements before it knows when it is there or not. But I would need to dig in some PHP C code to confirm that.
You can view the complete PHP code used to perform these tests at: https://github.com/mikepmtl/faster-in-array
.