Introduction
Problem statement
A core function of any inventory management system is to always have up to date information about every customer - their purchase history, deliver status, account details, etc. In inventory management, being real-time or near real-time for a customer makes a big impact on the customer experience. Inventory management systems must therefore query each and every customer account to ensure these details are up to date. But - how can we code an efficient scheduling system - for a database consisting of 1000’s, tens of thousands, even millions of customers?
Suppose an e-commerce business uses a API/database system where each customer’s account is queried at regular intervals to make sure we don’t miss any new information. For example, imagine each account in the system needs to run once every 15 minutes to capture any changes that may have taken place over the past 15 minutes (purchases, cancellations, account information, etc.).
In this blog post we’ll write an algorithm that can complete this scheduling task for us.
Assumptions
- The function will be called EVERY minute at the top of the minute.
- We will have access to a single piece of information - the list of account id’s to be queried. This will be in the form of a Python
List
. For example:account_ids = [1000, 1001, 1002, 1003, ...]
. - Assume we have the following function signature and framework:
def get_account_ids_to_run(account_ids_list):
account_ids_to_return = []
# Our logic will go here.
# account_ids_list is defined above.
# Expect it to be an array of integers
# (representing the account ids).
return account_ids_to_return
Requirements
Create a function that will schedule the accounts to be queried every 15 minutes by returning ONLY the account ids that need to run each minute. We can be guaranteed that this function will be called every minute, on the minute, and will pass you only the account IDs on the system.
For example, account 1000
may run at minute 0, minute 15, minute 30, minute 45 and needs to be in the list of accounts returned each time the function is called during those minutes.
First thoughts
First we need to get a representation of time - and since this isn’t provided to us and we aren’t given any indication of time constraints other than the function being run every minute we might as well fetch the current system time. For this we’ll use Python’s built in datetime
library (see Python documentation).
from datetime import datetime
Here the datetime
class provides us many methods with which to access and analyse the current machine time. We’ll use the class method datetime.now()
to return the current local date and time. This will be in the form of a datetime
object, which the Python documentation tells us is a single object containing all the information from a date
object and a time
object. The datetime
object will assume the current Gregorian calendar and that there are exactly 3600*24 seconds in every day. Consequently it should look something like the following 2019-09-03 10:30:05.017128
.
Having accessed the appropriate datetime
class method and returned a datetime
object we can now make use of the class attributes this makes available to us. First, use datetime.time()
to return a time
object with the hour, minute, second, and microsecond associated with the datetime
object. Then, access the specific minute (of the current hour) associated with datetime
by using the datetime.minute
attribute. These two steps can be chained together, along with the datetime.now()
class method into a single expression that returns the current system clock minute, which for the sake of argument I’ve named specific_minute
.
specific_minute = datetime.now().time().minute
In short: we now have an integer variable containing the specific minute that our function is being run.
Thinking through a solution
We need someway of linking each account id to the specific minute that our function is running. The variable account_ids_list
will contain a range of ids for us to consider - so we’ll need to loop through each in turn - but as we do so lets use the remainder operator %
to give us a value that uniquely links the current minute to the account id under consideration.
for item in account_ids_item:
uniq = item % 60
For example, id item = 1000
will return uniq = item % 60 = 40
Likewise id item = 1001
will return uniq = item % 60 = 41
. And so on until we get to id item = 1019
and then id item = 1020
at which point the variable uniq
will loop from uniq = 59
to uniq = 0
. In this way we see how regardless of the numerical value of the account id provided to us, there is a way to link that id to a particular minute in the hour. And so now all we need to do is return those account id’s whose value of uniq
matches the specific minute of the hour the function is being run (specific_minute
).
if (specific_minute == uniq):
account_ids_to_return.append(item)
Accounts will be scheduled every 15 minutes
Lets not forget that our function will be run every 15 minutes - thus if we know each account id will be queried four times every hour, that is every 15 minutes, then it makes sense that if we add 15 onto the uniq
variable we can find the other three occasions when this particular account will be returned.
i = 0
while i < 3:
uniq = (uniq + 15) % 60
if (specific_minute == uniq):
account_ids_to_return.append(item)
i += 1
First solution
Proposed Solution
def get_account_ids_to_run(account_ids_list):
account_ids_to_return = []
from datetime import datetime
specific_minute = datetime.now().time().minute
for item in account_ids_list:
uniq = item % 60
i = 0
if (specific_minute == uniq):
account_ids_to_return.append(item)
while i < 3:
uniq = (uniq + 15) % 60
if (specific_minute == uniq):
account_ids_to_return.append(item)
i += 1
return account_ids_to_return
Testing
def get_account_ids_to_run(account_ids_list, set_time=False, set_minute=0, verbose=False):
account_ids_to_return = []
if (not set_time):
from datetime import datetime
specific_minute = datetime.now().time().minute
else:
specific_minute = set_minute % 60
for item in account_ids_list:
uniq = item % 60
if (verbose): print("Account id: ", item, " will be run on minute: ", uniq)
i = 0
if (specific_minute == uniq):
account_ids_to_return.append(item)
while i < 3:
uniq = (uniq + 15) % 60
if (verbose): print("Account id: ", item, " will be run on minute: ", uniq)
if (specific_minute == uniq):
account_ids_to_return.append(item)
i += 1
return account_ids_to_return, specific_minute
account_ids_list = list(range(1000))
for looper in range(100):
run_these, specific_minute = get_account_ids_to_run(account_ids_list, set_time=True, set_minute=looper, verbose=False)
print("\nThe minute is: ", specific_minute)
print("This minute there are ", len(run_these), " accounts to run.")
print("The account numbers to run are:")
print(run_these)
We use the for
loop to mimic the passage of time - and test for the passage of 100 minutes. The current minute is passed into our function as parameter set_minute
. Within the function, just to be safe, I use the remainder operator on the set_minute
parameter, so I can be absolutely sure it’s within the 0
to 59
range needed. Now that we are simulating a particular minute of the hour, we can watch the output and check that it behaves how we might expect.
Example Output
The minute is: 0
This minute there are 67 accounts to run.
The account numbers to run are:
[0, 15, 30, 45, 60, 75, 90, 105, 120, 135, 150, 165, 180, 195, 210, 225, 240, 255, 270, 285, 300, 315, 330, 345, 360, 375, 390, 405, 420, 435, 450, 465, 480, 495, 510, 525, 540, 555, 570, 585, 600, 615, 630, 645, 660, 675, 690, 705, 720, 735, 750, 765, 780, 795, 810, 825, 840, 855, 870, 885, 900, 915, 930, 945, 960, 975, 990]
The minute is: 1
This minute there are 67 accounts to run.
The account numbers to run are:
[1, 16, 31, 46, 61, 76, 91, 106, 121, 136, 151, 166, 181, 196, 211, 226, 241, 256, 271, 286, 301, 316, 331, 346, 361, 376, 391, 406, 421, 436, 451, 466, 481, 496, 511, 526, 541, 556, 571, 586, 601, 616, 631, 646, 661, 676, 691, 706, 721, 736, 751, 766, 781, 796, 811, 826, 841, 856, 871, 886, 901, 916, 931, 946, 961, 976, 991]
...
...
The minute is: 15
This minute there are 67 accounts to run.
The account numbers to run are:
[0, 15, 30, 45, 60, 75, 90, 105, 120, 135, 150, 165, 180, 195, 210, 225, 240, 255, 270, 285, 300, 315, 330, 345, 360, 375, 390, 405, 420, 435, 450, 465, 480, 495, 510, 525, 540, 555, 570, 585, 600, 615, 630, 645, 660, 675, 690, 705, 720, 735, 750, 765, 780, 795, 810, 825, 840, 855, 870, 885, 900, 915, 930, 945, 960, 975, 990]
The minute is: 16
This minute there are 67 accounts to run.
The account numbers to run are:
[1, 16, 31, 46, 61, 76, 91, 106, 121, 136, 151, 166, 181, 196, 211, 226, 241, 256, 271, 286, 301, 316, 331, 346, 361, 376, 391, 406, 421, 436, 451, 466, 481, 496, 511, 526, 541, 556, 571, 586, 601, 616, 631, 646, 661, 676, 691, 706, 721, 736, 751, 766, 781, 796, 811, 826, 841, 856, 871, 886, 901, 916, 931, 946, 961, 976, 991]
Notice how there are the same number of accounts to be returned (and thus queried) for each minute that our function is run. This is especially important as it means the load we are placing upon the API/server is evenly distributed. We also clearly see the progresson of account numbers to be run, for each minute, and that after 15 minutes have passed the cycle repeats - with those accounts that were returned at minute 0 being run again at minute 15.
Improvements
One immediate area to improve is to allow the user to specify the time periods over which our function will be run. That is, rather than having it wired into the code that the function is run every minute, and that each account must be returned again after every 15 minutes, can we instead allow the user to specify that our function should run every x
minutes … or indeed every x
hours … or every x
days?
Review
What do you think? Do you agree with this solution? And what about the improvements - can you think of how to re-write the above solution in a more abstract fashion to then allow the user to choose the time period over which the accounts are queried?