I've been tasked with a somewhat interesting project.
A customer (online retailer) of ours has hundreds of thousands of users. Each user can search for products by specifying a search term that may include wildcards. So far, trivially easy.
Now each user can save their search terms so they can run any particular search they want again in the future. Again, trivially easy.
Now-- the interesting part: the customer wants users to be notified instantaneously whenever a new product arrives that matches their saved search parameters.
Thousands of new products arrive each day... is there some better approach than a simple brute force attack of iterating through and executing each and every one the hundreds of thousands of saved search terms against each and every product as it arrives? If so, this would be murder on the databases and murder on the processors.
Perhaps break each user's saved search criteria into individual words that can be indexed, and then break each product's description/name into words as well and only execute saved searchs where at least one of the individual words is found in both the product description and the saved search criteria? This should work as long as the customer isn't expecting the wildcard to complete an actual word... for instance, if the customer search is "pizza*" the above approach would match a product name of "pizza oven" but not "pizzas"-- which it probably should.
Any ideas for optimizing the brute force approach which requires every search criteria to be evaluated against every new product?
A customer (online retailer) of ours has hundreds of thousands of users. Each user can search for products by specifying a search term that may include wildcards. So far, trivially easy.
Now each user can save their search terms so they can run any particular search they want again in the future. Again, trivially easy.
Now-- the interesting part: the customer wants users to be notified instantaneously whenever a new product arrives that matches their saved search parameters.
Thousands of new products arrive each day... is there some better approach than a simple brute force attack of iterating through and executing each and every one the hundreds of thousands of saved search terms against each and every product as it arrives? If so, this would be murder on the databases and murder on the processors.
Perhaps break each user's saved search criteria into individual words that can be indexed, and then break each product's description/name into words as well and only execute saved searchs where at least one of the individual words is found in both the product description and the saved search criteria? This should work as long as the customer isn't expecting the wildcard to complete an actual word... for instance, if the customer search is "pizza*" the above approach would match a product name of "pizza oven" but not "pizzas"-- which it probably should.
Any ideas for optimizing the brute force approach which requires every search criteria to be evaluated against every new product?