Exploiting the Amazon.com People Who Bought Also Bought Algorithm in Reagent Selection. Christian Tyrchan, Niklas Falk and Jonas Boström - PDF

Description
Exploiting the Amazon.com People Who Bought Also Bought Algorithm in Reagent Selection Christian Tyrchan, iklas Falk and Jonas Boström Setting the Scene The current trend is that drug discovery projects

Please download to get full document.

View again

of 28
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information
Category:

Science

Publish on:

Views: 21 | Pages: 28

Extension: PDF | Download: 0

Share
Transcript
Exploiting the Amazon.com People Who Bought Also Bought Algorithm in Reagent Selection Christian Tyrchan, iklas Falk and Jonas Boström Setting the Scene The current trend is that drug discovery projects are treated as processes creativity might be hampered, and little room for Serendipity? We need new ways of working we want creative users (not feeling stuck in processes) Making novel compounds is at the heart of drug design Thus, the aim of the current work is to enhance discovery, surfacing reagents from deep in the catalog that our chemists wouldn't find on their own. Using a novel approach, where similarity is based on users (not structures). Internet Success Stories ew Technologies ew Sciences Finite State Machines Item-to-Item Collaborative Filtering (ew approaches to improve searches) Recommendation Systems are best known for their use on e-commerce Web sites. attempts to present items that are likely to be of interest to the user. The idea of recommending items at checkout is nothing new The Harry Potter Shopping Cart Amazon.com saw the opportunity to personalize impulse buys The Harry Potter Shopping Cart The idea of recommending items at checkout is nothing new Recommendation Systems Typically, a recommender system compares the user's profile to some reference characteristics, and seeks to predict the 'rating' that a user would give to an item they had not yet considered. Should help a customer find and discover new, relevant, and interesting items Two main categories (based on how the recommendations are made): Content-based recommendations the information item user will be recommended items similar to the ones the user preferred in the past Collaborative recommendations social environment user will be recommended items that people with similar taste liked in the past Content-based and Collaborative Systems Content-based recommendations nly the movies that have a high degree of similarity to what the user s preference are would be recommended. Collaborative recommendations start by finding a set of customers whose purchased items overlap the user s purchased items. The algorithm aggregates items from these similar customers, eliminates items the user has already purchased, and recommends the remaining items to the user. focus on finding similar users represents a user as an -dimensional vector of items. Recommendations needed to work... from sparse data often just a few purchases. it needed to be fast high-quality in real-time. the system needed to scale to massive numbers huge amounts of data. the algorithm must respond immediately to new information customer data is volatile. one of the existing methods were good enough Traditional collaborative filtering does little or no offline computation, nline computation scales with the number of customers and catalog items. The algorithm is impractical on large data sets. Content-based recommendations no news (unless randomization) Item-to-Item Collaborative Filtering item-to-item collaborative filtering matches each of the user s purchased items to similar items, then combines those similar items into a recommendation list. To determine the most-similar match for a given item, the algorithm builds a similaritems table by finding items that customers tend to purchase together. Amazon.com's item-to-item approach computes the cosine between binary vectors representing the purchases in a user-item matrix. Given two vectors of attributes (A and B) the cosine similarity (θ) is represented using a dot product and magnitude as: Recommendations based on items which are most similar to query item. Greg Linden et al. Amazon.com Recommendations: Item-to-Item Collaborative Filtering, IEEE Internet Computing, 2003, 7, Since it works for Amazon.com, why not try it... to help medicinal chemist select reagents from chemical databases enhance discovery, surfacing reagents from deep in the catalog that our chemists wouldn't find on their own. Exploiting the Amazon.com People Who Bought Also Bought algorithm in Reagent Selection ot only suggesting new reagents, but also solving problems? For example, suggesting possible bioisosters: + reductive amination R H R Final product may be genetoxic. Design idea to avoid AMES positives R H Genetoxic AMES test is one measure of genetic toxicity Aromatic amines are often unwanted fragments in drug design (GeneToxic). Regulatory view: If carcinogenic in animals, it will be a carcinogen in man. Strategy Collect Data Set of Chemical Reagents Get Check-out information Generate Similarity Matrix using Cosine Similarities Import Matrix into an racle database Display Recommendations ISIS/db query items (reagents) which are most similar to query item (reagent). Check-out information Reagent Data Set Extract reagents in Stockroom ( CIMS ) checked out the last 5yrs reagents Filter amount!=0 tweak-1 canonical SMILES generated counter salts were removed (and reagents merged) unique compound id s assigned unique Grouping Assign reagents into 10 functional classes, by SMARTS mapping: tweak-2 Times Check-ut Check-out only once reagents could be mapped onto the 10 functional classes. 194 unique chemists. Reagents Tweak 1 counter-ions Ca 5000 entries include a counter-ion Different salts should give the same results For example, the reagent below exists with and without the hydrochloride salt F F ClH F F F F 3,3,3-TRIFLURPRPYLAMIE 3,3,3-TRIFLURPRPYLAMIE HYDRCHLRIDE The salts are removed, and the data are merged for the vectors. Tweak 2 functional classes A search for amines should only recommend other amines + R reductive amination H R Class Reagents Freq FunctionalGroups primary and secondary amines acids, acid halides, anhydrides, carbamates, carbonates, esters aromatic halides alkyl halides sulphonyl chlorides alcohols aldehydes, ketones boronic acids, trifluoroborates isocyanates, isothiocyanates alpha halide ketones (dual functionalities counted twice) Similarities Data binary User checked-out reagent (1), or not (0). Where the cosine between C0001 and C003 is: Item User C001 C002 C003 Anthony icholls Andrew Grant Morten Langgard = checked-out, 0 = not checked out Frequency almost all-against-all Binned Amazon.com Similarities* *Roughly 85% of the reagents belong in the zero bin Architecture racle and MDL ISIS/Base not web-based system user rows user-by-item matrix item columns updates over-night possible Results What does the frontend look like? Yet Another Similarity Measure? A Dream Come True? Possible ways forwards ther info revealed Frontend, and That little bit extra riginal CIMS CIMS-Recommend Available amount Location Amazon.com vs ther Similarities Lingos and 3 fingerprints are calculated (ECFP6, FPFP6, MDL Public keys). TopX hits compared to topx Amazon-hits. verlap (%) MaxHits* ECFP6 FPFP6 Lingo MDL Public Keys Amazon Hito Molame 1 C C C C0001 FP/Lingos Hito Molame 1 C C C C0134 Max C0955 Max C0251 Results show that Amazon recommendations are, more or less, orthogonal to other searching techniques. Amazon.com vs ther Similarities Top 10 structures selected from the Amazon-like selection and the ECFP4 fingerprint method for two queries Amazon Top 10 H H H H H H F ECFP4 Top 10 Cl Br H H F F Exploiting Recommendation Systems in Reagent Selection Design idea to avoid AMES positives + R reductive amination H R Search database for anline, and get Chemists who requested aniline also requested : All AMES negatives H S The advantage of such a feature is the inherent knowledge-transfer. In the dream scenario such a reagent suggestion could solve an existing problem. Medicinal Chemistry Poll Pre-defined sets? To diverse recommendations? Already better! Since I get everything in one go Most Frequently Checked-ut Reagents ther information easily accessible just ask the right question. Top5 amines H H H o. Checked-out Reagent Top5 aldehydes H o. Check-out Reagent Summary Recommendation systems are useful alternatives to search algorithms since they help users to discover items they might not have found by themselves. We presented a novel dynamic similarity measure personalized information was used to produce reagent recommendations, using Amazon.com s item-to-item collaborative filtering technique. Low threshold for trying first prototype finished within 1-2 weeks (as all infrastructure was in place) maintaining data can readily be updated nightly, weekly In the dream scenario such a [reagent] suggestion could solve an existing problem. not there just yet (too little data need more info ) ur recommendations are, more or less, orthogonal to other similarity measures. Positive comments in small MedChem poll. In the end, what we want is happy satisfied customers! Jens Sadowski for presenting! Acknowledgments Exploiting the Amazon.com People Who Bought Also Bought Algorithm in Reagent Selection Abstract. Amazon.com s People who bought [this book] also bought [these books] is a popular feature on numerous web-sites nowadays. The use of such arecommendersystemcanbeexploitedinmanyareas,alsoindrugdesign.in the current work a system to recommend reagents has been developed, using the item-to-item collaborative filtering technique. The goal is to enhance discovery, surfacing reagents from deep in our corporate reagent database; reagents that medicinal chemists might not have found on their own. Another potential advantage of using personalized information is the inherent knowledge-transfer. That is, in a dream scenario a reagent recommendation could solve an existing problem. Moreover, this novel similarity measure differs from other similarity measures; as it is based on user-item information and not descriptions of molecular structures. It will be shown that the recommendations are, more or less, orthogonal to other methods.
Related Search
Similar documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks