Learning to rank through user preference data

Analysis of data aggregated preferences. The ranking of the elements as separate and static elements. Evaluation of the algorithms, assumptions, weight and shifting implicit preferences. The essence of ranking elements as a function of their attributes.

Рубрика Программирование, компьютеры и кибернетика
Вид дипломная работа
Язык английский
Дата добавления 30.08.2016
Размер файла 147,5 K

Отправить свою хорошую работу в базу знаний просто. Используйте форму, расположенную ниже

Студенты, аспиранты, молодые ученые, использующие базу знаний в своей учебе и работе, будут вам очень благодарны.

· Joachims, as expected, tended to put a big penalty on the first-ranked element.

· If judging the orderings under different assumptions, the one from weighted preferences seems to get overall good scores under all assumptions.

So, there does not seem to be a clear winner, and we do not know which kind of ordering would customers prefer. Unfortunately, it was not possible to test these orderings in an online setting, and they would probably require a very large number of clicks - definitely more than there were in the data - in order to determine if one order is significantly better than others under whatever criteria they are evaluated (number of clicks, number of purchases, revenue, average rank of the clicks, etc.).

Some obvious shortfalls of using such methods are that they favor items that have been for a longer time in the catalog, and that they cannot rank new items. One way of tackling the first problem could be to weight preferences according to how long has the clicked item been in the catalog. However, since the preferences are between two items, it is not clear if this would really be a good solution. In order to tackle the problem of ranking brand new items, if it is possible to establish the similarity between items (if there are features, for example, or by the text descriptions if these are representative), then it could be initially ranked next to the existing item to which it is more similar, and from there on let it accumulate clicks. The time of introduction of the items was, unfortunately, not available for this work.

As well, it might be better to assign higher weights to more recent preferences, but again, the date of each click was not available.

5. Ranking items as a function of their attributes

One caveat with the methods from the previous section is that items oftentimes share similar attributes, and it would be reasonable to expect that characteristics such as the brand of a product would influence its attractiveness. Thus, the clicks might be somewhat transferable from one product to another to the extent that important characteristics are shared, but such similarity cannot be captured if items are treated as atomic or independent elements. As well, it might be the case in some situations that customers conceive the attractiveness of items based solely on their constituent characteristics - for example, it is very reasonable to assume that the attractiveness of different hard disks in a computer hardware store is given by their brand, capacity, price, connection type, etc.

Thus, if the attributes that make up a product are measurable or possible to categorize, items could instead be defined by their features and customers might judge products by an unknown utility function such that , where denotes the set of attributes of product X. If this assumption holds, then the ideal ordering would be the one that ranks items by their utility function in descending order. For these purposes, the exact values of are meaningless and two utility functions and can be considered equivalent as long as

This approach presents some inherent advantages over the previous one, as it would allow to immediately rank new products, does not favor older products, does not require all items to have a reasonable chance of being seen (thus it can be used for larger catalogs with thousands of items), can identify potentially attractive items that are not normally seen and suffers less from the bias of the preferences, but it requires more data and suffers from approximation inaccuracy and representation bias.

The problem of estimating such utility functions has been studied in Marketing Research in the context of conjoint analysis, although the goal there is not to rank items but to assess the value of each characteristic by making the estimated utility function linear - that is, to find a vector of positive weights such that . Nevertheless, this simplified approach is also suitable for ranking, and some ranking algorithms for search engines follow a similar approach, with features relating documents to queries instead of descriptive features (Burges, 2010, Hang, 2011). As a side effect of ranking with a linear function, it is also possible to obtain direct estimates of attractiveness of different attributes, so it can also be used to, for example, know if one brand is more attractive than another one, ceteris paribus, and the coefficients can be manually altered according to marketing goals such as favoring a certain brand over another.

One of the most popular approaches to conjoint analysis (Hauser & Rao, 2004) is to administer a questionnaire to different people in which they are asked to choose the best between two or more products (known as choice-based conjoint analysis, in contrast with designs where users rate or rank the products), and one of the fundamental problems is dealing with the differences that arise between individuals (Allenby & Rossi, 1998, Evgeniou, Pontil & Toubia, 2007). The simpler algorithms work by estimating individual utility functions for each person and then averaging the coefficients to obtain an overall feature attractiveness list. The product attributes are categorized, and since the possible number of combinations of different features is usually very large and people get tired after answering a couple of questions, the items to compare in each question are not picked at random, but are rather obtained by an orthogonal design that aims to make the most possible attribute comparisons with as few questions as possible. More recent approaches in internet surveys use adaptive questionnaires instead, where the products to compare are chosen according to the uncertainness of the coefficients (Abernethy et al, 2008, Lin, 2008).

In contrast to the explicit surveys used in conjoint analysis, if mining implicit pairwise preferences as described in this work, there are far larger amounts of data, but it is biased, is a lot noisier, and users do not provide each the same product comparisons. A naпve approach would simply merge all the generated preferences as if they were coming from the same user, to then estimate an aggregated utility function.

When estimating utility functions, some of the most popular approaches have been based on logistic regression or Hierarchical Bayes models (Ben-Akiva et al., 1997, Evgeniou, Pontil & Toubia, 2007). In the case of logistic regression, the problem is usually modeled as making the probability of choosing the item being preferred at each question be higher than the probability of choosing the non-preferred item(s). The exact mathematical formulations vary from work to work, and most of the advances have been in dealing with heterogeneity in consumer preferences.

More recently, newer ideas have been tried based on advances in machine learning (Evgeniou, Boussios & Zacharia, 2005, Chapelle & Harchaoui, 2005). Particularly, Evgeniou, Boussios & Zacharia (2005) proposed a different mathematical way of viewing the problem, using a single utility function for all the respondents (as is desired here), with inequalities instead of logistic-based probabilities, and introducing errors as slack variables, with the intuitive goal being finding such that while minimizing the slack variables (the error terms) and adding a penalization to the norm of to avoid overfitting.

This can be efficiently solved in polynomial time using inner-point methods and has a unique solution, while the optimal lambda parameter can be found by grid-search with cross-validation. Intuitively, if using it for ranking, it would be desirable to put on a different penalization on the weights, so that some could be exactly zero and it could performing variable selection (Tibshirani, 1996), but this makes the optimization problem a lot harder and requires different optimization methods that are not efficient (Ye, Chen & Xie, 2011). Additionally, in order to make sure that all attributes have positive weights as is oftentimes desired in conjoint analysis, virtual comparisons are added in which a product containing one attribute and nothing else is preferred to a product containing no attributes, rather than adding these constraints directly to the optimization problem.

Evgeniou, Boussios & Zacharia (2005) report the results obtained from this algorithm to be superior to those of logistic regression in different experiments as measured by proportion of correctly classified pairs (one being preferred to the other), and report kernel-based models to overfit the data in their experiments.

This problem turns out to be equivalent to that of building a one-class SVM having as features the difference of the features between the preferred and non-preferred items (Hang, 2011). This work will follow such approach with some minimal differences: instead of using the typical C-SVM (named so because of its hyperparameter , where ) whose hyperparameter can lay in the unconstrained range , nu-SVM (Schцlkopf et al., 1999) will be used, whose hyperparameter is constrained to lay in and is thus easier to tune; and the pairwise preferences will be weighted according to their value in the preferences table as in the previous section - the result is equivalent to adding one observation for each repeated pairwise preference in the table. Since the goal is to rank items, no positive-attribute constraints will be added, but negativity constraints will be added for price by introducing virtual comparisons in the training data.

In theory, it would be possible to use the kernel trick to come up with non-linear utility functions in an efficient manner, but one big problem of doing this is that ranking with it would not be as straightforward, as it would only be able to tell whether one item is more attractive than other, but these relationships are not guaranteed to be transitive, and we would be back to the same problem as before.

If a non-linear function without kernels is desired, then it is also possible to perform a polynomial expansion of the columns, but this increases their numbers very quickly - for example, if there were 100 columns, a grade-2 polynomial expansion would give columns, grade-3 would give 176,850 and so on. The current implementation of one-class SVM in libsvm is reported to have a running time between and , so this approach will not be explored here due to the required run time.

Experiment design

In order to test the algorithm, pairwise preferences were obtained as described in section 2, for simplicity reasons under Agichtein's assumption only, from a different section of the same webpage, this time containing women's jackets. It contains 119 items and there were 296 browsing sessions with at least 1 click, totaling 662 clicks. The click distribution is depicted below, along with a smoothing line fit using the least-squares method:

This click-rank distribution indicates that lots of users browse deep down the catalog and so all elements have a reasonable but unequal chance to be seen, and the number of items is low-enough that it is still possible to run the previous algorithms, while also being high-enough to reasonably expect to rank items by an approximated utility function. Unlike the colorable books for children from the previous section, the products here can be reasonably described by attributes such as jacket type, brand, price, having a zipper vs. buttons, having or lacking a cap, material, etc.

These features were obtained by directly crawling the retailer's webpage, which contained semi-structured data with the product characteristics, and underwent the following processing: some equivalent features were merged, and levels of similar attributes were also merged, such as “blue” and “dark-blue”, to the extent that their names could be matched in Russian language. Non-visible characteristics such as the country of origin were eliminated. For categorical attributes that can have multiple values, the values were divided by the number of present features - so a jacket that has the top-half blue and the bottom-half red would have 0.5 in the columns corresponding to the red color and blue color. The same processing was performed to material, but using the proportions specified in the info - so a product that is 60% cotton and 40% polyester would have a value of 0.6 in the cotton column and 0.4 in the polyester column. Numerical attributes such as length were not categorized (since it is not the goal to find optimal levels), but were rather mean centered, scaled, and then divided by some arbitrary values to diminish their weight relative to other attributes. Categorical values were coded as one column per category rather than typical dummy-coding (this does not pose any mathematical problems) and missing values - which were very common in all products - were treated as not having any value in the columns corresponding to the missing attribute. Price was a problem, since in some cases there were discounts introduced or removed and permanent price changes since the time the clicks happened and the time the data was crawled from the page. In some cases, products went out of stock and their price was no longer shown. In order to treat it, missing prices were imputed by weighted kNN with k=5, and the variable was also mean centered and scaled.

After all this processing, there were a total of 26 attributes, with a mixture of numerical, categorical and binary values, and after setting one column per categorical level, there were 115 columns. From the possible pairs of items, there were 5,354 non-zero entries in the preferences table. The accuracy of the algorithm was obtained by 3-fold cross-validation, by dividing these 5,354 pairs in 3 folds at random, training the weights with two of them - some attributes were entirely missing in the training data in some cases - and their accuracy evaluated by applying the utility function to the pairs in the hold-out set - which in the previous section was not possible to do because the items were independent entities - and this accuracy was then weighted by the magnitude of the preferences.

Then, the obtained ranking was also compared in terms of score (as defined in section 4) and correlations with the items sorted by clicks and optimized with Metropolis-Hastings swapping. Unlike the previous case, Kwik-Sort turned out to reach a far lower score, so only Metropolis-Hastings will be illustrated. Scores for random permutations will also be provided.

Different models were tried, using the full feature set and using a reduced feature set which excludes some features that intuitively seemed unimportant such as jacket length, reducing the data to 9 attributes, all of them categorical or binary, comprising 79 columns, and purposefully excluding price. The hyperparameters were all obtained by cross-validated accuracy, which was also used to decide which model was the best.

Unlike the previous case, score might not be the best metric to judge, due to the large number of items - not all of them are seen in all browsing sessions, although the presentation bias, as depicted in the previous plot, is small.

The coefficient for price was 3.14. Even after adding virtual comparisons to make price negative and weighting them higher than any other product comparison, price still came out positive, although its magnitude was low compared to other attributes.

These coefficients are indicative of the overall liking of such attributes ceteris paribus (i.e. color orange seems to be overall more attractive than color pink, leaving everything else constant). Since the utility values are scale-less abstract numbers, attributes with negative coefficients should not be thought of as being detrimental to items' attractiveness, only less attractive. As well, near-zero coefficients should not be interpreted as being neutral.

Unfortunately, it was not possible to test the results in an online setting.

Taking it further

One advantage of using feature-based inequalities is that it also allows virtual comparisons between non-products, and this offers the possibility of introducing more information: for example, whenever a user chooses to put a filter, a virtual example could be added such that the selected attributes are preferred to an empty-attribute product, or if they sort the list by date or price, then a virtual example making this attribute positive or negative could be added. This information was, unfortunately, not collected during the time of this experiment.

There could also be personalized rankings. The preferences from all people in both this and the previous section were all mixed together to obtain a global table of preferences, but it is intuitive that different people have different tastes. If it is possible to identify users - for example, by their IP or user account - and if there is information available about the users, such as past purchases and demographic data, they could be clustered and different orderings shown to people in different clusters, to account for the differences in taste. Such information was not available for this work.

Another important characteristic is that the preferences in some cases change with seasons - in this case, the data was collected during the winter season, and the column for “winter season” had a large positive weight (10.95), whereas the “summer season” had a large negative weight (-9.35). Maybe the preference for other attributes such as the color would also change with the season. As such, it might be convenient to have different coefficients for each season, or to weight the preferences according to the current season and the season when the clicks happened.

In the context of recommender systems, it has been reported that users also value diversity in the list of recommended products, and recommendations that score lower in offline metrics but are diverse end up faring better in online settings (Ziegler et al., 2005). The same effect has been reported in search engines (Radlinski, Kleinberg & Joachims, 2008). As descriptive product features are available here, these same methods intended to diversify recommendation lists could be applied - it would be really undesirable to have all the first 20 products be of the same color, for example, and doing this would avoid such potential situations. Since it was not possible to try the results online, this approach was not explored here.

Finally, explicit data on preferences could also be mixed in if it is possible to collect it, perhaps weighting it higher.

2. Summary and conclusions

This work explored ideas for ordering items in electronic catalogs based on aggregated user preferences. The relationship between user preferences and click behavior has been thoroughly studied in the context of search engines with the aim of deducing preferences from clicks to produce better rankings. Different hypotheses of how user preferences drive clicks have been studied, aided with explicit judgments and eye tracking equipment, and some have been found to be very accurate. Simple rules for deducing preferences between two items have been proposed, and these implicit pairwise preferences have been found to agree with explicitly-stated preferences, but they are noisy and biased. Nevertheless, they are easy and free to collect, compared with explicit preferences, and can also be collected in electronic catalogs.

One common way of arranging items in a catalog is sorting them by clicks or purchases, which seems to intuitively favor attractive items. Some of the shortfalls of this naпve approach were examined here, and it was proposed to use pairwise preferences deduced from click behavior to come up with better orderings that can avoid such problems, by putting more preferred products at the top. Using only incomplete pairwise preferences for ranking presents its own problems and properties, but it is nevertheless possible to generate a complete ranking from them, and to judge given rankings by how well they respect these preferences.

It was proposed to view the catalog items either as independent, indivisible elements or as a collection of their attributes, and it was described in which situations such approaches would be more or less representative.

New algorithms were proposed for the case when items are treated as independent elements, based on iterative optimization rather than on graph techniques, which is the dominant approach in the literature, and these were compared to existing algorithms on preferences deduced from clicks from a catalog section containing a few dozen items. One of them - Metropolis-Hastings swapping - was found to be better or at least as good as the existing algorithms in terms of being able to build an order that satisfies the most possible preferences while violating the least.

The algorithm was run under different hypotheses for deducing pairwise preferences, and the results of following each hypothesis were compared and were found to be similar. Problems of ranking items under these assumptions were described, and some ideas taking time into consideration were also proposed, but time data was not available to experiment with.

A different approach treating items as a collection of their intrinsic attributes was described, and the problem was found to be similar to that of choice-based conjoint analysis. The most appropriate algorithm was identified - namely, the equivalent of a one-class support vector machine with features being the difference of the features of preferred and non-preferred items - and was run with semi-structured data crawled from another section of catalog's webpage plus the implicit pairwise preferences. This catalog section contained over a hundred items and 600+ clicks, so it was possible to examine it under both views of the items. The product data was incomplete and presented some problems, but the results were still satisfactory. A simple linear function turned out to be accurate, convenient and useful for other purposes too. Advantages and disadvantages of following this approach were identified, and some improvements were proposed, but they could not be tried due to a lack of data.

Unfortunately, it was not possible to run A/B tests of the results from the proposed methods vs. sorting by clicks, so at this point their advantages are only based on their theoretical ability to represent user preferences.

References

1. Abernethy, J., Evgeniou, T., Toubia, O., & Vert, J. P. (2008). Eliciting consumer preferences using robust adaptive choice questionnaires. Knowledge and Data Engineering, IEEE Transactions on, 20(2), 145-155.

2. Ackerman, B., & Chen, Y. (2011). Evaluating rank accuracy based on incomplete pairwise preferences. In Proceedings of the 2nd International Workshop on User-centric Evaluation of Recommender Systems and their Interfaces.

3. Agichtein, E., Brill, E., Dumais, S., & Ragno, R. (2006, August). Learning user interaction models for predicting web search result preferences. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 3-10). ACM.

4. Ailon, N., Charikar, M., & Newman, A. (2008). Aggregating inconsistent information: ranking and clustering. Journal of the ACM (JACM), 55(5), 23.

5. Allenby, G. M., & Rossi, P. E. (1998). Marketing models of consumer heterogeneity. Journal of Econometrics, 89(1), 57-78.

6. Ben-Akiva, M., McFadden, D., Abe, M., Bцckenholt, U., Bolduc, D., Gopinath, D., ... & Steinberg, D. (1997). Modeling methods for discrete choice analysis. Marketing Letters, 8(3), 273-286.

7. Burges, C. J. (2010). From ranknet to lambdarank to lambdamart: An overview. Learning, 11, 23-581.

8. Chapelle, O., & Harchaoui, Z. (2005). A machine learning approach to conjoint analysis. Advances in neural information processing systems, 17, 257-264.

9. Craswell, N., Zoeter, O., Taylor, M., & Ramsey, B. (2008, February). An experimental comparison of click position-bias models. In Proceedings of the 2008 International Conference on Web Search and Data Mining (pp. 87-94). ACM.

10. Dou, Z., Song, R., Yuan, X., & Wen, J. R. (2008, October). Are click-through data adequate for learning web search rankings?. In Proceedings of the 17th ACM conference on Information and knowledge management (pp. 73-82). ACM.

11. Evgeniou, T., Boussios, C., & Zacharia, G. (2005). Generalized robust conjoint estimation. Marketing Science, 24(3), 415-429.

12. Evgeniou, T., Pontil, M., & Toubia, O. (2007). A convex optimization approach to modeling consumer heterogeneity in conjoint estimation. Marketing Science, 26(6), 805-818.

13. Fox, S., Karnawat, K., Mydland, M., Dumais, S., & White, T. (2005). Evaluating implicit measures to improve web search. ACM Transactions on Information Systems (TOIS), 23(2), 147-168.

14. Freund, Y., Iyer, R., Schapire, R. E., & Singer, Y. (2003). An efficient boosting algorithm for combining preferences. The Journal of machine learning research, 4, 933-969.

15. Geanakoplos, J. (2005). Three brief proofs of Arrow's impossibility theorem. Economic Theory, 26(1), 211-215.

16. Hang, L. I. (2011). A short introduction to learning to rank. IEICE TRANSACTIONS on Information and Systems, 94(10), 1854-1862.

17. Hardie, B. G., Johnson, E. J., & Fader, P. S. (1993). Modeling loss aversion and reference dependence effects on brand choice. Marketing science, 12(4), 378-394.

18. Hauser, J. R., & Rao, V. R. (2004). Conjoint analysis, related modeling, and applications. In Marketing Research and Modeling: Progress and Prospects (pp. 141-168). Springer US.

19. Holland, S., Ester, M., & KieЯling, W. (2003). Preference mining: A novel approach on mining user preferences for personalized applications (pp. 204-216). Springer Berlin Heidelberg.

20. Hu, Y., Koren, Y., & Volinsky, C. (2008, December). Collaborative filtering for implicit feedback datasets. In Data Mining, 2008. ICDM'08. Eighth IEEE International Conference on (pp. 263-272). Ieee.

21. Joachims, T. (2002, July). Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 133-142). ACM.

22. Joachims, T., Granka, L., Pan, B., Hembrooke, H., Radlinski, F., & Gay, G. (2007). Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM Transactions on Information Systems (TOIS), 25(2), 7.

23. Jung, S. Y., Hong, J. H., & Kim, T. S. (2005). A statistical model for user preference. Knowledge and Data Engineering, IEEE Transactions on, 17(6), 834-843.

24. Konstan, J. A., Miller, B. N., Maltz, D., Herlocker, J. L., Gordon, L. R., & Riedl, J. (1997). GroupLens: applying collaborative filtering to Usenet news. Communications of the ACM, 40(3), 77-87.

25. Lee, T. Q., Park, Y., & Park, Y. T. (2008). A time-based approach to effective recommender systems using implicit feedback. Expert systems with applications, 34(4), 3055-3062.

26. Leskovec, J., Rajaraman, A., & Ullman, J. D. (2014). Mining of massive datasets. Cambridge University Press.

27. Lin, J. J. (2008). An Optimal Design Search with Conjoint Analysis Using Genetic Algorithm. Tamkang Journal of Science and Engineering, 11(1), 73г84.

28. Nag, B., & Solutions, S. D. (2008). Vibes: A Platform-Centric Approach to Building Recommender Systems. IEEE Data Eng. Bull., 31(2), 23-31.

29. Oard, D. W., & Kim, J. (2001). Modeling information content using observable behavior.

30. Park, S. T., & Chu, W. (2009, October). Pairwise preference regression for cold-start recommendation. In Proceedings of the third ACM conference on Recommender systems (pp. 21-28). ACM.

31. Parra, D., & Amatriain, X. (2011, July). Walk the talk: analyzing the relation between implicit and explicit feedback for preference elicitation. In Proceedings of the 19th international conference on User modeling, adaption, and personalization (pp. 255-268). Springer-Verlag.

32. Parra, D., Karatzoglou, A., Amatriain, X., & Yavuz, I. (2011). Implicit feedback recommendation via implicit-to-explicit ordinal logistic regression mapping. Proceedings of the CARS-2011.

33. Radlinski, F., & Joachims, T. (2006, May). Minimally invasive randomization for collecting unbiased preferences from clickthrough logs. In Proceedings of the National Conference on Artificial Intelligence (Vol. 21, No. 2, p. 1406). Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999.

34. Radlinski, F., Kleinberg, R., & Joachims, T. (2008, July). Learning diverse rankings with multi-armed bandits. In Proceedings of the 25th international conference on Machine learning (pp. 784-791). ACM.

35. Rendle, S., Freudenthaler, C., Gantner, Z., & Schmidt-Thieme, L. (2009, June). BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence (pp. 452-461). AUAI Press.

36. Salimans, T., Paquet, U., & Graepel, T. (2012, September). Collaborative learning of preference rankings. In Proceedings of the sixth ACM conference on Recommender systems (pp. 261-264). ACM.

37. Schapire, W. W. C. R. E., & Singer, Y. (1998). Learning to order things. Advances in Neural Information Processing Systems, 10, 451.

38. Schцlkopf, B., Williamson, R. C., Smola, A. J., Shawe-Taylor, J., & Platt, J. C. (1999, December). Support Vector Method for Novelty Detection. In NIPS (Vol. 12, pp. 582-588).

39. Schulze, M. (2011). A new monotonic, clone-independent, reversal symmetric, and condorcet-consistent single-winner election method. Social Choice and Welfare, 36(2), 267-303.

40. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 267-288.

41. Ye, G. B., Chen, Y., & Xie, X. (2011). Efficient variable selection in support vector machines via the alternating direction method of multipliers. In International Conference on Artificial Intelligence and Statistics (pp. 832-840).

42. Yilmaz, E., Aslam, J. A., & Robertson, S. (2008, July). A new rank correlation coefficient for information retrieval. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval (pp. 587-594). ACM.

43. Young, H. P., & Levenglick, A. (1978). A consistent extension of Condorcet's election principle. SIAM Journal on applied Mathematics, 35(2), 285-300.

44. Ziegler, C. N., McNee, S. M., Konstan, J. A., & Lausen, G. (2005, May). Improving recommendation lists through topic diversification. In Proceedings of the 14th international conference on World Wide Web (pp. 22-32). ACM.

45. Ziegler, C. N., McNee, S. M., Konstan, J. A., & Lausen, G. (2005, May). Improving recommendation lists through topic diversification. In Proceedings of the 14th international conference on World Wide Web (pp. 22-32). ACM.

Размещено на Allbest.ru

...

Подобные документы

  • Data mining, developmental history of data mining and knowledge discovery. Technological elements and methods of data mining. Steps in knowledge discovery. Change and deviation detection. Related disciplines, information retrieval and text extraction.

    доклад [25,3 K], добавлен 16.06.2012

  • A database is a store where information is kept in an organized way. Data structures consist of pointers, strings, arrays, stacks, static and dynamic data structures. A list is a set of data items stored in some order. Methods of construction of a trees.

    топик [19,0 K], добавлен 29.06.2009

  • Проблемы оценки клиентской базы. Big Data, направления использования. Организация корпоративного хранилища данных. ER-модель для сайта оценки книг на РСУБД DB2. Облачные технологии, поддерживающие рост рынка Big Data в информационных технологиях.

    презентация [3,9 M], добавлен 17.02.2016

  • Классификация задач DataMining. Создание отчетов и итогов. Возможности Data Miner в Statistica. Задача классификации, кластеризации и регрессии. Средства анализа Statistica Data Miner. Суть задачи поиск ассоциативных правил. Анализ предикторов выживания.

    курсовая работа [3,2 M], добавлен 19.05.2011

  • Web Forum - class of applications for communication site visitors. Planning of such database that to contain all information about an user is the name, last name, address, number of reports and their content, information about an user and his friends.

    отчет по практике [1,4 M], добавлен 19.03.2014

  • Описание функциональных возможностей технологии Data Mining как процессов обнаружения неизвестных данных. Изучение систем вывода ассоциативных правил и механизмов нейросетевых алгоритмов. Описание алгоритмов кластеризации и сфер применения Data Mining.

    контрольная работа [208,4 K], добавлен 14.06.2013

  • Совершенствование технологий записи и хранения данных. Специфика современных требований к переработке информационных данных. Концепция шаблонов, отражающих фрагменты многоаспектных взаимоотношений в данных в основе современной технологии Data Mining.

    контрольная работа [565,6 K], добавлен 02.09.2010

  • Основы для проведения кластеризации. Использование Data Mining как способа "обнаружения знаний в базах данных". Выбор алгоритмов кластеризации. Получение данных из хранилища базы данных дистанционного практикума. Кластеризация студентов и задач.

    курсовая работа [728,4 K], добавлен 10.07.2017

  • Історія виникнення комерційних додатків для комп'ютеризації повсякденних ділових операцій. Загальні відомості про сховища даних, їх основні характеристики. Класифікація сховищ інформації, компоненти їх архітектури, технології та засоби використання.

    реферат [373,9 K], добавлен 10.09.2014

  • Значение атрибута TITLE тега HTML-документа. Возможности HTML для разработчиков Web-страниц. Параметры тега , регулирующие отступы вокруг изображения. Оформление комментариев в CSS. Теги логического форматирования текста (phrase elements).

    тест [19,9 K], добавлен 11.10.2012

  • Особенности работы с графическими изображениями Java Script. Способы динамического управления слоями. Рассмотрение примеров использования операторов цикла. Характеристика свойств объекта form: encoding, elements, checkbox. Возможности документов HTML.

    курсовая работа [167,7 K], добавлен 09.02.2013

  • Роль информации в мире. Теоретические основы анализа Big Data. Задачи, решаемые методами Data Mining. Выбор способа кластеризации и деления объектов на группы. Выявление однородных по местоположению точек. Построение магического квадранта провайдеров.

    дипломная работа [2,5 M], добавлен 01.07.2017

  • Определение программы управления корпоративными данными, ее цели и предпосылки внедрения. Обеспечение качества данных. Использование аналитических инструментов на базе технологий Big Data и Smart Data. Фреймворк управления корпоративными данными.

    курсовая работа [913,0 K], добавлен 24.08.2017

  • Анализ проблем, возникающих при применении методов и алгоритмов кластеризации. Основные алгоритмы разбиения на кластеры. Программа RapidMiner как среда для машинного обучения и анализа данных. Оценка качества кластеризации с помощью методов Data Mining.

    курсовая работа [3,9 M], добавлен 22.10.2012

  • Методика и основные этапы построения модели бизнес-процессов верхнего уровня исследуемого предприятия, его организационной структуры, классификатора. Разработка модели бизнес-процесса в IDEF0 и в нотации процедуры, применением Erwin Data Modeler.

    курсовая работа [1,6 M], добавлен 01.12.2013

  • Изучение возможностей AllFusion ERwin Data Modeler и проектирование реляционной базы данных (БД) "Санатория" на основе методологии IDEF1x. Определение предметной области, основных сущностей базы, их первичных ключей и атрибутов и связи между ними.

    лабораторная работа [197,5 K], добавлен 10.11.2009

  • Перспективные направления анализа данных: анализ текстовой информации, интеллектуальный анализ данных. Анализ структурированной информации, хранящейся в базах данных. Процесс анализа текстовых документов. Особенности предварительной обработки данных.

    реферат [443,2 K], добавлен 13.02.2014

  • Характеристика та класифікація CASE-засобів, технологія їх впровадження. Структура і функції CASE-засобу Silverrun. Переваги, результати застосування та ключові функції CA ERwin Data Modeler. Проектування роботи інтернет-магазину за допомогою UML-діаграм.

    курсовая работа [1,5 M], добавлен 07.02.2016

  • Общее понятие о системе Earth Resources Data Analysis System. Расчет матрицы преобразования космоснимка оврага. Инструменты геометрической коррекции, трансформирование. Создание векторных слоев. Оцифрованные классы объектов. Процесс подключения скрипта.

    курсовая работа [4,3 M], добавлен 17.12.2013

  • Управление электронным обучением. Технологии электронного обучения e-Learning. Программное обеспечение для создания e-Learning решений. Компоненты LMS на примере IBM Lotus Learning Management System и Moodle. Разработка учебных курсов в системе Moodle.

    курсовая работа [146,6 K], добавлен 11.06.2009

Работы в архивах красиво оформлены согласно требованиям ВУЗов и содержат рисунки, диаграммы, формулы и т.д.
PPT, PPTX и PDF-файлы представлены только в архивах.
Рекомендуем скачать работу.