Text Mining the Slashdot Data - Social Media Analysis

Social Media Analysis — Text Mining Meets Network Mining

4.3 Text Mining the Slashdot Data

Como trabalhos futuros, para as abordagens de fusão, nós pretendemos (1) utilizar comitês de classificadores heterogêneos, para fusão de decisão, e para a fusão de dados, a (2) utilização de outras medidas de agregação de ranking, bem como a (3) investigação acerca de diferentes taxas de seleção de atributos e seu impacto na seleção do subconjunto final de atributos.

Para o PF-DFS, pretendemos: (1) investigar o uso de outras medidas de avaliação na fronteira de pareto; (2) analisar o uso de outras configurações na estratégia de validação que permitam uma maior cobertura das instâncias utilizadas na fase de validação; (3) bem como avaliar a estabilidade da seleção de atributos (CATENI; COLLA, 2016) no contexto de seleção dinâmica de atributos, nesse sentido, melhorando a qualidade dos resultados produzidos pelo PF-DFS.

Referências

ARAÚJO, D. et al. A combination method for reducing dimensionality in large datasets. In: VILLA, A. E.; MASULLI, P.; RIVERO, A. J. P. (Ed.). Artificial Neural Networks and Machine Learning – ICANN 2016. Cham: Springer International Publishing, 2016. p. 388–397. ISBN 978-3-319-44781-0.

BACKER, E.; JAIN, A. K. A clustering performance measure based on fuzzy set decomposition. IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE, n. 1, p. 66–75, 1981.

BATTITI, R. Using mutual information for selecting features in supervised neural net learning. Trans. Neur. Netw., IEEE Press, Piscataway, NJ, USA, v. 5, n. 4, p. 537–550, jul. 1994. ISSN 1045-9227. Disponível em: <http://dx.doi.org/10.1109/72.298224>. BIHL, T. J.; TEMPLE, M. A.; BAUER, K. W. Feature selection fusion (fsf) for aggregating relevance ranking information with application to zigbee radio frequency device identification. In: 2016 IEEE National Aerospace and Electronics Conference (NAECON) and Ohio Innovation Summit (OIS). [S.l.: s.n.], 2016. p. 80–87. ISSN 2379-2027.

BROWN, G. A new perspective for information theoretic feature selection. In: DYK, D. V.; WELLING, M. (Ed.). Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS-09). Journal of Machine Learning Research - Proceedings Track, 2009. v. 5, p. 49–56. Disponível em: <http://jmlr.csail.mit.edu/proceedings/papers/v5/brown09a/brown09a.pdf>.

BROWN, G. et al. Conditional likelihood maximisation: A unifying fra- mework for information theoretic feature selection. J. Mach. Learn. Res., JMLR.org, v. 13, p. 27–66, jan. 2012. ISSN 1532-4435. Disponível em: <http://dl.acm.org/citation.cfm?id=2188385.2188387>.

CATENI, S.; COLLA, V. Improving the stability of wrapper variable selection applied to binary classification. v. 8, p. 214–225, 03 2016.

CHANDRASHEKAR, G.; SAHIN, F. A survey on feature selection methods. Computers & Electrical Engineering, Elsevier, v. 40, n. 1, p. 16–28, 2014.

CHEN, F.-L.; LI, F.-C. Combination of feature selection approaches with svm in credit scoring. Expert systems with applications, Elsevier, v. 37, n. 7, p. 4902–4909, 2010. DANTAS, C. A.; NUNES, R. d. O.; XAVIER, A. M. P. C. e. J. C. Investigating the impact of similarity metrics in an unsupervised-based feature selection method. In: 2017 Brazilian Conference on Intelligent Systems (BRACIS). [S.l.: s.n.], 2017. p. 55–60.

DEB, K. Multi-objective optimization. In: Search methodologies. [S.l.]: Springer, 2014. p. 403–449.

DHEERU, D.; TANISKIDOU, E. K. UCI Machine Learning Repository. 2017. Disponível em: <http://archive.ics.uci.edu/ml>.

DIETTERICH, T. G. Approximate statistical tests for comparing supervised classification learning algorithms. Neural computation, MIT Press, v. 10, n. 7, p. 1895–1923, 1998.

EVERITT, B.; LANDAU, S.; LEESE, M. Cluster Analysis. 4th. ed. [S.l.]: Arnold, 2001. ISBN 1-58488-369-3.

FACELI, K.; LORENA, A.; CARVALHO, A. Inteligência Artificial: Uma Abordagem de Aprendizado de Máquina. [S.l.]: Grupo Gen - LTC, 2000. ISBN 9788521620150.

FLEURET, F. Fast binary feature selection with conditional mutual information. J. Mach. Learn. Res., JMLR.org, v. 5, p. 1531–1555, dez. 2004. ISSN 1532-4435. Disponível em: <http://dl.acm.org/citation.cfm?id=1005332.1044711>.

FONSECA, C. M.; FLEMING, P. J. An overview of evolutionary algorithms in multiobjective optimization. Evolutionary computation, MIT Press, v. 3, n. 1, p. 1–16, 1995.

FRIEDMAN, M. The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance. Journal of the American Statistical Association, American Statistical Association, v. 32, n. 200, p. 675–701, dez. 1937. ISSN 01621459.

GIBBONS, J.; CHAKRABORTI, S. Nonparametric Statistical Inference. [S.l.]: Marcel Dekker Incorporated, 2003. (Statistics, textbooks and monographs). ISBN 9780824755225.

GORDON, G. J. et al. Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res, v. 62, p. 4963–4967, 2002.

GUYON, I.; ELISSEEFF, A. An introduction to variable and feature selection. J. Mach. Learn. Res., JMLR.org, v. 3, p. 1157–1182, mar. 2003. ISSN 1532-4435.

HANSEN, P.; JAUMARD, B. Cluster analysis and mathematical programming. Mathematical programming, Springer, v. 79, n. 1-3, p. 191–215, 1997.

HIRA, Z. M.; GILLIES, D. F. A review of feature selection and feature extraction methods applied on microarray data. Advances in bioinformatics, Hindawi, v. 2015, 2015. HONG, Y. et al. Unsupervised feature selection using clustering ensembles and

population based incremental learning algorithm. Pattern Recognition, Elsevier, v. 41, n. 9, p. 2742–2756, 2008.

JAFFEL, Z.; FARAH, M. A symbiotic organisms search algorithm for feature selection in satellite image classification. In: IEEE. Advanced Technologies for Signal and Image Processing (ATSIP), 2018 4th International Conference on. [S.l.], 2018. p. 1–5.

JAIN, A.; ZONGKER, D. Feature selection: evaluation, application, and small sample performance. IEEE Transactions on Pattern Analysis and Machine Intelligence, v. 19, n. 2, p. 153–158, Feb 1997. ISSN 0162-8828.

JAIN, A. K.; DUBES, R. C. Algorithms for Clustering Data. Upper Saddle River, NJ, USA: Prentice-Hall, Inc., 1988. ISBN 0-13-022278-X.

JANECEK, A. et al. On the relationship between feature selection and classification accuracy. In: . -: [s.n.], 2008. v. 91-106, n. 4, p. 90–105. Disponível em: <http://eprints.cs.univie.ac.at/378/>.

JESUS, J.; ARAÚJO, D.; CANUTO, A. Fusion approaches of feature selection algorithms for classification problems. In: 2016 5th Brazilian Conference on Intelligent Systems (BRACIS). [S.l.: s.n.], 2016. p. 379–384.

JESUS, J.; CANUTO, A.; ARAÚJO, D. Estudo sobre o impacto do particionamento dos dados na seleção dinâmica de atributos. Proceedings of ENIAC, p. 10, oct 2017.

JESUS, J.; CANUTO, A.; ARAÚJO, D. Dynamic feature selection based on pareto front optimization. In: 2018 International Joint Conference on Neural Networks (IJCNN). [S.l.: s.n.], 2018.

JESUS, J. K. L.; CANUTO, A. M. P.; ARAÚJO, D. S. A. A feature selection approach based on information theory for classification tasks. In: 2017 ICANN. [S.l.: s.n.], 2017. JIANG, S.-y.; WANG, L.-x. Efficient feature selection based on correlation measure between continuous and discrete features. Inf. Process. Lett., Elsevier North-Holland, Inc., Amsterdam, The Netherlands, The Netherlands, v. 116, n. 2, p. 203–215, fev. 2016. ISSN 0020-0190. Disponível em: <https://doi.org/10.1016/j.ipl.2015.07.005>.

JOLLIFFE, I. T. Principal Component Analysis. Berlin; New York: Springer-Verlag, 1986.

KANUNGO, T. et al. An efficient k-means clustering algorithm: Analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell., IEEE Computer Society, Washington, DC, USA, v. 24, n. 7, p. 881–892, jul. 2002. ISSN 0162-8828. Disponível em: <http://dx.doi.org/10.1109/TPAMI.2002.1017616>.

KOHAVI, R.; QUINLAN, J. R. Handbook of data mining and knowledge discovery. In: KLöSGEN, W.; ZYTKOW, J. M. (Ed.). New York, NY, USA: Oxford University Press, Inc., 2002. cap. Data Mining Tasks and Methods: Classifica- tion: Decision-tree Discovery, p. 267–276. ISBN 0-19-511831-6. Disponível em: <http://dl.acm.org/citation.cfm?id=778212.778254>.

KOTSIANTIS, S. B.; ZAHARAKIS, I.; PINTELAS, P. Supervised machine learning: A review of classification techniques. Emerging artificial intelligence applications in computer engineering, v. 160, p. 3–24, 2007.

KULLBACK, S.; LEIBLER, R. A. On information and sufficiency. Ann. Math. Statist., The Institute of Mathematical Statistics, v. 22, n. 1, p. 79–86, 03 1951.

KUNCHEVA, L. Combining Pattern Classifiers: Methods and Algorithms. New York: Wiley, 2004. 376 p.

KUNCHEVA, L. Ludmila k. collection lkc. 2004.

LI, J. et al. Feature selection: A data perspective. arXiv:1601.07996, 2016.

LI, J.; LIU, H. Challenges of feature selection for big data analytics. IEEE Intelligent Systems, v. 32, n. 2, p. 9–15, Mar 2017. ISSN 1541-1672.

LIKAS, A.; VLASSIS, N.; VERBEEK, J. J. The global k-means clustering algorithm. Pattern recognition, Elsevier, v. 36, n. 2, p. 451–461, 2003.

LIU, H.; MOTODA, H. Computational Methods of Feature Selection. [S.l.]: Chapman & Hall/CRC, 2007. ISBN 1584888784.

MARLER, R.; ARORA, J. Survey of multi-objective optimization methods for engineering. Structural and Multidisciplinary Optimization, v. 26, n. 6, p. 369–395, Apr 2004. ISSN 1615-1488.

MITCHELL, T. M. Machine Learning. 1. ed. New York, NY, USA: McGraw-Hill, Inc., 1997. ISBN 0070428077, 9780070428072.

NGUYEN, X. V. et al. Effective global approaches for mutual information based feature selection. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM, 2014. (KDD ’14), p. 512–521. ISBN 978-1-4503-2956-9. Disponível em: <http://doi.acm.org/10.1145/2623330.2623611>.

NOVAKOVIĆ, J. Toward optimal feature selection using ranking methods and classification algorithms. Yugoslav Journal of Operations Research, v. 21, n. 1, 2016. NUNES, R. d. O. et al. An unsupervised-based dynamic feature selection for classification tasks. In: 2016 IJCNN. [S.l.: s.n.], 2016. p. 4213–4220.

NUNES, R. d. O.; DANTAS, C. A.; XAVIER, A. M. P. C. e. J. C. Dynamic feature selection for classifier ensembles. Accepted in Brazilian Conference on Intelligent Systems, 2018.

PENG, H.; LONG, F.; DING, C. Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell., IEEE Computer Society, Washington, DC, USA, v. 27, n. 8, p. 1226–1238, ago. 2005. ISSN 0162-8828. Disponível em: <http://dx.doi.org/10.1109/TPAMI.2005.159>.

PEREZ, M. A. F. Um método heurístico para o problema de escalonamento multiobjetivo em vários ambientes de máquinas. Dissertao (Mestrado) — PUC-RIO, 2012.

PRATI, R. C. Combining feature ranking algorithms through rank aggregation. In: The 2012 International Joint Conference on Neural Networks (IJCNN), Brisbane, Australia, June 10-15, 2012. [s.n.], 2012. p. 1–8. Disponível em: <http://dx.doi.org/10.1109/IJCNN.2012.6252467>.

RODRIGUEZ-LUJAN, I. et al. Quadratic programming feature selection. J. Mach. Learn. Res., JMLR.org, v. 11, p. 1491–1516, ago. 2010. ISSN 1532-4435. Disponível em: <http://dl.acm.org/citation.cfm?id=1756006.1859900>.

ROGATI, M.; YANG, Y. High-performing feature selection for text classification. In: Proceedings of the 11th CIKM. New York, NY, USA: ACM, 2002. (CIKM ’02), p. 659–661. ISBN 1-58113-492-4. Disponível em: <http://doi.acm.org/10.1145/584792.584911>. SAEYS, Y.; ABEEL, T.; PEER, Y. Van de. Robust feature selection using ensemble feature selection techniques. In: SPRINGER. Joint European Conference on Machine Learning and Knowledge Discovery in Databases. [S.l.], 2008. p. 313–325.

SEVILLE, B. G. BIGS BioInformatics Research Group of Seville Repository. 2004. SHANNON, C. E. A mathematical theory of communication. Bell system technical journal, v. 27, 1948.

SHEN, Q.; DIAO, R.; SU, P. Feature selection ensemble. In: VORONKOV, A. (Ed.). Turing-100. The Alan Turing Centenary. [S.l.]: EasyChair, 2012. (EPiC Series in Computing, v. 10), p. 289–306. ISSN 2040-557X.

TANG, J.; ALELYANI, S.; LIU, H. Feature selection for classification: A review. Data classification: Algorithms and applications, CRC Press, p. 37, 2014.

TRAWIŃSKI, B. et al. Nonparametric statistical analysis for multiple comparison of machine learning regression algorithms. International Journal of Applied Mathematics and Computer Science, Versita, v. 22, n. 4, p. 867–881, 2012.

VANSCHOREN, J. et al. Openml: Networked science in machine learning. SIGKDD Explorations, ACM, New York, NY, USA, v. 15, n. 2, p. 49–60, 2013. Disponível em: <http://doi.acm.org/10.1145/2641190.2641198>.

XU, R.; WUNSCH, D. Survey of clustering algorithms. IEEE Transactions on neural networks, Ieee, v. 16, n. 3, p. 645–678, 2005.

XUE, B. et al. A survey on evolutionary computation approaches to feature selection. IEEE Transactions on Evolutionary Computation, IEEE, v. 20, n. 4, p. 606–626, 2016. YIJING, L. et al. Adapted ensemble classification algorithm based on multiple classifier system and feature selection for classifying multi-class imbalanced data. Knowledge-Based Systems, Elsevier, v. 94, p. 88–104, 2016.

ZHANG, Y.; GONG, D.-w.; CHENG, J. Multi-objective particle swarm optimization approach for cost-based feature selection in classification. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), IEEE Computer Society Press, v. 14, n. 1, p. 64–75, 2017.

A

Resultados Individuais - Métodos

Dans le document Case Studies Using Open-Source Tools (Page 123-128)