• Aucun résultat trouvé

APPROXIMATE SEMANTIC MATCHING OF MUSIC CLASSES ON THE INTERNET

9.6 Future work

The presented general scheme of approximation can be improved in several directions. For example, not all disjunct-conjunct pairs are equally important in their contribution to the tested formulas. Disjuncts and conjuncts can have Literals may also have different size when it comes to the sets of instances they denote.

Accounting for these differences, e.g., weighing may result in a more accurate sloppiness measure.

a different size, i.e., a different number of literals they contain.

Zharko Aleksovski, Warner ten Kate and Frank van Harmelen

145

0 5000 10000 15000 20000 25000 30000 35000 40000 45000

0 0.05 0.1

0.15 0.2 0.25 0.3

0.35 0.4 0.45 0.5

0.55 0.6 0.65 0.7

0.75 0.8 0.85 0.9

0.95 1 Sloppiness

Number of equivalences

Figure 9.6. Number of equivalent relations inferred between ADN and MM using different sloppiness parameter.

Using background knowledge is another way to improve the mapping scheme. Given that two concepts are synonyms, they can be considered as equivalent in the matching process, and therefore provide a better match. Also, other relations, such as subclass between concepts, will boost the quality of the results. For example, using the fact that the Chicago region is part of America, the method can discover thatBlues, Chicagois a subclass ofBlues, American.

Prerequisite is the availability of the background knowledge. We are not aware of such an ontology existing in the music domain. One approach is to create one through knowledge discovery mechanisms. We conducted some preliminary experiments in which we considered two ways to extract relations between terms from the music domain. For the first we used The Free Dictio-nary10as a source, and in the second we used Google11. In The Free Dictionary we used as measure how strongly two terms are related, the co-occurrence of words between the pages that describe the terms. In the Google case, we as-sumed that related terms occur on the same pages; then, the number of Google

10http://www.thefreedictionary.com/

11http://www.google.com/

Approximate Semantic Matching of Music Classes on the Internet

146

hits when querying for both terms relative to the number of hits when querying for each term separately, was used as strength measure for the term relation.

The experiments produced useful results and we plan to continue in this direc-tion in the future.

9.7 Conclusion

In this chapter, we have presented a new method to do approximate match-ing between classes from different concept hierarchies. We presented the re-sults from applying this method to the music domain. The method is based on the approach of semantic models [Bouquet et al., 2003], and it discovers matches using logic inferencing.

We discussed the present problems in music artist classifications on the In-ternet, based on music content data extracted from Internet music providers.

In the course of this analysis, we identified the need of integrating music con-tent from different providers. Further, we discussed that fuzziness, as one of the main characteristics of the domain, makes the problem of matching music classes from different sources even more severe.

We applied our approximate matching method on music data extracted from the Internet. We presented and discussed the first results from these experi-ments. There is clear indication that the method helps to deal with this prob-lem.

This is a preliminary work; additional research should focus not only upon implementing the suggested improvements and testing against other state-of-the-art methods, but also testing with richer data, and data from other domains.

Due to the size limitations of the test data, in our study we couldn’t assess the performance of the method accurately.

Acknowledgements

We would like to use this opportunity of thanking Heiner Stuckenschmidt for his useful feedback and fruitful discussions. Our thanks are also due to Aleksandar Pechkov for his feedback about the relation extraction from the Internet, and Perry Groot for his feedback and the translation into LATEX.

References

Aucouturier, Jean-Julien, and Francois Pachet [2003]. Representing musical genre: A state of the art.Journal of New Music Research 2003, 32(1): 83–93.

Bilenko, Mikhail, Raymond Mooney, William Cohen, Pradeep Ravikumar, and Stephen Fien-berg [2003]. Adaptive name matching in information integration.IEEE Intelligent Systems.

Bouquet, Paolo, Luciano Serafini, and Stefano Zanobini [2003]. Semantic coordination: A new approach and an application. InProc. of 2nd Int. Semantic Web Conf. (ISWC), Sanibel Island, Florida, USA, pages 130–145.

Zharko Aleksovski, Warner ten Kate and Frank van Harmelen

147

Hayes, Conor, and Padraig Cunningham [2003]. Context boosting collaborative

recommenda-Ichise, Ryutaro, Hiedeaki Takeda, and Shinichi Honiden [2003]. Integrating multiple internet directories by instance-based learning. InProc. 18th Int. Joint Conf. on Artificial Intelligence (IJCAI), Acapulco, Mexico, pages 22–28.

Maynard, Diana, Giorgos Stamou, Heiner Stuckenschmidt, Ilya Zaihrayeu, Jesus Barrasa, Jerome Euzenat, Manfred Hauswirth, Marc Ehrig, Mustafa Jarrar, Paolo Bouquet, Pavel Shvaiko, Rose Dieng-Kuntz, Ruben Lara Hernandez, Sergio Tessaris, Sven Van Acker, and Thanh-Le Bach [2004]. State of the art on ontology alignment. Knowledge Web Deliverable D2.2.3, INRIA, Saint Ismier.

Mendelson, E. [1997].Introduction to Mathematical Logic. Chapman & Hall.

Pachet, Francois, and Daniel Cazaly [2000]. A taxonomy of musical genres. InProc. Content-Based Multimedia Information Access (RIAO), Paris, France, pages 1238–1245.

ten Kate, Warner, Herman ter Horst, and Steffen Pauws [2003]. Semantics in media systems:

Adapting machine operation to the human context. InProc.1 Int. WS on Socio-Cognitive Grids, Santorini, Greece, pages 11–18.

Approximate Semantic Matching of Music Classes on the Internet

tions. Technical Report TCD-CS-2003-26, Trinity College Dublin, Computer Science Department.

Chapter 10