Discourse analysis based segregation of relevant document segments for knowledge acquisition

N. Madhusudanan; Amaresh Chakrabarti; B. Gurumoorthy

doi:10.1017/S0890060416000408

Discourse analysis based segregation of relevant document segments for knowledge acquisition

Published online by Cambridge University Press: 04 October 2016

N. Madhusudanan ,

Amaresh Chakrabarti and

B. Gurumoorthy

Show author details

N. Madhusudanan*: Affiliation:
Virtual Reality Laboratory, Centre for Product Design and Manufacturing, Indian Institute of Science, Bangalore, India
Amaresh Chakrabarti: Affiliation:
Virtual Reality Laboratory, Centre for Product Design and Manufacturing, Indian Institute of Science, Bangalore, India
B. Gurumoorthy: Affiliation:
Virtual Reality Laboratory, Centre for Product Design and Manufacturing, Indian Institute of Science, Bangalore, India
*: Reprint requests to: N. Madhusudanan, Virtual Reality Laboratory, Centre for Product Design and Manufacturing, Indian Institute of Science, Bangalore 560 012, India. E-mail: madhu@cpdm.iisc.ernet.in

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Documents are a useful source of expert knowledge in organizations and can be used to foresee, in an earlier stage of a product's life cycle, potential issues and solutions that might occur in later stages of its life cycle. In this research, these stages are, respectively, design and assembly. Even if these documents are available online, it is rather difficult for users to access the knowledge contained in these documents. It is therefore desirable to automatically extract the knowledge contained in these documents and store them in a computer accessible or manipulable form. This paper describes an approach for the first step in this acquisition process: automatically identifying segments of documents that are relevant to aircraft assembly, so that they can be further processed for acquiring expert knowledge. Such identification of relevant segments is necessary for avoiding processing of unrelated information that is costly and possibly distracting for domain relevance. The approach to extracting relevant segments has two steps. The first step is the identification of sentences that form a coherent segment of text, within which the topic does not shift. The second step is to classify segments that are within the topics of interest for knowledge acquisition, that is, aircraft assembly in this instance. These steps filter out segments that are unrelated, and therefore need not be processed for subsequent knowledge acquisition. The steps are implemented by understanding the contents of documents. Using methods of discourse analysis, in particular, discourse representation theory, a list of discourse entities is obtained. The difference in discourse entities between sentences is used to distinguish between segments. The list of discourse entities in a segment is compared against a domain ontology for classification. The implementation and results of validation on sample texts for these steps are described.

Keywords

Aircraft Assembly Discourse Analysis Discourse Representation Theory Text Segmentation

Information

Type: Special Issue Articles
Information: AI EDAM , Volume 30 , Special Issue 4: Engineering Design Informatics , November 2016 , pp. 446 - 465

DOI: https://doi.org/10.1017/S0890060416000408 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2016

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

REFERENCES

Alavi, M., & Leidner, D.E. (2001). Review: knowledge management and knowledge management systems: conceptual foundations and research issues. MIS Quarterly 25(1), 107–136.CrossRef Google Scholar

Allen, J. (2011). Natural Language Understanding, 2nd ed. New York: Pearson.Google Scholar

Andrews, N.O., & Fox, E.A. (2007). Recent developments in document clustering. Technical Report TR-07-35. Blacksburg, VA: Virginia Tech, Computer Science.Google Scholar

Ast, M., Glas, M., Roehm, T., & Luftfahrt, V.B. (2014). Creating an Ontology for Aircraft Design. Bonn: Deutsche Gesellschaft für Luft-und Raumfahrt-Lilienthal-Oberth eV.Google Scholar

Beeferman, D., Berger, A., & Lafferty, J. (1999). Statistical models for text segmentation. Machine Learning 34(1–3), 177–210.CrossRef Google Scholar

Blackburn, P., & Bos, J. (2006). Working With Discourse Representation Theory: An Advanced Course in Computational Semantics. Accessed at http://ling.uni-konstanz.de/pages/home/butt/main/material/bb-drt.pdf Google Scholar

Bos, J., (2008). Wide-coverage semantic analysis with boxer. Proc. 2008 Conf. Semantics in Text Processing, pp. 277–286. Stroudsburg, PA: Association for Computational Linguistics.Google Scholar

Chandrasegaran, S.K., Ramani, K., Sriram, R.D., Horváth, I., Bernard, A., Harik, R.F., & Gao, W. (2013). The evolution, challenges, and future of knowledge representation in product design systems. Computer-Aided Design 45(2), 204–228.CrossRef Google Scholar

Chen, H. (2010). Learning semantic structures from in-domain documents. PhD Thesis, Massachusetts Institute of Technology.Google Scholar

Curran, J.R., Clark, S., & Bos, J. (2007). Linguistically motivated large-scale NLP with C&C and Boxer. Proc. 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 33–36. Stroudsburg, PA: Association for Computational Linguistics.Google Scholar

Feigenbaum, E.A. (2003). Some challenges and grand challenges for computational intelligence. Journal of the ACM 50(1), 32–40.Google Scholar

Foltz, P.W., Kintsch, W., & Landauer, T.K. (1998). The measurement of textual coherence with latent semantic analysis. Discourse Processes 25(2–3), 285–307.Google Scholar

Fraser, B. (1999). What are discourse markers? Journal of Pragmatics 31(7), 931–952.Google Scholar

Giora, R. (2003). Segmentation and segment cohesion: on the thematic organization of the text. Text-Interdisciplinary Journal for the Study of Discourse 3(2), 155–182.Google Scholar

Goller, C., Löning, J., Will, T., & Wolff, W. (2000). Automatic document classification—a thorough evaluation of various methods. Proc. ISI 2000, pp. 145–162. Cuernavaca, Mexico, October 10–14.Google Scholar

Grosz, B.J., & Sidner, C.L. (1986). Attention, intentions, and the structure of discourse. Computational Linguistics 12(3), 175–204.Google Scholar

Gruber, T.R. (1989). Automated knowledge acquisition for strategic knowledge. Machine Learning 4(3–4), 293–336.CrossRef Google Scholar

Han, X., & Sun, L. (2012). An entity-topic model for entity linking. Proc. 2012 Joint Conf. Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 105–115. Stroudsburg, PA: Association for Computational Linguistics.Google Scholar

Hearst, M.A. (1994). Multi-paragraph segmentation of expository text. Proc. 32nd Annual Meeting on Association for Computational Linguistics, pp. 9–16. Stroudsburg, PA: Association for Computational Linguistics.Google Scholar

Hoque, A.S.M., & Szecsi, T. (2007). Application of design-for-manufacture (DFM) rules in CAD/CAM. Proc. 3rd I*PROMS Virtual Conf., Cardiff, July 2–13.Google Scholar

Hossain, M.S., & Angryk, R.A. (2007). Gdclust: a graph-based document clustering technique. Proc. 7th IEEE Int. Conf. Data Mining Workshops, 2007/ICDM Workshops 2007, pp. 417–422, Omaha, NE, October 28–31.Google Scholar

Kamp, H., & Reyle, U. (1993). From Discourse to Logic: Introduction to Modeltheoretic Semantics of Natural Language, Formal Logic and Discourse Representation theory. No. 42. Berlin: Springer Science & Business Media.Google Scholar

Kataria, S.S., Kumar, K.S., Rastogi, R.R., Sen, P., & Sengamedu, S.H. (2011). Entity disambiguation with hierarchical topic models. Proc. 17th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pp. 1037–1045. New York: ACM.Google Scholar

Lascarides, A., & Asher, N. (2008). Segmented discourse representation theory: dynamic semantics with discourse structure. In Computing Meaning, pp. 87–124. Dordrecht: Springer.CrossRef Google Scholar

Le Thanh, H., Abeysinghe, G., & Huyck, C. (2004). Automated discourse segmentation by syntactic information and cue phrases. Proc. IASTED Int. Conf. Artificial Intelligence and Applications (AIA 2004), Innsbruck, Austria.Google Scholar

Li, Y., Chung, S.M., & Holt, J.D. (2008). Text document clustering based on frequent word meaning sequences. Data & Knowledge Engineering 64(1), 381–404.Google Scholar

Liu, B., Li, X., Lee, W.S., & Yu, P.S. (2004). Text classification by labeling words. Proc. AAAI, Vol. 4, pp. 425–430. Cambridge, MA: MIT Press.Google Scholar

Liu, S., McMahon, C.A., & Culley, S.J. (2008): A review of structured document retrieval (SDR) technology to improve information access performance in engineering document management. Computers in Industry 59(1), 3–16.Google Scholar

Liu, S., McMahon, C.A., Darlington, M.J., Culley, S.J., & Wild, P.J. (2006). A computational framework for retrieval of document fragments based on decomposition schemes in engineering information management. Advanced Engineering Informatics 20(4), 401–413.Google Scholar

Liu, T.I., Yang, X.M., & Kalambur, G.J. (1995). Design for machining using expert system and fuzzy logic approach. Journal of Materials Engineering and Performance 4(5), 599–609.CrossRef Google Scholar

Loftus, C., Hicks, B., & McMahon, C. (2009). Capturing key relationships and stakeholders over the product life cycle: an email based approach. Proc. 6th In. Conf. Project Life Cycle Management (PLM 09), Bath, July 6–8.Google Scholar

Loper, E., & Bird, S. (2002). NLTK: the natural language toolkit. Proc. ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, Vol. 1. Stroudsburg, PA: Association for Computational Linguistics.Google Scholar

Madhusudanan, N., & Chakrabarti, A. (2014). A questioning based method to automatically acquire expert assembly diagnostic knowledge. Computer-Aided Design 57, 1–14.Google Scholar

Marx, W.J., Mavris, D.N., & Schrage, D.P. (1998). A knowledge-based system integrated with numerical analysis tools for aircraft life-cycle design. Artificial Intelligence for Engineering, Design Analysis and Manufacturing 12(3), 211–229.Google Scholar

Mihalcea, R., Corley, C., & Strapparava, C. (2006). Corpus-based and knowledge-based measures of text semantic similarity. Proc. AAAI, Vol. 6. Cambridge, MA: MIT Press.Google Scholar

Miller, G.A. (1995). WordNet: a lexical database for English. Communications of the ACM 38(11), 39–41.Google Scholar

Morris, J., & Hirst, G. (1991). Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational Linguistics 17(1), 21–48.Google Scholar

Mozina, M., Guid, M., Krivec, J., Sadikov, A., & Bratko, I. (2008). Fighting knowledge acquisition bottleneck with argument based machine learning. Proc. European Conf. Artificial Intelligence, pp. 234–238, Patras, Greece, July 21–25.Google Scholar

Mu, J., Stegmann, K., Mayfield, E., Rosé, C., & Fischer, F. (2012). The ACODEA framework: developing segmentation and classification schemes for fully automatic analysis of online discussions. International Journal of Computer-Supported Collaborative Learning 7(2), 285–305.Google Scholar

Nyberg, K. (2011). Document classification using machine learning and ontologies. MS Thesis, Aalto University, School of Science, Degree Programme of Information Networks.Google Scholar

Park, J.-H., & Seo, K.K. (2003). Knowledge-based approximate life cycle assessment system in the collaborative design environment. Proc. 3rd Int. Symp. Environmentally Conscious Design and Inverse Manufacturing, 2003. EcoDesign'03, Tokyo, December 11–13.Google Scholar

Passonneau, R.J., &. Litman, D.J. (1997). Discourse segmentation by human and automated means. Computational Linguistics 23(1), 103–139.Google Scholar

Pevzner, L., & Hearst, M.A. (2002). A critique and improvement of an evaluation metric for text segmentation. Computational Linguistics 28(1), 19–36.Google Scholar

Pokojski, J. (2006). Knowledge Based Engineering and Intelligent Personal Assistant Context in Distributed Design, Intelligent Computing in Engineering and Architecture, pp. 519–528. Berlin: Springer.Google Scholar

Qiu, L., Kan, M.Y., & Chua, T.-S. (2004). A public reference implementation of the RAP anaphora resolution algorithm. Proc. 4th Int. Conf. Language Resources and Evalution, Lisbon, Portugual.Google Scholar

Reynar, J.C. (1999). Statistical models for topic segmentation. Proc. 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics.Google Scholar

Stein, B. (2004). Topic identification: framework and application. Proc. I-KNOW ’04, Graz, Austria, June 30–July 2.Google Scholar

Tofiloski, M., Brooke, J., & Taboada, M. (2009). A syntactic and lexical-based discourse segmenter. Proc. ACL-IJCNLP 2009 Conf. Short Papers. Stroudsburg, PA: Association for Computational Linguistics.Google Scholar

Venkatachalam, A.R., Mellichamp, J.M., & Miller, M.D. (1993). A knowledge-based approach to design for manufacturability. Journal of Intelligent Manufacturing 4(5), 355–366.Google Scholar

Wijewickrema, C.M., & Gamage, R. (2013). An ontology based fully automatic document classification system using an existing semi-automatic system, Proc. IFLA WLIC 2013. Singapore: Future Libraries: Infinite Possibilities.Google Scholar

Xie, S.Q., PTu, P.L., & Zhou, Z.D. (2004). Internet-based DFX for rapid and economical tool/mould making. International Journal of Advanced Manufacturing Technology 24(11–12), 821–829.Google Scholar

Zhang, W., Sim, Y.C., Su, J., & Tan, C.L. (2011). Entity linking with effective acronym expansion, instance selection, and topic modeling. Proc. 23rd. Int Joint Conf. Artificial Intelligence, pp. 1909–1914. Cambridge, MA: MIT Press.Google Scholar

Zheng, H.-T., Kang, B.-Y., & Kim, H.-G. (2009). Exploiting noun phrases and semantic relationships for text document clustering. Information Sciences 179(13), 2249–2262.CrossRef Google Scholar

Article contents

Discourse analysis based segregation of relevant document segments for knowledge acquisition

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

References

REFERENCES

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests