现代计算技术与中医药信息处理
2012-8
浙江大学出版社
吴朝晖,陈华钧,姜晓红 编
233
Elsevifer Insights provides high quality specialized content, across a range of disc,iplines including Iife sciences,pliysical sciences, social scicnces,engineering,compuLing, and flnance.Through fast-track publicalion,Elscvier Insights offers the reader cutting-edge inforniation,available in eBook or print format.
Preface1 Overview of Knowledge Discovery in Traditional Chinese Medicine1.1 Introduction1.2 The State of the Art of TCM Data Resources1.2.1 Traditional Chinese Medical Literature Analysis and Retrieval System1.2.2 Figures and Photographs of Traditional Chinese Drug Database1.2.3 Database of Clunese Medical Formulae1.2.4 Database of Chemical Composition from Chinese Herbal Medicine1.2.5 Clinical Medicine Database1.2.6 TCM Electronic Medical Record Database1.3 Review of KDTCM Research1.3.1 Knowledge Discovery for CMF Research1.3.2 Knowledge Discovery for CHM Research1.3.3 Knowledge Discovery for Research of TCM Syndrome1.3.4 Knowledge Discovery for TCM Clinical Diagnosis1.4 Discussions and Future Directions1.5 Conclusions2 Integrative Mining of Traditional Clunese Medicine Literature and MEDLINE for Functional Gene Networks2.1 Introduction2.2 Connecting TCM Syndrome to Modern Biomedicine by Integrative Literature Mining2.3 Related Work on Biomedical Literature Mining2.4 Name Entity and Relation Extraction Methods2.4.1 Bubble-BootstrappingMethod2.4.2 Relation Weight Computing2.5 MeDisco/3SSystem2.6 Results2.6.1 Functional Gene Networks2.6.2 Functional Analysis of Genes from Syndrome Perspective2.7 Conclusions3 MapReduce-Based Network Motif Detection for Traditional Clunese Medicine3.1 Introduction3.2 Related Work3.3 MapReduce-Based Pattern Finding3.3.1 MRPF Framework3.3.2 Neighbor Vertices Finding and Pattern Initialization3.3.3 Pattern Extension3.3.4 FrequencyComputing3.4 Application to Prescription Compatibility Structure Detection3.4.1 Motifs Detection Results3.4.2 PerformanceAnalysis3.5 Conclusions4 Data Quality for Knowledge Discovery in Traditional Chinese Medicine4.1 Introduction4.2 Key Data Quality Dimensions in TCM4.2.1 Representation Granularity4.2.2 RepresentationConsistency4.2.3 Completeness4.3 Methods to Handle Data Quality Problems4.3.1 Handling Representation Granularity4.3.2 Handling Representation Consistency4.3.3 HandlingCompleteness4.4 Conclusions5 Service-Oriented Data Mining in Traditional Chinese Medicine5.1 Introduction5.2 Related Work5.2.1 Traditional Data Mining Software5.2.2 Data Mining Systems for Specific Field5.2.3 Distributed Data Mining Platform5.2.4 The Spora Demo5.3 System Architecture and Data Mining Service5.3.1 Hierarchical Structure5.3.2 Service Operator Organization5.3.3 Userlnteraction and Visualization5.4 Case Studies5.4.1 Case 1: Domain-Driven KDD Support for TCM5.4.2 Case 2: Data Mining Based on Distributed Resources5.4.3 Case 3: Data Mining Process as a Service5.5 Conclusions……6 Semantic E-Science for Traditional Clunese Medicine7 Ontology Development for Unified Traditional Chinese Medical Language System8 Causal Knowledge Modeling for Traditional Chinese Medicine Using OWL 29 Dynamic Subontology Evolution for Traditional Chinese Medicine Web Ontology10 Semantic Association Mining for Traditional Chinese Medicine11 Semantic-Based Database Integration for Traditional Chinese Medicine12 Probabilistic Semantic Relationship Discovery from Traditional Chinese Medical Literature13 Deriving Similarity Graphs from Traditional Chinese Medicine Linked Data on the Semantic Web
to use different notions and expressions to describe one concept.When the infor-mation in different dynasties is collected in databases,the problem of representa-tion inconsistency is also introduced.For instance,ginseng Panax and Radixginseng could both refer to the Chinese herbal medicine ginseng in English.Thissituation becomes more complex in Chinese: there are 10 aliases for ginseng.Another example is the inconsistent weight units used in different Chinese medicalformulae,which has been mentioned before.Before data analysis and knowledgediscovery can be carried out on these data,such representation consistency issuesmust be addressed to ensure the final reliability. 4.2.3 Completeness One of the biggest problems hampering the effective usage of TCM resources isthe incompleteness of data.Take DCMF for an example; two crucial attributes ofDCMF are ingredients and efficacy.The attribute ingredients have already beendescribed,and efficacy is a textual attribute containing a description of the remedyprinciple in the TCM background.Due to historical reasons,among 85,917 validrecords wherein the attribute value of ingredients is not null,only 15,671 recordsare stored with efficacy not null.That is to say,81.76% of data in attribute efficacyis missing.Identifying such phenomenon in TCM data and treating this problem isan important task in data analysis. 4.3 Methods to Handle Data Quality ProblemsDue to the existence of data quality problems mentioned previously,it is extremelyimportant to conduct necessary data preprocessing activities for data analys,e andknowledge discovery.Jiang et al. indicated that data preprocessing was thekey to the knowledge discovery of the compatibility rule of TCM formulae.Thus,it is of vital necessity to explore preprocessing methods of TCM data.The dataquality problems mentioned in the last section are the main obstacles in TCM onthe way to high data quality.In this section,we introduce the preprocessing meth-ods used to handle these problems. 4.3.1 Handling Representation Granularrty The procedure we conduct to treat the representation granularity problem is calledstructurizing,i.e.,to structurize a data field with multiple data elements .into multi-ple separate data fields.To handle the example problem of representation granular-ity mentioned in the previous section,a concept of a herb information unit (HIU) isdefined,which is the name of Chinese herbal medicine,followed by the preparationmethod,dosage,and weight unit.With this perspective,we could see that the attri-bute ingredients usually consist of multiple HIUs separated by commas.To effec-tively use all information in this field,we should first split ingredients into multiple HIUs.Secondly,for each CIU,we further divide it into four fields: the name ofChinese herbal medicine,preparation method,dosage,and weight unit.To performthis two-step extraction,there are a lot of details and exceptions that should benoticed in practice.For instance,in many records,the delimiter comma might bereplaced by semicolon/period,or even be missing; the preparation method/dosage/weight unit is also missing or misspelled in many records.To implement the two-step splitting,a splitting-rule-based system named field splitter was developed in2003 to handle this problem.Tens of specific splitting rules,such as“keep betweenA and B” and “replace A with B”,are defined.Users can form their own splittingsetting by organizing these rules.The system field splitter is found to work well forthese years.This is the structurizing method we use to fight representation granu-larity problems in TCM. ……