MedScan is a fast and flexible biomedical information extraction technology. It uses dictionaries to identify individual biomedical terms (proteins, cellular processes, small molecules, diseases, etc) referred to in literature articles, and applies advanced natural language processing techniques to detect the relationships within the article and extract these terms and the relationships; the overall process of detection, identification, extraction and assembling, is termed Information Harvesting.
Information extracted by MedScan represents the multiple aspects of protein function, including protein modification, cellular localization, protein-protein interactions, gene expression regulation, molecular transport and synthesis, as well as association with diseases, and regulation of various cellular processes. This scope can be broadened by modifying information extraction rules and the dictionaries. Dictionaries can be assembled on any topic or area that is represented in the literature you wish to harvest.
- High accuracy of the extracted information based on advanced natural language processing (NLP) technology
- High-speed information processing
- Customizable dictionaries
- Customizable information extraction rules and patterns
- Multiple input formats: PubMed XML, HTML, Microsoft Word, plain text, some forms of PDF, archives.
- Advanced information filtering capabilities
- Integration with Pathway Studio software for visualization and analysis of the extracted information on a pathway diagram
- Integration with PubMed and Google search engines