[X] Close
You are about to erase all the values you have customized, search history, page format, etc.
Click here to RESET all values       Click here to GO BACK without resetting any value
Item 1 of about 1
1. Torvik VI, Smalheiser NR: A quantitative model for linking two disparate sets of articles in MEDLINE. Bioinformatics; 2007 Jul 1;23(13):1658-65
PDF icon [Fulltext service] Get downloadable fulltext PDFs of articles closely matching to this article, as many as you want.

  • [Source] The source of this record is MEDLINE®, a database of the U.S. National Library of Medicine.
  • [Title] A quantitative model for linking two disparate sets of articles in MEDLINE.
  • BACKGROUND: Identifying information that implicitly links two disparate sets of articles is a fundamental and intuitive data mining strategy that can help investigators address real scientific questions.
  • The Arrowsmith two-node search finds title words and phrases (so-called B-terms) that are shared across two sets of articles within MEDLINE and displays them in a manner that facilitates human assessment.
  • A serious stumbling-block has been the lack of a quantitative model for predicting which of the hundreds if not thousands of B-terms computed for a given search are most likely to be relevant to the investigator.
  • METHODOLOGY/PRINCIPAL FINDINGS: Using a public two-node search interface, field testers devised a set of two-node searches under real life conditions and a certain number of B-terms were marked relevant.
  • These were employed as 'gold standards;' each B-term was characterized according to eight complementary features that were strongly correlated with relevance.
  • A logistic regression model was developed that permits one to estimate the probability of relevance for each B-term, to rank B-terms according to their likely relevance, and to estimate the overall number of relevant B-terms inherent in a given two-node search.
  • CONCLUSIONS/SIGNIFICANCE: The model greatly simplifies and streamlines the process of carrying out a two-node search, and may be applicable to a number of other literature-based discovery applications, including the so-called one-node search and related gene-centric strategies that incorporate implicit links to predict how genes may be related to each other and to human diseases.
  • This should encourage much wider exploration of text mining for implicit information among the general scientific community.
  • AVAILABILITY: Two-node searches can be carried out freely at http://arrowsmith.psych.uic.edu.
  • SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
  • [MeSH-major] Algorithms. Artificial Intelligence. Information Storage and Retrieval / methods. MEDLINE. Natural Language Processing. Pattern Recognition, Automated / methods. Periodicals as Topic. Terminology as Topic
  • [MeSH-minor] Vocabulary, Controlled

  • COS Scholar Universe. author profiles.
  • [Email] Email this result item
    Email the results to the following email address:   [X] Close
  • (PMID = 17463015.001).
  • [ISSN] 1367-4811
  • [Journal-full-title] Bioinformatics (Oxford, England)
  • [ISO-abbreviation] Bioinformatics
  • [Language] eng
  • [Grant] United States / NLM NIH HHS / LM / LM007292; United States / NLM NIH HHS / LM / LM08364
  • [Publication-type] Journal Article; Research Support, N.I.H., Extramural
  • [Publication-country] England
  •  go-up   go-down


Advertisement





Advertisement