[X] Close
You are about to erase all the values you have customized, search history, page format, etc.
Click here to RESET all values       Click here to GO BACK without resetting any value
Items 1 to 3 of about 3
1. Kim D, Yu H: Figure text extraction in biomedical literature. PLoS One; 2011 Jan 13;6(1):e15338
PDF icon [Fulltext service] Download fulltext PDF of this article and others, as many as you want.

  • [Source] The source of this record is MEDLINE®, a database of the U.S. National Library of Medicine.
  • [Title] Figure text extraction in biomedical literature.
  • BACKGROUND: Figures are ubiquitous in biomedical full-text articles, and they represent important biomedical knowledge.
  • However, the sheer volume of biomedical publications has made it necessary to develop computational approaches for accessing figures.
  • Therefore, we are developing the Biomedical Figure Search engine (http://figuresearch.askHERMES.org) to allow bioscientists to access figures efficiently.
  • Since text frequently appears in figures, automatically extracting such text may assist the task of mining information from figures.
  • Little research, however, has been conducted exploring text extraction from biomedical figures.
  • METHODOLOGY: We first evaluated an off-the-shelf Optical Character Recognition (OCR) tool on its ability to extract text from figures appearing in biomedical full-text articles.
  • We then developed a Figure Text Extraction Tool (FigTExT) to improve the performance of the OCR tool for figure text extraction through the use of three innovative components: image preprocessing, character recognition, and text correction.
  • We first developed image preprocessing to enhance image quality and to improve text localization.
  • Then we adapted the off-the-shelf OCR tool on the improved text localization for character recognition.
  • Finally, we developed and evaluated a novel text correction framework by taking advantage of figure-specific lexicons.
  • RESULTS/CONCLUSIONS: The evaluation on 382 figures (9,643 figure texts in total) randomly selected from PubMed Central full-text articles shows that FigTExT performed with 84% precision, 98% recall, and 90% F1-score for text localization and with 62.5% precision, 51.0% recall and 56.2% F1-score for figure text extraction.
  • When limiting figure texts to those judged by domain experts to be important content, FigTExT performed with 87.3% precision, 68.8% recall, and 77% F1-score.
  • FigTExT significantly improved the performance of the off-the-shelf OCR tool we used, which on its own performed with 36.6% precision, 19.3% recall, and 25.3% F1-score for text extraction.
  • In addition, our results show that FigTExT can extract texts that do not appear in figure captions or other associated text, further suggesting the potential utility of FigTExT for improving figure search.

  • [Email] Email this result item
    Email the results to the following email address:   [X] Close
  • [Cites] Bioinformatics. 2006 Jul 15;22(14):e446-53 [16873506.001]
  • [Cites] Bioinformatics. 2006 Jul 15;22(14):e547-56 [16873519.001]
  • [Cites] AJR Am J Roentgenol. 2007 Jun;188(6):1475-8 [17515364.001]
  • [Cites] Bioinformatics. 2007 Aug 15;23(16):2196-7 [17545178.001]
  • [Cites] Bioinformatics. 2008 Feb 15;24(4):569-76 [18033795.001]
  • [Cites] Bioinformatics. 2008 Sep 1;24(17):1968-70 [18614584.001]
  • [Cites] IEEE Trans Image Process. 2004 Jan;13(1):87-99 [15376960.001]
  • [Cites] Bioinformatics. 2009 Dec 1;25(23):3174-80 [19783830.001]
  • [Cites] AMIA Annu Symp Proc. 2009;2009:6-10 [20351812.001]
  • [Cites] AMIA Annu Symp Proc. 2009;2009:327-31 [20351874.001]
  • [Cites] PLoS One. 2010;5(10):e12983 [20949102.001]
  • [Cites] Nucleic Acids Res. 1994 Nov 11;22(22):4673-80 [7984417.001]
  • [Cites] Bioinformatics. 2008 Sep 15;24(18):2086-93 [18718948.001]
  • (PMID = 21249186.001).
  • [ISSN] 1932-6203
  • [Journal-full-title] PloS one
  • [ISO-abbreviation] PLoS ONE
  • [Language] ENG
  • [Grant] United States / NCRR NIH HHS / RR / R21 RR024933; United States / NCRR NIH HHS / RR / 1R21RR024933
  • [Publication-type] Journal Article; Research Support, N.I.H., Extramural; Research Support, Non-U.S. Gov't
  • [Publication-country] United States
  • [Other-IDs] NLM/ PMC3020938
  •  go-up   go-down


2. Prasad R, McRoy S, Frid N, Joshi A, Yu H: The biomedical discourse relation bank. BMC Bioinformatics; 2011;12:188
PDF icon [Fulltext service] Download fulltext PDF of this article and others, as many as you want.

  • [Source] The source of this record is MEDLINE®, a database of the U.S. National Library of Medicine.
  • [Title] The biomedical discourse relation bank.
  • BACKGROUND: Identification of discourse relations, such as causal and contrastive relations, between situations mentioned in text is an important task for biomedical text-mining.
  • A biomedical text corpus annotated with discourse relations would be very useful for developing and evaluating methods for biomedical discourse processing.
  • However, little effort has been made to develop such an annotated resource.
  • RESULTS: We have developed the Biomedical Discourse Relation Bank (BioDRB), in which we have annotated explicit and implicit discourse relations in 24 open-access full-text biomedical articles from the GENIA corpus.
  • Guidelines for the annotation were adapted from the Penn Discourse TreeBank (PDTB), which has discourse relations annotated over open-domain news articles.
  • We introduced new conventions and modifications to the sense classification.
  • We report reliable inter-annotator agreement of over 80% for all sub-tasks.
  • Experiments for identifying the sense of explicit discourse connectives show the connective itself as a highly reliable indicator for coarse sense classification (accuracy 90.9% and F1 score 0.89).
  • These results are comparable to results obtained with the same classifier on the PDTB data.
  • With more refined sense classification, there is degradation in performance (accuracy 69.2% and F1 score 0.28), mainly due to sparsity in the data.
  • The size of the corpus was found to be sufficient for identifying the sense of explicit connectives, with classifier performance stabilizing at about 1900 training instances.
  • Finally, the classifier performs poorly when trained on PDTB and tested on BioDRB (accuracy 54.5% and F1 score 0.57).
  • CONCLUSION: Our work shows that discourse relations can be reliably annotated in biomedical text.
  • Coarse sense disambiguation of explicit connectives can be done with high reliability by using just the connective as a feature, but more refined sense classification requires either richer features or more annotated data.
  • The poor performance of a classifier trained in the open domain and tested in the biomedical domain suggests significant differences in the semantic usage of connectives across these domains, and provides robust evidence for a biomedical sublanguage for discourse and the need to develop a specialized biomedical discourse annotated corpus.
  • The results of our cross-domain experiments are consistent with related work on identifying connectives in BioDRB.
  • [MeSH-major] Computational Biology / methods. Data Mining. Software
  • [MeSH-minor] Humans. Natural Language Processing. Semantics

  • COS Scholar Universe. author profiles.
  • [Email] Email this result item
    Email the results to the following email address:   [X] Close
  • [Cites] Bioinformatics. 2004 Aug 4;20 Suppl 1:i290-6 [15262811.001]
  • [Cites] J Med Libr Assoc. 2004 Jul;92(3):364-7 [15243643.001]
  • [Cites] Proc Natl Acad Sci U S A. 1999 Jan 19;96(2):529-34 [9892667.001]
  • [Cites] Proc Natl Acad Sci U S A. 2004 Dec 7;101(49):17050-5 [15563588.001]
  • [Cites] BMC Bioinformatics. 2005;6 Suppl 1:S6 [15960840.001]
  • [Cites] Genome Biol. 2005;6(7):224 [15998455.001]
  • [Cites] Nat Rev Genet. 2006 Feb;7(2):119-29 [16418747.001]
  • [Cites] Bioinformatics. 2006 Mar 1;22(5):597-605 [16368768.001]
  • [Cites] Bioinformatics. 2006 Jul 15;22(14):e446-53 [16873506.001]
  • [Cites] Bioinformatics. 2006 Jul 15;22(14):e547-56 [16873519.001]
  • [Cites] BMC Bioinformatics. 2006;7:356 [16867190.001]
  • [Cites] Pac Symp Biocomput. 2006;:40-51 [17094226.001]
  • [Cites] J Biomed Inform. 2007 Jun;40(3):236-51 [17462961.001]
  • [Cites] Bioinformatics. 2007 Jul 1;23(13):i66-71 [17646347.001]
  • [Cites] Brief Bioinform. 2007 Sep;8(5):358-75 [17977867.001]
  • [Cites] Bioinformatics. 2008 Sep 15;24(18):2086-93 [18718948.001]
  • [Cites] BMC Bioinformatics. 2009;10:183 [19527520.001]
  • [Cites] Bioinformatics. 2009 Dec 1;25(23):3174-80 [19783830.001]
  • [Cites] AMIA Annu Symp Proc. 2009;2009:6-10 [20351812.001]
  • [Cites] IEEE/ACM Trans Comput Biol Bioinform. 2010 Jul-Sep;7(3):454-61 [20671317.001]
  • [Cites] J Am Med Inform Assoc. 2010 Sep-Oct;17(5):563-7 [20819865.001]
  • [Cites] PLoS One. 2010;5(10):e12983 [20949102.001]
  • [Cites] J Am Med Inform Assoc. 2010 Nov-Dec;17(6):696-701 [20962133.001]
  • [Cites] J Biomed Inform. 2010 Dec;43(6):962-71 [20670693.001]
  • [Cites] J Biomed Inform. 2010 Dec;43(6):953-61 [20709188.001]
  • [Cites] AMIA Annu Symp Proc. 2010;2010:11-5 [21346931.001]
  • [Cites] AMIA Annu Symp Proc. 2010;2010:657-61 [21347060.001]
  • [Cites] Bioinformatics. 2001;17 Suppl 1:S74-82 [11472995.001]
  • [Cites] Pac Symp Biocomput. 2002;:386-97 [11928492.001]
  • [Cites] J Biomed Inform. 2002 Aug;35(4):222-35 [12755517.001]
  • [Cites] Bioinformatics. 2003;19 Suppl 1:i180-2 [12855455.001]
  • [Cites] J Comput Biol. 2003;10(6):821-55 [14980013.001]
  • [Cites] Pac Symp Biocomput. 1998;:707-18 [9697224.001]
  • (PMID = 21605399.001).
  • [ISSN] 1471-2105
  • [Journal-full-title] BMC bioinformatics
  • [ISO-abbreviation] BMC Bioinformatics
  • [Language] eng
  • [Publication-type] Journal Article; Research Support, Non-U.S. Gov't; Research Support, U.S. Gov't, Non-P.H.S.
  • [Publication-country] England
  • [Other-IDs] NLM/ PMC3130691
  •  go-up   go-down


Advertisement
3. Cohen KB, Johnson HL, Verspoor K, Roeder C, Hunter LE: The structural and content aspects of abstracts versus bodies of full text journal articles are different. BMC Bioinformatics; 2010 Sep 29;11:492
PDF icon [Fulltext service] Download fulltext PDF of this article and others, as many as you want.

  • [Source] The source of this record is MEDLINE®, a database of the U.S. National Library of Medicine.
  • [Title] The structural and content aspects of abstracts versus bodies of full text journal articles are different.
  • BACKGROUND: An increase in work on the full text of journal articles and the growth of PubMedCentral have the opportunity to create a major paradigm shift in how biomedical text mining is done.
  • However, until now there has been no comprehensive characterization of how the bodies of full text journal articles differ from the abstracts that until now have been the subject of most biomedical text mining research.
  • RESULTS: We examined the structural and linguistic aspects of abstracts and bodies of full text articles, the performance of text mining tools on both, and the distribution of a variety of semantic classes of named entities between them.
  • We found marked structural differences, with longer sentences in the article bodies and much heavier use of parenthesized material in the bodies than in the abstracts.
  • We found content differences with respect to linguistic features.
  • Three out of four of the linguistic features that we examined were statistically significantly differently distributed between the two genres.
  • We also found content differences with respect to the distribution of semantic features.
  • There were significantly different densities per thousand words for three out of four semantic classes, and clear differences in the extent to which they appeared in the two genres.
  • With respect to the performance of text mining tools, we found that a mutation finder performed equally well in both genres, but that a wide variety of gene mention systems performed much worse on article bodies than they did on abstracts.
  • POS tagging was also more accurate in abstracts than in article bodies.
  • CONCLUSIONS: Aspects of structure and content differ markedly between article abstracts and article bodies.
  • A number of these differences may pose problems as the text mining field moves more into the area of processing full-text articles.
  • However, these differences also present a number of opportunities for the extraction of data types, particularly that found in parenthesized text, that is present in article bodies but not in article abstracts.

  • COS Scholar Universe. author profiles.
  • [Email] Email this result item
    Email the results to the following email address:   [X] Close
  • [Cites] Bioinformatics. 2001;17 Suppl 1:S74-82 [11472995.001]
  • [Cites] Bioinformatics. 2009 Dec 1;25(23):3174-80 [19783830.001]
  • [Cites] Proc AMIA Symp. 2001;:105-9 [11825163.001]
  • [Cites] J Biomed Inform. 2001 Oct;34(5):301-10 [12123149.001]
  • [Cites] BMC Bioinformatics. 2003 May 29;4:20 [12775220.001]
  • [Cites] Bioinformatics. 2004 Nov 22;20(17):3206-13 [15231534.001]
  • [Cites] BMC Bioinformatics. 2005;6 Suppl 1:S1 [15960821.001]
  • [Cites] Bioinformatics. 2005 Jul 15;21(14):3191-2 [15860559.001]
  • [Cites] Bioinformatics. 2007 Jul 15;23(14):1862-5 [17495998.001]
  • [Cites] Pac Symp Biocomput. 2008;:652-63 [18229723.001]
  • [Cites] Genome Biol. 2008;9 Suppl 2:S1 [18834487.001]
  • [Cites] Genome Biol. 2008;9 Suppl 2:S9 [18834500.001]
  • [Cites] AMIA Annu Symp Proc. 2008;:394-8 [18998902.001]
  • [Cites] BMC Bioinformatics. 2009;10:46 [19192280.001]
  • [Cites] BMC Bioinformatics. 2009;10:183 [19527520.001]
  • [Cites] BMC Bioinformatics. 2009;10:311 [19778419.001]
  • [Cites] J Am Med Inform Assoc. 2001 Nov-Dec;8(6):598-609 [11687566.001]
  • (PMID = 20920264.001).
  • [ISSN] 1471-2105
  • [Journal-full-title] BMC bioinformatics
  • [ISO-abbreviation] BMC Bioinformatics
  • [Language] ENG
  • [Grant] United States / NLM NIH HHS / LM / 1 R01LM010120-D1; United States / NIGMS NIH HHS / GM / R01GM083649; United States / NLM NIH HHS / LM / R01LM008111; United States / NLM NIH HHS / LM / R01LM009254
  • [Publication-type] Journal Article; Research Support, N.I.H., Extramural
  • [Publication-country] England
  • [Other-IDs] NLM/ PMC3098079
  •  go-up   go-down






Advertisement