The central feature of BioMedLib is that the user finds “better answers”, and finds them “faster”, as compared to other search engines.
How does BioMedLib find better answers than other engines?
1. BioMedLib performs a comprehensive analysis of your query. It is like mapping the query to MeSH terms or other controlled vocabularies used by search engines like Ovid/Embase/Pubmed, but 100 times more comprehensive. For example, MeSH has 25 thousand terms (concepts), while BioMedLib’s knowledgebase contains over 2 million biomedical concepts, expressed by 20 million unique terms (synonyms). There are many advantages to a better more comprehensive query mapping and expansion; like the user will miss less results due to user’s choice of words.
2. BioMedLib tags every instance of 20 million terms (synonyms) that express 2 million biomedical concepts, in each of the 20 million articles of Medline database. This is similar to MeSH terms added to each article by the NLM indexers, but much more comprehensive. BioMedLib uses optimized and automated algorithms that assign the tags without need for human intervention. This automation makes it possbile to retag the whole corpus of the 20 million articles about twice per year, when major vocabularies used in BioMedLib’s knowledgebase are updated.
Given MeSH, Mtree and other vocabularies (used by other search engines) improve with each annual edition, when was the last time Ovid/Embase/Pubmed retagged their whole corpus of articles?
3. BioMedLib computes an advanced hybrid relevance score, proprietary to BML, for each article of each query that you submit. By default BioMedLib shows the most relevant articles first, where they have the highest chance to be the “best answers” to your query.
Moreover, you can specify a range of publication dates, within which the articles are then sorted by relevance, thus providing you with both timeliness and relevance.
The factors that are used in computing BioMedLib’s relevance score include:
3.1 Semantics: presence of relation in addition to the presence of query words. BioMedLib looks for triplets {Concept1, Relation, Concept2} in each sentence of each article, where Concept1 and Concept2 are from your query. BioMedLib assigns higher relevance score to an article that contains such relation triplets.
3.2 Meaning-based (concept-based) search, besides text-word searching. When you submit a query like ‘heart attack’, BioMedLib not only searches for the words ‘heart’ and ‘attack’, but it also searches for the biomedical concept with ID = C0027051, which is a “Disease or Syndrome”, with definition of “gross necrosis of the myocardium, as a result of interruption of the blood supply to the area”, which has about 100 synonyms taken from about 150 different vocabularies. Other engines may do similar operation, but they understand only a small fraction of the 2 million concepts that BioMedLib understands. BioMedLib assigns higher relevance score to an article that contains the concept (than an article that contains the query words but in separate and unrelated places).
3.3 BioMedLib’s relevance score includes all the factors used in a typical TF-IDF (term frequency – inverse document frequency) operation. For example, field-weighting is one of such features, like when “major” MeSH terms of an article matches your query, versus the minor MeSHes, and that article is given a higher relevance score.
Therefore BioMedLib utilizes an optimized hybrid of multiple scoring systems, some unique to BML, to deliver its superior relevance score.
How do users find the best answers faster with BioMedLib?
1. BioMedLib takes each article matching your query, and summarizes it such that you will spend less time when deciding if the article is useful for you (the screening porcess) and if you should go ahead and obtain the full-text of the article.
The text summarization of BioMedLib has three steps:
1.1 Selecting sentences of the article that contain your query words;
1.2 Selecting sentences that express semantic relation between the query concepts;
1.3 Highlighting important relevant terms, and the relations between them;
And you have the option of viewing the whole abstract (via Pubmed link, like in “view PubMed record for the above article (PMID = 15171183).”), and to customize the highlighting process (change color choices, or turning it off completely).
2. BioMedLib takes all the articles matching your query, and sorts them by their relevance to your question. It will then show the most relevant answers at the top of the results. It saves the users significant time when the best answers are gathered together rather than sprinkled across a large set of results, otherwise the user needs to screen tens or hundreds of articles to find the most relevant ones.
BioMedLib provides three sorting methods:
2.1 By relevance, the superior BML relevance score;
2.2 Sort within a publication date range, and by relevance;
2.3 By publication date;
3. BioMedLib provides link-outs to full-texts of the articles. It will automatically resolve and display content of your subscribed resources. And it clearly marks the articles that have open-access free full text (BML marks such hyperlinks by “[Free full-text]“).
4. BioMedLib provides multiple alternative output options (besides the abstract, and full text).
4.1 Export to citation management software;
4.2 Saving the search results in PDF format;
4.3 Subscription to RSS feeds of your search;
4.4 Sending the search results by email;
4.5 Saving the history of the queries you submitted, and their results.
BioMedLib disambiguation feature
BioMedLib detects unclear phrases in user’s query, and suggests to the user ways to clarify the query interactively, so that to get more precise results.
For example, the query ‘peg complications’ has the word ‘peg’ which can mean different things (hence unclear):
1. Concept C0032478: 400 polyethylene glycol;
2. Concept C0032483: glycol polyethylene;
3. Concept C0072225: 1 2 dihydroxypropane;
4. Concept C0176751: peg percutaneous endoscopic gastrostomy;
Or the query ‘cold’ can mean these different concepts:
1. Concept C0009264: low temperature physical force;
2. Concept C0009443: cold common;
3. Concept C0010412: cold therapy regime therapy;
4. Concept C0024117: cafl chronic airflow limitation;
5. Concept C0234192: cold sensation function;
6. Concept C0719425: cold brand of chlorpheniramine phenylpropanolamine;
BioMedLib provides the suggestions, where each has a checkbox. Then you can simply check the meanings you meant (and uncheck the meanings you don’t want), and resumbit the clarified query. BioMedLib will then automatically screen all the results and choose the ones that match the meaning you intended (the disambiguation process). This saves the user significant time, as the resulting articles are right on target, and the user does not need to screen hundreds of articles searching for the right sense (the intended meaning) of the word in the query.
BioMedLib‘s “related articles”
BioMedLib provides three types of the “related articles” feature.
1. The standard version of the “related articles” feautre, as done in other engines. This focuses on a single article as the starting point, and usually is computed off-line where the results are stored and then shown to user when user asks. This is done in BioMedLib by a query like ’15380493[related]‘ where the integer is the PMID of the starting article and the search tag [related] signifies the user is asking to retrieve related articles to the article with PMID 15380493;
2. BioMedLib enables the user to customize the starting point of the “related articles” with one or a few words.
In the real-world, usually the user starts with a query, which retrieves a set of articles. The user reviews the results and finds one of them to be very relevant. Then the user asks the “related artilces” for that very relevant article. However, in the standard version, there is no way to incorporate the additional important information contained in the original query that user started with. In other words, in the standard “related articles” version, no matter what query the user used to arrive at the relevant article, the results of the “related article” action will be exactly the same.
BioMedLib allows the user to keep the original query as part of the “related articles” computation, thus provding a more precise view of the starting article that matches user’s information needs much better. This is done in BioMedLib by a query like ’15380493[related] heart attack treatment’ where the integer is the PMID of the starting article and the query words ‘heart attack treatment’ express the specific view of the article that the user is interested in.
3. BioMedLib enables the user to start the “related articles” with multiple articles. In other words, the user can ask for “related articles” to a set of given articles, rather than a single article. In BioMedLib a query like ‘(10881502 17437750)[related]‘ will retrieve articles that are related to both PMIDs 10881502 and 17437750. You can choose bigger sets of multiple starting articles (the multi-article relatedness). And you can further customize the query by adding one or a few query words that express your view of interest.
BioMedLib‘s semantic query formulation
If the user is asking for “articles discussing side-effects of meidcations (clinical drugs) in children published in the past 5 years”; how would you do that?
The challenge with queries like the above, is that the user is clearly asking for a concept that contains very many specific instances.
In the above query the user is asking for ‘clinical drugs’, among others. But the user is not specifying a particular instance of drug. In other words any type of “drug” will qualify for the query. Also, any type of the “side-effects” will qualify for the above query. The user is asking for all concepts with the “semantic type” of drug, and all concepts with semantic type of side-effect.
BioMedLib enables the user to search for 135 semantic types, which in turn are organized hierarchically. For example, the query ‘drug[semtyp] injury[semtyp]‘ retrieves all articles that discuss medications and injury or poisoning (side-effects). You can narrow down the query to a range of publication dates like in ‘drug[semtyp] injury[semtyp] 2005:2010[pubdate]‘. And you can use MeSH headings to further narrow down to children, ‘drug[semtyp] injury[semtyp] 2005:2010[pubdate] ((“infant newborn”)[mh] or “infant”[mh] or (“child preschool”)[mh] or “child”[mh] or “adolescent”[mh])’. This will answer the question posed above.
Meaning-based search
BioMedLib provides “meaning-based search” in addition to the widely available “keyword search”.
In a keyword search, the search engine looks for occurrence of user’s words in each document, literally. For example if user submits the query ‘coronary attack‘ the keyword search will return results that are often dramatically different than when user submits the query ‘heart attack‘ .
In the meaning-based search, the search engine is aware that both queries ‘heart attack’ and ‘coronary attack’ point to the same meaning (the same UMLS concept with ID of C0027051). Therefore the BioMedLib search engine will find not only documents that match literally to user’s query words, but also will find all other documents that express the same meaning by using “synonyms” of user’s words. This guaranties the user won’t miss relevant documents just because of the variation in the words the user used to express the question.
Semantic search
BioMedLib provides “semantic search”, on top of the meaning-based search (which in turn is added on top of keyword search).
When you submit a multiword query, of course you like to see documents that contain those words. Besides, the document should have the query words in a “related” fashion.
For example if you submit ‘plastic surgery infection‘ , then not only the documents should contain the three words, but also the documents should explain a sort of relationships between the words. You don’t want a document that talks about infection in one paragraph, and talks about plastic surgery somewhere else and unrelated to the infection; do you?
BioMedLib makes sure documents that express relationship between the concepts of your query, will get higher ranks, so that you find them at the top of the first page of results.
Better keyword search
BioMedLib, besides its meaning-based search, provides a more accurate keyword search as well.
For example submit the double-quoted query “single dose erythromycin” to PubMed, and PubMed will tell you that there is no article in the MEDLINE having the exact phrase “single dose erythromycin”. This is not true; in other words PubMed is missing articles that have the exact phrase “single dose erythromycin”. Try it with BioMedLib , and you will see articles with PubMed IDs 7081971 and 3335066 actually have the exact phrase.
[Kroboth PD, Brown A, Lyon JA, Kroboth FJ, Juhl RP. Pharmacokinetics of single-dose erythromycin in normal and alcoholic liver disease subjects. Antimicrob Agents Chemother. 1982 Jan;21(1):135-40. PMID: 7081971.]
[Malinverni R, Overholser CD, Bille J, Glauser MP. Antibiotic prophylaxis of experimental endocarditis after dental extractions. Circulation. 1988 Jan;77(1):182-7. PMID: 3335066.]
You can try the IDs 7081971 and 3335066 in PubMed and see that the two articles do exist in PubMed’s backend database, but PubMed is unable to find them in response to the exact phrase query “single dose erythromycin” (as of November 2009; and August 2010).
Hassle-free advanced search
The advantages of BioMedLib’s meaning-based search and semantic search come with zero extra hassle for the user. You enter your query as usual, and BioMedLib engine will expand and optimize your query automatically, no questions asked.
And if you don’t like the idea, just uncheck the “expand the query” box in the “[+] Query is expanded” area; and you will get a simple keyword search!
(Note: sometimes the “[+] Query is expanded” area is denoted as “[+] Clarify the query”)