CRAB 3.0 is a fully integrated Text Mining tool aimed at supporting literature review in chemical cancer risk assessment.
CRAB 3.0 supports the gathering of risk assessment literature via PubMed, semantic classification of the literature for cancer risk assessment, statistical analysis of the classified literature, and efficient reading and study of the literature.
Given the double exponential growth rate of biomedical literature over recent years, there is a pressing need to develop technology that can make information in published literature more accessible and useful for scientists. Such technology can be based on text mining. Drawing on techniques from natural language processing, information retrieval and data mining, text mining can automatically retrieve, extract and discover novel information even in huge collections of written text.
The CRAB project develops text mining technology to support one of the most literature-dependent areas of biomedicine: chemical health risk assessment. This task is complex and time-consuming, requiring a thorough review of existing scientific data on a particular chemical. Covering human, animal, cellular and other mechanistic data from various fields of biomedicine, this is highly varied and therefore difficult to harvest from literature databases via manual means.
The project has developed a tool that automates literature review and analysis in chemical risk assessment by extracting relevant scientific data in published literature and classifying it according to multiple qualitative dimensions. Developed in close collaboration with risk assessors, the tool allows navigating the classified dataset in various ways and sharing the data with other users.
Currently applicable to cancer, the tool could be straightforwardly adapted to support the assessment and study of other important health risks related to chemicals (e.g. allergy, asthma, reproductive disorders, among many others).
CRAB 3.0 is is a joint project between the University of Cambridge, UK and the Karolinska Institutet, Sweden.
CRAB 3.0 is a fully redesigned application offering instant classification results and 'just-in-time' argument zoning.
CRAB 2.0 is an extended and fully integrated text mining tool aimed at supporting the entire process of literature review in real-life chemical cancer risk assessment (CRA). CRAB 2.0 not only supports the semantic classification of CRA literature on the basis of scientific evidence, like CRAB, but also the gathering of the relevant literature via the PubMed query interface, the statistical analysis of the classified literature, and efficient reading and information extraction from the classified literature.
In addition, it introduces new features including:
CRAB is the first text mining tool for assisting literature review in chemical cancer risk assessment (CRA). This tool enables classifying PubMed abstracts for a given chemical semantically according to the scientific evidence used for CRA and visualizing the results in a taxonomy-like structure.
Imran Ali, Yufan Guo, Ilona Silins, Johan Högberg, Ulla Stenius, Anna Korhonen. 2016. Grouping chemicals for health risk assessment: A text mining-based case study of polychlorinated biphenyls (PCBs). Toxicol Lett. 2016 Jan 22; 241:32-7. doi: 10.1016/j.toxlet.
Ilona Silins, Anna Korhonen, Ulla Stenius. 2014. Evaluation of carcinogenic modes of action for pesticides in fruit on the Swedish market using a text-mining tool. Front Pharmacol. 2014 Jun 23;5:145. doi: 10.3389/fphar.
Yufan Guo, Ilona Silins, Ulla Stenius, Anna Korhonen. Active learning-based information structure analysis of full scientific articles and two applications for biomedical literature review. Bioinformatics. 2013 Jun 1;29(11):1440-7. doi: 10.1093/bioinformatics/btt163.
Sandeep Kadekar, Ilona Silins, Anna Korhonen, Kristian Dreij, Lauy Al-Anati, Johan Högberg and Ulla Stenius. 2012. Exocrine pancreatic carcinogenesis and autotaxin expression. PLoS One. 2012;7(8):e43209. doi: 10.1371/journal.pone.0043209.
Anna Korhonen, Diarmuid O'Séaghdha, Ilona Silins, Lin Sun, Johan Högberg and Ulla Stenius. 2012. Text mining for literature review and knowledge discovery in cancer risk assessment and research. PLoS One. 2012;7(4):e33427.
Ilona Silins, Anna Korhonen, Johan Högberg and Ulla Stenius. 2012. Data and literature gathering in chemical cancer risk assessment. Integrated Environmental Assessment and Management. 2012 Jan 3. doi: 10.1002/ieam.1278.
Yufan Guo, Anna Korhonen, Ilona Silins and Ulla Stenius. 2011. Weakly-supervised learning of information structure of scientific abstracts - is it accurate enough to benefit real-world tasks in biomedicine? Bioinformatics 2011; doi: 10.1093/bioinformatics/btr536.
Yufan Guo, Anna Korhonen, Maria Liakata, Ilona Silins, Johan Hogberg and Ulla Stenius. 2011. A comparison and user-based evaluation of models of textual information structure in the context of cancer risk assessment. BMC Bioinformatics 2011, 12:69.
Anna Korhonen, Lin Sun, Ilona Silins, and Ulla Stenius. 2009. The First Step in the Development of Text Mining Technology for Cancer Risk Assessment: Identifying and Organizing Scientific Evidence in Risk Assessment Literature. In BMC Bioinformatics 10:303.
Yufan Guo, Diarmuid Ó Séaghdha, Ilona Silins, Lin Sun, Johan Högberg, Ulla Stenius and Anna Korhonen. 2014. CRAB: A text mining tool for supporting literature review in chemical cancer risk assessment. To appear in Proceedings of COLING 2014. Dublin, Ireland.
Anna Korhonen, Yufan Guo, Meliha Yetisgen-Yildiz, Ulla Stenius, Masashi Narita, Pietro Lio. 2014. Improving Literature-Based Discovery with Text Mining. In Proceedings of CIBB 2014. Cambridge, UK.
Yufan Guo, Ilona Silins, Roi Reichart and Anna Korhonen. 2012. CRAB Reader: A Tool for Analysis and Visualization of Argumentative Zones in Scientific Literature. In Proceedings of COLING 2012. Mumbai, India.
Yufan Guo, Anna Korhonen, Maria Liakata, Ilona Silins, Lin Sun and Ulla Stenius. 2010. Identifying the Information Structure of Scientific Abstracts: An Investigation of Three Different Schemes In Proceedings of the BioNLP 2010. Uppsala, Sweden.
Lin Sun, Anna Korhonen, Ilona Silins, and Ulla Stenius. 2009. User-Driven Development of Text Mining Resources for Cancer Risk Assessment. In Proceedings of the BioNLP 2009. Boulder, Colorado.
Ian Lewin, Ilona Silins, Anna Korhonen, Johan Hogberg, and Ulla Stenius. 2008. A New Challenge for Text Mining: Cancer Risk Assessment. In Proceedings of the ISMB BioLINK Special Interest Group on Text Data Mining. Toronto, Canada.
Ilona Silins, Anna Korhonen, Yufan Guo, Ulla Stenius. 2014. A text mining approach for chemical risk assessment and cancer research. Eurotox 2014, Edinburgh, UK.
Ilona Silins, Anna Korhonen, Johan Högberg, Ulla Stenius. 2011. A text mining approach to identify chemicals' modes of action in risk assessment of combined exposures. Toxicology of Mixtures Conference, Arlington, VA, USA.
Sandeep Kadekar, Ilona Silins, Anna Korhonen, Johan Hogberg, Kristian Dreij, and Ulla Stenius. 2010. Carcinogen-induced inflammation and pancreatic cancer. 101th Annual Meeting of the American Association for Cancer Research. Washington D.C., USA.
Ilona Silins, Anna Korhonen, Lin Sun, Johan Högberg, and Ulla Stenius. 2010. Chemical Carcinogenesis and Biomedical Text Mining. Karolinska Institutet Cancer Conference, Stockholm, Sweden.
Ilona Silins, Anna Korhonen, Johan Hogberg, Lin Sun, and Ulla Stenius. 2009. Improved Cancer Risk Assessment Using Text Mining. Proceedings of the 100th Annual Meeting of the American Association for Cancer Research. Denver, Colorado, USA.
Emma Westerholm, Jordi Boix, Hanna Miettinen, Robert Roos, Elsa Antunes-Fernandes, Remco Westerink, Majorie van Duursen, Mia Stenberg, Sara Carreira, Miroslav Machala, Ilona Silins, Ulla Stenius, Krister Halldin, Annika Hanberg, and Helen Hakansson. 2009. ATHON NDL-PCB effect database - a tool to facilitate the cumulative risk assessment of NDL-PCBs. In Toxicology Letters, Volume 189, Supplement 1, 13 September 2009. Abstracts of the 46th Congress of the European Societies of Toxicology.
Anna Korhonen, Ian Lewin, Ilona Silins, Johan Hogberg, and Ulla Stenius. 2008. CRAB - Cancer Risk Assessment and Biomedical Text Mining. European Conference on Computational Biology. Sardinia, Italy. See the ECCB08 website
Enter a search term into the main search box, for example benzyl chloride
Then press RETURN
on your keyboard or click on the icon.
The results of the search will be displayed below the search box:
The red circle next to each tag indicates the number of abstracts found for the search term that are categorised with that tag:
If no abstracts have been found for a particular tag, the tag will be greyed out:
To view the abstracts for a particular tag, click on the red circle next to the tag:
This will load the top 50 abstracts for the tag below the results area:
To load the next 50 abstracts (if available), scroll to the bottom of the abstracts and click the View more button:
The Argument Zones for each abstract - the parts of the abstract corresponding to "Background", "Objective", "Method", "Result", "Conclusion", "Related work" and "Future work" - will be calculated in the background.
Before an abstract's Argument Zones have been calculated, the abstract will be greyed out:
Once the Argument Zones for an abstract have been calculated, they will be highlighted in the abstract:
Show or hide particular Argument Zones by clicking on the Argument Zone buttons at the top of the abstracts section:
To view a graph of the results, click on the blue graph button
Three different graphs will be displayed, one for each of the top-level sections, ie. "Scientific Evidence", "Mode of Action", "Toxicokinetics".
Clicking the bar or label of a graph will show the associated abstracts:
To download a graph as an image, click the red download button
To download the raw data as a CSV spreadsheet, click the red Excel button
To download the data as a Word document, click the red Word button
To limit the search to a particular year date range, add the following text to your search:
AND ("START_YEAR"[Date - Publication] : "END_YEAR"[Date - Publication])
For example:
You can expand or collapse particular parent tags by clicking on the orange icon to the left of the tag:
You can also expand or collapse every tag, by clicking on
the green expand icon or
the orange collapse icon to the left of the results area:
The content of each abstract can be hidden by clicking on the title of the abstract:
The content of all abstracts can be expanded or contracted by clicking on
the green expand icon or
the orange collapse icon to the left of the abstracts area:
You can view the original PubMed article by clicking on the icon to the left of the abstract's title:
It takes 1-2 seconds to calculate the Argument Zones for a particular abstract from scratch, though this data will be cached once it has been calculated for a particular abstract.
The status of the Argument Zoning process will appear on the bottom-right of the screen in a grey box:
You can share searches by clicking on the share button on the top-right of the screen and copying the URL:
Multiple searches can be created, allowing data analysis to be carried out across several datasets.
To create an additional search, click on the Add tab button:
Repeat your search as before.
The name of the search in the tab will reflect the text content of the search.
To change the name of the search tab, click on the tab's text box directly:
To create a dataset representing all records, enter the wildcard character *
into the search box.
This can be used to run a significance test of a particular term against the entire database (see Data analysis, below).
To delete a search, click on the delete icon
To analyze one or more datasets, click on the data analysis icon
This will take you to the Analyze data screen:
Select the data analysis function from the Select function dropdown:
Select the parent node from the Select parent node dropdown to select a subset of tags to carry out data analysis on:
Select the datasets from the searches that have been completed:
Click the blue plus icon to run the analysis.
The graph/table will added to the bottom of the page:
To download the image of a graph, click the download icon
To download the CSV data for a table, click the Excel image icon
To delete a data analysis element, click the delete icon
Note: The state of the data analysis page is not saved in the URL and is not recreated when a URL is shared.
Your message has been sent
Are you sure you want to create a new dataset from the item(s) in the clipboard?
Are you sure you want to delete the current tab?
At least one tab must exist