{"id":272,"date":"2022-01-01T12:00:37","date_gmt":"2022-01-01T12:00:37","guid":{"rendered":"http:\/\/blogs.kent.ac.uk\/kei-case-studies\/?p=272"},"modified":"2023-06-30T09:56:17","modified_gmt":"2023-06-30T08:56:17","slug":"272","status":"publish","type":"post","link":"https:\/\/blogs.kent.ac.uk\/kei-case-studies\/2022\/01\/01\/272\/","title":{"rendered":"TRN: Investigating the feasibility of extracting academic profile information from web sources"},"content":{"rendered":"<p><a href=\"http:\/\/trn.net\/\">The Research Network (TRN)<\/a>\u00a0is based in Kent and provides pharmaceutical R&amp;D consultancy, project management, outsourcing, training and due diligence services. Their expertise in drug discovery and development disciplines covers both small molecule and biotherapeutics. They regularly collaborate with universities and research organisations to progress new drug discovery ventures from lead identification through to early clinical development.<\/p>\n<p>The project between TRN and\u00a0<a href=\"https:\/\/www.kent.ac.uk\/\">University of Kent<\/a>\u00a0academics was\u00a0<a href=\"https:\/\/eira.ac.uk\/funding-opportunities\/\">funded through EIRA\u2019s Innovation Voucher scheme<\/a>. It explored whether it was possible to use data mining techniques to extract accurate profile information of potential collaborators. This study was conducted to investigate the feasibility of mining academic information, which would then be used to find similarities between academics who could potentially collaborate in the future.<\/p>\n<h2><span style=\"color: inherit;font-family: inherit;font-size: 30px\">The Challenge<\/span><\/h2>\n<p>Finding the right kind of scientific expertise needed to contribute to cutting edge drug discovery research presents a huge challenge. Information on expertise which could be utilised is often dispersed and comes from disparate sources, such as a company website or university profile pages. This project explored the possibility of using data mining techniques to extract academic profile information, providing a simple and effective way of shortlisting potential collaborators.<\/p>\n<h2>The Approach<\/h2>\n<p>University of Kent academics\u00a0<a href=\"https:\/\/www.kent.ac.uk\/mathematics-statistics-actuarial-science\/people\/1097\/www.kent.ac.uk\/mathematics-statistics-actuarial-science\/people\/1097\/bentham-james\">Dr James Bentham<\/a>\u00a0and\u00a0<a href=\"https:\/\/www.kent.ac.uk\/physical-sciences\/people\/1318\/www.kent.ac.uk\/physical-sciences\/people\/1318\/hiscock-jennifer\">Dr Jennifer Hiscock<\/a>\u00a0are experts in the fields of machine learning, data extraction and bioscience.<\/p>\n<p>Their approach to the project was:<\/p>\n<ol>\n<li>To identify potential sources of information on academics and their collaborations<\/li>\n<li>To carry out pilot web scraping (data extraction)<\/li>\n<li>To consider data storage and access requirements<\/li>\n<li>To carry out pilot pre-processing and data mining<\/li>\n<\/ol>\n<p>Various websites were investigated to identify salient attributes which could then be extracted, stored and used as the basis of data mining potential scientific investigators.<\/p>\n<p>One way of identifying individual academics and areas of research was to generate lists of university departments, which could then be used for further data gathering. Webscraping was carried out for five universities:<\/p>\n<ul>\n<li>University of Kent<\/li>\n<li>University of Warwick<\/li>\n<li>University of Sheffield<\/li>\n<li>Imperial College London<\/li>\n<li>King\u2019s College London<\/li>\n<\/ul>\n<p>The text from the personal webpages was pre-processed in R (a special programming language used for statistical analysis and visualisation) to remove extraneous information. A vector space model was utilised to analyse the words used in the description and a graphical plot (word cloud) was extractible. Vector spaces are used as a means of representing in a 3D space how close a particular word or description is to a target value. This could then be used to cluster associative skills, in order to match academics who shared interests in similar topics.<\/p>\n<h2>The Result<\/h2>\n<p>The study proved that rich information was available on individual academics and their collaborations. However, this information was found to be fragmented. Web scraping was proved to be feasible for university department and staff lists, free text from personal webpages, university repositories, and research council websites.<\/p>\n<p>Data storage and access was straightforward. Natural language processing methods were applied to the data successfully, finding similarities between academics based on free text. Network analysis produced meaningful and useful results, which described potential collaboration networks. \u00a0The next phase of development recommended was for work to be carried out that further refined and combined these methods.<\/p>\n<p><a href=\"http:\/\/trn.net\/project\/andy_mcelroy\/\">Andy McElroy, the CEO of TRN<\/a>\u00a0had this to say about the project:<\/p>\n<blockquote><p>&#8220;<em>This short project provided useful insights into available information on academic expertise and projects and has positioned us well for the second project to pilot the analysis and use of this information to highlight collaboration opportunities.<\/em>&#8221;<\/p>\n<p><span style=\"font-size: 14px\">Andy McElroy, CEO of TRN<\/span><\/p><\/blockquote>\n<h2>Next Steps<\/h2>\n<p>TRN have applied for an EIRA Innovation Voucher to build on the work of the feasibility study, collaborating with the academics who carried out this project. The aim of the new project is to take the concept of data extraction of researcher profiles through to a testable prototype system, which can be used to find suitable skilled scientists needed to contribute to cutting edge research projects.<\/p>\n<footer><strong>\u00a0Interested in working with our academics? <a href=\"https:\/\/www.kent.ac.uk\/knowledge-exchange-innovation\/contact\">Contact our business and innovation gateway team<\/a> to discuss how your business can access the University of Kent&#8217;s expertise.<\/strong><\/footer>\n","protected":false},"excerpt":{"rendered":"<p>The Research Network (TRN)\u00a0is based in Kent and provides pharmaceutical R&amp;D consultancy, project management, outsourcing, training and due diligence services. Their expertise in drug discovery &hellip; <a href=\"https:\/\/blogs.kent.ac.uk\/kei-case-studies\/2022\/01\/01\/272\/\">Read&nbsp;more<\/a><\/p>\n","protected":false},"author":74795,"featured_media":430,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[256065,275480],"tags":[13897,281564,278374,244813,278385,223259],"_links":{"self":[{"href":"https:\/\/blogs.kent.ac.uk\/kei-case-studies\/wp-json\/wp\/v2\/posts\/272"}],"collection":[{"href":"https:\/\/blogs.kent.ac.uk\/kei-case-studies\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.kent.ac.uk\/kei-case-studies\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.kent.ac.uk\/kei-case-studies\/wp-json\/wp\/v2\/users\/74795"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.kent.ac.uk\/kei-case-studies\/wp-json\/wp\/v2\/comments?post=272"}],"version-history":[{"count":4,"href":"https:\/\/blogs.kent.ac.uk\/kei-case-studies\/wp-json\/wp\/v2\/posts\/272\/revisions"}],"predecessor-version":[{"id":278,"href":"https:\/\/blogs.kent.ac.uk\/kei-case-studies\/wp-json\/wp\/v2\/posts\/272\/revisions\/278"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blogs.kent.ac.uk\/kei-case-studies\/wp-json\/wp\/v2\/media\/430"}],"wp:attachment":[{"href":"https:\/\/blogs.kent.ac.uk\/kei-case-studies\/wp-json\/wp\/v2\/media?parent=272"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.kent.ac.uk\/kei-case-studies\/wp-json\/wp\/v2\/categories?post=272"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.kent.ac.uk\/kei-case-studies\/wp-json\/wp\/v2\/tags?post=272"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}