ResearchStrathclyde Text Analytics Group

Text analytics is concerned with inference from written communications. Modern developments in computational techniques and power have created opportunities to analyse vast quantities of text data to provide more effective decision support. The applications of these methods are widespread, the challenges are considerable but the potential benefits are substantial. As such, text analytics draws academics from across all four faculties of the university with interests ranging from developing fundamental techniques to applying methods to enhance outcomes.

The following provides an example of some of our work in Text Analytics.  If you would be interested in learning more about our activity either contact our group at or please make direct enquiries to particular academics.

Current projects

James Bowden and Daniel Broby

We are using textual clues to identify the responsiveness of social reporting metrics to changes in sentiment. We use opinion mining and sentiment analysis to identify changes in how the public perceives the social responsiveness of a company. This will help us create a real time index that instantly captures when social metrics change. We hope it will provide a useful tool for the $15.02 trillion invested in funds using Socially Responsible screening.  Our contribution is expected to be the removal of significant subjective evaluation from the benchmarking of socially responsible investment.

Andrew WodehouseJonathan Corneyand Ross Maclachlan

Our research seeks to utilise the patent database more effectively in the engineering design process. In our latest work we have set out an approach that firstly utilises crowdsourcing to summarise patents and then applies text analysis in relation to three affective parameters: appearance, ease of use, and semantics. This has resulted in novel patent clusters that provide an alternative perspective on relevant technical data, and differs significantly from classifications using only functional requirements. The established interfaces and workflows emerging from the research support a new paradigm for the use of big data in engineering design, and could be applicable to other settings trying establish rich, user-centric information.

Zachary Greene

Systematically measuring policy goals has emerged as a major challenge to social scientists. Although traditional research tools such as surveys offer insights into priorities, these tools come with serious limitations and oftentimes reflect broader biases and the institutional context. Political scientists have turned to textual data from parliaments, interviews, newspaper and online sources to provide new insights. Politicians’ speeches reveal information over their policy preferences and the issues they care about. Likewise, the tone of debates predicts the broader mood towards an issue or politician. Scholars in the School of Government and Public Policy have used both supervised and unsupervised models for computational text analysis to help answer big political science questions. For example, work by Dr Greene uses speeches from party congresses to evaluate the disagreements between party members and the centrality of parties’ factions. Dr Brandenburg evaluates the tone of news coverage on the popularity of party leaders and major politicians. Other projects focus on social media networks and debates, the content of parties’ election programmes in diverse international settings and measuring newspapers’ bias towards candidates based on their gender and broader political background.

Stuart MackieLeif AzzopardiYashar Moshfeghi

Procurement legislation stipulates that information about the goods, services, or works, that tax-funded authorities wish to purchase are made publicly available in a procurement contract notice. However, for businesses wishing to tender for such competitive opportunities, finding relevant procurement contract notices presents a challenging professional search task — requiring businesses to spend hours wading through hundreds of potential opportunities every day in order to find ones that are relevant and of value to their business. As part of a Knowledge Transfer Project with BIP Solutions Ltd we are implementing machine learning algorithms that learn from the searcher’s interactions with the text and system. The machine learning tailors and optimises the search algorithm to find and deliver more relevant and valuable opportunities to businesses.

Leif AzzopardiMartin Halvey

Health policy makers are faced with complex and difficult questions with significant societal and economic implications, e.g. Should beta-blockers be given to heart-attack survivors? What are the benefits of minimum alcohol pricing? To support evidence-based practice and decision making, systematic reviews are performed to identify, assess and synthesise all the relevant evidence available. However, such reviews are require tens of thousands of pounds in labour costs simply to review and identify evidence from thousands of potentially relevant medical articles. Our team has been developing methods, tools and datasets to facilitate the research and development of machine learning models to perform continuous active learning to help speed up and automate the processes of identify, reviewing and extracting relevant evidence more effectively when conducting systematic reviews.

Leif Azzopardi

Social media use has been steadily increasing among minors and teenagers — who turn to online social networks to express their emotions and feeling as well as look for attention and advice — and this can lead to have serious negative consequences. Recent research has shown that the more time they spend on such sites leads to increased psychological distress, poorer mental health and worryingly increased suicidal ideation, self-harm and anorexia.  Consequently, our team has been investigating and developing early risk detection methods that harvest and analyse the posts of social media users to determine their likelihood of depression, self-harm, and anorexia using large-scale neural language models and deep learning methods. The project aims to provide accurately identify high-risk users, as early as possible, in order to provide rapid interventions potentially reducing the escalation of the disorders and improving the mental health and wellbeing of social media users.

Amal HtaitLeif Azzopardi 

User-generated online content, such as book, movie, and travel reviews as well as social media posts, provide a wealth of information regarding the products and services discussed, as well as revealing information about users and their personalities. One of the key problems, however, is that with so much user-generated content being created, it is difficult to extract out the meaningful and valuable comments/reviews/etc. So how do we identify content that is useful for tasks such as rating, summarising and classifying products and services? For example, which reviews will be most helpful to a user to help them decide whether to purchase the item, or not. As such, we been developing specialised sentiment analysis methods which can be used to identify how positive or negative content is, and its intensity using word embeddings and language models. Using these methods, we have been processing online reviews to improve recommendation and retrieval of related items, as well as monitoring the sentiment of social media posts to provide insights into people’s personality and outlook.

Leif AzzopardiYashar Moshfeghi, Massimiliano Vasile

Organisations, like the European Space Agency (ESA), hold a wealth of operational expertise within their staff. Their experts can explain why a particular design choice was made, how a design arose and what to consider for future designs. Yet, when staff leave or retire, this tacit knowledge, is often lost forever. While capturing this knowledge is important, the process for capturing, processing and sharing this expertise is often beyond the remit, experience, resources and capabilities of most organisations  and comes at a high cost (e.g. detailed interviews ) or low quality (e.g. generic structured questionnaires). In this project, with the ESA, we are using advances in machine learning, natural language processing and information retrieval, to create assistants that interactively and in a conversational manner illicit knowledge about the projects staff have been working on — This information is then catalogued and recorded within their knowledge base for future use and value extraction.

Mark Dunlop

Touchscreen text entry is core to much of our interactions with each other and with on-line systems. Under the surface of modern text entry are complex predictive tap and language models based on large corpus analysis and dynamic updating. Developed through participatory design techniques we have worked, for example, with older adults to develop novel keyboard approaches that better support awareness of potential errors and with stroke survivors to develop word filling interfaces to support more flexible and social text communication. Our approach is typically a combination of facilitating design workshops and co-design of solutions tied with developing prototype mobile systems. We have experience in working with various machine learning models to support text entry and improve accuracy, for example of watch-face text entry.