Linking science and strategy: challenges and opportunities for using technology mining
Overview of the development of technologies as a basis for economic growth. Development approaches for linking science and policy through the use of production technology to data sources, indicating S&T development and implementation of STI policies.
Ðóáðèêà | Ïðîèçâîäñòâî è òåõíîëîãèè |
Âèä | äèïëîìíàÿ ðàáîòà |
ßçûê | àíãëèéñêèé |
Äàòà äîáàâëåíèÿ | 04.08.2016 |
Ðàçìåð ôàéëà | 2,0 M |
Îòïðàâèòü ñâîþ õîðîøóþ ðàáîòó â áàçó çíàíèé ïðîñòî. Èñïîëüçóéòå ôîðìó, ðàñïîëîæåííóþ íèæå
Ñòóäåíòû, àñïèðàíòû, ìîëîäûå ó÷åíûå, èñïîëüçóþùèå áàçó çíàíèé â ñâîåé ó÷åáå è ðàáîòå, áóäóò âàì î÷åíü áëàãîäàðíû.
Ðàçìåùåíî íà http://www.allbest.ru/
National Research University Higher School of Economics
Institute for Statistical Studies and Economics of Knowledge
MASTER THESIS
LINKING SCIENCE AND STRATEGY: CHALLENGES AND OPPORTUNITIES FOR USING TECHNOLOGY MINING
Student: Pavel Bakhtin
Group: 141
Supervisor: PhD, Professor Ozcan Saritas
Submission date: 16.05.2016
Moscow, 2016
List of Abbreviations
Abbreviation |
Meaning |
|
STI |
Science, Technology and Innovation |
|
S&T |
Science and Technology |
|
NLP |
Natural-language processing |
|
R&D |
Research and Development |
|
TM |
Technology Mining |
|
GTMS |
Global Technology Trend Monitoring System (GTMS). |
|
NSF |
National Science Foundation |
|
NIH |
National Institute of Health |
|
SAO |
Subject-action-object analysis |
List of Tables
Table |
Name |
|
Table 1 |
NSF awards for perovskite solar cells |
List of Figures
Figure |
Name |
|
Figure 1 |
Risk Assessment and Horizon Scanning (RAHS) - a foresight support system for the German Federal Armed Forces |
|
Figure 2 |
The conceptual model of linking Science and Strategy through Technology Mining |
|
Figure 3 |
Semantic map of S&T terms for research area “Energy & Fuels” with topic names |
|
Figure 4 |
Semantic map of S&T terms for research area “Energy & Fuels” with all terms |
|
Figure 5 |
Scatter plot for the topic “Energy consumption” |
|
Figure 6 |
Scatter plot for the topic “Anode material” |
|
Figure 7 |
Scatter plot for the topic “Hydrogen production” |
|
Figure 8 |
Scatter plot for the topic “Carbon dioxide” |
|
Figure 9 |
Scatter plot for the topic “Solar cell” |
|
Figure 10 |
Scatter plot for the topic “Fuel cell” |
|
Figure 11 |
Scatter plot for the topic “X-ray diffraction” |
|
Figure 12 |
Scatter plot for the topic “Chemical process” |
|
Figure 13 |
“Energy & Fuels” topics' distribution diagram |
|
Figure 14 |
Structure changes of “Energy & Fuels” topics |
|
Figure 15 |
NSF grants for “solar energy” topic, 2000-2015 |
|
Figure 16 |
Trend analysis of “perovskite solar cell” |
|
Figure 17 |
“Perovskite solar cell” on the semantic map of S&T terms |
Science and Technology (S&T) development is a foundation for economic growth of any country and solution for many grand challenges. Strategic goals and priorities set for S&T development aim at prioritizing and concentrating country's resources. However, high money and time cost and uncertainty of success of Research and Development (R&D), the chaotic nature of S&T advancements, technology diffusion and knowledge exchange between different research areas create the demand for periodic interlinking of S&T development with Strategy realization. Growing amount of data related to STI sphere forces new ways to handle big data and provide evidence-based Science, Technology and Innovation (STI) policy. The present study develops an approach for linking Science and Strategy through the application of Technology Mining (TM) to data sources indicating S&T development and STI policy implementation. It reviews existing TM methods and tools and suggests the processing of various STI-related data sources for identification of S&T topics and trends, and their cross-linking analysis on the timeline. Case study in the research area of “Energy & Fuels” demonstrate two types of linkages: STI policy affecting S&T development and, in turn, trends influencing further formulation of policy. Lack of linkage is also either a signal of potential breakthrough or lack of progress.
Introduction
Science advancements and breakthroughs shape the foundation of the modern world. The fundamental or, in other words, basic science is driven by curiosity and aims at exploring and discovering new knowledge while applied science tries to provide solutions for existing problems. The distribution of Research and Development (R&D) activities' by the actor (industries, higher education institutions, public research institutions and other organizations) may vary from country to country. However, the process of setting global priorities and strategic goals for Science and Technology (S&T) development as a foundation for economic growth (Sokolov & Chulok, 2015) from both research topic and investment perspectives always remains the responsibility of governments as a part of strategic planning and Foresight projects.
Policy-makers do not always take Foresight seriously due to long-term horizons and lack of scientific credibility (Cagnin et al., 2015), or, in other words, data evidence. The rising demand for evidence-based policy affects Foresight activities, especially at the first phase of gathering knowledge about the current state of S&T development activities (Amanatidou et al., 2012), or, in terms of Systemic Foresight Methodology (Saritas, 2013), the “Intelligence” phase. Technology Mining (TM) has been introduced as a set of quantitative methods and tools used to extract and aggregate S&T-related information, such as list of phrases, topics, product and technology descriptions, key players of development and many more from a big amount of scientific texts. Porter (2009, p. 3) identified Technology Mining or `tech mining” as “text mining of science & technology information resources”.
Many researchers employ TM for their studies. For example, de Miranda Santo et al. (2006) apply TM to scientific publications in order to spot major topics in the sphere of nanotechnology. Altuntas et al. (2015) look at patent data to assess technologies' level of readiness and their potential for further development. Zhang et al. (2016) use National Science Foundation (NSF) awards data to identify technology topics, their dynamics and attempt to forecast future development in the computer science area of big data research.
Big data evidence supports long-term S&T strategy development. However, high money and time cost and uncertainty of success of R&D, the chaotic nature of S&T advancements, technology diffusion and knowledge exchange between different research areas create big threats for Strategy realization. Hence, Science, Technology and Innovation (STI) policy implemented to achieve strategic goals in the long term needs periodic evaluation and adjustment according to interim indications of progress or unexpected breakthroughs in non-supported research fields. In other words, the instrument for cross-analyzing documents related separately to S&T development and STI policy is required.
This study attempts to identify data sources that would indicate the progress of S&T development and policy instruments implemented in accordance to Strategy realization. It suggests TM methods and tools to bridge Science and Strategy in the form of decision-making support instrument for periodic monitoring purposes.
Several hypothesis are tested within the study with the help of case analysis. First, Tech Mining approaches applied to data sources that indicate S&T development allows identification of S&T topics and trends. Second, the data extracted from STI policy documents can be linked with S&T development.
Presenting the results of the study, the paper consists of four major chapters: literature review, proposed methodology and approach, findings with case examples of application of proposed methods and tools, and further discussion and conclusion. Each chapter is divided into several subchapters.
Literature review introduces the problem of S&T priority setting, strategy development and STI policy implemented, as well as Foresight as one of the main supporting instruments. Technology Mining subchapter describes the state of the art in TM, bibliometric and patent analysis, and Natural Language Processing (NLP). Modern methods and approaches described help to grasp the main ideas proposed by the study.
The methodology and approach chapter first deals with the number of potential data sources that are applicable for the analysis of S&T development and linking them with STI policy instruments. Then, the process of identification of S&T topics through the extraction of terms and phrases, their network analysis and clustering is described in detail covering a wide range of methods and tools. Afterwards, main approach for spotting trends in processed data is introduced. Finally, algorithm for linking S&T development and strategic goals by correlating results of the analysis applied to two sets of data sources is demonstrated.
The findings chapter consequently introduces results of the analysis and main visualizations for each logical part of the methodology. First, the semantic map of S&T terms and phrases is described for two sets of data sources. Second, the topic diagram is displayed as a result of term and phrase network analysis and clustering. Third, trends are identified as a result of dynamic spotting process. Finally, main types of linkages between Science and Strategy are demonstrated on the case of “Energy & Fuels” research area.
The discussion and conclusion chapter focuses on theoretical and practical implications, such as discussion about proposed methods and tools and some results of their application to case examples, implementation of the ideas in the Global Technology Trend Monitoring System (GTMS). The paper is wrapped up with limitations of the work and plans for further research and development.
Literature review
STI policy for driving S&T development
Long-term future is uncertain and investment in the development of Science and Technology is always risky (de Oliveira et al., 2016). Hence, national S&T policies (later STI) appeared in 1960s (Henriques & Larédo, 2013) to support S&T development on the national and international levels.
Based on OECD model the main functions of such policies are as follows (Henriques & Larédo, 2013):
S&T planning of national activities.
Priority setting as part of national strategy due to impossibility of addressing all challenges and possibilities.
Allocation of resources and professional administration.
Beck et al. (2016) in his study demonstrates that policy-induced expenditure is significant for radical innovation, or in the terms of S&T, breakthroughs. At the same time due to interdependence of existing challenges and opportunities there cannot be only one policy instrument - a mix of policy instruments is required (Borrás & Edquist, 2013).
Listed problems and future uncertainties lead to policies and strategies that do not always protect against existing threats, as well as do take advantage of related uncertainties (Hansen et al., 2016). All of that pushed the implementation of Foresight as a future-oriented STI policy instrument.
Martin (1995) defines Foresight as “the process involved in systematically attempting to look into the longer-term future of science, technology, the economy, and society with the aim of identifying areas of strategic research and the emerging new technologies likely to yield the greatest economic and social benefits”. Miles & Keenan (2002) state that Foresight is the application of `systematic,' `participatory,' `future-intelligence-gathering and medium-to- long-term vision building process' to `informing present-day decisions and mobilizing joint actions'. Sokolov & Chulok (2015) point out the role of Foresight in shaping STI policy, setting priorities for allocation R&D funds.
Generally, the Foresight process consist of gathering knowledge with the help of expert evaluation and horizon scanning (analysis of literature, patents, including the use of Technology Mining), using creativity to forecast potential scenarios of future development, integration and interpretation of collected information, intervention into the current S&T development with the help of policy instruments and various programmes, evaluation of impacts, and, finally, interaction between stakeholders throughout the whole activity (Saritas, 2013).
Similar process displayed in the figure 1 is used even to support Foresight in the German Federal Armed Forces (Durst et al., 2015).
Figure 1:Risk Assessment and Horizon Scanning (RAHS) - a foresight support system for the German Federal Armed Forces
Source: (Durst et al., 2015)
Georghiou & Harper (2011) with the help of Rafael Popper compiled the list of the most popular foresight objectives based on 50 exercises:
Analysis of the future potential of technologies (22%)
Support of policy or strategy development (17%)
Network building (14%)
Priority setting for S&T (11.5%)
Methodology and capacity building (9.5%)
Articulating supply and demand (9.5%)
Public engagement (5.5%)
Other (10.6%)
Impact of Foresight on Strategy development adds coherence to STI policy (Pietrobelli & Puppato, 2015). It allows predicting future demand for innovation solutions, their market potential, and is the basis for integrated roadmaps (Vishnevskiy et al., 2015)
However, one of the challenges is that policy-makers do not always take Foresight seriously due to long-term horizons and lack of scientific credibility (Cagnin et al., 2015). Furthermore, the focus of Foresight is more on broad-based priority setting rather than on engineering major changes in STI policy (Georghiou & Harper, 2011).
In order to solve the problem, the realization of S&T strategy, developed through Foresight process, should be constantly monitored and linked with S&T development.
R&D programmes and grants are one of the most important STI policy instruments in the US (AAAS, 2016) and can be monitored and analyzed in systemic way through the available databases, such as National Science Foundation (NSF), National Institute of Health (NIH) and others. Moreover, project funding based on the experience of the NSF (Smith, 1990) demonstrate the increase in the number of EU countries (Lepori et al., 2007) compared to the decline of block funding (Martin & Irvine, 1992).
The next subchapter of literature review focuses on Technology Mining methods and tools to analyze STI-related documents, which is further used in the methodology to process documents indicating S&T development and STI policy implementation.
Technology Mining for monitoring S&T development
Once long-term S&T priorities are set, strategy is developed and STI policy instruments are implemented, the main question that arises is how to monitor the progress, and at the same time continuously analyze other events, which could potentially affect the plans (e.g. new breakthrough technology developed in other country or economic crisis leading to decline of some markets). The era of big data of STI-related documents brings many challenges: one expert can no longer physically read all the literature related to his sphere in the limited amount of time. Moreover, the convergence between S&T areas (e.g. new materials for smart sensors implanted in a human body for health monitoring) is impossible to assess without having wide multidisciplinary knowledge. That creates the demand for methods and tools that would help to analyze big amounts of documents, aggregating numerical and text data together for further assessment by all types of stakeholders: from policymakers and technology analysts to scientists and wide public.
Historically Technology Mining is defined as the combination of bibliometrics, patent analysis and different Natural-Language Processing (NLP) tools to collect, process and represent competitive technological intelligence in the visually understandable form (Porter & Cunningham, 2004; Yoon B., 2008). Since 2004 the active development and application of TM methods and tools demonstrated its use in wide range of topics and tasks. They include:
Extraction of the list of terms (Judea et al., 2014; Anick et al., 2014).
Classification of documents (Jones, 1965) and identification of topics (Blei, 2012; Zhang et al., 2016).
Spotting potential trends (Li et al., 2011; Kim et al., 2012; Saritas & Burmaoglu, 2015).
Identification of early warning about threats (Sun et al., 2015).
Spotting opportunities to solve problems and challenges, such as various diseases (Carvalho et al., 2015).
Finding key players to cooperate with or keep track of the progress (Kerr et al., 2006).
The first thing that needs to be considered for Technology Mining is the choice of STI-related documents. Based on the review of TM articles (de Miranda Santo et al., 2006; Trumbah et al., 2006; Li et al., 2011; Altuntas et al., 2015; Saritas & Burmaoglu, 2015; Zhang et al., 2016), the most popular types of documents are as follows:
scientific publications (articles, conference proceedings, books, etc.);
patents and/or patent families (a list of patents grouped by the same object of invention);
scientific grants;
policy documents;
analytical reports and other related materials.
In order to perform a systemic research based on chosen data sources, researchers tend to scope their field of study by downloading a sample of documents from a dedicated database and using various criteria like classifications and citation indexes. Without such scoping it is impossible to judge whether enough documents are analyzed to make any conclusions due to lack of data boundaries. Early research data in the form of articles can be found in Web of Science, Scopus, PubMed. Compendex, Inspec, Medline databases. Insights about technologies close to the introduction on the market can be identified in in Derwent Thompson Innovation, Orbit, Business Source Premier, ABI Inform and other patent and business databases. Information about grants is available in the national agencies, such as National Science Foundation, National Institute of Health and many more.
Documents are extracted using existing classifications (research areas, IPC, CPC and others) or expert-defined queries (Mikova & Sokolova, 2014). To find, download and process the most relevant documents that potentially contain information about emerging S&T developments, researchers apply iterative processes from general search categories and terms to more concrete words and phrases (Haung et al., 2015).
Once documents are extracted, TM methods and tools for the processing and analysis of data need to be considered. There are two main types of data TM deals with: structured meta-data (author's name and affiliation, publication date, research areas, keywords, etc.) and text (title, abstract, the whole article or book). The main idea behind TM processing is to combine, aggregate and map different parts of data together for further interpretation. An example could be a matrix of authors' countries and publication dates to demonstrate the distribution of publications over the time for each country. The more sophisticated TM methods and concentrate on profound text analytics to, for example, identify topics based on terms in the abstract.
TM's text analytics is based on authors' keywords and their mapping with publication dates and other meta-data, or application of NLP methods to unstructured text. From the perspective of NLP, TM methods can be logically divided into two groups: those that are based on linguistic rules and patterns (Li et al., 2011), and more dependent on text statistics. The majority of newest approaches are hybrid and attempt to combine both linguistics and statistics at the same time allowing heuristics.
Application of NLP to the text generally includes sentence-splitting, tokenization (splitting words into tokens - bits of text data), part-of-speech tagging for each token (e.g. noun, adjective, verb, etc.), morphological analysis (finding the lemma for the token - the token “is” would have the lemma “be”), named entity recognition (searching for organizations' or people's names), syntactic parsing (dependencies between tokens in the sentence) and other additional functions (Manning et al., 2014). The basis of NLP is the combination of machine-learning approaches applied to very big amount of training text data, linguistic rules and heuristics (Chen & Manning, 2014).
Out of all NLP functions, the most commonly used are sentence-splitting, tokenization, part-of-speech tagging and morphological analysis as they allow finding separate terms in the text and perform various text analytics along with available meta-data. However, the application of syntactic parsing (or syntactic analysis) allows finding relationships between terms within the text. For example, (Verhaegen et al., 2009; Yoon & Kim, 2011; Yoon & Kim, 2012a; Yoon & Kim, 2012b; Park et al., 2013) developed “property-function” approach aimed at extracting and analyzing `adjective' plus `noun' (property of a system, technology or other entity) and `verb' plus `noun' (function of a system, technology or other entity) syntactic relationships. Wang et al. (2015) developed methodology to identify trends based on subject-action-object analysis (SAO) of terms connected by verbs. Angeli et al. (2015) further develop the approach of triplet analysis (SAO), or in their words “Open Information Extraction”, by deepening the extraction of inter-clause relationships, which are missed during the direct analysis of subjects and objects.
Such advancement make it possible to move forward from pure terms' statistics to identification of more sophisticated relationships between objects (technologies, products and other entities) enabling ontology development for the research field.
From the point of TM, it is also important to mention topic modeling, a machine-learning algorithm to finding patterns in the data and identifying key topics as the collection of keywords based on the set of training data (Blei, 2012). The approach is valuable for TM once there is a big set of base documents that systemically represent different S&T areas for the sake of future mapping of new documents against existing topics.
One more approach to be considered in TM is pattern analysis in time series (Assfalg et al., 2009). It allows the clustering of terms based on the similar dynamics over the time. Hence, terms with different trending behavior can be grouped separately.
Lahoti et al. (2015) in their study demonstrated the application of TM to validating and refining technology roadmap through the analysis of trends. Such approach reduces the distance between policy documents and indications of progress.
To conclude the chapter, TM's main benefits are the scope of the analysis, which is limited only to the amount of available STI-related documents, the speed (from several minutes to several hours depending on the hardware configuration and required calculations) and reproducibility. The latter benefit, in fact, brings objectivity to the analysis of S&T development since policymakers and other stakeholders may transparently see how the study is conducted if required. Moreover, that means there is a possibility for periodic monitoring and mapping results against strategic goals, as well as comparing with the results achieved during previous monitoring stages. Hence, it creates the potential to be the linking instrument between S&T development and Strategy realization.
The methodology chapter considers all the literature about the S&T priority setting, strategy development and STI policy implementation, as well as possibilities brought by TM. It combines and further enhances the state of the art TM methods and tools for the purpose of S&T development monitoring and linking it with measures of STI policy.
Methodology and Approach
Data sources
The whole methodology proposed in the study can be demonstrated using the conceptual model in the figure 2. In order to test hypothesis listed in the introduction this study uses the research field “Energy & Fuels” as the case for the application of proposed methodology.
Figure 2:The conceptual model of linking Science and Strategy through Technology Mining
Source: author's proposal
Both S&T development and STI policy measure are represented by certain types of documents. The first step in establishing Technology Mining approach for S&T development monitoring and linking it with STI policy measures is the choice of data to be processed and analyzed.
For the sake of the study as conceptual research aimed at developing the linking instrument between Science and Strategy no particular country or research area is chosen. In other words, data sources considered need to represent international development or create the global impact, and at the same time represent major areas of S&T without any specificity.
Scientific publications in high impact journals are collected under research areas and categories within databases like Web of Science (WoS) or Scopus. It provides good opportunity to analyze scientific advancements over the time. Using the data about the most cited articles (for example, 10%) within some research area for each year allows collecting documents with the highest impact. For the case study, the paper uses 10% of the most cited scientific articles in the research area of “Energy & Fuels” for each year from 2005 to 2015 in the Web of Science.
Other sources of information are not employed practically. However, from the conceptual point of view it is also important to include patents, annual scientific conference materials and various analytical reports related to the research field. In terms of patents only international patents (PCT) or triadic patent families can be considered in order to normalize the contribution from each country. Other way, due to difference in the patenting culture in each country the number and focus of patents might vary significantly.
From the perspective of STI policy related documents it is important to include relevant technology roadmaps (such as, for example, NASA roadmap for nanocoating materials for the use in space stations), R&D programmes, grants and other policy measures that contribute to the development of innovation clusters, science and technology parks, etc.
Scientific grants represent a very important S&T policy instrument designed to progress science by funding research proposals that are in line with priorities and strategic goals of a country. The US's R&D funding system heavily concentrates on the competitive grant finance and is available for public access. One of the biggest departments with an annual budget of $7.5 billion that supports the majority of S&T areas is the National Science Foundation (NSF). Its mission is "to promote the progress of science; to advance the national health, prosperity, and welfare; to secure the national defense…" (NSF, 2016). Due to openness and wide scoping of research areas in the NSF awards database it was chosen as the main source of STI policy-related documents for the case study.
Since both Web of Science and NSF cover energy-related topics, these two sources are analyzed in the attempt to find the linkage between Science and Strategy.
Identification of topics
Once sources of data are chosen, the next step is the processing of data and further identification of topics.
The present study uses the latest developments in the NLP to process abstracts of scientific articles download from Web of Science to represent 10% of the most cited documents in the area of “Energy & Fuels” for each year from 2005 to 2015.
All abstracts are split into sentences, words are tokenized and lemmatized, syntactic relationships are calculated using “property-function” approach described in the literature review.
First, bigrams (phrases that combine two words) are built using the syntactic relationships “adjective modifier” and “compound”. Based on the location in the sentences bigrams are then further combined in bigger phrases. Such phrases are checked from the perspective of statistical relevance: if the phrase does not occur more than in five documents, it is replaced by the phrase of lower amount of words. For example, if “hybrid tandem solar cell” is very rare combination, the algorithm picks “tandem solar cell” instead.
Second, all phrases are filtered by stopword lists in order to get rid of irrelevant words. That turns phrases into meaningful terms that can be used for further analysis.
Then, based on the table of terms and their relation to the document, the matrix of co-occurrence is calculated based on terms that appear in the same document. In other words, if “solar cell” and “power distribution” appear in one document in one sentence together, they get the co-occurrence value of one. If the co-occurrence is spotted in more documents, the value increases.
Once the co-occurrence matrix is built, the network analysis can be performed. Each term in the matrix represents a separate node of the graph. The amount of documents, where two terms occur, represents the weighted edge between nodes. Network analysis helps to calculate the centrality of each node (the amounts of edges with their weights that connect to the node).
Then, clustering algorithm is applied to divide graphs into subgraphs of the most closely related nodes. In other words, the more co-occurrence a group of terms have between each other, the more probability that they end up in the same cluster.
From the practical point of view the “fast greed algorithm” was used for the case study as one of the quickest graph clustering algorithms for the big amounts of data. More about the algorithm can be read: (Kumar et al., 2015).
Once clustering is done, it objectively divides all terms in several groups. These groups are perceived as S&T topics due to the maximum proximity of terms to each other in the topic based on big data statistics.
Trend spotting
After all extracted terms are grouped into topics, their trending behavior can be analyzed. For that matter the study uses occurrences of terms in the abstracts of publications over the time period (2005-2015). Such representation of data is perceived as time series where the main task is to divide all terms by their trend patterns as part of the dynamic clustering. To achieve the task Pearson's correlation coefficient was considered as similarity criteria of terms' dynamics:
where , - are two different terms; - the amount of publication abstracts in the year n where occurs; - average amount of publication abstracts over all time period where occurs.
All terms were then clustered into three types of relative trending patterns using k-means clustering:
High trending
Medium trending
Low trending
The trending index itself for each term was calculated based on the slope of the linear trend. It means, the higher the slope, the bigger the increase of occurrence in scientific article the term had from 2005 to 2015.
Based on both topic distribution of terms and trend patterns, the cross-cluster analysis was performed. For each topic it demonstrated the total amount of related terms, as well as distribution of high, medium and low trending terms. The higher the amount of high trending terms is in the topic, the more trending the topic is.
Finally, the distribution of topics along the whole time period was calculated based on co-occurrence matrices for each separate year. The structure changes of topics between all years was estimated based on the flow of terms from one topic to another during the whole time period.
Cross-linking of S&T development
Once S&T topics, trend and structure changes over the time period is identified all of these objects can be used for the linking of data with terms statistics over the years in the NSF awards.
If one S&T topic is trending during the time period of, for example 2005-2010, then NSF awards for the period of approximately 2000-2010 need to be analyzed using the same terms as in the S&T topic.
Once statistics for NSF awards is calculated, the trends on the timeline can be compared. If NSF awarded organizations for the topic that become trending over the next several years, it demonstrates the direct linkage between STI policy instrument and S&T development.
The same way, if S&T topic become trending in the scientific publications and later NSF awarded organizations doing research for similar topic that demonstrates other type of linkage: trends affecting policy.
If no connection is found between NSF awards and scientific publications it might mean two situations. First, it can be a result of a “scientific push” or a breakthrough, which has not been considered by policy makers yet. This is a serious warning for policy makers. Second, it can be a result of a “policy pull”, which has not yet resulted in the appeared of any trend. That might advocated for the lack of progress in the research field or its complexity.
The further decision on the level of policy makers should consider whether to continue R&D funding or not.
Findings
Semantic map of S&T terms
Based on the proposed methodology the following findings were achieved. On the figure 3 the semantic map of S&T terms is displayed. Colors demonstrated several topics. The name of the topics are generated in the automated way based on the most central element
Figure 3:Semantic map of S&T terms for research area “Energy & Fuels” with topic names
Source: HSE Global Trend Monitoring System, calculations based on the abstracts of 10% of the most cited Web of Science articles in the research area “Energy & Fuels”
As a result, the following list of topics was generated for the research area of “Energy & Fuels”:
Energy consumption - topic of energy systems, power management and integration of renewable sources of energy
Anode material - topic of batteries
Hydrogen production - topic about hydrogen and biomass production
Carbon dioxide - topic about fossil fuels, diesel and ecological problems
Solar cell - topic about solar energy
Fuel cell - topic about fuel cells and catalytic activity
X-ray diffraction - topic about properties of light, various spectroscopy and microscopy methods, or, in other words, photonics
Chemical process - topic about process happening with oil
Figure 4:Semantic map of S&T terms for research area “Energy & Fuels” with all terms
Source: HSE Global Trend Monitoring System, calculations based on the abstracts of 10% of the most cited Web of Science articles in the research area “Energy & Fuels”
Topic diagram
Topic diagrams represent the distribution of terms based on their centrality and trending index. Calculations are described in the methodology part.
Figure 5:Scatter plot for the topic “Energy consumption”
Source: HSE Global Trend Monitoring System, calculations based on the abstracts of 10% of the most cited Web of Science articles in the research area “Energy & Fuels”
Figure 6:Scatter plot for the topic “Anode material”
Source: HSE Global Trend Monitoring System, calculations based on the abstracts of 10% of the most cited Web of Science articles in the research area “Energy & Fuels”
Figure 7:Scatter plot for the topic “Hydrogen production”
Source: HSE Global Trend Monitoring System, calculations based on the abstracts of 10% of the most cited Web of Science articles in the research area “Energy & Fuels”
Figure 8:Scatter plot for the topic “Carbon dioxide”
Source: HSE Global Trend Monitoring System, calculations based on the abstracts of 10% of the most cited Web of Science articles in the research area “Energy & Fuels”
Figure 9:Scatter plot for the topic “Solar cell”
Source: HSE Global Trend Monitoring System, calculations based on the abstracts of 10% of the most cited Web of Science articles in the research area “Energy & Fuels”
Figure 10:Scatter plot for the topic “Fuel cell”
Source: HSE Global Trend Monitoring System, calculations based on the abstracts of 10% of the most cited Web of Science articles in the research area “Energy & Fuels”
Figure 11:Scatter plot for the topic “X-ray diffraction”
Source: HSE Global Trend Monitoring System, calculations based on the abstracts of 10% of the most cited Web of Science articles in the research area “Energy & Fuels”
Figure 12:Scatter plot for the topic “Chemical process”
Source: HSE Global Trend Monitoring System, calculations based on the abstracts of 10% of the most cited Web of Science articles in the research area “Energy & Fuels”
S&T trends
Based on the distribution of trending terms in topics it can be witnessed that following topics are the most developing in the “Energy & Fuels”:
Solar cells
Anode material
Energy consumption
Fuel cell
Figure 13:“Energy & Fuels” topics' distribution diagram
Source: HSE Global Trend Monitoring System, calculations based on the abstracts of 10% of the most cited Web of Science articles in the research area “Energy & Fuels”
Figure 14:Structure changes of “Energy & Fuels” topics
Source: HSE Global Trend Monitoring System, calculations based on the abstracts of 10% of the most cited Web of Science articles in the research area “Energy & Fuels”
The structure changes demonstrate the rising interest in solar cells and convergence with studies around catalytic activities and photonics.
Science and STI policy linkages
Following trends were identified and used for the linkage analysis with NSF awards.
Based on the structure changes and S&T topic diagram, solar cell is one of the most popular topics in the “Energy & Fuels” area. Based on NSF data, this area is highly supported by NSF grants, thus such linkage can be called “policy-induced”.
Figure 15:NSF grants for “solar energy” topic, 2000-2015
Source: Analysis performed using NSF Awards Database, data retrieved May 04, 2016 from http://www.nsf.gov/
Notes: the search query is "SOLAR CELL" OR "SOLAR FILM"OR "PHOTOVOLTAIC CELL" OR "PHOTOVOLTAIC FILM" OR "SOLAR ENERGY" OR "SOLAR BATTERY"
One of the most promising trends in “solar cell” topic is “perovskite solar cell”.
Figure 16:Trend analysis of “perovskite solar cell”
Source: HSE Global Trend Monitoring System, calculations based on the abstracts of 10% of the most cited Web of Science articles in the research area “Energy & Fuels”
Figure 17:“Perovskite solar cell” on the semantic map of S&T terms
Source: HSE Global Trend Monitoring System, calculations based on the abstracts of 10% of the most cited Web of Science articles in the research area “Energy & Fuels”
Based on the table 1 of NSF awards data about perovskite solar cells it can be concluded, that policy did not exist before “scientific push” or breakthrough. Hence, the S&T trend affect further policy making - second type of linkage between Science and Strategy.
Table - NSF awards for perovskite solar cells
Award Number |
Title |
Program(s) |
Start Date |
End Date |
Awarded Amount To Date |
|
1437656 |
Surface Analytical Investigation on Organometal Trihalide Perovskite |
ENERGY FOR SUSTAINABILITY |
08/01/2014 |
07/31/2017 |
$330,000.00 |
|
1437230 |
Understanding Excitons for Lead-Free Perovskite Photovoltaics |
ENERGY FOR SUSTAINABILITY |
08/15/2014 |
07/31/2017 |
$329,395.00 |
|
1438681 |
Organometal Halide Perovskites: Sequential Vapor Deposition And Device Study Toward Highly Efficient Thin-Film Solar Cells |
ENERGY FOR SUSTAINABILITY |
09/01/2014 |
08/31/2017 |
$345,000.00 |
|
1515619 |
EAPSI: Illuminating fundamental material properties to enable highly efficient and inexpensive solar cells |
EAPSI |
06/01/2015 |
05/31/2016 |
$5,070.00 |
|
1505535 |
Combined Macroscopic and Nanoscopic Studies of the Photovoltaic Behavior of Organic Perovskite Materials |
ELECTRONIC/PHOTONIC MATERIALS |
07/01/2015 |
06/30/2018 |
$316,306.00 |
|
1507351 |
Collaborative Research: High Efficiency Tandem Perovskite-Copper Indium Selenide Solar Cell |
ELECT, PHOTONICS, & MAG DEVICE |
07/01/2015 |
06/30/2018 |
$190,000.00 |
|
1507291 |
Collaborative Proposal: High Efficiency Tandem Perovskite/CIS Solar Cell |
ELECT, PHOTONICS, & MAG DEVICE |
07/01/2015 |
06/30/2018 |
$190,000.00 |
|
1464735 |
Infrared Electro-Optical Spectroscopy of Degradation Pathways in Organo-Halide Perovskite Photovoltaics |
Chem Struct,Dynmcs&Mechansms A |
08/01/2015 |
07/31/2018 |
$395,396.00 |
|
1538893 |
RII Track-2 FEC: Low-Cost, Efficient Next-Generation Solar Cells for the Coming Clean Energy Revolution |
RESEARCH INFRASTRUCTURE IMPROV |
08/01/2015 |
07/31/2019 |
$2,000,000.00 |
|
1509955 |
High performance Perovskite/CIGS tandem solar cells |
OFFICE OF MULTIDISCIPLINARY AC, ELECT, PHOTONICS, & MAG DEVICE, ELECTRONIC/PHOTONIC MATERIALS |
08/01/2015 |
07/31/2018 |
$449,819.00 |
|
1549917 |
CAREER: Scalable Electrospray Processing of High-Efficiency Perovskite Solar Cells |
Manufacturing Machines & Equip, NANOMANUFACTURING, Materials Eng. & Processing |
08/10/2015 |
07/31/2020 |
$500,000.00 |
|
1506504 |
Characterization and Control of Structure, Energetics and Electrical Properties at Interfaces between Perovskite Active Layers and Charge-Collection Electrodes |
ELECTRONIC/PHOTONIC MATERIALS |
08/15/2015 |
07/31/2017 |
$300,000.00 |
|
1510121 |
SusChEM: Collaborative Research: Hybrid perovskite inspired pathways towards green and stable ionic PV absorbers |
ENERGY FOR SUSTAINABILITY |
08/15/2015 |
07/31/2018 |
$157,418.00 |
|
1510948 |
SusChEM: Collaborative Research: Hybrid perovskite inspired pathways towards green and stable ionic PV absorbers |
ENERGY FOR SUSTAINABILITY |
08/15/2015 |
07/31/2018 |
$252,581.00 |
|
1507803 |
Femtosecond Microscopy of Charge Transport in Perovskite Thin Films |
CONDENSED MATTER PHYSICS |
09/01/2015 |
08/31/2018 |
$429,940.00 |
|
1639790 |
I-Corps: Customer Discovery for Light Weight Photovoltaics |
I-Corps |
05/01/2016 |
10/31/2016 |
$50,000.00 |
Source: Analysis performed using NSF Awards Database, data retrieved May 04, 2016 from http://www.nsf.gov/
Notes: the search query is PEROVSKITE AND ("SOLAR CELL" OR "SOLAR FILM"OR "PHOTOVOLTAIC CELL" OR "PHOTOVOLTAIC FILM")
Discussion and Conclusions
Theoretical implications
Science and Technology development is a foundation for economic growth of any country. Foresight helps to set long-term priority, develop strategy and suggest STI policy instruments for strategy realization. However, high money and time cost and uncertainty of success of R&D, the chaotic nature of S&T advancements, technology diffusion and knowledge exchange between different research areas create the demand for periodic interlinking of S&T development with Strategy realization and further adjustment of policy.
Growing amount of data related to STI sphere brings both challenges and opportunities. On the one hand, it makes it almost impossible for experts in various research fields to keep track of all the latest advancements. On the other hand, it forces new ways to handle big data and provide evidence-based STI policy.
The paper develops an approach for linking of Science and Strategy through the application of Technology Mining to data sources indicating S&T development and STI policy implementation. It reviews existing TM methods and tools and suggests the processing of various STI-related data sources for identification of S&T topics and trends.
From theoretical point of view, one of the implications is the described interconnection between three research fields: Foresight, STI policy and Technology Mining. Foresight acts as a priority setting participatory framework, STI policy - as instrument for achieving strategic goals, and Technology Mining - as provider of methods and algorithms for processing and analyzing big amounts of data in order to link and monitor S&T development along with STI policy implementation. technology economic production policy
Moreover, the paper raises the importance of trend monitoring as one of the main ways to track S&T development and its structure changes over the time. Trends in this sense represent quantitative changes in the dynamics of occurrence of certain phrases (e.g. the rising occurrence of “solar cell” topic over the last 10 years) in documents. Relative grouping of phrases from the perspective of the growth, stagnation or decrease of their occurrence allows assessing and comparing S&T topics as clusters of such phrases.
Suggested by the paper trend-monitoring algorithm applied to the research area of “Energy & Fuels” demonstrate evidence-based formulation of S&T topics and trends using abstracts of scientific publications as a core of phrase statistics. Some of the trends identified (e.g. “perovskite solar cell”) were validated using external sources, such as MIT Technology Review.
Furthermore, the paper makes progress in bridging topics and trends identified during the analysis of different types of data sources (e.g. scientific publications and grants) and correlation of them on the timeline in terms of impact. It became evident, that linkages between S&T development and STI policy measures may be of different orientation. For example, “perovskite solar cells”, an emerging R&D trend in solar energy with high promises towards its power conversion efficiency and low cost per film, appeared in scientific publications in the form of breakthrough. Hence, after the realization of their potential policy makers attempt to support further development with grants and R&D programmes. On the contrary, the majority of “green technologies” are supported by STI policy, which leads to various trends later in scientific publications.
All of the described results lead to one major conclusion. Strategy affects Science and Technology development, as well as scientific breakthroughs impact the way the Strategy is realized. STI policy is the way to control and adjust the process of achieving strategic goals. By interlinking the contents of STI policy documents (texts of grants, R&D programmes, etc.) and documents indicating S&T development (scientific publications, patents, conference materials, STI news, analytical reports) through Technology Mining methods and tools it is possible to keep policy makers and other stakeholders aware of the progress, related challenges, weak signals of new developments and dangerous wild cards.
Practical implications
The paper suggest a wide range of Technology Mining methods and tools for evidence-based identification of Science and Technology topics, trends of development and structure changes over the time. Based on the trend data from different sources the bridging algorithm for linking S&T development and STI policy implementation is introduced.
From the practical point of view, the paper gives big focus to data sources used for the TM analysis. For spotting S&T development the paper identifies Web of Science, Scopus and similar databases of scientific publications as one of the most important sources.
Patent databases are not demonstrated in the form of cases studies, but such resources as Orbit FamPat and Derwent Thompson Innovation are listed as potential systemic sources of data.
For the assessment of STI policy an attention is given to the US grant system. More specifically, different sources of funding, such as National Institute of Health, Department of Energy, NASA, National Science Foundation and others are researched from the point of practical application of TM methods and tools.
Case studies in the area of “Energy & Fuels” and more specifically solar energy are performed using 10% of the most cited scientific publications and all NSF awards abstracts. That gives justification for TM methods developed and used within the work.
Finally, based on all suggested approaches, the Global Technology Trend Monitoring System (GTMS) was developed under the ownership of Institute for Statistical Studies and Economics of Knowledge (ISSEK), National Research University Higher School of Economics as the main practical implication of the work. ISSEK employees now use the system for the analysis of S&T topics and trends in various research areas, extraction of relevant keywords, collection of bibliometric statistics and many more.
Limitations
The main limitation of the work is high dependence on the availability, quality and quantity of data sources indicating separately S&T development and implementation of STI policy used by Technology Mining methods and tools.
The paper suggest scientific publications, namely the most cited articles in the Web of Science database, as one of the main sources of data indicating scientific advancements. However, even in such systemic databases there is a bias towards certain research fields accepted by journals, which are indexed by WoS. In other words, journals play major role in the choice of topics used later in the analysis. That creates a barrier for new topics to emerge unless they are seriously considered and taken into account by journals' editors and reviewers.
Furthermore, the citation count itself is a rather controversial indicator. On the one hand, it allows choosing papers with the highest impact on Science. On the other hand, human factor and scientist networks are big determinants of citations made in publications. The more people in the research field are familiar with each other, the more they meet during various conferences, the bigger the chance for them to read each other's publications and cite them. That creates a big difficulty for new researchers with bright ideas to popularize their works and create real impact.
The time lag between application of the scientific work and its publication is also to be considered. Normally it takes up to one year for researchers to progress through all reviewing and revision process to have their work publicly available. University's working papers in that case partially solve the problem, but are not a complete solution.
Finally, the systemic collection of documents related to STI policy instruments implementation related to the realization of strategic goals may not always be an easy option for analysts. In the case of “Energy & Fuels”, only NSF awards were used to map with S&T development. To broaden the work, other types of data needs to be also considered, such R&D programmes and, for examples, awards by the Department of Energy.
Further research
The future work will focus on the extension of employed data sources related to either S&T development or STI policy implementation, analysis of full text of documents rather than only abstracts and development of ontologies for deeper semantic analysis of relationships in the text.
Data sources will be extended to systemically cover weak signals related to various S&T developments, as well as forecasted market values of future products and technologies. Those sources will include materials of international scientific conferences, which tend to keep up with the newest ideas and developments, and various analytical reports, such as market research, forecasts by consultancies, venture funds and other private companies. Information from listed sources will be mapped with S&T development and STI policy instruments through Technology Mining methods and tools based on similar terminology and topics.
...Ïîäîáíûå äîêóìåíòû
Õàðàêòåðèñòèêà, äåÿòåëüíîñòü ïðåäïðèÿòèÿ, ïëàí ðàáîòû ñ ïåðñîíàëîì. Ñòðóêòóðà ïðåäïðèÿòèÿ è îòäåëà. Òåõíîëîãèÿ ActiveX Data Objects ADO â Delphi. Êîíöåïöèÿ è áàçîâûå îáúåêòû ADO. Êîìïîíåíòû Delphi äëÿ ïîääåðæêè ADO. Ñõåìà ñâÿçè ñ îáúåêòîì ADO â Delphi.
ðåôåðàò [26,4 K], äîáàâëåí 22.11.2010The concept of economic growth and development. Growth factors: extensive, intensive, the growth of the educational and professional level of personnel, improve the management of production. The factors of production: labor, capital and technology.
ïðåçåíòàöèÿ [2,3 M], äîáàâëåí 21.07.2013Productivity Growth in Agriculture: Sources and Constraints. Agriculture in Development Thought. Transition to Sustainability. Economic understanding of process of agricultural development. Technical changes and improvement of efficiency of agriculture.
êîíòðîëüíàÿ ðàáîòà [31,5 K], äîáàâëåí 18.07.2009Characteristic of growth and development of Brazil and Russian Federation. Dynamics of growth and development. Gross value added by economic activity. Brazilian export of primary and manufactured goods. Export structure. Consumption side of GDP structure.
ðåôåðàò [778,3 K], äîáàâëåí 20.09.2012Asian Development Fund. Poverty reduction in Asia and the Pacific. Promotion of pro poor, sustainable economic growth. Supporting social development. Facilitating good governance. Long-term Strategic Framework. Private, financial sector development.
ïðåçåíòàöèÿ [298,7 K], äîáàâëåí 08.07.2013The history of translation studies in ancient times, and it's development in the Middle Ages. Principles of translation into Greek, the texts of world's religions. Professional associations of translators. The technology and terminology translation.
äèïëîìíàÿ ðàáîòà [640,7 K], äîáàâëåí 13.06.2013Characteristics of sausages, of raw and auxiliary materials. Technology of production of dry sausage enzymatic. Technological line for crude smoked sausage production. Requirements for the finished product, for quality sausage. Defects of sausages.
êóðñîâàÿ ðàáîòà [303,1 K], äîáàâëåí 01.05.2011Characteristics of the economic life of Kazakhstan in the post-war years, the beginning of economic restructuring on a peace footing. Economic policies and the rapid development of heavy industry. The ideology of the industrial development of Kazakhstan.
ïðåçåíòàöèÿ [1,3 M], äîáàâëåí 13.12.2014Prospects for reformation of economic and legal mechanisms of subsoil use in Ukraine. Application of cyclically oriented forecasting: modern approaches to business management. Preconditions and perspectives of Ukrainian energy market development.
ñòàòüÿ [770,0 K], äîáàâëåí 26.05.2015Imperialism has helped countries to build better technology, increase trade, and has helped to build powerful militaries. During 19th century America played an important role in the development of military technologies. Militarism led to the World War I.
êîíòðîëüíàÿ ðàáîòà [20,2 K], äîáàâëåí 26.01.2012Geography and the climate of the Great Britain. The history of the formation and development of the state. The figures of the country's policy. Level of economic development and industries. Demographic characteristics. The education and culture of the UK.
êóðñ ëåêöèé [117,9 K], äîáàâëåí 12.11.2014Classical and modern theories of the international trade. Concept and laws of development of the international trade. Structure and the basic commodity streams of the international trade at the present stage of development. Foreign trade of the Russia.
êóðñîâàÿ ðàáîòà [15,8 K], äîáàâëåí 25.02.2009Development of computer technologies. Machines, which are able to be learned from experience and not forget that they studied, and able to work unassisted or control of man. Internet as global collection of different types of computer networks.
òîïèê [10,3 K], äîáàâëåí 04.02.2009The history of the development of Internet banking in Kazakhstan and abroad. Analysis of the problems faced by banks in the development of this technology. Description of statistical of its use and the dynamics of change. Security practices for users.
ïðåçåíòàöèÿ [1,3 M], äîáàâëåí 24.05.2016The influence of the movement of refugees to the economic development of host countries. A description of the differences between forced and voluntary migration from the point of view of economic, political consequences. Supply in the labor markets.
ñòàòüÿ [26,6 K], äîáàâëåí 19.09.2017The most important centers of the Belarusian national revival. Development of public libraries in Byelorussia. Value Hlebtsevicha as a great researcher of library science, his contribution to development of network of free libraries in Byelorussia.
ñòàòüÿ [8,2 K], äîáàâëåí 14.10.2009Theoretical aspects of efficiency of development of advertising activity and your place in marketing system, development and its value for manufacturers and consumers. Research of the advertising campaign of the new goods in open company "Nataly".
äèïëîìíàÿ ðàáîòà [49,3 K], äîáàâëåí 19.06.2010Modern sources of distributing information. Corpus linguistics, taxonomy of texts. Phonetic styles of the speaker. The peculiarities of popular science text which do not occur in other variations. Differences between academic and popular science text.
êóðñîâàÿ ðàáîòà [24,6 K], äîáàâëåí 07.02.2013Evolutionary and revolutionary ways of development of mankind. Most appreciable for mankind by stages of development of a civilization. The disclosing of secret of genome of the man. Recession in an economy and in morality in Russia. Decision of problems.
ñòàòüÿ [12,1 K], äîáàâëåí 12.04.2012The importance of English phonetics and phonology. Phonetics as an independent branch of linguistics. Phonetics as a science. The history of phonetics. Connection with other sciences. Development of phonology. Differences between phonetics and phonology.
êóðñîâàÿ ðàáîòà [23,2 K], äîáàâëåí 11.01.2014