Главная Коллекция "Revolution" Программирование, компьютеры и кибернетика Transposon recognition by machine learning methods

Transposon recognition by machine learning methods

An overview of machine learning applications for analyzing genome data. Molecular medicine and gene therapy. DNA and RNA are mobile genetic elements. Possible applications of transposons. Cross check software algorithm. Recognition of relocatable items.

Рубрика	Программирование, компьютеры и кибернетика
Вид	дипломная работа
Язык	английский
Дата добавления	10.12.2019
Размер файла	1,5 M

посмотреть текст работы

скачать работу можно здесь

полная информация о работе

весь список подобных работ

Отправить свою хорошую работу в базу знаний просто. Используйте форму, расположенную ниже

Студенты, аспиранты, молодые ученые, использующие базу знаний в своей учебе и работе, будут вам очень благодарны.

Страница:

· Firstly, dinucleotide shuffling algorithm was used to obtain shuffled sequences. Hence, compositional constitution in terms of nucleotide pairs of shuffled L1 is the same as of the unshuffled L1. The same situation is observed with Alu transposable element sequences and shuffled Alu.

· Second fact is that L1 and Alu tail untranslated regions are quite different in terms of sequences composition. Hence, they could be distinguished well.

The feature importance analysis (See Fig. 12) showed significant di- and tri-nucleotides for each model. The analysis revealed that different groups of dinucleotide pairs and triplets are identified as the most influential and important in recognition of 3' untranslated regions 50 base pairs L1 from shuffled and Alu from shuffled sequences. The substantial interest of this research is recognizing the combined set of L1 and Alu 3' untranslated regions 50 base pairs and extraction of common significant features characteristic to this set. For the case of 3'-end 50 base pairs sequences, these features are mostly trinucleotides rather than dinucleotides, and mostly GC-rich trinucleotides. This fact reflects the existence of compositional bias of 3'-ends of both transposable elements.

Figure 12. Feature importance heatmap of 50 base pairs model

4.3 Stem-loop statistical models

4.1 Stem-loop statistical models

Second model relies on extracted sequence characteristics from 3'-end stem-loops. Shuffled sequences were used as an alternative class.

Resulting Receiver Operating Characteristic of the modeling using only sequence characteristics to recognize 3'-end stem-loops is presented in Figure 13. In recognition of L1 3' untranslated region or Alu 3'-end stem-loops or their combined set from stem-loops of shuffled sequences all constructed models achieved performance AUC>=0.96. Generally speaking, sequence-based models for stem-loop recognition indicate almost the same performance as models recognizing 50 base pairs ends of mobile genetic elements, which are the expected results.

Figure 13. Receiver Operating Characteristic of stem-loop sequence based model.

Stem-loop sequence statistical models' Receiver Operating Characteristic AUC is slightly smaller than the 50 base pairs sequence-based model, this could be explained by the fact that a sequence of a stem-loop is 20-30 bases long, which is significantly shorter than 50 bases, considered by previous models set.

As can be seen in the Precision-Recall plot (Figure 14), all models are showing good results in terms of True Positive and True Negative predictions. Training datasets are well-balanced.

Figure 14. Precision-Recall of stem-loop sequence based model.

The feature importance analysis (See Figure 15) showed that top-10 most significant features are mostly nucleotide pairs both for recognition of L1 and Alu 3' untranslated region stem-loops jointly or separately, and these are inverse complementary pairs of dinucleotides such as CA-GT, TC-GA for the experiments L1 versus shuffled and AG-CT, TC-GA for Alu versus shuffled reflecting inner compositional constraints of the analyzed sequences.

There are features that have significant importance values in all experiments, such as CA, GA and AC dinucleotides. Also, there are some clusters of features, for example, all transposon stem-loops are clearly different from shuffled sequences in terms of TGC triplets count.

Figure 15. Feature importances heatmap plot of stem-loop sequences based models.

4.4 Physical, chemical and structural property-based models

The last machine learning models were built based on structural and physical properties of a stem and taking into account bulges and loops. Also, it considered position-specific nucleotide pairs properties of a stem including geometrical features such as twist, tilt, rise, bent, shift and slide and other kinds of characteristics: physical and chemical features such as entropy, enthalpy, Gibbs free energy, and hydrophilicity. For loops position-specific sequence composition was considered, taking into account only the first five positions of the loop creating 20 binary features describing a loop. Also, a bulge was considered with the size of 3 nucleotides from the left and/or from the right part of a stem (if any was present). All bulge nucleotide positions also were fixed and 24 binary features were set up to characterize it. Obtained results of the modelling are shown in Table 1.

Figure 16. Receiver Operating Characteristic of physical, chemical and structure properties-based models.

ROC curves, presented at Figure 16 show that all classes are clearly distinguishable. Precision-recall plots, presented at Figure 17 indicate that classes are well-balanced and models are good in terms of balance of True Positive and True Negative predictions.

Even though some ROC AUC metrics of stem-loop structural, physical and chemical parameter-based models are slightly lower than ROC AUC metrics of previous models, this predictor set usage could be justified by a lot higher interpretability of the chosen characteristics.

Figure 17. Precision-Recall of physical, chemical and structure properties-based models.

Model's feature importance analysis showed that L1 and Alu 3'-end stem-loops are recognized from shuffled stem-loops taking into account almost completely different sets of top-10 characteristics. The comprehensive analysis of the combined set of L1 and Alu 3' untranslated region stem-loops revealed the properties that are significant for both transposon families.

Following structural characteristics: roll, shift, and tilt happened to be the most significant parameters in recognizing L1 3' untranslated region stem-loop from stem-loops from shuffled sequences (shuffled stem-loops), as well as two stem positions counting from the loop (positions LS0, close to the loop, and LS3, close to the bulge) (see Figure 18). Many parameters are identified as important in distinguishing Alu's stem-loops from shuffled stem-loops for the first stem nucleotide position counting from the loop (position LS0 in Figure 19). Also, most important features include the following characteristics of different classes: energy features, such as enthalpy and free energy; geometric characteristics rise and tilt, and hydrophilicity as well. The top-10 most important parameters for the joint set of L1 and Alu 3' untranslated region stem-loops also include the geometrical parameter rise for two stem positions, tilt, shift, roll, and hydrophilicity for four stem positions (Figure 19).

Taking into consideration solely positions having at least one important parameter, the first stem position was showed as important for three parameters: rise, hydrophilicity and tilt (position LS0 in Figure 19); the fourth nucleotide position of stem (LS3) was considered important for two features: shift and roll. Four out of ten significant parameters for recognition of the combined set L1-Alu 3' untranslated region stem-loops are hydrophilicity at stem positions close to the loop (LS0, LS1) and at stem positions close to the base (LS6, LS7).

Figure 18. Feature importance heatmap of physical, chemical and structure properties-based models.

Figure 19. Top-10 most important features of physical, chemical and structural property-based models.

The short summary of the most important parameters and positions shown as significant in all discussed experiments is presented in Figure 20. The most frequently mentioned parameter is hydrophilicity (appeared 11 times) followed by the rise (9), loop-specific positions (9), tilt (6), and shift (5). Three stem positions were highlighted as significant: first dinucleotide position next to the loop (LS0, 17 times), position close to a bulge (LS3, 10 times), and positions near a stem base LS6, LS7 (7 times).

Figure 20. Top-10 most important features of physical, chemical and structural property-based models.

As can be seen in Figure 21, there are several dense regions in terms of most important features position-wise concentration. First nucleotides of the stem (LS0), i.e. the nucleotide close to a loop, are considered as the most important by almost all discussed models. Also, the third position (LS3) is considered significant by 5 of totally 6 models. Models that considered Alu mobile genetic elements, pointed to LS6-7 and LS0 coordinates, allowing us to suggest that there exist some characteristics, specific to that transposons family.

Figure 21. Top-10 most important features of physical, chemical and structural property-based models mapped to positions of stem-loops.

As can be seen in Figure 22, the most important features, mapped to specific positions of stem-loops, are clustered in several groups, e.g. first three experiments revealed importance of LS3 position, as well as experiments, included Alu sequences, showed up significance of other positions. It is also evident, that bulge positions do not present in top 10 significant features is experiments, included Alu transposable elements.

Figure 22. Most important features of physical, chemical and structural property-based models mapped to positions of stem-loops heatmap plot.

4.1 4.5 Combined results analysis

Table 1 Recognition of 3'-ends and 3'-end stem-loops of L1 and Alu sequences.

Class 1	Class 2	AUC	Accuracy	Precision	Recall
		50bp	SL1	SL2	50bp	SL1	SL2	50bp	SL1	SL2	50bp	SL1	SL2
Alu 3'-end	shuffled	0.99	0.99	0.99	0.99	0.97	0.98	0.99	0.98	0.99	0.99	0.96	0.97
Alu 3'-end	L1 3' UTR	0.99	0.99	0.99	0.99	0.99	0.99	0.99	0.99	0.99	0.99	0.99	0.99
Alu 3'-end + L1 3' UTR	shuffled	0.99	0.98	0.99	0.98	0.95	0.97	0.98	0.97	0.99	0.98	0.93	0.94
L1 3' UTR	shuffled	0.99	0.97	0.98	0.98	0.93	0.94	0.98	0.96	0.98	0.98	0.90	0.90
L1 3' UTR	L1 5' UTR	1.00	0.99	0.99	0.99	0.96	0.98	0.99	0.96	0.98	0.99	0.97	0.98
L1 5' UTR + L1 3' UTR	shuffled	0.99	0.98	0.98	0.97	0.93	0.94	0.97	0.96	0.99	0.96	0.90	0.89

Table 1 Recognition of 3'-ends and 3'-end stem-loops of L1 and Alu sequences.

Note: 50bp denotes models for recognizing 50 base pairs 3'-ends, SL1 denotes models for recognizing 3'-end stem-loops with sequence-based models, and SL2 denotes models for recognizing 3'-end stem-loops with physical, chemical and structure-based models.

SUMMARY

This chapter contains the review of the conducted experiments. There are Receiver Operating Characteristic, Precision-Recall plots and feature importance heatmaps presented. These graphs indicate that all classes of sequences used are clearly distinguishable.

It was also shown that there is a small group of 5 top most commonly significant feature groups such as hydrophilicity (appeared 11 times) followed by the rise (9), loop-specific positions (9), tilt (6), and shift (5). One of the key results of these experiments are sets of common most important features that could be used to separate L1 and Alu transposons from shuffled sequences.

DISCUSSION

Main goal of this research, was an exploration of the ability of machine-learning models to recognize 3'-UTR and 3'-UTR stem-loops of the most active transposons in human genome: L1 and Alu separately and as a joint set as well as L1 and Alu or L1 5'-end and 3'-end between each other. Two types of machine-learning models were constructed, considering separate types of features, encoding sequence composition and chemical, physical, and geometrical properties of nucleotide pairs, calculated from different experiments. Sequence-based models consider nucleotide composition of a whole sequence, while structure-based models use only structural properties of a stem and position-specific characteristics of loops and bulges.

Generally, the so-called structure-based model also uses sequence information. It is mapped to a different alphabet of dinucleotide properties. All these characteristics have been revealed from experiments, but the experiments were done for a narrow class of RNA duplexes. Obtaining structural properties of RNA dinucleotides required X-ray crystallographic images of RNA duplexes to be analyzed and later filtered out for structures containing proteins, drugs, mismatches, overhanging bases or unusual bases in canonical Watson-Crick pairings [42]. Analysis of the remaining structures was performed by authors at the canonical base-pair level. Resulting geometrical (shift, slide and rise, tilt, roll and twist) parameters were matching to averages for crystallographic static images of naked RNA. As was claimed in [42], a considerable amount of experimental data was removed by filtering, and it ensures that perturbations are within the harmonic limit.

Enthalpy, entropy and free energy used in the present research were taken from [43]. These parameters are obtained from optical melting of RNA duplexes considering compositions of ends. Hydrophilicity for 16 dinucleotides were also calculated from experiments as described in [44].

Selected features match to dinucleotides and, generally, the model used numerical mapped representations of the sequence-based information, but this approach has two key advantages:

· Machine-learning algorithms operate better with numerical values rather than with one-hot encoded integers resulting from sequence's k-mers statistics.

· Machine-learning algorithms are capable to capture some tendencies related to adjacent nucleotide pairs and thus can be interpreted.

Both models showed comparable performance (AUC>96-99%) in all experiments, but feature importance analysis of the structure-based models showed characteristics that are more significant for L1 and Alu 3'-end stem-loops. These properties contained shift, rise, tilt, and hydrophilicity.

The crystal structure of proteins that bind a stem-loop showed that the entire stem-loop and its flanking sequence regions take part into binding with those proteins [45]. It was experimentally shown that proteins recognize the shape of the stem-loop rather than exact nucleotides, but the sequence determines the shape of it. The recognition specificity is provided by the nucleotides in the loop: the first (U) and the third (U) nucleotides are very conserved.

Here, we checked RNA structural properties for stem-loops at the 3' UTR of L1 and Alu sequences with different machine learning models. The results of the modeling showed both stem and loop parameters among the top 10 most significant features either in recognizing only L1 and only Alu 3'-end stem-loops or a joint set of Alu and L1 elements.

The geometrical parameter shift as well as rise and slide are translational helical parameters as opposed to rotational (tilt, roll and twist). These parameters affect the width and height of the double-stranded RNA that would contribute to the height and width of the stem. All these characteristics influence flexibility of A-RNA. In transposon stem-loop recognition it is a subject for evolutionary selection. The fact that energy parameters of the dinucleotides next to the loop as in the case of Alu stem-loops appeared to be important, reflects some structural peculiarity of Alu families since these energetic parameters did not appeared as significant in L1 stem-loops.

CONCLUSION

In this Master's Thesis a data processing and machine learning pipelines were proposed in order to explore the ability of machine-learning models considering both sequence and structure information about 3'-ends to distinguish mobile genetic elements families from shuffled sequences and each other. Two types of machine-learning models were constructed, using different sets and types of features in order to prove the hypothesis that transposons 3'-ends are important for the active retrotransposition. Proposed models revealed some features which are important for recognition of transposable elements. These characteristics may be significant for transposon recognition by cellular machinery.

Performed analysis and the obtained results have a potential to be used for building next-generation genome editing techniques. Results of this research can be used in further experiments, including in vitro testing in order to reveal transposition mechanism details. Personalized medicine global market is one of the fastest growing spheres of economics, and it relies mainly on modern data processing techniques. The experimental pipeline, presented in this Master's Thesis, is an attempt to build an analytical system for transposon recognition.

Размещено на http://www.allbest.ru/

Размещено на http://www.allbest.ru/

Bibliography

transposon software algorithm

[1]. “Global Precision Medicine Market: Analysis and Forecast 2017-2026.” .

[2]. “(PDF) Survey on the Emergence of Big Data,” ResearchGate. [Online].. Available: https://www.researchgate.net/publication/323116268_Survey_on_the_Emergence_of_Big_Data. [Accessed: 07-Apr-2019]..

[3]. J. D. Watson and F. H. C. Crick, “Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid,” Nature, vol. 171, no. 4356, p. 737, Apr. 1953.

[4]. Mouse Genome Sequencing Consortium et al., “Initial sequencing and comparative analysis of the mouse genome,” Nature, vol. 420, no. 6915, pp. 520-562, Dec. 2002.

[5]. P. Deininger, “Alu elements: know the SINEs,” Genome Biol., vol. 12, no. 12, p. 236, Dec. 2011.

[6]. “Going non-viral: the Sleeping Beauty transposon system breaks on through to the clinical side: Critical Reviews in Biochemistry and Molecular Biology: Vol 52, No 4.” [Online].. Available: https://www.tandfonline.com/doi/full/10.1080/10409238.2017.1304354. [Accessed: 13-Apr-2019]..

[7]. D. Bouard, N. Alazard-Dany, and F.-L. Cosset, “Viral vectors: from virology to transgene expression,” Br. J. Pharmacol., vol. 157, no. 2, pp. 153-165, May 2009.

[8]. K. Hu et al., “High-performance gene expression and knockout tools using sleeping beauty transposon system,” Mob. DNA, vol. 9, no. 1, p. 33, Nov. 2018.

[9]. “Enhancement of adenovirus infection and adenoviral vector-mediated gene delivery by bromodomain inhibitor JQ1 | Scientific Reports.” [Online].. Available: https://www.nature.com/articles/s41598-018-28421-x. [Accessed: 13-Apr-2019]..

[10]. “CRISPR/Cascade 9-Mediated Genome Editing-Challenges and Opportunities.” [Online].. Available: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6042012/. [Accessed: 14-Apr-2019]..

[11]. M. Cassandri et al., “Zinc-finger proteins in health and disease,” Cell Death Discov., vol. 3, p. 17071, Nov. 2017.

[12]. F. D. Urnov et al., “Highly efficient endogenous human gene correction using designed zinc-finger nucleases,” Nature, vol. 435, no. 7042, pp. 646-651, Jun. 2005.

[13]. C. R. L. Huang, K. H. Burns, and J. D. Boeke, “Active transposition in genomes,” Annu. Rev. Genet., vol. 46, pp. 651-675, 2012.

[14]. C. Feschotte and E. J. Pritham, “DNA Transposons and the Evolution of Eukaryotic Genomes,” Annu. Rev. Genet., vol. 41, pp. 331-368, 2007.

[15]. B. Chenais, “Transposable elements in cancer and other human diseases,” Curr. Cancer Drug Targets, vol. 15, no. 3, pp. 227-242, 2015.

[16]. R. Cordaux and M. A. Batzer, “The impact of retrotransposons on human genome evolution,” Nat. Rev. Genet., vol. 10, no. 10, pp. 691-703, Oct. 2009.

[17]. D. C. Hancks and H. Kazazian, “SVA retrotransposons: Evolution and genetic instability,” Semin. Cancer Biol., vol. 20, no. 4, pp. 234-245, Aug. 2010.

[18]. Y. Hayashi, M. Kajikawa, T. Matsumoto, and N. Okada, “Mechanism by which a LINE protein recognizes its 3? tail RNA,” Nucleic Acids Res., vol. 42, no. 16, pp. 10605-10617, Sep. 2014.

[19]. D. Grechishnikova and M. Poptsova, “Conserved 3? UTR stem-loop structure in L1 and Alu transposons in human genome: possible role in retrotransposition,” BMC Genomics, vol. 17, no. 1, p. 992, Dec. 2016.

[20]. M. Petrillo, G. Silvestro, P. P. Di Nocera, A. Boccia, and G. Paolella, “Stem-loop structures in prokaryotic genomes,” BMC Genomics, vol. 7, p. 170, Jul. 2006.

[21]. G. G. Schumann, N. V. Fuchs, P. Tristбn-Ramos, A. Sebe, Z. Ivics, and S. R. Heras, “The impact of transposable element activity on therapeutically relevant human stem cells,” Mob. DNA, vol. 10, no. 1, p. 9, Mar. 2019.

[22]. “The Role of Retrotransposons in Gene Family Expansions in the Human and Mouse Genomes.” [Online].. Available: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5631067/. [Accessed: 15-Apr-2019]..

[23]. P.-Й. Jacques, J. Jeyakani, and G. Bourque, “The majority of primate-specific regulatory sequences are derived from transposable elements,” PLoS Genet., vol. 9, no. 5, p. e1003504, May 2013.

[24]. P. Larranaga, Machine learning in bioinformatics, vol. 7. 2006.

[25]. G. Louppe, L. Wehenkel, A. Sutera, and P. Geurts, “Understanding variable importances in forests of randomized trees,” in Advances in Neural Information Processing Systems 26, C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2013, pp. 431-439.

[26]. T. K. Ho, “Random Decision Forests,” in Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1, Washington, DC, USA, 1995, pp. 278-.

[27]. P. H. Russell, R. L. Johnson, S. Ananthan, B. Harnke, and N. E. Carlson, “A large-scale analysis of bioinformatics code on GitHub,” PLoS ONE, vol. 13, no. 10, Oct. 2018.

[28]. B. Ekmekci, C. E. McAnany, and C. Mura, “An Introduction to Programming for Bioscientists: A Python-Based Primer,” PLoS Comput. Biol., vol. 12, no. 6, p. e1004867, 2016.

[29]. D. Chicco, “Ten quick tips for machine learning in computational biology,” BioData Min., vol. 10, Dec. 2017.

[30]. H. Fang, Y.-F. Huang, A. Radhakrishnan, A. Siepel, G. J. Lyon, and M. C. Schatz, “Scikit-ribo enables accurate estimation and robust modeling of translation dynamics at codon resolution,” Cell Syst., vol. 6, no. 2, pp. 180-191.e4, Feb. 2018.

[31]. A. Abraham et al., “Machine learning for neuroimaging with scikit-learn,” Front. Neuroinformatics, vol. 8, Feb. 2014.

[32]. S. van der Walt et al., “scikit-image: image processing in Python,” PeerJ, vol. 2, Jun. 2014.

[33]. “Python Data Analysis Library -- pandas: Python Data Analysis Library.” [Online].. Available: https://pandas.pydata.org/. [Accessed: 28-Apr-2019]..

[34]. “Matplotlib: Python plotting -- Matplotlib 3.0.3 documentation.” [Online].. Available: https://matplotlib.org/. [Accessed: 02-May-2019]..

[35]. “seaborn: statistical data visualization -- seaborn 0.9.0 documentation.” [Online].. Available: https://seaborn.pydata.org/. [Accessed: 02-May-2019]..

[36]. H. Khan, A. Smit, and S. Boissinot, “Molecular evolution and tempo of amplification of human LINE-1 retrotransposons since the origin of primates,” Genome Res., vol. 16, no. 1, pp. 78-87, Jan. 2006.

[37]. A. L. Price, E. Eskin, and P. A. Pevzner, “Whole-genome analysis of Alu repeat elements reveals complex evolutionary history,” Genome Res., vol. 14, no. 11, pp. 2245-2252, Nov. 2004.

[38]. B. Alipanahi, A. Delong, M. T. Weirauch, and B. J. Frey, “Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning,” Nat. Biotechnol., vol. 33, no. 8, pp. 831-838, Aug. 2015.

[39]. “DNA punctuation.” [Online].. Available: http://www.dnapunctuation.org/. [Accessed: 27-Apr-2019]..

[40]. “Dinucleotide property database.” [Online].. Available: http://diprodb.fli-leibniz.de/. [Accessed: 27-Apr-2019]..

[41]. “Project Jupyter.” [Online].. Available: https://www.jupyter.org. [Accessed: 28-Apr-2019]..

[42]. A. Pйrez, A. Noy, F. Lankas, F. J. Luque, and M. Orozco, “The relative flexibility of B-DNA and A-RNA duplexes: database analysis,” Nucleic Acids Res., vol. 32, no. 20, pp. 6144-6151, 2004.

[43]. T. Xia et al., “Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson-Crick base pairs,” Biochemistry, vol. 37, no. 42, pp. 14719-14735, Oct. 1998.

[44]. I. Barzilay, J. L. Sussman, and Y. Lapidot, “Further studies on the chromatographic behaviour of dinucleoside monophosphates,” J. Chromatogr., vol. 79, pp. 139-146, May 1973.

[45]. D. Tan, W. F. Marzluff, Z. Dominski, and L. Tong, “Structure of histone mRNA stem-loop, human stem-loop binding protein and 3?hExo ternary complex,” Science, vol. 339, no. 6117, pp. 318-321, Jan. 2013.

Appendix.

Scripts for merging tables

# Developed by AlexShein 04.2018

import argparse

import logging

import pandas as pd

log = logging.getLogger('parallel_processing_v2.py')

logging.basicConfig(level=logging.DEBUG, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')

def merge_dataframes(files):

dataframes = [].

for filename in files.split(','):

dataframes.append(pd.read_csv(filename, sep=';'))

return pd.concat(dataframes)

if __name__ == '__main__':

parser = argparse.ArgumentParser(

description='Merge processed *.pal files.',

usage='python3 ./merge_dataframes.py -output_file merged.csv -files target.csv,non_target.csv',

)

parser.add_argument(

'-output_file',

dest='output_file',

help='Name of file to store results',

required=True,

)

parser.add_argument(

'-files',

dest='files',

help='Name of files to merge, comma separeted',

required=True,

)

args = parser.parse_args()

log.info("Starting merge")

result_df = merge_dataframes(

args.files,

)

if result_df is not None:

result_df = result_df.drop(['Unnamed: 0']., axis=1)

log.info("Writing to file")

result_df.to_csv(args.output_file, sep=';')

log.info("Done")

Размещено на Allbest.ru
...

Страница:

1
2

дипломная работа "Transposon recognition by machine learning methods" скачать

Подобные документы

Algorithmic recognition of the Verb
Basic assumptions and some facts. Algorithm for automatic recognition of verbal and nominal word groups. Lists of markers used by Algorithm No 1. Text sample processed by the algorithm. Examples of hand checking of the performance of the algorithm.

курсовая работа [22,8 K], добавлен 13.01.2010

Machine Translation
Machine Translation: The First 40 Years, 1949-1989, in 1990s. Machine Translation Quality. Machine Translation and Internet. Machine and Human Translation. Now it is time to analyze what has happened in the 50 years since machine translation began.

курсовая работа [66,9 K], добавлен 26.05.2005

Перспектива использования технологии машинного обучения в медицине
Machine Learning как процесс обучения машины без участия человека, основные требования, предъявляемые к нему в сфере медицины. Экономическое обоснование эффективности данной технологии. Используемое программное обеспечение, его функции и возможности.

статья [16,1 K], добавлен 16.05.2016

Organizing information
A database is a store where information is kept in an organized way. Data structures consist of pointers, strings, arrays, stacks, static and dynamic data structures. A list is a set of data items stored in some order. Methods of construction of a trees.

топик [19,0 K], добавлен 29.06.2009

Data mining
Data mining, developmental history of data mining and knowledge discovery. Technological elements and methods of data mining. Steps in knowledge discovery. Change and deviation detection. Related disciplines, information retrieval and text extraction.

доклад [25,3 K], добавлен 16.06.2012

Системы управления обучения (LMS)
Управление электронным обучением. Технологии электронного обучения e-Learning. Программное обеспечение для создания e-Learning решений. Компоненты LMS на примере IBM Lotus Learning Management System и Moodle. Разработка учебных курсов в системе Moodle.

курсовая работа [146,6 K], добавлен 11.06.2009

Создание теста на Visual Basic
Написание тестирующей программы для проверки знаний учащихся с помощью языка программирования Visual Basic for Applications (VBA), встроенного в пакет Microsoft Office. Общие сведения о программе, условия ее выполнения, настройка, проверка, выполнение.

контрольная работа [25,2 K], добавлен 07.06.2010

Использование Visual Basic for Applications в автоматизации банковских операций
Функции и основная роль коммерческого банка. Особенности кредитных и депозитных операций. Описание среды программирования и сущность Visual Basic for Applications (VBA). Схема алгоритма программы, процедура сохранения файла и выхода из программы.

курсовая работа [1,9 M], добавлен 04.04.2012

Создание программного продукта на языке программирования Visual Basic for Applications
Сумма двух разреженных полиномов, заданных ненулевыми коэффициентами и их номерами. Разработка программ на языке программирования Visual Basic for Applications. Вывод справочной информации. Операционная система Windows. Хранение двоичных данных.

научная работа [390,2 K], добавлен 09.03.2009

Язык программирования Visual Basic for Applications
Рождение и развитие Basic. Краткое описание Visual Basic for Applications. Новые возможности Visual Basic 5.0. Пример взаимодействия Excel и Visual Basic. Программирование табличных функций. Встраивание, применение функций. Формы, средства управления OLE.

реферат [20,7 K], добавлен 11.03.2010

Использование языка программирования Visual Basic for Applications (VBA) для обработки результатов АСТ - тестирование
Теория тестирования. Тест как система заданий и его эффективности. Качество тестовых заданий. Проверка качества тестовых заданий. Матрица результатов. Современный подход к понятию "трудность". Visual Basic for Applications (VBA). Объектные модели.

дипломная работа [198,9 K], добавлен 10.11.2008

Оценка эффективности внедрения электронного обучения
Общие понятия об e-learning. Области применения продукта. Модели и технологии. Исследование и анализ программных инструментов. Создание учебного курса для преподавателей инженерно-экономического факультета. Оценка эффективности внедрения такого обучения.

дипломная работа [4,7 M], добавлен 03.05.2018

Creating a Data Mart for an Online E-Book Store
Web Forum - class of applications for communication site visitors. Planning of such database that to contain all information about an user is the name, last name, address, number of reports and their content, information about an user and his friends.

отчет по практике [1,4 M], добавлен 19.03.2014

Табличный процессор MS Excel. Язык программирования Visual Basic for Applications
Назначение и основные функции Ехсе1. Причины возникновения ошибок и способы их решения в Ехсе1. Язык программирования Visual Basic for Applications (VBA): общая характеристика языка. Основные понятия информационной безопасности, способы ее нарушения.

шпаргалка [201,2 K], добавлен 26.02.2010

Division of the sentence into phrases
Lists used by Algorithm No 2. Some examples of the performance of Algorithm No 2. Invention of the program of reading, development of efficient algorithm of the program. Application of the programs to any English texts. The actual users of the algorithm.

курсовая работа [19,3 K], добавлен 13.01.2010

Блоги
Блог: понятие, функции, классификация. Политика, быт, путешествие, образование, мода, музыка. Drupal, wordpress, textpattern, nucleus CMS, inTerra blog machine как популярные движки для блогов. Особенности выбора темы оформления странички пользователя.

контрольная работа [23,3 K], добавлен 18.09.2014

International Business Machines
International Business Machines (IBM) — транснациональная корпорация, один из крупнейших в мире производителей и поставщиков аппаратного и программного обеспечения. Прозвище компании — Big Blue. Основание IBM в период 1888—1924. Начало эры компьютеров.

презентация [1023,3 K], добавлен 14.02.2012

Big Data
Проблемы оценки клиентской базы. Big Data, направления использования. Организация корпоративного хранилища данных. ER-модель для сайта оценки книг на РСУБД DB2. Облачные технологии, поддерживающие рост рынка Big Data в информационных технологиях.

презентация [3,9 M], добавлен 17.02.2016

Анализ данных в MSExcel
Макрос как запрограммированная последовательность действий, записанная на языке программирования Visual Basic for Applications. Рассмотрение особенностей решения данных задач в Excel. Характеристика проблем создания пользовательских функций на VBA.

курсовая работа [1,8 M], добавлен 15.01.2015

Разработка приложения на VBA "Тест"
Создание программного обеспечения в среде Visual Basic for Applications для проведения теста по работе полушарий мозга человека. Описание команд. Разработка интерфейса и тестирование программы. Листинг приветствия и задаваемых пользователю вопросов.

курсовая работа [387,1 K], добавлен 09.03.2014

Другие документы, подобные "Transposon recognition by machine learning methods"

главная

рубрики

по алфавиту

вернуться в начало страницы

вернуться к началу текста

вернуться к подобным работам

Рубрики

По алфавиту

Закачать файл

весь список подобных работ

скачать работу можно здесь

Работы в архивах красиво оформлены согласно требованиям ВУЗов и содержат рисунки, диаграммы, формулы и т.д.
PPT, PPTX и PDF-файлы представлены только в архивах.
Рекомендуем скачать работу.