Database on biodiversity: A temporal analysis of scientific production

TEDESCHI, Victor Hugo Pancera

Biology

Database on biodiversity: A temporal analysis of scientific production

RC: 127632

103 Readings

Rate this post

DOI: ESTE ARTIGO AINDA NÃO POSSUI DOI

SOLICITAR AGORA!

BIBLIOMETRIC REVIEW

TEDESCHI, Victor Hugo Pancera ^[1], TSUNODA, Denise Fukumi ^[2]

TEDESCHI, Victor Hugo Pancera. TSUNODA, Denise Fukumi. Database on biodiversity: A temporal analysis of scientific production. Revista Científica Multidisciplinar Núcleo do Conhecimento. Year 05, Ed. 09, Vol. 06, p. 68-81. September 2020. ISSN: 2448-0959, Access link: https://www.nucleodoconhecimento.com.br/biology/scientific-production ‎

ABSTRACT

The present research deals with biodiversity databases through a bibliographic survey on the Scopus journals portal, using terms in English and limiters in the keywords, seeking to restrict the search to the database. The documents were analyzed using bibliometrics, in Biblioshiny (an R package). A total of 352 documents published in the period from 1984 to 2020 were retrieved. Through the analysis, an increase in publications was observed from the year 2006. The researchers from the United States (54) presented the largest number of publications, while Brazil ( 11) is in sixth position. The results found in this research point to a trend in works on the subject, thus providing a direction for future research.

Keywords: Taxonomic database, Biodiversity Informatics, Dataset, Bibliometrics.

1. INTRODUCTION

Biological diversity on the planet is very high and some estimates put values in the millions. Brazil, according to conservative estimates, is home to 13% of the world’s biota. This estimate is due to Brazil having the largest river system in the world as well as five biomes, earning the country the title of “mega diverse country” and contemplating approximately 165,000 known species, and more species are being discovered every day.

Each discovered part of a species generates data in a continuous and aggregating way on its characteristics (morphology, nomenclature, phylogeny, etc.), habits (feeding, behavior, etc.), geographic distribution (sightings, collection records, etc.), genetics (phylogenetics, DNA sequencing, etc.) among others.

All this information needs to be stored in order to facilitate its access and sharing worldwide, justifying the need for the development and maintenance of databases that perform the correct storage of this information as well as specific recovery systems for the area. Based on this thought, the objective of the present work is to explore, through a bibliometric analysis, the production on the subject deposited in the Scopus reference base that indexes peer-reviewed academic titles, open access titles, conference proceedings and commercial publications, among others.

2. THEORETICAL FOUNDATION

Database can be defined as a collection of logically coherent data that has a meaning, which its interpretation is given according to the application, this collection of data abstractly represents a part of the real world.

Biological information can be divided into three dimensions, molecular, organism and ecosystem (WILSON, 2005), according to its application this information is arranged in three structural types of bases, which are taxonomic bases (organisms), which focus on the presentation of the morphological information of the species, bases of Informatics for Biodiversity (ecosystem), which has a transdisciplinary character of information, in order to provide ecological information and geographic distribution, and of Bioinformatics (molecular), which in turn is aimed at the storage and distribution of molecular data, genes and proteins.

The information that feeds these bases comes from the digitization processes of museum collections, collection reports, enlistment lists, extracted from published documents, sequencing of materials, etc. The results of this scan can be stored or shared in a variety of formats such as text documents, spreadsheets, web pages, related databases, maps or GIS (Geographic Information System), images, etc. (FRAZIER; WALL; GRANT, 2008)

The digitization of these materials is of vital importance for the preservation and sharing of species information, but this approach brings new challenges, such as the reliability of digitized data, starting from this point, Ruas (2017) highlights the importance of metadata, which are information that These describe the information contained in the database, in the curation of the content generated by the digitization processes, the meta-information guarantees the fidelity and authenticity of the information presented, as well as the contextualization and that the information was captured.

3. METHODOLOGICAL COURSE

The choice of terms to be used in this research was carried out by experimenting with different sets of Boolean words and operators, based on Scopus. The research flow (from general to specific levels) and the number of documents retrieved in each experiment is shown in table 1.

Table 1 – Sets of words and Boolean operators used in the Scopus database and the number of documents retrieved

Words and operators	Recovered documents
ALL ( biodiversity AND database )	91.714
ALL ( taxonomy AND database )	113.470
ALL ( biodiversity OR taxonomy AND database OR dataset )	226.573
ALL ( “biodiversity database” )	1.487
ALL ( “biodiversity database” OR “taxonomy database” )	2.158
ALL ( “biodiversity database” OR “taxonomy database” ) AND ( LIMIT-TO ( EXACTKEYWORD , “Biodiversity” ) OR LIMIT-TO ( EXACTKEYWORD , “Taxonomy” ) OR LIMIT-TO ( EXACTKEYWORD , “Databases, Genetic” ) OR LIMIT-TO ( EXACTKEYWORD , “Factual Database” ) OR LIMIT-TO ( EXACTKEYWORD , “Data Set” ) OR LIMIT-TO ( EXACTKEYWORD , “Protein Database” ) OR LIMIT-TO ( EXACTKEYWORD , “Databases, Protein” ) OR LIMIT-TO ( EXACTKEYWORD , “Biodiversity Informatics” ) OR LIMIT-TO ( EXACTKEYWORD , “Data Quality” ) ) AND ( EXCLUDE ( SUBJAREA , “IMMU” ) OR EXCLUDE ( SUBJAREA , “MEDI” ) OR EXCLUDE ( SUBJAREA , “NEUR” ) OR EXCLUDE ( SUBJAREA , “PHYS” ) OR EXCLUDE ( SUBJAREA , “PHAR” ) OR EXCLUDE ( SUBJAREA , “ARTS” ) OR EXCLUDE ( SUBJAREA , “VETE” ) OR EXCLUDE ( SUBJAREA , “HEAL” ) OR EXCLUDE ( SUBJAREA , “NURS” ) )	723

Source: Prepared by the authors (2020).

From the results found, the first 100 documents were read, in order to observe whether the results were aligned with the theme addressed in the present research, thus making adjustments to the terms in order to refine the results. As can be seen in Table 1, in some combinations of terms the number of documents retrieved ranged from 700 to over 200,000 documents. This step was necessary because terms such as biodiversity and taxonomic are used in different contexts by researchers in the biological sciences.

The search strategy in the Scopus database (Elsevier) was the search for the English terms “biodiversity database” and “taxonomic database”, in all fields, a limiter was used in the keywords, selecting those that are related to the data, the words selected are “Database”, “Data set”, “Data Base”, “database, factual”, “factual database”, “Database Systems”, “data quality” and “data management”, using the results obtained up to the date from the search on June 28, 2020. The query parameters with field codes and operators, resulted in:

Metric analysis of word frequency (Title, abstract, authors’ keywords, extended keywords), document production frequency (countries, sources, authors) and evolution over time (Production by year, use of keywords) was performed using Biblioshiny (graphical interface of the Bibliometrix package, produced in R language) and the Microsoft Excel spreadsheet editor. Data was exported directly from the Scopus database in CSV format, compatible with the Biblioshiny library for software R v.3.6.3 and Microsoft Excel 2016.

4. PRESENTATION AND DISCUSSION OF RESULTS

The analysis comprises the 352 documents retrieved in the Scopus database search on June 28, 2020, of which 272 are published articles, 30 conference proceedings and a book chapter. The first publication found refers to the year 1984. The 352 documents recovered were written by 1,978 authors. 2,068 extended keywords (words generated automatically by the database) and 1,033 authors’ keywords were identified.

As shown in Graph 1, it is observed that in the time interval between 1984 and 1996, there was no significant production on the subject, after this period an increase in the number of productions on the subject is observed. It is observed that in 2017 there was a significant oscillation, where there was a drop in production, and a resumption in the next year. Until the moment of this research 2020 already presents 13 published articles.

Graph 1 – Evolution of scientific production in the Scopus database from 1984 to 2020

*Source: Prepared by the authors (2020).*

The distribution of documents by countries shows the United States of America (54) with the highest number of publications in this area, followed by the United Kingdom (32), which shows a predominance of documents in the English language. This result may have been influenced by the use of search terms in the English language. On the other hand, several journals in different languages use the abstract as one of the mandatory elements. Brazil (11) appears in sixth place in number of publications, as shown in Graph 2.

Graph 2 – Contribution of scientific production in the world

Among the 352 documents retrieved in the query, PLOS ONE presented the highest number of published articles, in a total of 20, followed by Ecological Informatics with 14. Graph 3 presents the 10 sources with the highest number of publications.

Graph 3 – Sources with the highest number of productions

The 10 most productive researchers are shown in Table 2 and the 10 most cited in Table 3. The most productive author (10) and the most cited (241) is Dr. Jorge M. Lobo, Research Professor at the Department of Biogeography and Global Change of the Museo Nacional de Ciencias Naturales, in Madrid, Spain. The author’s most referenced article is “Use of niche models in invasive species risk assessments” co-authored with researchers A. Jiménez-Valverde, A. T. Peterson, J. Soberón, J. M. Overton and P. Argon, which is also the third most cited article (384), presented in table 4. This work analyzes the use of species location data for the elaboration of predictive models for the risk of invasive species settlement common in large biodiversity data sets, such as georeferencing data and misidentifications of species.

Table 2 – Most productive authors

Authors	Articles
LOBO JM	10
HORTAL J	9
PAGE RDM	7
SOBERÃ³N J	6
BOOTH TH	5
COSTELLO MJ	5
KREFT H	5
ARIÃ±O AH	4
GURALNICK R	4

Source: Prepared by the authors (2020).

Table 3 – Most cited authors

Authors	Quotes
LOBO J M	241
PETERSON A T	223
HORTAL J	217
GUISAN A	132
COSTELLO M J	117
GRAHAM C H	114
SOBERÃ N J	109
FERRIER S	108
JIMÃ NEZ VALVERDE A	108

Source: Prepared by the authors (2020).

The five most referenced articles are listed in Table 4. The most cited is the article titled “SequenceMatrix: concatenation software for the fast assembly of multi-gene datasets with character set and codon information”, produced by Vaidya, G., Lohman, D. J. and Meier, R. published in 2011. In this work, the authors present the SequenceMatrix software used in the analysis and association of multiple genes from different datasets, pointing out the ease of use as a strong point and presenting its main functionalities. The software enables features of detection and correction of errors contained in the datasets.

Andelman and Fagan (2000) assess whether the use of species called “flag” or “umbrella” are efficient in the use as substitutes in conservation, because instead of focusing on the conservation of several areas, the focus is on the conservation of these areas few species, which consequently helps in the conversation of entire areas. To test their hypothesis, the authors used three databases with different dimensions of coverage.

Jayasiri et al. (2015) deals in their article with the creation of a database via the web, focused on the diversity of fungi, in order to improve the accuracy of scientific names, focusing on taxonomy. The base has 76 curators specialized in the different groups, thus ensuring data reliability.

Lobo, Jiménez-Valverde and Hortal (2010) deal with data on the absence of species in certain regions contained in the databases, in the generation of distribution models. They carried out a case study of a beetle species that has a known distribution, in order to demonstrate the possible errors in the use of these absence data and its importance in the modeling of distribution maps.

Table 4 – the five most cited articles

Articles	Total Citations
VAIDYA, G.; LOHMAN, D. J.; MEIER, R. SequenceMatrix: Concatenation software for the fast assembly of multi-gene datasets with character set and codon information. Cladistics, v. 27, n. 2, p. 171–180, 2011.	847
ANDELMAN, S. J.; FAGAN, W. F. Umbrellas and flagships: Efficient conservation surrogates or expensive mistakes? Proceedings of the National Academy of Sciences of the United States of America, v. 97, n. 11, p. 5954–5959, 2000.	447
JIMÉNEZ-VALVERDE, A. et al. Use of niche models in invasive species risk assessments. Biological Invasions, v. 13, n. 12, p. 2785–2797, 2011.	384
JAYASIRI, S. C. et al. The Faces of Fungi database: fungal names linked with morphology, phylogeny and human impacts. Fungal Diversity, v. 74, n. 1, p. 3–18, 2015.	355
LOBO, J. M.; JIMÉNEZ-VALVERDE, A.; HORTAL, J. The uncertain nature of absences and their importance in species distribution modelling. Ecography, v. 33, n. 1, p. 103–114, 2010.	325

Source: Prepared by the authors (2020).

The five most cited articles in the documents are shown in table 5. The most cited article in the documents is “Interoperability of biodiversity databases: biodiversity information on every desktop”, produced by the authors Edwards, Lane, and Nielsen and published in the year 2000, in the article and treaty on the GBIF, which was created to facilitate the digitization of data on biodiversity, and make it freely accessible. In the article is presented about GBIF and the future perspectives of data on biodiversity.

Hortal, Lobo and Jiménez‐Valverde (2007) present a case study on the limitations found in biodiversity databases, focusing on a basis on seed and plant diversity. In their work Soberón and Peterson (2004), they discuss the potential of Biodiversity Informatics, in their discussion they comment on the application of methods in biodiversity management, and not just for fundamental studies and information sharing. Soberón et al. (2006), in their article, demonstrate the use of data on biodiversity present in the bases to estimate richness and in different resolutions of geographic distribution. Bisby (2000) deals with the emergence of large global biological information systems in his research.

Table 5 – the 5 most cited documents in the references in the period

Documents cited in the references	Number of Citations
EDWARDS, James L.; LANE, Meredith A.; NIELSEN, Ebbe S. Interoperability of biodiversity databases: biodiversity information on every desktop. Science, v. 289, n. 5488, p. 2312-2314, 2000.	49
HORTAL, Joaquín; LOBO, Jorge M.; JIMÉNEZ‐VALVERDE, ALBERTO. Limitations of biodiversity databases: case study on seed‐plant diversity in Tenerife, Canary Islands. Conservation Biology, v. 21, n. 3, p. 853-863, 2007.	37
SOBERÓN, Jorge; PETERSON, Townsend. Biodiversity informatics: managing and applying primary biodiversity data. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, v. 359, n. 1444, p. 689-698, 2004.	34
SOBERÓN, Jorge et al. Assessing completeness of biodiversity databases at different spatial scales. Ecography, v. 30, n. 1, p. 152-160, 2007.	25
BISBY, Frank A. The quiet revolution: biodiversity informatics and the internet. Science, v. 289, n. 5488, p. 2309-2312, 2000.	23

Source: Prepared by the authors (2020).

The keywords most used by the authors Database (49), followed by biodiversity (32), taxonomy (26), biodiversity informatics (19) and data quality (19), these word frequencies were probably affected by the document selection methodology carried out in this research. Figure 1 shows the 50 keywords with the highest occurrence of the 1033.

Figure 1 – The author’s 50 most frequent keywords

Regarding the extended keywords, the most used word is Biodiversity (236), followed by database (125), Taxonomy (88), dataset (66) and classification (62). And in the abstracts the most frequent words data (1,221), species (856), biodiversity (488), database (315), databases (268). The most frequent words in the titles of the documents are data (113), biodiversity (100), species (68), database (57), databases (44).

The 10 keywords of the authors with the highest frequencies are represented according to the number of occurrences accumulated over time in graph 4. In the graph it is also possible to observe the exponential growth of the word data quality, which may indicate an increase in interest by studies aimed at evaluating the information contained in existing databases, together with this word and observing the growth of the word GBIF, which corresponds to the Global Biodiversity Information Facility, which is an international network for sharing data on all types of life in the land. The words database, biodiversity database, databases and biodiversity informatics have shown a decline in use in recent years.

Graph 4 – Dynamics of the author’s keywords over time

In the co-occurrence analysis of the keywords, some relationships can be seen: a) database – biodiversity – metadata, internet, information system, distribution, conservation and checklist; b) taxonomy – biodiversity informatics and phylogeny – databases- species, classification, nomenclature and biogeography; c) data quality – species richness – biodiversity databases, biodiversity database, GBIF and biological databases; d) data management, datashring and data integration. It is worth highlighting the relationships presented in set “c”, in which it is possible to observe the focus on the quality of the data presented in the bases, with a focus on GBIF. The separation between the set “a” and “b”, presents itself in an interesting way, making visible the difference in the focus of the research, where the set “a” focused on the distribution, conservation and diversity of species, while the set “ b” deals more with the nomenclature and classification of species. Figure 2 presents the aforementioned relationships.

Figure 2 – The co-occurrence of the keywords used by the authors

5. PARTIAL CONSIDERATIONS

The present study presented the evolution of scientific productions related to the biodiversity and taxonomy databases found in the Scopus database. The query returned 352 documents, distributed in the period from 1984 to 2020. There was an increase in publications from 2006 onwards.

According to the information obtained through the analyses, it was possible to visualize the growth of productions and the countries that produced the most on the subject, and allowed to point out the publications that were most cited referenced.

The identification of the most used keywords was possibly affected by the methodology applied in this research, since they were used in the search terms applied in the Scopus database. However, it is possible to observe a growth in the interest in the recovered documents related to the quality of the data found in these bases.

This article presents the initial exploratory approach to databases related to biodiversity. Based on the results, issues related to the quality of the data present in the most used databases, such as the GBIF database, will be better explored in the future. Another point to be addressed in future research is the form of representation of Brazilian species in these databases, and the possibility of retrieving this information by interested parties.

The continuity of the research is in line with the interests proposed in the author’s dissertation, in which he proposes the development of an open access database/portal, which facilitates the dissemination of data on the diversity of Brazilian fauna (animals), valuing quality data in constant updating and trust, to meet the needs of professionals and researchers in the biological areas.

REFERENCES

ANDELMAN, S. J.; FAGAN, W. F. Umbrellas and flagships: Efficient conservation surrogates or expensive mistakes? Proceedings of the National Academy of Sciences of the United States of America, v. 97, n. 11, p. 5954–5959, 2000.

BISBY, Frank A. The quiet revolution: biodiversity informatics and the internet. Science, v. 289, n. 5488, p. 2309-2312, 2000.

EDWARDS, James L.; LANE, Meredith A.; NIELSEN, Ebbe S. Interoperability of biodiversity databases: biodiversity information on every desktop. Science, v. 289, n. 5488, p. 2312-2314, 2000.

FRAZIER, C.K., WALL, J.; GRANT, S.. Initiating a Natural History CollectionDigitisation Project, version 1.0. Copenhagen: Global Biodiversity Information Facility.75 pp. 2008.

HORTAL, Joaquín; LOBO, Jorge M.; JIMÉNEZ‐VALVERDE, ALBERTO. Limitations of biodiversity databases: case study on seed‐plant diversity in Tenerife, Canary Islands. Conservation Biology, v. 21, n. 3, p. 853-863, 2007.

JAYASIRI, S. C. et al. The Faces of Fungi database: fungal names linked with morphology, phylogeny and human impacts. Fungal Diversity, v. 74, n. 1, p. 3–18, 2015.

JIMÉNEZ-VALVERDE, Alberto et al. Use of niche models in invasive species risk assessments. Biological invasions, v. 13, n. 12, p. 2785-2797, 2011.

LOBO, J. M.; JIMÉNEZ-VALVERDE, A.; HORTAL, J. The uncertain nature of absences and their importance in species distribution modelling. Ecography, v. 33, n. 1, p. 103–114, 2010.

RUA, J. DIGITALIZAÇÃO, PRESERVAÇÃO E ACESSO: contributos para o projeto Museu Digital da U.PORTO. Páginas a&b. S.3, nº especial (2017) 199-229 | DOI 10.21747/21836671/pag2017a13

SOBERÓN, Jorge et al. Assessing completeness of biodiversity databases at different spatial scales. Ecography, v. 30, n. 1, p. 152-160, 2007.

SOBERÓN, Jorge; PETERSON, Townsend. Biodiversity informatics: managing and applying primary biodiversity data. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, v. 359, n. 1444, p. 689-698, 2004.

VAIDYA, G.; LOHMAN, D. J.; MEIER, R. SequenceMatrix: Concatenation software for the fast assembly of multi-gene datasets with character set and codon information. Cladistics, v. 27, n. 2, p. 171–180, 2011.

WILSON, Edward O. Systematics and the future of biology. Proceedings of the National Academy of Sciences, v. 102, n. suppl 1, p. 6520-6521, 2005.

^[1] Graduated in Biological Sciences, Pontifícia Universidade Católica do Paraná (PUC-PR), Paraná, Brazil. Graduated in Industrial Electronics, Faculdade de Tecnologia de Curitiba (FATEC) Curitiba, Paraná, Brazil. Master’s student in Information Management, Universidade Federal do Paraná (UFPR), Curitiba, Paraná, Brazil.

^[2] PhD in Electrical and Computer Engineering, Universidade Tecnológica Federal do Paraná (UTFPR), Curitiba, Paraná, Brazil.

Sent: September, 2020.

Approved: September, 2020.

Rate this post

Victor Hugo Pancera Tedeschi

Graduated in Biological Sciences, Pontifical Catholic University of Paraná (PUC-PR), Paraná, Brazil. Graduated in Industrial Electronics, Faculty of Technology of Curitiba (FATEC) Curitiba, Paraná, Brazil. Master's student in Information Management, Federal University of Paraná (UFPR), Curitiba, Paraná, Brazil.

SEE ALL PUBLISHED SCIENTIFIC ARTICLES

Leave a Reply Cancel reply

POXA QUE TRISTE!😥

Este Artigo ainda não possui registro DOI, sem ele não podemos calcular as Citações!

SOLICITAR REGISTRO

Search by category…

This ad helps keep Education free

Biology

Comparative immunotoxicity of bees, apis mellifera (hymenoptera: apidae), exposed to natural and synthetic xenobiotics

The objective of this study was to compare the effects of natural and synthetic chemical pesticides, thereby examining the supposed

RECEBER ARTIGO EM PDF!

Por favor aguarde

Database on biodiversity: A temporal analysis of scientific production

Sections

ABSTRACT

1. INTRODUCTION

2. THEORETICAL FOUNDATION

3. METHODOLOGICAL COURSE

4. PRESENTATION AND DISCUSSION OF RESULTS

5. PARTIAL CONSIDERATIONS

REFERENCES

Victor Hugo Pancera Tedeschi

Leave a Reply Cancel reply

Search by category…