NCBI to assist in Virus Hunting Data Science Hackathon January 9-11, 2019

Featured


We are pleased to announce the second installment of the SoCal Bioinformatics Hackathon. From January 9-11, 2019, the NCBI will help run a bioinformatics hackathon in Southern California hosted by the Computational Sciences Research Center at San Diego State University!

We’re specifically looking for folks who have experience in computational virus hunting or adjacent fields to identify known, taxonomically-definable and novel viruses from a few hundred thousand metagenomic datasets that we’ll put on cloud infrastructure. This event is for researchers, including students and postdocs, who are already engaged in the use of bioinformatics data or in the development of pipelines for virological analyses from high-throughput experiments. If this describes you, please apply! The event is open to anyone selected for the hackathon and willing to travel to SDSU (see below).

Continue reading

500 organisms annotated with the Eukaryotic Genome Annotation Pipeline


This month, the NCBI Eukaryotic Genome Annotation Pipeline annotated its 500th organism! The lucky winner is Pocillopora damicornis, a stony reef-building coral frequently used as an experimental model, whose larval dispersal and development are affected by environmental changes in the oceans.

Stony coral (Pocillopora damicornis)

Continue reading

Visit NCBI at the ASCB | EMBO meeting in San Diego, December 9-11, 2018


Going to the ASCB | EMBO meeting? Stop by the NCBI booth (#327) to learn about all that NCBI has to offer, ask questions, and provide feedback on how we can better meet your needs for research and teaching.

Booth #327, Exhibit Hall:

  • Sunday, December 9, 9:30 AM – 4:00 PM
  • Monday, December 10, 9:30 AM – 4:00 PM
  • Tuesday, December 11, 9:30 AM – 4:00 PM

Visit the booth anytime during exhibit hours to discuss any topic or just to say hello. We’re also offering specific times at the booth for focused conversations about using specific sets of NCBI resources in your research and teaching.

Discussion Sessions:

Sunday

  • 12:30 PM  NCBI BLAST in research and teaching 

Monday

  • 12:30 PM   Jupyter notebooks to teach scripting and NCBI resources

Tuesday

  • 12:30 PM    EDirect  for command-line access  to NCBI databases
  • 2:00 PM    Jupyter notebooks to teach scripting and NCBI resources

To stay up-to-date about NCBI at ASCB or in general, follow us on Twitter at @NCBI ‏.

 

Adapting flatfile parsers for GenBank’s new accession formats


As previously announced, GenBank and other INSDC members will expand the accession formats used for sequencing projects by the end of this year. We’re introducing these new formats to accommodate the growth of Whole Genome Shotgun (WGS), Transcriptome Shotgun Assembly (TSA), and Targeted Locus Study (TLS) sequencing sequences. More details about those changes are available on NCBI Insights.

You may have to adjust your code and databases to accommodate the new formats’ longer length. In particular, the first line of the flatfile format, referred to as the LOCUS line, includes the “Locus Name” (usually identical to the accession number), which may now grow to as long as 20 characters. See section 3.4.4 of the GenBank release notes for examples of how the LOCUS line might change.

Since 2003, the GenBank release notes have recommended that flatfile parsers use a whitespace-separated tokens approach to accommodate changes like the one described in section 3.4.4. If your flatfile parsers rely solely on position, you may have to make modifications. From our internal testing, it appears BioPython and BioPerl properly handle most of the examples shown in section 3.4.4, and only have issues with the last theoretical examples where the sequence length no longer ends at position 40. We do recommend adjusting code to accommodate those theoretical examples for future-proofing.

Please write to the helpdesk with any questions about the new formats.

November 28 NCBI Minute: Getting the Most from Track Hubs in NCBI’s Genome Data Viewer (GDV)


This webinar is intended for both new and experienced Track Hubs users.

Join us November 28, 2018 at noon EST for an NCBI Minute explaining what GDV’s Track Hubs are and how they can help you in your research.

Register here: https://bit.ly/2PUHBqz

After this webinar, you’ll be able to:

Continue reading

RefSeq release 91 is public


RefSeq release 91 is accessible online, via FTP and through NCBI’s Entrez programming utilities, E-utilities.

This full release incorporates genomic, transcript, and protein data available as of November 5, 2018. It contains 179,672,083 records, including 125,530,811 proteins, 24,447,570 RNAs, and sequences from 85,308 organisms.

The release is provided in several directories as a complete dataset and as divided by logical groupings.

Continue reading

Discovering associated data in PMC


In the NLM Strategic Plan released earlier this year, we noted that “[c]reating efficient ways to link the literature with associated datasets enables knowledge generation and discovery.” To that end, PMC is now aggregating data citations, data availability statements and supplementary materials, as available, in an Associated Data box. This box will only display on articles that have one or more of these features in the article.

associated_data_box

Figure 1. The Associated Data box is outlined in red.

To limit your search to records with an Associated Data box, you can use the new “Associated Data” facet on the search results page.

associated_data_facet

Figure 2. You can click on “Associated Data” (outlined in red) under Article attributes to limit your search to records with an Associated Data box.

We hope that exposing this content in a consistent format and in an easy to find and easy to access manner, you will more readily find the datasets you need to further accelerate discovery and advance health. As part of our ongoing commitment to making data findable, accessible, interoperable, and re-usable (FAIR), we encourage you to contact us with your feedback on these updates and with any other suggestions you may have for improving discovery of related data in PMC.

Autosuggest comes to Gene, Nucleotide and other databases


If you’ve been searching in Gene, Nucleotide, Protein, Genome or Assembly databases, you’ve probably noticed the new search experience we introduced in September to interpret several common language searches and offer improved results. We’re excited to announce we’ve added as-you-type suggestions to the search bar in these databases.

Here’s a peek at the new menu in the NCBI Gene database.

"human" is typed in the search box and a drop-down menu shows the most popular results

Figure 1. Typing into the search box brings up automatic suggestions of the most popular queries.

Continue reading

MedGen: Your search engine for human medical genetics


MedGen is a free, comprehensive resource for one-stop access to essential information on phenotypic health topics related to medical genetics as collected from established high-quality sources. It integrates terminology from multiple primary ontologies (or nomenclatures) to facilitate standardization and more accurate results from search queries.

Some things you can do in MedGen:

Continue reading