1 Sample Information

The following Sample Information Table shows information for up to 10 samples. For projects with more than 10 samples, the Complete Sample Information Table can be accessed by clicking on the first link below the table.

More Information
Sample Information Table:
1. sample_id: unique identification code that was assigned to each sample.
2. customer_label: sample name provided for the project and used in analyses.
3. raw_seq_files: names of the associated raw sequencing files for each sample.
4. total_read: total number of reads generated within each raw sequencing files.
5. avg_read_length: average length of reads within each raw sequencing files.
6. download_raw_file: click the links in this column to download the raw fastq file for each sample.


sample_id customer_label raw_seq_file total_reads avg_read_length download_raw_file
in3684_8 D6323.210224 in3684_8.fastq.gz 1792146 10318 Click Here


Complete Sample Information Table

2 Contig Files

Click on the link below to view the config fasta files generated for each sample:

Results

3 Binning Results

Click on the link below to view the table containing the binning results generated for each sample:

Results

4 Taxanomy Profiles

Click on the link below to view the taxonomy analysis results:

Results

5 Materials and Methods

The samples were processed and analyzed with the PacBio Metagenomic Sequencing Service (Zymo Research, Irvine, CA). Specific details for the project can be found in the final PDF report.

DNA Extraction: One of three DNA extraction kits was used depending on the sample type and sample volume. In most cases, the ZymoBIOMICS®-96 MagBead DNA Kit (Zymo Research, Irvine, CA) was used to extract DNA using an automated platform. In some cases, ZymoBIOMICS® DNA Miniprep Kit (Zymo Research, Irvine, CA) was used. For some low biomass samples, such as skin swabs, the ZymoBIOMICS® DNA Microprep Kit (Zymo Research, Irvine, CA) was used as it permits for a lower elution volume, resulting in more concentrated DNA samples. During DNA extraction in most cases, microbial samples are lyzed using mechanical lysis, i.e. bead beating at maximum speed on Vortex Genie 2 for 40 minutes unless indicated.

PacBio Library Preparation: The resulting DNA from our DNA extraction methods has a fragment size of 7-15 kb, which is ideal for subsequent PacBio library preparation without additional size selection or fragmentation. Additional size selection or fragmentation might be needed when customers provide extracted DNA for service. The PacBio library was prepared using SMRTbell® Prep Kit 3.0 and SMRTbell® Barcoded Adapter Plate 3.0.

Sequencing: The final library was sequenced on 8M SMRT cells on the PacBio Sequel IIe system.

Bioinformatics Analysis: the bioinformatics analysis was performed by mimicking the workflows of HiFi-MAG-Pipeline and Taxonomic-Profiling-Sourmash (Portik et al., 2022) from the github of pb-metagenomics-tools (https://github.com/PacificBiosciences/pb-metagenomics-tools, as of April 1, 2023) with some modifications. In brief, the analysis can be divided into two parts: (1) metagenomic assembly and binning; (2) assembly-independent taxonomy classification. First, the PacBio HiFi reads were assembled into contigs using hifiasm-meta (Feng et al., 2022). The contigs were binned into bins/MAGs (Metagenome Assembled Genomes) with MetaBAT 2 (Kang et al., 2019) and SemiBin2 (Pan et al., 2023) respectively. The binning results from these two tools were merged DAS Tool (Sieber et al., 2018). The quality of the bins/MAGs was assessed using CheckM2 (Chklovski et al., 2022). High quality MAGs (with CheckM2 completeness >60% and contamination <10%) were selected and annotated with GTDBtk (Chaumeil et al., 2022). After that, PacBio HiFi reads were classified to obtain taxonomy composition using sourmash (Brown and Irber, 2016). The full GTDB database (R07-RS207) was used for bacterial and archaea identification. Pre-formated GenBank databases (v. 2022.03) provided by sourmash (https://sourmash.readthedocs.io/en/latest/databases.html) were also used for virus, protozoa and fungi identification. Meanwhile, to facilitate the identification of novel microbes, the high quality MAGs were also formatted and used as a reference database for sourmash search. The resulting taxonomy and abundance information were further analyzed: (1) to perform alpha- and beta-diversity analyses; (2) to create microbial composition barplots with QIIME (Caporaso et al., 2012); (3) to create taxa abundance heatmaps with hierarchical clustering (based on Bray-Curtis dissimilarity); and (4) for biomarker discovery with LEfSe (Segata et al., 2011) with default settings (p>0.05 and LDA effect size >2) if applicable.

6 References

Brown, C. T., & Irber, L. (2016). sourmash: a library for MinHash sketching of DNA. Journal of open source software, 1(5), 27.

Caporaso, J.G., Kuczynski, J., Stombaugh, J., Bittinger, K., Bushman, F.D., Costello, E.K. et al. (2010) QIIME allows analysis of high-throughput community sequencing data. Nat Methods 7: 335-336.

Chaumeil, P. A., Mussig, A. J., Hugenholtz, P., & Parks, D. H. (2022). GTDB-Tk v2: memory friendly classification with the genome taxonomy database. Bioinformatics, 38(23), 5315-5316.

Chklovski, A., Parks, D. H., Woodcroft, B. J., & Tyson, G. W. (2022). CheckM2: a rapid, scalable and accurate tool for assessing microbial genome quality using machine learning. bioRxiv, 2022-07.

Pan, S., Zhao, X. M., & Coelho, L. P. (2023). SemiBin2: self-supervised contrastive learning leads to better MAGs for short-and long-read sequencing. bioRxiv, 2023-01.

Feng, X., Cheng, H., Portik, D., & Li, H. (2022). Metagenome assembly of high-fidelity long reads with hifiasm-meta. Nature Methods, 19(6), 671-674.

Kang, D. D., Li, F., Kirton, E., Thomas, A., Egan, R., An, H., & Wang, Z. (2019). MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ, 7, e7359.

Pan, S., Zhu, C., Zhao, X. M., & Coelho, L. P. (2021). Semibin: incorporating information from reference genomes with semi-supervised deep learning leads to better metagenomic assembled genomes (mags). BioRxiv, 2021-08.

Portik, D. M., Brown, C. T., & Pierce-Ward, N. T. (2022). Evaluation of taxonomic classification and profiling methods for long-read shotgun metagenomic sequencing datasets. BMC bioinformatics, 23(1), 541.

Segata, N., Izard, J., Waldron, L., Gevers, D., Miropolsky, L., Garrett, W.S., and Huttenhower, C. (2011) Metagenomic biomarker discovery and explanation. Genome Biol 12: R60.

Sieber, C. M., Probst, A. J., Sharrar, A., Thomas, B. C., Hess, M., Tringe, S. G., & Banfield, J. F. (2018). Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nature microbiology, 3(7), 836-843.