Coursera was launched in 2012 by Daphne Koller and Andrew Ng with the goal of giving life-changing learning experiences to students all around the world. In the modern day, Coursera is a worldwide online learning platform that provides anybody, anywhere with access to online courses and degrees from top institutions and corporations.
Introduction to Genomic Technologies Coursera Quiz Answers
Week 1 Quiz Answers
Quiz 1: Overview and Molecular Biology
Q1. The central dogma of molecular biology tells us that information is passed from
- DNA to RNA to protein
- DNA to methylation to RNA to protein
- DNA to RNA to methylation to protein
- RNA to DNA to protein
Q2. Which of the following is one of the major drivers of the sequencing revolution that began after 2008?
- Decreased cost of sequencing
- Improved Sanger sequencing
- Decreased computational analysis time
- Increased sample collection
QQ3. Which of the following is an exclusive characteristic of genomics compared to traditional biology?
- Studies considering the entire genome
- Targeted studies of one or a few genes
- Measurements of molecules in the Central Dogma
- Clever experimental design
Q4. Genomic data science involves techniques from which of these disciplines?
- Computer Science
- Molecular Biology
- Statistics
- All of the these options;
Q5. Which of the following is an activity that genomic data scientists do not perform?
- Pipetting
- Population genomics
- Preprocessing and normalization
- Statistics and machine learning
Q6. Which of these is not one of the DNA nucleic acids?
- Tyrosine
- Thymine
- Adenine
- Guanine
- Alanine
Q7. Transcription is a process that converts DNA to
- genes
- polymerases
- RNA
- ribosomes
Q8. The cost to sequence a human genome today, in U.S. dollars, is approximately
- $30 million
- None of these options
- $1000
- $20,000
Q9. DNA encodes instructions for
- Creating an entire human being from scratch
- Regulating body temperature
- Helping us to see objects
- Enveloping viruses that infect a cell
Q10. One major difference between humans and bacteria is
- Human cells have a nucleus, and bacterial cells do not.
- The human genome is made of DNA, while bacteria are made of RNA.
- Human genes are first transcribed to RNA, while bacterial genes are not.
- Human proteins are made of combinations of 20 amino acids, while bacterial proteins use a smaller set of 12 amino acids.;
Week 2
Quiz 1: Measurement Technology
Q1. Genome assembly refers to
- A computational method for reconstructing chromosomes from short reads
- A computational method to identify the genes being expressed in a cell or tissue
- The process whereby a cell copies its DNA
- A method for capturing gene sequences
Q2. Which of the following is not true about DNA?
- It is a doublestranded molecule
- It doesn’t matter which direction you write the sequence in
- Each strand has a direction
- One strand is complementary to the other
Q3. RNA molecules are translated into
- Modified RNA molecules
- Introns
- DNA molecules
- Proteins
Q4. Messenger RNA is;
- A template from which proteins are constructed by ribosomes
- A special signal that helps a cell communicate with other cells
- A reverse copy of DNA
- The genetic material inherited by offspring
Q5. DNA is copied into DNA in order to
- Replicate a cell
- Create species diversity
- Encourage evolutionary changes
- Respond to an infection;
Q6. Evolutionary biology involves the study of
- The process of natural selection that allows some DNA mutations to survive and cause others to die out
- How the cell membrane is formed
- The process through which RNA is exported from the nucleus
- The origin of the very first living organisms
Q7. Which of the following can we measure with next generation sequencing?
- DNA-protein binding
- Cell structure
- Protein levels
- RNA secondary structure
Q8. What is the first step in ChIP-sequencing to measure protein-DNA binding?
- Cross-linking proteins to the DNA
- Sequencing the bound DNA fragments
- Antibody pulldown of the linked proteinDNA fragments
- Fragmenting the DNA
Q9. Which of the following can be measured using bisulfite conversion and then sequencing?
- DNA methylation
- DNA secondary structure
- DNA variants
- DNA-protein binding
Q10. What is the primary measurement technology used in most modern genomics experiments?;
- Nanopore sequencing
- Polymerase chain reaction
- Next generation sequencing
- Sanger sequencing
- Oligonucleotide arrays
- Western blotting
Week 3
Quiz 1: Computing Technology
Q1. A computer algorithm is
- A description of the memory organization within a computer
- A protocol for transmitting data over a network
- A description of the hardware and software capabilities of a computer
- A precise specification of all the steps needed to compute a solution to a problem;
Q2. You can make a program more efficient by
- Re-designing the data structures to require less storage
- Using a faster computer
- Running it on the Amazon cloud
- Replacing your hard drives with faster solid state drives
Q3. DNA sequences can be represented efficiently using
- Two bits for each of the 4 possible bases
- One byte for each of the 4 possible bases
- One codon for each amino acid
- The SAM alignment format
Q4. A programming language is
- A formal language used to instruct computers what to do
- The way we describe high-level algorithms
- Anything written in Python
- A method for translating between languages such as English and French
Q5. Software engineering involves
- Debugging computer code
- Testing programs on a wide range of examples to see if they perform as expected
- All of these options
- Updating code so that it remains compatible with other software systems.
Q6. Bowtie, TopHat, and Cufflinks are
- Programs for analysis of RNA-seq (transcriptome) data sets
- A web-based system for browsing genome data
- Elements of men’s formal evening wear
- Programs for determining the function of a gene
Q7. Sequence alignment refers to
- Lining up two DNA sequences so that positions with the same base match one another
- Making sure that all the DNA sequences in a file use the same format
- Shuffling the positions in a sequence to randomize them
- Determining the amino acid sequence produced by translating a DNA sequence
Q8. A software pipeline for RNA sequence (RNA-seq) analysis will
- Compare cases to controls and determine which genes were responsible for any differences in phenotype
- Create a database in which one can efficiently store and retrieve results
- Process large raw sequence files into a summary table showing which genes were present
- Automatically update all software to the latest version
Q9. Which of the following is not a computer operating system?
- RedHat Linux
- Mac OS X
- Google Drive
- Unix
Q10. A data set large enough to overwhelm the main memory of a computer would
- Use uncompressed files rather than compressed ones
- Contain more than one genome’s worth of data
- Be larger than the available RAM
- Be at least 100 gigabytes in size
Week 4
Quiz 1: Data Science Technology
Q1. Which of the following are required for sharing a data set?
- An explicit and exact recipe to go from the raw to the tidy data
- The raw data
- A code book describing each variable and its values
- All of these options
Q2. Which of the following should be included in data tidying recipes?
- Power calculations
- Version numbers for software
- Preprocessed data
- Units of variables
Q3. What is the central dogma of statistics?
- Using measurements on a population to infer knowledge about a sample
- Using Bayes rule to calculate probabilities we care about
- Estimating parameters using frequencies of observed events
- Using measurements on a probabilistically selected sample to infer knowledge about a population
Q4. Which of the following are types of variability in all genomic data?
- Phenotypic variability
- Genetic drift
- Variation from changing technology
- Variability due to dropout
Q5. Which of the following will increase power in a statistical analysis?
- Increasing sample size
- Using a new technology
- Adjusting for confounders
- Increasing measurement variation
Q6.If 100 p-values are calculated on a data set with no signal, how many p-values would we expect to be less than 0.05 on average?
- 100
- 20
- 10
- 5
Q7. If we report 500 results as significant out of 10,000 tests while controlling the family-wise error rate at 5%, about how many false positives do we expect?
- 10
- 0
- 200
- 25
Q8. What is the most common confounder in genomics?
- Batch effects
- Sex
- Genetic background
- Population stratification
Q9. Which of the following can be used to address potential confounders at the experimental design stageRandomization
- Measuring DNA instead of RNA
- Multiple testing correction
- Using linear models
Q10. Which of the following are benefits of making big data as small as possible as soon as possible?
- Reducing the data will reduce the number of hypothesis tests
- Smaller data sets will decrease false discovery rates
- Interactive analysis can improve our ability to make discoveries
- Reduced data will increase the power of statistical
- Quiz 2: Course Project
Q1. Why did the authors write this paper?
- To prove that there are a large number of genes shared between humans and bacteria.
- To compute the E-values for the BlastP matches to the proteins from the human proteome.
- To propose a plausible alternative to the hypothesis that genes had been “laterally” transferred to humans.
- To show that sample size formulae for “lateral” gene transfer are not correct.
Q2. What is “lateral gene transfer”?
- When genetic material is passed from the genome of one organism to another through a process other than reproduction.
- When a gene is transferred out of the DNA and permanently lost.
- When genes are transferred out of the nucleus and into the cell.
- When new genetic material is created and transferred to the genome of an organism.
Q3. Why is lateral gene transfer (LGT) from bacteria to humans unlikely?
- Because a bacterium would have to infect a germline cell, enter the nucleus of that cell, and insert some of its DNA into one of the host’s chromosomes, after which the mutation would then have to provide an evolutionary advantage to spread through the population.
- Because bacteria never actually enter human cells during an infection.
- It is not unlikely; in fact, LGT has occurred and it is an ongoing process in the human population.
- There hasn’t been sufficient time since humans and bacteria diverged for laterally transferred genes to spread through the population.
Q4. What are homologs?
- Identical mutations that occurred over evolutionary time.
- Two genes in different organisms that have been mutated at the same rate.
- Genes that are greater than 99% similar in DNA sequence
- Two copies of a gene in different organisms that share a common ancestor.
Q5. What was the main method used to rule out lateral gene transfers between humans and bacteria?
- If a homolog of a bacteria was also found in humans.
- If genes were found to have mutated between eukaryotic genomes and human genomes.
- If a homolog of a gene was found in prokaryotic genomes.
- If a homolog of a gene found in humans was also found in a species of nonvertebrate eukaryotes.
Q6. Why would this method rule out lateral gene transfers?
- Lateral gene transfer is an unusual process compared to standard inheritance and nonvertebrate eukaryotic organisms and humans are evolutionarily “closer” than bacteria and humans. If humans and nonvertebrates share a homologous gene, it was likely not directly passed from bacteria to humans.
- Inheritance of common genes is less common than lateral gene transfer and nonvertebrate eukaryotic organisms and humans are evolutionarily “closer” than bacteria and humans. If humans and bacteria share a homologous gene, it was likely directly passed from bacteria to humans.
- Humans and bacteria are both likely to have shared an evolutionary history with nonvertebrate eukaryotic organisms, so genes are likely to be homologous across all three.
- Nonvertebrate eukaryotic organisms and bacteria are evolutionarily “closer” than invertebrate eukaryotic organisms and humans. If they share a homologous gene, then bacteria are likely to have passed genes directly to humans.
Q7. What are the biological, computational, and statistical parts of Figure 1?
- Biological: the argument that lateral transfer should be ruled out if there is a human/nonvertebrate eukaryote homologs.
- Computational: The identification of homologs by performing Blastp searches on known protein sets.
- Statistical: The calculation of the standard error for the sample size curves.
- Biological: the argument that a Blast cutoff of 10^-10 should define homologs
- Computational: The identification of homologs by performing Blastp searches on known protein sets.
- Statistical: Observing and quantifying the trend in genes shared versus genome sample size.
- Biological: the argument that lateral transfer should be ruled out if there is a human/nonvertebrate eukaryote homologs.
- Computational: The identification of homologs by performing Blastp searches on known protein sets.
- Statistical: Observing and quantifying the trend in genes shared versus genome sample size.
- Biological: the argument that gene should be ruled out if there is a human/nonvertebrate eukaryote homologs.
- Computational: Observing and quantifying the trend in genes shared versus genome sample size.
- Statistical: The identification of homologs by performing Blastp searches on known protein sets.
Q8. What are the biological, computational, and statistical parts of Figure 2?
- Biological: the argument that lateral gene transfer is less common than standard gene flow through reproduction
- Computational: The identification of homologs of human HAS genes by iterative BlastP searches and application of the neighbor-joining algorithm to create the phylogenetic tree.
- Statistical: The inference that humans cluster more closely (have smaller distances to) other eukaryotes than to bacteria.
- Biological: the argument that lateral gene transfer is less common than standard gene flow through reproduction.
- Computational: The calculation of statistical significance of the protein hits in the Blastp search.
- Statistical: The statistical modeling of protein sequences via a Markov Model.
- Biological: the argument that proteins should have more similar sequences if they are evolutionarily closer.
- Computational: The identification of homologs of human HAS genes by iterative BlastP searches and application of the neighbor-joining algorithm to create the phylogenetic tree.
- Statistical: The inference that humans cluster more closely (have smaller distances to) other eukaryotes than to bacteria.
- Biological: the argument that lateral gene transfer is less common than standard gene flow through reproduction
- Computational: The storage of data in a low redundancy protein database.
- Statistical: The inference that humans cluster more closely (have smaller distances to) other eukaryotes than to bacteria.
Q9. The analysis in this paper required multiple data sources. Which of the following data sources was not used in the paper?
- The set of all known genes (at the time) from the malaria parasite, Plasmodium falciparum.
- The set of all known genes (at the time) from completed bacterial genomes.
- The complete set of genes from the fruit fly, nematode worm, yeast, and mustard weed genomes.
- The complete set of noncoding RNA genes from the human genome.
Q10. In the end what is the conclusion of the paper?
- That increasing the number of sequenced genomes is likely to increase the number of potential lateral gene transfer events.
- That a more plausible explanation for the observation of homologous genes found in bacteria and humans but not in non-vertebrate eukaryotes is gene loss and low sample size.
- That the argument for lateral gene transfer is statistical because we must average over multiple possible transfer events.
- That genes are more likely to be laterally transferred from certain types of bacteria to humans.
Review:
Based on our knowledge, we urge you to enroll in this course so you can pick up new skills from specialists. It will be worthwhile, we trust.