Programa del Curso

Foundations of Audio Classification

  • Sound event types: environmental, mechanical, human-generated
  • Overview of use cases: surveillance, monitoring, automation
  • Audio classification vs detection vs segmentation

Audio Data and Feature Extraction

  • Types of audio files and formats
  • Sampling rate, windowing, frame size considerations
  • Extracting MFCCs, chroma features, mel-spectrograms

Data Preparation and Annotation

  • UrbanSound8K, ESC-50, and custom datasets
  • Labeling sound events and temporal boundaries
  • Balancing datasets and augmenting audio

Building Audio Classification Models

  • Using convolutional neural networks (CNNs) for audio
  • Model input: raw waveform vs features
  • Loss functions, evaluation metrics, and overfitting

Event Detection and Temporal Localization

  • Frame-based and segment-based detection strategies
  • Post-processing detections using thresholds and smoothing
  • Visualizing predictions on audio timelines

Advanced Topics and Real-Time Processing

  • Transfer learning for low-data scenarios
  • Deploying models with TensorFlow Lite or ONNX
  • Streaming audio processing and latency considerations

Project Development and Application Scenarios

  • Designing a full pipeline: ingestion to classification
  • Developing a proof-of-concept for surveillance, quality control, or monitoring
  • Logging, alerting, and integration with dashboards or APIs

Summary and Next Steps

Requerimientos

  • An understanding of machine learning concepts and model training
  • Experience with Python programming and data preprocessing
  • Familiarity with digital audio fundamentals

Audience

  • Data scientists
  • Machine learning engineers
  • Researchers and developers in audio signal processing
 21 Horas

Próximos cursos

Categorías Relacionadas