Speech Processing


Teaching Staff: Karydis Ioannis
Code: MO310
Course Type: Direction of BCI - Compulsory
Course Level: Undergraduate
Course Language: Greek
Semester: 8th
ECTS: 5
Teaching Units: 3
Lecture Hours: 2
Lab/Tutorial Hours: 2L
Total Hours: 4
E Class Page: https://opencourses.ionio.gr/courses/DDI119/
Curricula: Revamped Curriculum in Informatics from 2025
Short Description:

Speech processing is the field of science that deals with the analysis, processing, understanding and synthesis of the human voice by computer systems. It is a basic subject of speech technology and has applications in areas such as speech recognition and understanding, voice synthesis, dialogue systems, voice biometrics and assistive technologies for people with disabilities. In modern times, speech technologies are integrated into smart devices, virtual assistants and multimedia interaction environments.

The theoretical part of the course studies: Modeling the speech production mechanism: Speech production mechanism, Speech sounds. Digital preprocessing of speech text: Selection of sampling frequency, Digitization, Short-term analysis of speech signal, Selection of frame length, Pre-emphasis, Selection of "window" filter, Frame movement rate. Acoustic parameters: Parameter extraction, Acoustic information for speaker discrimination, Energy and zero crossings, Fundamental frequency, Tonality calculation methods, Spectrogram, Vocal channel resonances (FORMANTS), Linear prediction coefficients (LPC), filter bank, reflection coefficients, Cepstral coefficients. Basic speech processing techniques. Hidden Markov models: Definition and fundamental algorithms. Speech recognition/understanding systems, Speaker recognition systems. Speech synthesis. Digital noise removal techniques.

The laboratory part includes practical application of speech processing techniques, as presented in the theoretical part, using the open source software Octave GNU.

Objectives - Learning Outcomes:

Students' understanding of the basic concepts of speech processing. Cultivating scientific thinking around the issues of speech processing technologies as well as their extensions.

Students will also have the opportunity to:

  • understand the stages of development of speech processing applications,
  • be able to design, develop and manage corresponding processes,
  • detect the opportunities for the development of the technology,
  • come into contact with related research issues.

Upon completion of the thematic modules, students are able to:

  • plan the development of speech processing,
  • implement speech processing usage scenarios,
  • execute the design and development of a complete software package furnishing speech processing capabilities, and also
  • detect business-professional opportunities.
Syllabus:

Introduction to Signal Processing

Introduction to Digital Signal Processing

  • What is a discrete signal and what is a discrete system
  • Fourier Transform
  • Signal Sampling and Digitization

Speech Production

  • Speech Production
  • Modeling the Speech Production System

The three parameters that allow any acoustic phenomenon to be characterized are intensity, frequency, and time. The perception of sound intensity in humans depends on the frequency of the sound

Basic steps in speech production

  • Formation of the idea that we want to communicate
  • Conversion of the idea into a linguistic structure using related words and phrases
  • Classification of words based on grammatical rules determined by the language used
  • Addition of features such as frequency, intensity
  • The brain produces a series of commands that move the vocal system which in turn produces the sound (acoustic) waves

Speech Preprocessing & Speech Parameterization

  • How do we convert speech from an acoustic signal to a digital one
  • Introduction to speech parameterization: how can we keep from a speech signal only the parameters that express it

Speech digitization includes the following steps:

  • conversion of the acoustic signal into electrical
  • amplification of the level of the electrical signal coming from the microphone
  • passage of the acoustic signal through low-pass filter to cut off high frequencies
  • converting the analog signal to digital
  • separating the digital speech signal into short time frames (framing)

Speech Parameterization

Speech parameterization: how can we keep from a speech signal only the parameters that express it

High information redundancy of digitized signal data

  • extraction of appropriate parameters
  • only necessary information for a specific use
  • result: substantial data volume compression and easy use

Modeling parameter requirements:

  • High recognition reliability
  • Short computational time required for their determination
  • Small information flow

Linear Predictive Coding

Based on the previous values ​​of a function, can we calculate its value at position n?
The concept of uncertainty is introduced in predicting a future value of a function

Predicting future values ​​based on existing (known) values ​​is widely used:

  • Meteorology
  • Stock market
  • Signal coding/compression (image/sound/data) for transmission
  • In biology for predicting population evolution

Linear prediction is a simple method of predicting future values ​​based on a linear combination of existing values

Speech Recognition

Why speech recognition?

  • Speech is the dominant and most widespread way of human communication
  • The best human-computer interface!
  • Most computer users speak faster than they type.
  • Humans speak first, then write: computer use from toddlers
  • People with limited motor skills (or even limited education) will be able to use computers
  • More natural communication with television, kitchen, coffee maker, front door (intelligent homes)
  • Virtual Reality systems with speech recognition
  • Computer/console games

A speech recognition system is a system that transcribes speech into text

  • It acts like a typist, it "listens" to what the user says and converts it into written speech
  • Speech recognition does not imply speech understanding
  • Understanding falls within the field of artificial intelligence (AI)
  • Many systems can recognize speech, none can truly "understand" it today

Speaker Recognition

A speech recognition system is a system that converts the speech signal into text

  • We are interested in what the speaker is saying
  • Speech recognition does not imply and speech understanding

Speech analysis: A speech recognition system uses speech parameterization methods that we have come to know

Speech recognition systems are categorized according to

  • the type of speech they can recognize
  • the type of speaker
  • the size of the vocabulary they support
  • the recognition unit
  • the recognition technique

Speech synthesis

Speech synthesis is the conversion of text into a speech signal

  • What is the difference between a speech synthesis system and a CD-player?
  • A speech synthesis system algorithmically generates a new speech signal, while a CD-player or an mp3 file reproduces a stored speech signal.

Speech synthesis applications

  • Human-computer interface
  • People with speech impairments
  • People with visual impairments
  • Telecommunications (reading messages, directory information, telephone information, news, etc.)
  • Entertainment (videogames)
Suggested Bibliography:
Teaching Methods:
  • In-person interactive lectures
  • In-person labs
  • Optional project work
New Technologies:
  • Interactive whiteboard
  • Slides with animation
  • Computer software for the individual components of speech processing
  • Design and development of speech processing applications in Octave GNU
Evaluation Methods:

With optional assignment

  • Grade = 60% assignment, 40% written exam
  • Required pass condition: written exam grade >= 5/10

Without optional assignment

  • Grade = 100% written exam
  • Required pass condition: written exam grade >= 5/10

Back
<< <
October 2025
> >>
Mo Tu We Th Fr Sa Su
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Today, Monday 13-10-2025
No results found for that day
Text To SpeechText To Speech Text ReadabilityText Readability Color ContrastColor Contrast
Accessibility Options