The telephony standard is 8-bit PCM mono uLaw with a sampling rate of 8Khz. Since this telephony format is fixed, any audio file uploaded to Twilio will be transcoded to that telephony standard. That standard is bandwidth-limited to the 300Hz - 8Khz audio range and is designed for voice and provides acceptable voice-quality results. This standard isn't suitable for quality music reproduction but will provide minimally acceptable results.
Recording
For any audio conversion, start with the best possible source recording. This means well recorded voice in a room with good acoustics and a professional quality microphone and preamp. You can achieve the best results with careful mic placement in close proximity to the the sound source. For voice, place the mic to below or side of the speaker's mouth in order to avoid distortion due to plosives. You can also incorporate a pop-filter to avoid this distortion.
Record your source at 44.1kHz or 48Khz sample rate to a 16- or 24-bit mono uncompressed WAV or AIFF file. If available, compressor/limiter and equalization processors can help you get very best audio quality.
Post processing
After recording, archive your recordings in that source format. Transcoding to telephony standard will degrade the audio quality to a large degree, and by keeping a high-quality archive you have the option of reusing the source material.
Use an audio software editing program such as Audacity (a very suitable freeware utility) to trim leading and trailing silence from the recording, to normalize the volume, and to apply and equalization to the source file.
If you have stereo sources, convert these to mono and prior to uploading to your server for use with Twilio. This enables you to monitor the audio for any stereo-to-mono phase artifacts.
Sample Rate Conversion & Transcoding
Keep in mind that there will be unavoidable compression artifacts once the file is transcoded to 8-bit uLaw. These will manifest mostly as loss of transient response. Also keep in mind that playback on a mobile device will sound considerably worse than a landline phone, due to additional transcoding to GSM format and the adverse impact of poor cell reception.
When uploading an audio file to your server for use with Twilio you have a couple of options of handling the sample rate conversion that is part of the overall transcoding of the audio file. Depending on your audio software's capabilities, each option may yield the best results, so it is worthwhile to try each:
1) Upload the high-quality file to your server and let Twilio handle all aspects of the transcoding.
2) Transcode your audio file in advance. If your audio software editor has sample rate convertor and encoding capabilities, this option affords you some degree of control over the final results. Also, the quality of sample rate convertors (to go from high-quality 44.1Khz or 48Khz to 8Khz) varies depending on the algorithm used, so you can compare Twilio with that of your own audio software.
Avoid Lossy format conversions
Always use the best source recording and avoid any file format that converts one lossy format to another (ie MP3 to 8-bit uLaw), this will definitely introduce additional artifacts.
Also avoid the temptation to compensate for the limited 8Khz bandwidth by over-emphasizing the higher frequencies in your audio source. This doesn't accomplish much and your results can sound worse - but you can experiment with modest amounts of EQ.
Pro tip: Use an equalizer to roll off low frequencies (under 200hz) to help remove room background noise, emphasize the 2-3Khz range to improve intelligibility, and notch out 1.2Khz slightly to smooth out harsh sounding voices.