The JavaSound API adds audio capabilities to the Java platform. It's been part of J2SE since version 1.3 and it supports the WAV, AU, and AIFF audio formats, and provides MIDI support. It doesn't support some other audio formats, such as MP3, but it provides a flexible plugin architecture allowing any third-party vendor to add custom audio format support through the JavaSound Service Provider Interfaces (SPIs). This article deals with this plugin architecture and API, how to write and use a custom SPI implementation, how metadata such as title, artist, and copyright are exposed, and how multiple SPI implementations could be integrated in an application such as player or a game.
The JavaSound API provides a plugin architecture, allowing third parties to support new formats such as MP3, Ogg Vorbis, FLAC, Monkey's Audio, and more. This architecture allows the JVM to discover and load plugins at runtime. Each plugin must implement the service provider interfaces. One implementation is needed for each new audio format supported. That's the reason why you can find one SPI implementation for MP3, one for Monkey's Audio, and so on.
To be loaded, the SPI implementation must be available in the JVM runtime
classpath. To play audio, the JVM will look for javax.sound.sampled.spi.AudioFileReader
and javax.sound.sampled.spi.FormatConversionProvider, text files stored in
META-INF/services folder. These files contain the concrete classnames of the
SPI implementation
that will be instantiated. They are needed for loading and decoding audio data. Then, when an
application needs to play an audio file, JavaSound will try each SPI implementation until
throwing UnsupportedAudioFileException if none matches. Thus, for a
JavaSound-based application (such as an audio player, game, educational program, etc.),
developers don't have to
pay attention to audio-format support. Instead, the application just needs to use the
JavaSound API. SPI classes are needed at runtime only and not at build time, so, in
addition to technical advantages, their use could have business advantages for
GNU-GPL-based solutions integration.
|
Related Reading
|
The JavaZOOM team provides an open source MP3 SPI implementation. It focuses on MP3 playing only. It relies on JLayer, an open source Java library that decodes and converts MP3 (MPEG 1, 2, and 2.5, Layers 1, 2, and 3) frames to PCM, the standard for uncompressed audio data. JavaSound service provider interfaces allows caller to read, convert, and write audio data, but to play MP3, we only need the read and convert features. JavaZOOM's MP3 SPI does not allow MP3 encoding.
Thus, the JavaSound API requires us to implement the AudioFileReader and
FormatConversionProvider abstract classes. First, let's focus on our MpegAudioFileReader
that extends AudioFileReader. Six methods must be implemented; three return an AudioFileFormat instance from an
input (File, URL, or InputStream) and three return an
AudioInputStream instance.
public abstract AudioFileFormat getAudioFileFormat(File file) throws UnsupportedAudioFileException, IOExceptionpublic abstract AudioFileFormat getAudioFileFormat(URL url) throws UnsupportedAudioFileException, IOExceptionpublic abstract AudioFileFormat getAudioFileFormat(InputStream stream) throws UnsupportedAudioFileException, IOExceptionpublic abstract AudioInputStream getAudioInputStream(File file) throws UnsupportedAudioFileException, IOExceptionpublic abstract AudioInputStream getAudioInputStream(URL url) throws UnsupportedAudioFileException, IOExceptionpublic abstract AudioInputStream getAudioInputStream(InputStream stream) throws UnsupportedAudioFileException, IOExceptionTo avoid code duplication, we developed one more
generic method for getAudioFileFormat:
public AudioFileFormat getAudioFileFormat(InputStream inputStream, long mediaLength) throws UnsupportedAudioFileException, IOExceptionIndeed, File and URL could be seen as InputStreams
with a known length. We also did the same for getAudioInputStream. The work of
our getAudioFileFormat
is to read and parse the first MP3 frame to:
InputStream is a valid MP3 stream (if not, then it throws an UnsupportedAudioFileException).MpegAudioFileFormat instance with all of these audio properties.MpegAudioFileFormat extends AudioFileFormat by adding
MP3-specific, high-level audio properties such as metadata (ID3 tags).
Its constructor needs a Type and an AudioFormat:
MpegAudioFileFormat(AudioFileFormat.Type type, int byteLength, AudioFormat format, int frameLength)We also extended AudioFormat to MpegAudioFormat to add MP3-specific
properties (VBR, CRC flag, padding, etc.). Unlike AudioFileFormat, AudioFormat includes
low-level audio properties such as sampling rate, channels, framesize and AudioFormat.Encoding.
We defined multiple AudioFormat.Encoding constants, one for each combination of MPEG version
and layer:
public class MpegEncoding extends AudioFormat.Encoding
{
public static final AudioFormat.Encoding MPEG1L1 =
new MpegEncoding("MPEG1L1");
public static final AudioFormat.Encoding MPEG1L2 =
new MpegEncoding("MPEG1L2");
public static final AudioFormat.Encoding MPEG1L3 =
new MpegEncoding("MPEG1L3");
public static final AudioFormat.Encoding MPEG2L1 =
new MpegEncoding("MPEG2L1");
public static final AudioFormat.Encoding MPEG2L2 =
new MpegEncoding("MPEG2L2");
public static final AudioFormat.Encoding MPEG2L3 =
new MpegEncoding("MPEG2L3");
public static final AudioFormat.Encoding MPEG2DOT5L1 =
new MpegEncoding("MPEG2DOT5L1");
public static final AudioFormat.Encoding MPEG2DOT5L2 =
new MpegEncoding("MPEG2DOT5L2");
public static final AudioFormat.Encoding MPEG2DOT5L3 =
new MpegEncoding("MPEG2DOT5L3");
public MpegEncoding(String strName)
{
super(strName);
}
}
Now, let's focus on our MpegFormatConversionProvider, which extends
FormatConversionProvider :
public abstract AudioInputStream getAudioInputStream (AudioFormat.Encoding targetEncoding, AudioInputStream sourceStream)public abstract AudioInputStream getAudioInputStream (AudioFormat targetFormat, AudioInputStream sourceStream)public abstract AudioFormat.Encoding[] getSourceEncodings()public abstract AudioFormat.Encoding[] getTargetEncodings()public abstract AudioFormat.Encoding[] getTargetEncodings (AudioFormat sourceFormat)public abstract AudioFormat[] getTargetFormats (AudioFormat.Encoding targetEncoding, AudioFormat sourceFormat)The
getSourceEncodings and getTargetEncodings methods return the list
of sources and target encodings supported by the conversion provider. For MP3, it's important
that the returned AudioFormat.Encodings indicate only the combinations of
sampling rate, bitrate, and channels that are allowed by the
MP3 header specification.
The getAudioInputStream methods return an AudioInputStream with the specified format
(or encoding) from the given source AudioInputStream. For instance, MP3 SPI could
return a decoded 44.1 kHz/16bits/stereo PCM stream given a 44.1 kHz/128 kbps/joint
stereo input MP3 stream. To save time, we used the low-level classes of Tritonus. They provide nice methods for the matrix format conversion and circular buffer
implementation (to store decoded PCM data) needed for most SPI implementations. This way, the
main job of our MpegFormatConversionProvider is to call the JLayer API to synchronize and get
decoded frames.
|
Thanks to JavaSound plugin architecture, using the MP3 SPI in your application is as easy as using the JavaSound API.
To get MP3 information (such as channels, sampling rate, and other metadata), you need to call the
AudioSystem.getAudioFileFormat(file) static method from AudioSystem.
It will return an instance of MpegAudioFileFormat, from which you can get audio
properties. Note that the AudioSystem class acts as the entry point to the
sampled-audio system resources.
File file = new File("filename.mp3");
AudioFileFormat baseFileFormat = null;
AudioFormat baseFormat = null;
baseFileFormat = AudioSystem.getAudioFileFormat(file);
baseFormat = baseFileFormat.getFormat();
// Audio type such as MPEG1 Layer3, or Layer 2, or ...
AudioFileFormat.Type type = baseFileFormat.getType();
// Sample rate in Hz (e.g. 44100).
float frequency = baseFormat.getSampleRate();
To play MP3, you need first to call AudioSystem.getAudioInputStream(file) to get
an AudioInputStream from an MP3 file, select the target format (i.e., PCM) according to
input MP3 channels and sampling rate, and finally get an AudioInputStream with the
target format. If JavaSound doesn't find a matching SPI implementation supporting the MP3-to-PCM conversion, then it will throw an exception.
File file = new File("filename.mp3");
AudioInputStream in= AudioSystem.getAudioInputStream(file);
AudioInputStream din = null;
AudioFormat baseFormat = in.getFormat();
AudioFormat decodedFormat =
new AudioFormat(AudioFormat.Encoding.PCM_SIGNED,
baseFormat.getSampleRate(),
16,
baseFormat.getChannels(),
baseFormat.getChannels() * 2,
baseFormat.getSampleRate(),
false);
din = AudioSystem.getAudioInputStream(decodedFormat, in);
// Play now.
rawplay(decodedFormat, din);
in.close();
Second, you have to send the decoded PCM data to a SourceDataLine. This means
you have to
load PCM data from the decoded AudioInputStream into the SourceDataLine
buffer until the end of file is reached.
JavaSound will send this data to the sound card. Once the file is exhausted, the
line resources must be closed.
private void rawplay(AudioFormat targetFormat,
AudioInputStream din)
throws IOException, LineUnavailableException
{
byte[] data = new byte[4096];
SourceDataLine line = getLine(targetFormat);
if (line != null)
{
// Start
line.start();
int nBytesRead = 0, nBytesWritten = 0;
while (nBytesRead != -1)
{
nBytesRead = din.read(data, 0, data.length);
if (nBytesRead != -1)
nBytesWritten = line.write(data, 0, nBytesRead);
}
// Stop
line.drain();
line.stop();
line.close();
din.close();
}
}
private SourceDataLine getLine(AudioFormat audioFormat)
throws LineUnavailableException
{
SourceDataLine res = null;
DataLine.Info info =
new DataLine.Info(SourceDataLine.class, audioFormat);
res = (SourceDataLine) AudioSystem.getLine(info);
res.open(audioFormat);
return res;
}
If you're familiar with JavaSound API, you will notice that source code for playing MP3 is similar to the what you'd use to play a WAV file. The source code sample above has no dependencies upon the MP3 SPI implementation. It's transparent for the developer.
Notice that if the file to play was stored on a web server, we would have used:
URL url = new URL("http://www.myserver.com/filename.mp3");
AudioInputStream in= AudioSystem.getAudioInputStream(url);
instead of:
File file = new File("filename.mp3");
AudioInputStream in= AudioSystem.getAudioInputStream(file);
Most audio formats include metadata such as title, album, comments, compression quality,
encoding, and copyright. ID3 tags, used for MP3,
are the best-known metadata format.
Depending on ID3 version (v1 or v2), they can be found either at the end or at the beginning of
an MP3 file. They include information such as duration, title, album, artist, track number, date,
genre, copyright, etc. They can even include lyrics and pictures.
The famous (and free) SHOUTcast
streaming MP3 server, from Nullsoft, uses a different scheme in order to provide
additional metadata such as title streaming, which
allows a player to display the current song being played from the online radio stream. All of
these metadata items need to be parsed and exposed through the SPI implementation. As of J2SE 1.5,
the JavaSound API standardizes the passing of metadata parameters through an immutable
java.util.Map:
File file = new File("filename.mp3");
AudioFileFormat baseFileFormat =
AudioSystem.getAudioFileFormat(file);
Map properties = baseFileFormat.properties();
String key_author = "author";
String author = (String) properties.get(key_author);
String key_duration = "duration";
Long duration = (Long) properties.get(key_duration);
All metadata keys and types should be provided in the SPI documentation. However, common properties include:
"duration" (Long): Playback duration of file, in microseconds"author" (String): Name of the author of the file"title" (String): Title of the file"copyright" (String): Copyright message"comment" (String): Arbitrary text
Adding MP3 audio capabilities to the Java platform means adding JAR files containing the
MP3 SPI implementation to the runtime CLASSPATH.
Adding Ogg Vorbis,
Speex, Flac, or
Monkey's Audio support would be similar, but could generate conflicts that
make other SPI implementations fail. The following situation could occur:
CLASSPATH includes both MP3 and Ogg Vorbis SPIs.AudioSystem tries Ogg Vorbis SPI first.NullPointerException, ArrayIndexOutOfBoundException),
and in the worst case, you will hear weird noises or just deadlock.In the example above, it's true that the problem comes from the Ogg Vorbis SPI implementation, but it's not easy for the SPI provider to have reliable controls (just think about streaming). Thus, each SPI provider has to pay attention to the others. That's the main practical drawback of the JavaSound plugin architecture. So don't be surprised if you have problems making multiple SPIs work together in your application.
JMF stands for Java Media Framework. It's an optional J2SE packages that adds multimedia support to the Java platform. It includes audio (GSM, QuickTime, etc.), video (AVI, QuickTime, H.263, etc.) and RTP streaming features. JMF provides a plugin architecture, but it is not compliant with that of JavaSound. In fact, MP3 support was previously included in JMF, but it was removed in 2002 because of licensing issues.
JavaSound rocks. It provides a plugin architecture allowing any third-party provider to add custom audio format support, such as for MP3 files. API is flexible enough to plug most heterogeneous (lossy, lossless) audio formats, whatever their parameters and metadata, to the Java platform -- "Write once, play anywhere."
The JavaZOOM Team are the authors of the open source projects JLayer and jlGui.
Return to ONJava.com
Copyright © 2007 O'Reilly Media, Inc.