Claudio Midolo / design + technology: December 2007

ABSTRACT

The aim of this paper is to explain the design investigation behind the creation of the soundFishing interface: a portable, semi-autonomous, digital tool that is able to analyze the sonic environment around us and extract some particular sounds out of it.
The main purpose of this project is to draw attention to those every-day life sonic perceptions that we usually don’t pay too much attention to, and to enable the individual to rediscover both the power and the value that they carry.
I like to call it soundFishing because I believe that this tool allows users to fish sounds directly from his or hers every-day life activities.
The development of this concept and the research project is here outlined.

INTRODUCTION

The initial inspiration for the soundFishing project can be tracked back to a subway routine trip from Brooklyn to Manhattan that took place in early October 2007. Usually the train is very crowded, but that morning I found myself alone and, since I had nothing else to do but wait for the train to stop at my destination, I started listening to those environmental sounds I usually don’t pay attention to. At the beginning they seemed to be just random audio events caused by the motion of the train, but the more I was paying attention to them, the subtlest the pattern became and, in the end, they really merged into a strange, yet fascinating musical piece made of rhythmical accelerations, repetitions and vibrations.

MOTIVATION

That experience had a double impact on me: first it revealed to me the real value carried by those ephemeral sonic perceptions, then it also made me think about the fact that all that “sound matter” is often silenced: it usually goes wasted because we are constantly surrounded by it, leaving us unable to really understand its value. My project tries to find a solution to this issue saving these sound fragments from oblivion, rescuing them from the world’s indifference and letting them tell a different story about it: a tale about the places we live in, the people we meet and the experiences we are going through every day. They speak to us about something that maybe we didn’t know and never noticed before.

CONCEPTS

The following concepts represent the core of the soundFishing project and its development:

Sounds as an intimate diary: the audio captured from the environment will build up a sonic diary of the events that take place during the user's everyday life.
Non conscious action: the basic difference between a traditional textual diary and this diary of sounds is that the former is created consciously by the user who has the power to personally intervene on it, deciding what and when to write something. On the other hand the sonic diary produced by the soundFishing interface is composed “unconsciously” by the user who has just the power to set the basic logical rules that will control the capturing of the sound events: the user can not decide explicitly what and when to record. I believe that the control loss embedded in the tool’s functionality can result into a surprise effect and induce curiosity towards an otherwise rather obvious final output.

Generative sampling - Automation: the nature of the interface will be intimately procedural and algorithmic as the user will define a set of rules that will regulate the recording process. The interface works on its own without any direct control: it operates as an autonomous audio filtering agent continuously browsing the environment for events to happen. Once found the sonic events in compliance with the user’s instruction, the device starts to capture the sounds. This process can be linked to Manovich’s concept of Automation:

"The numerical coding of media (principle 1) and the modular structure of a media object (principle 2) allow for the automation of many operations involved in media creation, manipulation and access. Thus human intentionality can, in part, be removed from the creative process."

" 'High Level' automation of media creation, which requires a computer to understand, to a certain degree, the meanings embedded in the objects being generated, that is, their semantics."
"The Internet, which can be thought of as one huge distributed media database, also crystallized the basic condition of the new information society: overabundance of information of all kinds. One response was an idea of software 'agents' designed to automate searching for relevant information. Some agents act as filters that deliver small amounts of information..."
The concept of the agent Manovich refers to in his book “The Language of New Media” is very similar to that behind the soundFishing interface: they are both media softwares that analyze and filter a particular environment. In Manovich the filter is applied to a virtual environment, such as the Internet, whereas the soundFishing interface acts on a sonic layer, in order to extract some valuable data out of it.

Multiplicity: generative processes lead to multiplicity. In order to capture the essence of this concept, I consider relevant the differences between the following two statements:

"I want to record the sound of the police car siren that is now patrolling the street."
and

"I want to record all and only the loud sounds that I’ll come across in my daily routine today."

A huge difference lies between these two statements: the first sentence leads to a simple, but rather obvious, result. On the contrary, the second statement opens up to many possible results, giving the user a glimpse of the almost infinitely wide spectrum of possibilities that we come across in our daily experiences and depicting only a tiny portion of the space of potential. In a comment posted to the teemingvoid Blog on October 29, 2007, Mitchell Whitelaw noted:

"Multiplicity here is a way to get a perceptual grasp on something quite abstract - that space of possibility. We get a visual ‘feel’ for that space, but also a sense of its vastness, a sense of what lies beyond the visualization."

"Multiplicity refers to the specific space of potential in any single system, by actualizing a subset of points within it."

Expanded cinema: - Sound sharing and mixing: the audio events captured by soundFishing can be valuable to the single user – thanks to certain personal parameters - but what about their value according to other people’s subjective evaluation? Why should some people find sounds coming from somebody else’s experience interesting or even useful?
The easiest answer could be that curiosity is the reason for that. A second point is that these often extramusical sounds can be used to produce something else, a musical piece for example, a sound effect or anything related to audio remixing and production.
I believe that the soundFishing functionality can be channeled into two different outputs. The first can be the creation of a social network community like Youtube, but instead of video, audio samples taken from the users’ every-day life experiences can be shared and mashed. Another expression can be one in the spirit of Expanded Cinema, where the author defines the technosphere as a symbiosis between man and machine:
"The computer liberates man from specialization and amplifies intelligence.”
According to a summarizing analysis of his work published by a popular open source, Youngblood compares computer processing to human neural processing, where logic and intelligence are the brain's software. According to him computer software will become more important than hardware and that in the future super-computers will design ever more advanced computers.
His vision of the future is represented by the Aesthetic Machine:
“Aesthetic application of technology is the only means of achieving new consciousness to match our environment."
It is also stated that according to Youngblood creativity will be shared between man and machine. This idea can be supported by the 1010ap-fm01 case, as it is explained on the homonymous website:

"fm transposes non-metaphoric systems and grammar theory (of computer languages, abstraction and data containers) to the realm of expanded cinema. The base proposal concerns the development of a scripting language, data structures, and suitable file system for the automated production and grammatical expression of endless cinema."

According to this specific point of view, the soundFishing interface can become an extension of the human ear and memory, allowing a more powerful perception of the sonic environment and a more effective memorization of sounds in the form of digital samples. These samples can then feed another generative system which assembles them algorithmically to produce further sonic experiences.

PRECEDENTS

The Dictaphone
A sound recording device most commonly used to record speech for later playback or to be typed into print.

Microsoft SenseCam
On the Microsoft website, in the project section, it is stated that: “SenseCam is a wearable digital camera that is designed to take photographs passively, without user intervention, while it is being worn. Unlike a regular digital camera or a camera phone, SenseCam does not have a viewfinder or a display that can be used to frame photos. Instead, it is fitted with a wide-angle (fish-eye) lens that maximizes its field-of-view. This ensures that nearly everything in the wearer’s view is captured by the camera, which is important because a regular wearable camera would likely produce many uninteresting images. SenseCam also contains a number of different electronic sensors. These include light-intensity and light-color sensors, a passive infrared (body heat) detector, a temperature sensor, and a multiple-axis accelerometer. These sensors are monitored by the camera’s microprocessor, and certain changes in sensor readings can be used to automatically trigger a photograph to be taken.”

Remembrance Agent
“The Remembrance Agent (RA) is a program which augments human memory by displaying a list of documents which might be relevant to the user's current context. Unlike most information retrieval systems, the RA runs continuously without user intervention. Its unobtrusive interface allows a user to pursue or ignore the RA's suggestions as desired.”
Forget-Me-Not
“...assume we could construct a device that accompanied the user everywhere, and which captured important data and context from his or her life. Furthermore, assume it would organize these data into a form that mimicked the episodic memory structures created naturally by the user. Needing to recall a detail from a past event, and armed with our device, the user could then draw upon his or her own, possibly fading, episodic memory, to locate similar episodes and data stored in the permanent memory of the device. In this way, the user could use the small things he or she could remember about the context of the event to retrieve the details that had been forgotten.”

METHODOLOGY

A three stages prototyping process has been followed, according to a hierarchy based on building ease, portability and power. These values have been chosen in order to successfully build and test the interface during a one month time span.
The work carried out during the first stage of the project is based on Processing, a very powerful and versatile programming environment based on Java. Processing, among many other functions, is ideal as a rapid prototyping tool. This program allowed me to build a working software prototype which embodies the main features of the soundFishing tool through a very short period of time.
During the second stage of this project I developed a hardware circuit built around a microcontroller and an audio recorder chip, achieving good portability and power.
Finally, over the third stage, I planned to take the prototype to the maximum portability and power hacking a classic iPod mp3 player - an already existing audio device which in theory could have given me the power to store a huge amount of sound data into a compact, comfortable and common object.

IMPLEMENTATION

The setting for the first prototype was a laptop running a Java applet built in Processing, an external microphone attached to it and a bag to carry them around; the logical rule implemented at this stage was telling the interface to capture all the “loud” sounds relatively to the default volume that characterized the environment.

To summarize the process, the user sets a rule – in this case based on sound volume - editing a configuration file which enables him to choose both the duration of the total recordings, the duration of the volume buffer and, more importantly, the interface tolerance in relation to the volume: in order for the sounds to be captured, the lower this final parameter is, the louder the sounds have to be in relation to the default volume that characterizes the environment.
While the software is running, it listens to the sound input coming from the microphone and continuously calculates the default environmental volume in order to define and adapt the threshold in relation to which a sound is considered a loud event and hence is recorded.

The exact same mechanism was partly implemented in the second prototype built with a PIC 16F88 microcontroller, an external microphone and a ISD5116 audio chip recorder, in fact in this setting the PIC listens to the environmental sounds through the microphone and tells the ISD chip to record it when it matches the “record all and only the loud sounds” rule.

The third prototype based on the Apple iPod could not be physically implemented due to time constrains, but all the documentation proved that the idea was feasible using an external PIC microcontroller connected to the mp3 player through the dock interface; thanks to the iPod serial communication protocol it should be possible for the PIC chip to tell the device when to record and play a target sound file.

EVALUATION

Due to time constrain only the results of the first prototype stage are available in form of digital audio files, these are some sounds “fished” in Brooklyn during the first prototyping stage:

soundFishing sounds of Brooklyn

Although the recording quality is not excellent and the form factor of the device is cumbersome, the basic rule system worked very well, recording just the sound events that matched the rule set by the user.

In order to transform this project into a working tool ready to be distributed to the public, many efforts must be put in shrinking the device to make it wearable, so that the user perceives what he is carrying around not as something detached and cumbersome but as something intimate and easy to wear.
According to this, mobile phones can be considered an interesting platform to work on as they already embody the technical and computational features needed to let the soundFishing interface run as a software application, possibly embedded in their hardwares.

The rule system has to be refined so that many different audio parameters can control the recording process, not just the amplitude but also the frequencies, thus the final output can carry a wider variety spectrum.

Finally a system to access, manage and arrange the audio fragments is desirable, so that the user can create new audio experiences from the samples captured from his or hers life experiences.

The sonic snapshots can be valuable also to other people as creative assets. Musicians and audio producers are always looking for interesting sounds and the output of the soundFishing interface can be appreciated also by these professionals.

CONCLUSION

I believe that this device is much more than an automatic sound recorder: it could, at a first glance, look similar to the old Dictaphone, but only if its technology is considered inattentively. The motivations that have led me to build the soundFishing interface and the context I imagine it can be used for as an asset can explain its features and its nature, bearing in mind the evolution that the digital devices are nowadays experiencing. This process is transforming them into objects as intimate as our personal diaries or as personal as our favorite garments.
I started this project with some clear ideas on my mind and a problem to solve: we are surrounded by sounds, sometimes they are awful and annoying, often they are sublime and inspiring, but in both cases we are losing them, not just because they are volatile in their nature, but because we usually take them for granted. We consider sound a common and unremarkable matter, therefore we don’t travel around waiting to record a sound that could interest or move us. The problem lies precisely here: in fact maybe that “common” sound can be valuable to us or to another person, it can make us laugh or remind us of an important experience or tell us something more about our life. So why not try to save them from oblivion?
The key to really grab the essence of this project is held by the concept of curiosity, a virtue that can turn something usual and useless into something unique and meaningful, a powerful entity that can open the door of knowledge to all of us.

REFERENCES

Lev Manovich, The Language of New Media (Cambridge: The MIT Press, 2002).
Gene Youngblood, Expanded Cinema (New York: EP Dutton, 1971), 180-2.
Youngblood, 189.

Bradley J. Rhodes, “Remembrance Agent: A continuously running automated information retrieval system” in The Proceedings of The First International Conference on The Practical Application Of Intelligent Agents and Multi Agent Technology (PAAM '96),London, The Practical Application Company Ltd, pp. 487-95.

Mik Lamming and Mike Flynn, "Forget-me-not: Intimate Computing in Support of Human Memory” in Proceedings of FRIEND21, '94 International Symposium on Next Generation Human Interface, Meguro Gajoen, Japan.

soundFishing postmortem (DOC)
soundFishing presentation (PDF)

"Latency is basically longest time that you have to wait before you obtain a desired result. For digital audio output it is the time between making a sound in software and finally hearing it."

PortAudio - Latency document

In these days I've been working on an audio project using Processing and the amazing Sonia library by Amit Pitaru.

Sonia is a powerful and easy to use tool to play,record,generate,manipulate audio in real time in Processing built on top of JSyn , a blazing fast audio synthesis and manipulation plugin for Java.

The only problem I've met using these tools is the latency with which, by default, the audio is then managed in Processing :

as you can see the default latency is almost half a second, quite a long time in audio terms.

The most basic implication of this is for example that 417 milliseconds must pass before actually hear the sound once the play button is pressed.

Researching a bit about this topic I found this topic on the Processing forum about "Sound and latency", Amit gives a hint about how to patch this problem:

"on a pc, i created a new batch file called runSonia.bat in the main processing directory, with the following two commands in it:

SET PA_MIN_LATENCY_MSEC=50
processing.exe

this will set the jsyn latency to 50ms and then run processing. if you get click-pop sounds, than increase the latency until its resolved."

This actually worked for me taking my latancy from 417ms to 60ms, a much better result:

The problem is that this is only a temporary solution, in fact the patch applies only when the runSonia.bat is first run and then Processing is evoked by it gaining the new latency settings, but if runSonia is not run , then the 417ms latency is still there in Processing.

A solution emerged in a post by Phil Burk, the creator of JSyn, linking to a very interesting document about PortAudio and latency, in which is clarified how to set a "permanent" environmental variable that controls the latency values for all the applications related to PortAudio, here's the extract:

" Macintosh:

The best thing you can do to improve latency on Mac OS 8 and 9 is to turn off Virtual Memory. PortAudio V18 will detect that Virtual Memory is turned off and use a very low latency.

For Mac OS X the latency is very low because Apple Core Audio is so well written. You can set the PA_MIN_LATENCY_MSEC variable using:

setenv PA_MIN_LATENCY_MSEC 4

Unix:

PortAudio under Unix currently uses a backgroud thread that reads and writes to OSS. This gives you decent but not great latency. But if you raise the priority of the background thread to a very priority then you can get under 10 milliseconds latency. In order to raise your priority you must run the PortAudio program as root! You must also set PA_MIN_LATENCY_MSEC using the appropriate command for your shell.

Windows:

For Windows XP, you can set environment variables as follows:

Select "Control Panel" from the "Start Menu".
Launch the "System" Control Panel
Click on the "Advanced" tab.
Click on the "Environment Variables" button.
Click "New" button under User Variables.
Enter PA_MIN_LATENCY_MSEC for the name and some optimistic number for the value.
Click OK, OK, OK. "

Thanks to this solution the latency can be set permanently on your machine.

-

Claudio Midolo / design + technology

Help yourself and everyone else

I'm

Bookmarks - Thesis

Bookmarks - photography

Kickass

I Love

Past

Monday, December 24, 2007

soundFishing postmortem

Saturday, December 22, 2007

Sounds from Brooklyn

Saturday, December 8, 2007

Openframeworks : Eating data at Eyebeam

Wednesday, December 5, 2007

Processing: how to lower audio latency using Sonia