I wrote this - please excuse the formatting (can't find a version of the UCL specific nroff MM macros)...
use of Labanotation to automate the annotation of video conferencing, something I heard about in the context of AI last week at MPI (31 years later)....and stable diffusion
1m"Some Experience of co‐authoring with synchronous and0m
1masynchronous computer mediated communication"0m
4mJ.24m 4mCrowcroft0m
4mM.24m 4md’Inverno0m
Department of Computer Science
4mABSTRACT0m
UCL has a digital multi‐media conferencing system that
connects us to 8 sites in the US and one site in Germany,
over the Internet. It allows up to 4 way video/audio, and
all the normal data communications facilities. These include
Electronic Mail and BBoards as well as a shared multimedia
document editor called mmconf/Slate. The conferencing system
is described in some detail in [1] and [2].
This paper is about specific experiences using this and
related systems. For comparison, a more old‐fashioned
asynchronous facilities to co‐author a document is briefly
described.
An approach is also proposed to help with the description of
the visual side of human computer mediated communication.
It should be stated that most of the use of the system has
been self‐referential; although there have been "naive"
users (UCL and MIT librarians), their usage has been
restricted to synchronous meetings. This paper is anecdotal
rather than analytical.
1m1. Introduction0m
First let us the terms in the title:
• Co‐authoring
Two or more people write a text together. There are
several interesting points on the spectrum of
distribution of labour in co‐authoring which interact
with the physical distribution of the authors ‐ we
comment on these in the conclusions.
• Synchronous1
____________________
1. Some use the term isochronous ‐ this is wrong.
Isochronous networks maintain clock synchronisation on
bits at all points in the network
‐ 2 ‐
Synchronous communication is soft real time. In long
haul networks such as the Internet, there may be a
delay between sending video/audio/data and receiving it
(Einstein is insurmountable) but it is bounded to some
reasonable number (because of the video compression
technique we use, it can be quite high ‐ as much as 400
msecs, or even 2‐3 seconds under specially bad
conditions!).
• Asynchronous
By this we mean use of electronic mail and file
transfer/access out of band from direct voice/video
communication between people.
• Computer‐mediated
This is harder to define. The network that carries E‐
mail, files and video/voice is made of a set of links
connected by nodes which are special purpose computers
(switches/routers). For E‐mail (especially multi‐media
mail) the end system that the human uses is general
purpose computer/workstation, which can apply many
processing techniques to the data. However, for Video
and Voice, only very special purpose processing is done
to maximise the quality (the 4mform24m). A great detail of
research is needed before more interesting processing
could be done on what is said and how during a video
conference (4mthe24m 4mcontent24m).
However, material from off‐line is presented on the
same screens in our system as the video, so a document
being discussed is juxtaposed to the speakers.
• Webster defines communication as:
com.mu.ni.ca.tion ‐.my:‐n*‐’ka‐‐sh*n n 1: an act or instance of
transmitting 2a: information communicated 2b: a verbal or written message
3: an exchange of information pl 4a: a system (as of telephones) for
communicating 4b: a system of routes for moving troops, supplies, and
vehicles 4c: personnel engaged in communicating 5: a process by which
meanings are exchanged between individuals through a common system of
symbols pl but sing or pl in constr 6a: a technique for expressing ideas
effectively in speech or writing or through the arts 6b: the technology of
the transmission of information
The rest of this note is into three main sections:
1. A brief description of a fairly successful production
of a document through asynchronous communication. This
is used to set the parameters for what may be of
interest in synchronous collaboration.
2. Details of two major meeting and possible documents
produced from them.
‐ 3 ‐
3. An approach to help with the description of the visual
side of human computer mediated ommunication is
described. This may also be used to help prescribe
what visual channels for communication are used when
for systems which have limited bandwidth and cannot
allow fully cross connected video and audio (Many
systems have full cross connected audio with speaker
video only for bandwidth reasons).
Finally we present some conclusions.
1m2. Asynchronous Collaborative Experience0m
A paper was written by 4 authors, 2 in London, 1 in
Cambridge and one in Colorado. Two of the authors had not
met face to face. The paper involved a repeatable experiment
and performance analysis. It took about 1 week elapsed time
to write and will be published in a respectable (refereed)
journal [4].
It evolved as follows.
i. Author in US E‐mails author in London with anomalous
output from an experiment.
ii. Author in London E‐mails 2nd author in London with
problem statement.
iii. 2nd author in London re‐implements experiment, and
discovers same results. 1st author in London outlines
components in the system that could be causing the
problem.
iv. Author in Colorado writes script to run experiment and
graph output, and sends to London.
v. 2nd Author in London hypothesises problem explanation,
and a possible solution.
vi. Authors in London E‐mail author in Cambridge and
Colorado with problem and hypothetical solution.
vii. Cambridge author E‐mails colleagues for availability
of code to implement solution.
viii. Meanwhile, draft paper is E‐mailed to interest list.
ix. Member of interest list is editor of journal, asks if
we want to submit it ‐ we do.
x. Referees comments sent (anonymously) by E‐mail that
they want to see implementation of solution.
‐ 4 ‐
xi. None of authors have source code to implement
solution, so they ask interest‐list.
xii. Colleague in Sweden on interest list changes code,
compiles and sends new (unix kernel) object to London
(by E‐mail).
xiii. Authors in London run script by author in colorado,
get correct results. These are (automatically)
included in the final paper which is E‐mailed to the
editor of the journal.
This sequence of events hides two extremely important
factors in the success of the collaboration:
1. Commonality of interest/experience.
The authors came from very similar technical
backgrounds ‐ (by coincidence, all trained in
engineering followed by computing).
2. Independance of tools.
The authors were all able to use tools they were
familiar with for mail processing, editing and
document preparation as well as actually running the
experiment.
It also reveals two of the major advantages of using
asynchronous communication:
1. Authors can work simultaneously rather than having to
gain some token as floor chair or editor‐of‐the‐
moment.
2. Synchronisation of versions of a document is achieved
at the same time as communication of a version ‐ i.e.
when any one author posts a new version, most users
see this at roughly the same time.
These advantages are at some expense ‐ authors may work
simultaneously to the same effect, and replicate each others
effort. Authors may diverge, and have to merge subsequent
versions of the work somehow.2 However, replicated work has
scientific value and divergence may may well be productive.
The reduction of waiting time to gain the lock on a version
____________________
2. The use of a revision control system was obviated by the
documents arriving as e‐mail, and therefore having a
unique source/author and timestamp/message id. Merging in
many RCS/SCCS type systems makes use of context
difference systems which are available in any case as
part of the normal operating system tools.
‐ 5 ‐
was certainly advantageous (especially given different time
zones, when finding the person who had the lock might take
1/2 a day).
1m3. Synchronous Meetings and Minutes0m
In this section we describe two meetings that were held in
rather different ways using the multi‐media conferencing
system, and report some of the users’ experiences.
4m3.124m 4mThe24m 4mIETF24m 4mDirectory24m 4mWorking24m 4mGroup0m
The first is a 4 site video conference which made use of the
UCL to US system to hold a research group meeting of around
35 participants. The meeting was held in rooms that are
designed like small studios with a small audience and a
small number of podium speakers, with cameras switchable
from audience to podium or individual speaker. The resulting
document from this meeting was a report of the meeting which
are available as a UCL technical report.
Workstations were available at each site during the meeting.
Normal editing/document preparation tools were available as
well as normal electronic mail and file transfer.
Groups like this usually chose not to use the shared
editing system for several reasons:
1. Lack of familiarity ‐ mmconf/Slate is a hybrid GUI
WYSYWIS system ‐ partly like Apple MAC, partly like
Sun Open Look and partly X Windows.
2. Size of the meeting and lack of screen real estate to
show the documents to all sites as well as video. It
is felt (whether correctly or not we cannot say) that
the video is more important.
3. This particular desired independent minutes from each
site ‐ this did not require shared editing. If they
had used shared editing, they may have felt that the
meeting dynamics were undermined by the floor
exchange/token exchange delays.
3.1.1 Pre‐meeting E‐Mail Setting up the meeting itself
with aims clearly stated and strong chairing seems to be
useful. Electronic mail informed all the users of a strict
timetable for the agenda and presentation based meeting.
e.g.
‐ 6 ‐
17:00 Introduction
o Discussion of Videoconference modus operandi
o Agenda
o Minutes of previous meeting
o Matters arising
o No liaisons!
17:15 Document Status. Review status of all working documents,
Internet Drafts, and submitted RFCs.
17:25 Presentation of Pilot Activity
...
18:15 US/Europe liaison issues
18:30 Management of ‘‘experimental’’ object identifiers
18:40 Naming Guidelines (Paul Barker)
19:10 Representing Network Information (Chris Weider)
19:55 Security (Peter Yee)
20:20 Naming in the US in light of NADF 123 (Marshall T. Rose?)
20:50 Date and Venue of next meeting
20:50 ‐‐ 21:00 AOB
Notice the use of a strict timescale and objectives. Also,
this group do a great deal of work by E‐mail (typically a
few messages a day over the last few months, to the groups
distribution list).
In the event, due to technical difficulties the strict
timescales did not come into play. The meeting started late
(20 mins or so) and the connection went down for at least a
quarter of an hour at one point. The meeting also finished
early ‐ of order a quarter of an hour.
The agenda was not followed strictly as some participants
were not available for the whole meeting.3
4m3.224m 4mAn24m 4minformal24m 4mMeeting0m
The UK‐US Network Interconnection is managed by a group of
expert networks operations staff (the Operational Management
Group ‐ OMG) who have monthly video conferences. These have
roughly 2 people per site, and 3 or 4 sites in the US and
UK.
These meetings are held in a small meetings room with wall
projection TV and a workstation running a shared editor.
This is used to show output from previous meetings and
experiments/measurements of the networks
availability/performance.
____________________
3. Paul Barker commented: 4mMy24m 4mfeeling24m 4mwas24m 4mthat24m 4mthe24m 4mmeeting0m
4mwas24m 4mmuch24m 4mlike24m 4mother24m 4mmeetings24m 4mI24m 4mgo24m 4mto:24m 4mthose24m 4mattending0m
4msense24m 4mwhen24m 4msomething24m 4mimportant24m 4mis24m 4mbeing24m 4mdiscussed,24m 4mand0m
4mwhere24m 4mprogress24m 4mis24m 4mbeing24m 4mmade,24m 4mand24m 4mthe24m 4mdiscussions24m 4mrun24m 4mon0m
4mto24m 4mallow24m 4mthe24m 4misue24m 4mtime.0m
‐ 7 ‐
One document that was produced as a result of the planning
and exchanges of ideas in these meetings can be found in
[3].
The use of a shared view for these meetings is very
successful. Again, like the asynchronous collaboration, this
group has common expertise.
A major difference is that the group is task oriented, and
thus can identify a given member to show a given document,
while the others simply view.
Under some circumstances, others will extract data from some
document or graph, and re‐process it, but this works well in
this environment. No locking or token system is used ‐
simply the ability to show a workstation window or screen to
the other sites.
1m4. Gesture Detection ‐ Future Work0m
We are working on annotating video tapes the live meetings
so that we can make objective comparisons between video and
4mface24m 4mto24m 4mface24m meetings. To this end, we are adapting one of
the modern Ballet notations as a 4mnon‐verbal24m 4mcommunication0m
4mdescription24m 4mlanguage24m ‐ we’ve called this Balgol ’92.
Dance notation or the process of recording peoples movement
dates from Arbeau’s early attempts[5]. A particularly
scholarly recent work is Guest’s [6], which includes a very
useful chapter on the use of computers for annotating
movement.
Recent work in Zoology and Neuro‐physiology has demonstrated
the special applicability of the Eshkol‐Wachmann movement
notation [7] to the objective study of movement and gesture
[8].4 We have dismissed Labanotation and Benesh notation as
deriving too from the musical context from which they arose,
and for the pragmatic reason that they depend on the human
interpretation of their somewhat hieroglyphic syntax.
Eshkol‐Wachmann, however, is based partly on an engineering
model of movement and is more amenable to computer
recognition and analysis.
Other work related to this is the use of the (Set theory
based) formal specification language Z to specify 4mfloor0m
4mcontrol24m schemas ‐ this allows a completely general
description of any floor control algorithm. We have looked
at how 4mspeech24m 4macts24m [9][10][11] may be used to help structure
____________________
4. We are indebted to Michael Recce from the UCL departments
of Computer Science and Anatomy for directing us to the
rich body of work in this area.
‐ 8 ‐
bids/negotiations for the floor. [1]
By analogy, we are trying to see if the same could be
applied to automatic gesture recognition to two ends:
1. Description.
There is constant debate comparing face to face
meetings with video conferences [e.g. 12]. We need an
objective measure of increased quality of meeting when
audio and text are enhanced with a view of the
participants. We need to know how many participants
need to be seen, and at what resolution for a given
increase in communication richness against a given
cost (more cameras/displays/bandwidth).
2. Prescription
If we have limited bandwidth, we may use detection of
"significant" movements to switch camera focus/view to
other users in a video‐conference. Judging
significance may be feasible.
Initial work in the UCL Anatomy department has already
demonstrated that it is feasible to recognise limited sets
of postures in rats in real time from video frames. Two
important pre‐requisites are:
1. Foreground/background contrast must be very high
2. The Lexicon of gestures/postures must be known in
advance, and areas of interest outlined.
Both of these requirements can be easily fulfilled in a
videoconference.5
To this end, we have started designing a language with a
lexicon of common postures. The syntax of language is based
on the idea of Eshkol/Wachmann that we understand the limbs
and body as a mechanical system. Thus we can describe
movement of limbs around a spherical coordinate system
centered on the torso. A gesture is some sequence leading
from one posture to an.
____________________
5. We note that many digital video systems use parallel
machines to run the video compression necessary to
transmit video over todays limited bandwidth networks.
These compression techniques include motion
detection/compensation and also transforming the image to
the frequency domain. Both of these if available to a
gesture detection system would automatically provide
strong hints as to the locality of "significant"
movement.
‐ 9 ‐
One modest experiment will be the ability to detect the
visual equivalent of an interruption ‐ currently the audio
systems use silence suppression. It may be useful in
situations with limited bandwidth to employ similar
approaches with the video. Most people attempt to get
attention in a meeting by waving their arm in the air. This
is very easily detected as compared with most other, more
random movements.
1m5. Conclusions0m
The system is available for use, subject to booking. We can
allow video recording of meetings if any researchers wish to
study how a real system is used when the bandwidth is
severely constrained, but the cost‐benefit of the system
over travel is very large.
The distribution of the authors has a profound effect on the
way document authoring is done. In the two video meetings
described, two different approaches were tried in producing
minutes/paper. The directory group had all sites write full
minutes, and then had the chairperson merge them. This may
have produced very accurate minutes, but 1mincreased 22mthe
workload.
The Operations Management Group meeting was used solely as a
blue‐skying/planning meeting, after which a number of tasks
were allocated, and the users went away, carried out those
tasks, and reported their findings back to the authors of
[3] who the wrote their paper. This is similar to the
experience with the purely asynchronously authored paper
[4], although here the success was based on shared
experience of the area of discourse (via education) rather
than an initial face to face or video meeting.
In the broadest sense, documents just record some
information, so the videos of the conferences/meetings are
themselves a form of document. Indeed, we predict that in
the future, (automatically) edited highlights of a video
meeting may replace minutes.
1m6. References and Acknowledgements0m
Thanks to Steve Hardcastle‐Kille and the members of the
IETF’s OSI‐DS Group for permission to quote their meeting
experience and minutes. Thanks to anonymous referees and
Angela Sasse of UCL for comments on movement notation.
1. "Multimedia TeleConferencing over International Packet
Switched Networks." J. Crowcroft, P.T. Kirstein, D.
Timm, Proc IEEE TriComm ’91, April 1991
‐ 10 ‐
2. "Specification, Design, and Implementation of an
Interactive Conferencing System", Mark d’Inverno Jon
Crowcroft, April 1991, Proceedings of Infocomm 91,
IEEE
3. "Traffic Analysis of some UK‐US Academic Network
Data.", Crowcroft, J and Wakeman, I. Proc INET ’91
Copenhagen, June 1991
4. "Layering Considered Harmful", J. Crowcroft, D.
Sirovica, I. Wakeman, Z.Wang, To appear, IEEE
Networks, Jan 1992.
5. Orchesography (Thonot Arbeau, 1589, trans mary stuart
evans, Dover 1967)
6. Dance Notation, Ann Hutchinson‐Grant, Dance Books,
1984
7. Movement Notation, N Eshkol & A Wachmann, London,
Weidenfeld and Nicholson, 1958.
8. Golani, I, "Homeostatic motor processes in mammalian
interactions: a choreography of display", Perspectives
in Ethology, vol 2, pp 69‐134, Plenum Press, NY, 1976
9. Speech Act Theory and Pragmatics, J.R.Searle Ed.,
Reidel Publishing Company, 1980, pp 40‐53, S. Davies ‐
Behaviour...
10. Foundations of Illocutionary Logic J.R.Searle,
Vanderveken, Cambridge, 1985 pp36 ‐40, exposition of
illocutionary point ‐ taxonomy of points
11. On Human Communication, Cherry. MIT Press 1957.
12. Heath C, Liuff, P. "Disembodied Conduct: Communication
through video in a multi‐media office environment",
ACM SIG HCI 1990.
1m7. Appendix: Extracts from IETF DS Report and Comments0m
4m7.124m 4mComments0m
What the users said about the real time element:
• BBN: Not as good as a face to face meeting, but better
than E‐mail.
• RIACS: might be more effective to choose a few items
and discuss to focus on the issues.
‐ 11 ‐
• ISI: Technical quality appalling ‐ too much delay.
Echo annoying. Sound poor. Scale: E‐mail ‐‐ 1, in
person ‐‐ 10, then generally video ‐‐ 7, but this time
‐‐ 4 due to the delay and quality. on line terminal
may help.
• UCL
• (SEK): ‘‘interesting’’, some useful discussion.
Presentations did not work. If too technical
interchange did not work.
• Colin Robbins: The delay and quality made it very
hard to hold a real meeting.
• Comments from Paul Barker: "You get a lot of
information from seeing someone and hearing them
talk. I’d never met any of the people from the
States before, but I was rapidly able to form a
picture of who the "politicians" were, as supposed
to the strongly technical. In E‐mail, I edit
myself very carefully ‐ I missed not being able to
do this in "strange" company!!
The large delay6 in the system made me very self
conscious when talking. To some extent, I played
with the technology to see how long the
propagation delays were ‐ I’d scratch an ear and
wait for the image in front of me to copy me!
The delay also made for a very presentational
style of address to the meeting. For various
reasons I was rather under‐prepared to talk on a
subject which I had to present to the meeting.
Whereas at a live meeting I would have been fairly
happy to have "chatted" informally on the subject,
such chat and half thought out ideas seemed to jar
somewhat with the formal style of presentation
(with the speaker very definitely holding the
floor). Self‐consciousness again, but it made me
feel rather ridiculous."
____________________
6. At the time this meeting was held, the 4 way meeting was
run by relaying n sites in the US through a single
mixer/quadruplexor site ‐ this involved decompressing the
video, mixing and recompressing. The CODECs which perform
compression do so partly by buffering a number of frames
and differencing them, thus introducing large delays.
Relaying this is not a good idea (in fact one may
introduce artifact in the picture this way too due to
different lossy compression techniques interfering).
‐ 12 ‐
• Comments from Steve Titcombe:
• Quality of sound
The sound quality was fairly reasonable, from
all sites except one, which seemed to have
electronic bubble noises popping and bursting
very time someone from that site talked. This
was not too annoying, but it did mean that
you had to listen carefully to hear what was
being said.
• Quality of Pictures
Picture quality at the UCL site was very
good, the screen monitoring what was being
broadcast out was very good. Other sites
pictures were pretty good, but when split
down into a 2x2 grid for four sites, it was
possible to see if someone had a beard or
not, but no more. (This was referring to
their shots of an entire room.)
4m7.224m 4mReport0m
The meeting report is highly structured, and follows the
sequence of events in the meeting.
Reports often attempt to hide the author of each
contribution, whereas this one, being merged from 4 versions
produced ateach site, repeatedly states the origin of each
idea.
It is not highly readable, except perhaps to members of the
group present.