Wednesday, January 18, 2023

ballet notation and video conferencing. back in 1992...

 I wrote this - please excuse the formatting (can't find a version of the UCL specific nroff MM macros)...

use of Labanotation to automate the annotation of video conferencing, something I heard about in the context of AI last week at MPI (31 years later)....and stable diffusion

1m"Some Experience of co‐authoring with synchronous and0m
	      1masynchronous computer mediated communication"0m
			       4mJ.24m 4mCrowcroft0m
			       4mM.24m 4md’Inverno0m
		      Department of Computer Science


       UCL has	a  digital  multi‐media	 conferencing  system  that
       connects	 us  to	 8 sites in the US and one site in Germany,
       over the Internet. It allows up to 4  way  video/audio,	and
       all the normal data communications facilities. These include
       Electronic Mail and BBoards as well as a	 shared	 multimedia
       document editor called mmconf/Slate. The conferencing system
       is described in some detail in [1] and [2].

       This paper is about  specific  experiences  using  this	and
       related	systems.   For	comparison,  a	more  old‐fashioned
       asynchronous facilities to co‐author a document	is  briefly

       An approach is also proposed to help with the description of
       the visual side of human computer mediated communication.

       It should be stated that most of the use of the	system	has
       been  self‐referential;	although  there	 have  been "naive"
       users  (UCL  and	 MIT  librarians),  their  usage  has  been
       restricted  to synchronous meetings. This paper is anecdotal
       rather than analytical.

       1m1.  Introduction0m

       First let us the terms in the title:

	  • Co‐authoring

	    Two or more people write a	text  together.	 There	are
	    several   interesting   points   on	  the  spectrum	 of
	    distribution of labour in co‐authoring  which  interact
	    with  the  physical	 distribution  of  the authors ‐ we
	    comment on these in the conclusions.

	  • Synchronous1


       1. Some	use  the  term	isochronous  ‐	 this	is   wrong.
	  Isochronous  networks	 maintain  clock synchronisation on
	  bits at all points in the network

				   ‐ 2 ‐

	    Synchronous communication is soft real  time.  In  long
	    haul  networks  such  as  the  Internet, there may be a
	    delay between sending video/audio/data and receiving it
	    (Einstein  is insurmountable) but it is bounded to some
	    reasonable number (because	of  the	 video	compression
	    technique we use, it can be quite high ‐ as much as 400
	    msecs,  or	even  2‐3  seconds  under   specially	bad

	  • Asynchronous

	    By	this  we  mean	use  of	 electronic  mail  and file
	    transfer/access out of  band  from	direct	voice/video
	    communication between people.

	  • Computer‐mediated

	    This  is  harder to define. The network that carries E‐
	    mail, files and video/voice is made of a set  of  links
	    connected  by nodes which are special purpose computers
	    (switches/routers). For E‐mail (especially	multi‐media
	    mail)  the	end  system  that the human uses is general
	    purpose  computer/workstation,  which  can	apply  many
	    processing	techniques  to the data. However, for Video
	    and Voice, only very special purpose processing is done
	    to	maximise  the quality (the 4mform24m). A great detail of
	    research is needed before more  interesting	 processing
	    could  be  done  on what is said and how during a video
	    conference (4mthe24m 4mcontent24m).

	    However, material from off‐line  is	 presented  on	the
	    same  screens in our system as the video, so a document
	    being discussed is juxtaposed to the speakers.

	  • Webster defines communication as: ‐.my:‐n*‐’ka‐‐sh*n n 1: an act or instance of
	       transmitting 2a: information communicated 2b: a verbal or written message
	       3: an exchange of information pl	 4a: a system (as of telephones) for
	       communicating 4b: a system of routes for moving troops, supplies, and
	       vehicles 4c: personnel engaged in communicating 5: a process by which
	       meanings are exchanged between individuals through a common system of
	       symbols pl but sing or pl in constr  6a: a technique for expressing ideas
	       effectively in speech or writing or through the arts 6b: the technology of
	       the transmission of information

       The rest of this note is into three main sections:

	 1.  A brief description of a fairly successful	 production
	     of a document through asynchronous communication. This
	     is used to set the	 parameters  for  what	may  be	 of
	     interest in synchronous collaboration.

	 2.  Details  of  two  major meeting and possible documents
	     produced from them.

				   ‐ 3 ‐

	 3.  An approach to help with the description of the visual
	     side   of	human  computer	 mediated  ommunication	 is
	     described. This may also be  used	to  help  prescribe
	     what  visual  channels for communication are used when
	     for systems which have limited  bandwidth	and  cannot
	     allow  fully  cross  connected  video  and audio (Many
	     systems have full cross connected audio  with  speaker
	     video only for bandwidth reasons).
       Finally we present some conclusions.

       1m2.  Asynchronous Collaborative Experience0m

       A  paper	 was  written  by  4  authors,	2  in  London, 1 in
       Cambridge and one in Colorado. Two of the  authors  had	not
       met face to face. The paper involved a repeatable experiment
       and performance analysis. It took about 1 week elapsed  time
       to  write  and will be published in a respectable (refereed)
       journal [4].

       It evolved as follows.

	 i.  Author in US E‐mails author in London  with  anomalous
	     output from an experiment.

	ii.  Author  in	 London	 E‐mails  2nd author in London with
	     problem statement.

       iii.  2nd author in  London  re‐implements  experiment,	and
	     discovers	same results. 1st author in London outlines
	     components in the system that  could  be  causing	the

	iv.  Author in Colorado writes script to run experiment and
	     graph output, and sends to London.

	 v.  2nd Author in London hypothesises problem explanation,
	     and a possible solution.

	vi.  Authors  in  London  E‐mail  author  in  Cambridge and
	     Colorado with problem and hypothetical solution.

       vii.  Cambridge author E‐mails colleagues  for  availability
	     of code to implement solution.

       viii. Meanwhile, draft paper is E‐mailed to interest list.

	ix.  Member  of interest list is editor of journal, asks if
	     we want to submit it ‐ we do.

	 x.  Referees comments sent (anonymously)  by  E‐mail  that
	     they want to see implementation of solution.

				   ‐ 4 ‐

	xi.  None   of	 authors  have	source	code  to  implement
	     solution, so they ask interest‐list.

       xii.  Colleague in Sweden on  interest  list  changes  code,
	     compiles  and sends new (unix kernel) object to London
	     (by E‐mail).

       xiii. Authors in London run script by  author  in  colorado,
	     get   correct   results.	These  are  (automatically)
	     included in the final paper which is E‐mailed  to	the
	     editor of the journal.

       This  sequence  of  events  hides  two  extremely  important
       factors in the success of the collaboration:

	 1.  Commonality of interest/experience.

	     The  authors  came	  from	 very	similar	  technical
	     backgrounds   ‐   (by   coincidence,  all	trained	 in
	     engineering followed by computing).

	 2.  Independance of tools.

	     The authors were all  able	 to  use  tools	 they  were
	     familiar	with   for  mail  processing,  editing	and
	     document preparation as well as actually  running	the

       It  also	 reveals  two  of  the	major  advantages  of using
       asynchronous communication:

	 1.  Authors can work simultaneously rather than having	 to
	     gain  some	 token	as  floor  chair  or editor‐of‐the‐

	 2.  Synchronisation of versions of a document is  achieved
	     at	 the same time as communication of a version ‐ i.e.
	     when any one author posts a new  version,	most  users
	     see this at roughly the same time.

       These  advantages  are  at  some	 expense ‐ authors may work
       simultaneously to the same effect, and replicate each others
       effort.	Authors	 may  diverge, and have to merge subsequent
       versions	 of the work somehow.2 However, replicated work has
       scientific value and divergence may may well be	productive.
       The  reduction of waiting time to gain the lock on a version


       2. The use of a revision control system was obviated by	the
	  documents  arriving  as  e‐mail,  and	 therefore having a
	  unique source/author and timestamp/message id. Merging in
	  many	 RCS/SCCS   type   systems  makes  use	of  context
	  difference systems which are available  in  any  case	 as
	  part of the normal operating system tools.

				   ‐ 5 ‐

       was certainly advantageous (especially given different  time
       zones,  when  finding the person who had the lock might take
       1/2 a day).

       1m3.  Synchronous Meetings and Minutes0m

       In this section we describe two meetings that were  held	 in
       rather  different  ways	using  the multi‐media conferencing
       system, and report some of the users’ experiences.

       4m3.124m	 4mThe24m 4mIETF24m 4mDirectory24m 4mWorking24m 4mGroup0m

       The first is a 4 site video conference which made use of the
       UCL  to US system to hold a research group meeting of around
       35 participants. The meeting was	 held  in  rooms  that	are
       designed	 like  small  studios  with  a small audience and a
       small number of podium  speakers,  with	cameras	 switchable
       from audience to podium or individual speaker. The resulting
       document from this meeting was a report of the meeting which
       are available as a UCL technical report.

       Workstations were available at each site during the meeting.
       Normal editing/document preparation tools were available	 as
       well as normal electronic mail and file transfer.

       Groups  like  this  usually   chose  not	 to  use the shared
       editing system for several reasons:

	 1.  Lack of familiarity ‐ mmconf/Slate	 is  a	hybrid	GUI
	     WYSYWIS  system  ‐	 partly like Apple MAC, partly like
	     Sun Open Look and partly X Windows.

	 2.  Size of the meeting and lack of screen real estate	 to
	     show  the	documents to all sites as well as video. It
	     is felt (whether correctly or not we cannot say)  that
	     the video is more important.

	 3.  This  particular desired independent minutes from each
	     site ‐ this did not require shared	 editing.  If  they
	     had  used	shared editing, they may have felt that the
	     meeting  dynamics	were  undermined   by	the   floor
	     exchange/token exchange delays.

       3.1.1  Pre‐meeting  E‐Mail   Setting  up	 the meeting itself
       with aims clearly stated and strong  chairing  seems  to	 be
       useful.	Electronic  mail informed all the users of a strict
       timetable for the agenda	 and  presentation  based  meeting.

				   ‐ 6 ‐

       17:00 Introduction
	    o Discussion of Videoconference modus operandi
	    o Agenda
	    o Minutes of previous meeting
	    o Matters arising
	    o No liaisons!

       17:15 Document Status.  Review status of all working documents,
	   Internet Drafts, and submitted RFCs.
       17:25 Presentation of Pilot Activity
       18:15 US/Europe liaison issues
       18:30 Management of ‘‘experimental’’ object identifiers
       18:40 Naming Guidelines (Paul Barker)
       19:10 Representing Network Information (Chris Weider)
       19:55 Security (Peter Yee)
       20:20 Naming in the US in light of NADF 123 (Marshall T. Rose?)
       20:50 Date and Venue of next meeting
       20:50 ‐‐ 21:00 AOB

       Notice  the  use of a strict timescale and objectives. Also,
       this group do a great deal of work by  E‐mail  (typically  a
       few  messages  a day over the last few months, to the groups
       distribution list).

       In the event,  due  to  technical  difficulties	the  strict
       timescales did not come into play.  The meeting started late
       (20 mins or so) and the connection went down for at least  a
       quarter	of an hour at one point.  The meeting also finished
       early ‐ of order a quarter of an hour.

       The agenda was not followed strictly  as	 some  participants
       were not available for the whole meeting.3

       4m3.224m	 4mAn24m 4minformal24m 4mMeeting0m

       The  UK‐US  Network Interconnection is managed by a group of
       expert networks operations staff (the Operational Management
       Group  ‐ OMG) who have monthly video conferences. These have
       roughly 2 people per site, and 3 or 4 sites in  the  US	and

       These  meetings	are held in a small meetings room with wall
       projection TV and a workstation	running	 a  shared  editor.
       This  is	 used  to  show	 output	 from previous meetings and
       experiments/measurements	      of	the	   networks


       3. Paul	Barker	commented:  4mMy24m 4mfeeling24m 4mwas24m 4mthat24m 4mthe24m 4mmeeting0m
	  4mwas24m 4mmuch24m 4mlike24m 4mother24m 4mmeetings24m 4mI24m	4mgo24m	 4mto:24m  4mthose24m  4mattending0m
	  4msense24m  4mwhen24m	 4msomething24m	 4mimportant24m 4mis24m 4mbeing24m 4mdiscussed,24m 4mand0m
	  4mwhere24m 4mprogress24m 4mis24m 4mbeing24m 4mmade,24m 4mand24m 4mthe24m 4mdiscussions24m 4mrun24m  4mon0m
	  4mto24m 4mallow24m 4mthe24m 4misue24m 4mtime.0m

				   ‐ 7 ‐

       One  document  that was produced as a result of the planning
       and exchanges of ideas in these meetings	 can  be  found	 in

       The  use	 of  a	shared	view  for  these  meetings  is very
       successful. Again, like the asynchronous collaboration, this
       group has common expertise.

       A  major	 difference is that the group is task oriented, and
       thus can identify a given member to show a  given  document,
       while the others simply view.

       Under some circumstances, others will extract data from some
       document or graph, and re‐process it, but this works well in
       this  environment.   No	locking	 or  token system is used ‐
       simply the ability to show a workstation window or screen to
       the other sites.

       1m4.  Gesture Detection ‐ Future Work0m

       We  are	working on annotating video tapes the live meetings
       so that we can make objective comparisons between video	and
       4mface24m  4mto24m  4mface24m meetings. To this end, we are adapting one of
       the modern Ballet notations as  a  4mnon‐verbal24m  4mcommunication0m
       4mdescription24m 4mlanguage24m ‐ we’ve called this Balgol ’92.

       Dance  notation or the process of recording peoples movement
       dates  from  Arbeau’s  early  attempts[5].  A   particularly
       scholarly  recent work is Guest’s [6], which includes a very
       useful chapter  on  the	use  of	 computers  for	 annotating

       Recent work in Zoology and Neuro‐physiology has demonstrated
       the special applicability of  the  Eshkol‐Wachmann  movement
       notation	 [7] to the objective study of movement and gesture
       [8].4 We have dismissed Labanotation and Benesh notation	 as
       deriving too from the musical context from which they arose,
       and for the pragmatic reason that they depend on	 the  human
       interpretation	of   their  somewhat  hieroglyphic  syntax.
       Eshkol‐Wachmann, however, is based partly on an	engineering
       model   of   movement  and  is  more  amenable  to  computer
       recognition and analysis.

       Other work related to this is the use  of  the  (Set  theory
       based)  formal  specification  language	Z  to specify 4mfloor0m
       4mcontrol24m  schemas  ‐	 this  allows	a   completely	 general
       description  of	any floor control algorithm. We have looked
       at how 4mspeech24m 4macts24m [9][10][11] may be used to help structure


       4. We are indebted to Michael Recce from the UCL departments
	  of  Computer	Science and Anatomy for directing us to the
	  rich body of work in this area.

				   ‐ 8 ‐

       bids/negotiations for the floor. [1]

       By  analogy,  we	 are  trying  to  see  if the same could be
       applied to automatic gesture recognition to two ends:

	 1.  Description.

	     There  is	constant  debate  comparing  face  to  face
	     meetings  with video conferences [e.g. 12]. We need an
	     objective measure of increased quality of meeting when
	     audio  and	 text  are  enhanced  with  a  view  of the
	     participants. We need to know  how	 many  participants
	     need  to  be  seen, and at what resolution for a given
	     increase in communication	richness  against  a  given
	     cost (more cameras/displays/bandwidth).

	 2.  Prescription

	     If	 we have limited bandwidth, we may use detection of
	     "significant" movements to switch camera focus/view to
	     other    users    in   a	video‐conference.   Judging
	     significance may be feasible.

       Initial work in	the  UCL  Anatomy  department  has  already
       demonstrated  that  it is feasible to recognise limited sets
       of postures in rats in real  time  from	video  frames.	Two
       important pre‐requisites are:

	 1.  Foreground/background contrast must be very high

	 2.  The  Lexicon  of  gestures/postures  must	be known in
	     advance, and areas of interest outlined.

       Both of these requirements can  be  easily  fulfilled  in  a

       To  this	 end,  we  have started designing a language with a
       lexicon of common postures. The syntax of language is  based
       on  the idea of Eshkol/Wachmann that we understand the limbs
       and body as  a  mechanical  system.  Thus  we  can  describe
       movement	 of  limbs  around  a  spherical  coordinate system
       centered on the torso. A gesture is  some  sequence  leading
       from one posture to an.


       5. We  note  that  many	digital	 video systems use parallel
	  machines  to	run  the  video	 compression  necessary	 to
	  transmit  video  over	 todays limited bandwidth networks.
	  These	   compression	  techniques	 include     motion
	  detection/compensation and also transforming the image to
	  the frequency domain. Both of these  if  available  to  a
	  gesture  detection  system  would  automatically  provide
	  strong  hints	 as  to	 the  locality	 of   "significant"

				   ‐ 9 ‐

       One  modest  experiment	will  be  the ability to detect the
       visual equivalent of an interruption ‐ currently	 the  audio
       systems	use  silence  suppression.  It	may  be	 useful	 in
       situations  with	 limited  bandwidth   to   employ   similar
       approaches  with	 the  video.  Most  people  attempt  to get
       attention in a meeting by waving their arm in the air.  This
       is  very	 easily	 detected as compared with most other, more
       random movements.

       1m5.  Conclusions0m

       The system is available for use, subject to booking. We	can
       allow video recording of meetings if any researchers wish to
       study how a real	 system	 is  used  when	 the  bandwidth	 is
       severely	 constrained,  but  the	 cost‐benefit of the system
       over travel is very large.

       The distribution of the authors has a profound effect on the
       way  document  authoring	 is done. In the two video meetings
       described, two different approaches were tried in  producing
       minutes/paper.  The directory group had all sites write full
       minutes, and then had the chairperson merge them.  This	may
       have  produced  very  accurate  minutes,	 but  1mincreased 22mthe

       The Operations Management Group meeting was used solely as a
       blue‐skying/planning  meeting, after which a number of tasks
       were allocated, and the users went away, carried	 out  those
       tasks,  and  reported  their findings back to the authors of
       [3] who the wrote  their	 paper.	 This  is  similar  to	the
       experience  with	 the  purely  asynchronously authored paper
       [4],  although  here  the  success  was	based	on   shared
       experience  of  the area of discourse (via education) rather
       than an initial face to face or video meeting.

       In  the	broadest  sense,   documents   just   record   some
       information,  so	 the videos of the conferences/meetings are
       themselves a form of document.  Indeed, we predict  that	 in
       the  future,  (automatically)  edited  highlights of a video
       meeting may replace minutes.

       1m6.  References and Acknowledgements0m

       Thanks to Steve Hardcastle‐Kille	 and  the  members  of	the
       IETF’s  OSI‐DS  Group  for permission to quote their meeting
       experience and minutes.	Thanks to  anonymous  referees	and
       Angela Sasse of UCL for comments on movement notation.

	 1.  "Multimedia TeleConferencing over International Packet
	     Switched Networks."  J. Crowcroft, P.T.  Kirstein,	 D.
	     Timm, Proc IEEE TriComm ’91, April 1991

				  ‐ 10 ‐

	 2.  "Specification,   Design,	and  Implementation  of	 an
	     Interactive Conferencing System", Mark  d’Inverno	Jon
	     Crowcroft,	 April	1991,  Proceedings  of Infocomm 91,

	 3.  "Traffic  Analysis	 of  some  UK‐US  Academic  Network
	     Data.",  Crowcroft,  J  and Wakeman, I.  Proc INET ’91
	     Copenhagen, June 1991

	 4.  "Layering	Considered  Harmful",  J.   Crowcroft,	 D.
	     Sirovica,	 I.   Wakeman,	 Z.Wang,  To  appear,  IEEE
	     Networks, Jan 1992.

	 5.  Orchesography (Thonot Arbeau, 1589, trans mary  stuart
	     evans, Dover 1967)

	 6.  Dance  Notation,  Ann  Hutchinson‐Grant,  Dance Books,

	 7.  Movement Notation, N  Eshkol  &  A	 Wachmann,  London,
	     Weidenfeld and Nicholson, 1958.

	 8.  Golani,  I,  "Homeostatic motor processes in mammalian
	     interactions: a choreography of display", Perspectives
	     in Ethology, vol 2, pp 69‐134, Plenum Press, NY, 1976

	 9.  Speech  Act  Theory  and  Pragmatics,  J.R.Searle Ed.,
	     Reidel Publishing Company, 1980, pp 40‐53, S. Davies ‐

	10.  Foundations   of	Illocutionary	Logic	J.R.Searle,
	     Vanderveken, Cambridge, 1985 pp36 ‐40,  exposition	 of
	     illocutionary point ‐ taxonomy of points

	11.  On Human Communication, Cherry. MIT Press 1957.

	12.  Heath C, Liuff, P. "Disembodied Conduct: Communication
	     through video in a	 multi‐media  office  environment",
	     ACM SIG HCI 1990.

       1m7.  Appendix: Extracts from IETF DS Report and Comments0m

       4m7.124m	 4mComments0m

       What the users said about the real time element:

	  • BBN:  Not as good as a face to face meeting, but better
	    than E‐mail.

	  • RIACS: might be more effective to choose  a	 few  items
	    and discuss to focus on the issues.

				  ‐ 11 ‐

	  • ISI:   Technical  quality  appalling  ‐ too much delay.
	    Echo annoying.  Sound poor.	 Scale:	 E‐mail	 ‐‐  1,	 in
	    person  ‐‐ 10, then generally video ‐‐ 7, but this time
	    ‐‐ 4 due to the delay and quality.	 on  line  terminal
	    may help.

	  • UCL

	       • (SEK):	  ‘‘interesting’’,  some useful discussion.
		 Presentations did  not	 work.	 If  too  technical
		 interchange did not work.

	       • Colin	Robbins: The delay and quality made it very
		 hard to hold a real meeting.

	       • Comments from Paul  Barker:  "You  get	 a  lot	 of
		 information  from  seeing someone and hearing them
		 talk.	I’d never met any of the  people  from	the
		 States	 before,  but  I was rapidly able to form a
		 picture of who the "politicians" were, as supposed
		 to  the  strongly  technical.	 In  E‐mail, I edit
		 myself very carefully ‐ I missed not being able to
		 do this in "strange" company!!

		 The  large  delay6 in the system made me very self
		 conscious when talking.  To some extent, I  played
		 with	the   technology   to	see  how  long	the
		 propagation delays were ‐ I’d scratch an  ear	and
		 wait for the image in front of me to copy me!

		 The  delay  also  made	 for  a very presentational
		 style of address  to  the  meeting.   For  various
		 reasons  I  was rather under‐prepared to talk on a
		 subject which I had to	 present  to  the  meeting.
		 Whereas at a live meeting I would have been fairly
		 happy to have "chatted" informally on the subject,
		 such chat and half thought out ideas seemed to jar
		 somewhat with the  formal  style  of  presentation
		 (with	the  speaker  very  definitely	holding the
		 floor).  Self‐consciousness again, but it made	 me
		 feel rather ridiculous."


       6. At  the time this meeting was held, the 4 way meeting was
	  run by relaying n  sites  in	the  US	 through  a  single
	  mixer/quadruplexor site ‐ this involved decompressing the
	  video, mixing and recompressing. The CODECs which perform
	  compression  do so partly by buffering a number of frames
	  and differencing them,  thus	introducing  large  delays.
	  Relaying  this  is  not  a  good  idea  (in  fact one may
	  introduce artifact in the picture this  way  too  due	 to
	  different lossy compression techniques interfering).

				  ‐ 12 ‐

	       • Comments from Steve Titcombe:

		    • Quality of sound

		      The sound quality was fairly reasonable, from
		      all sites except one, which  seemed  to  have
		      electronic bubble noises popping and bursting
		      very time someone from that site talked. This
		      was  not	too  annoying, but it did mean that
		      you had to listen carefully to hear what	was
		      being said.

		    • Quality of Pictures

		      Picture  quality	at  the	 UCL  site was very
		      good, the screen monitoring  what	 was  being
		      broadcast	 out  was  very	 good.	Other sites
		      pictures were pretty  good,  but	when  split
		      down  into  a 2x2 grid for four sites, it was
		      possible to see if someone  had  a  beard	 or
		      not,  but	 no  more.  (This  was referring to
		      their shots of an entire room.)

       4m7.224m	 4mReport0m

       The meeting report is highly  structured,  and  follows	the
       sequence of events in the meeting.

       Reports	 often	 attempt   to	hide  the  author  of  each
       contribution, whereas this one, being merged from 4 versions
       produced	 ateach	 site, repeatedly states the origin of each

       It is not highly readable, except perhaps to members of	the
       group present.

No comments:

Blog Archive

About Me

My photo
misery me, there is a floccipaucinihilipilification (*) of chronsynclastic infundibuli in these parts and I must therefore refer you to frank zappa instead, and go home