I wrote this - please excuse the formatting (can't find a version of the UCL specific nroff MM macros)...
use of Labanotation to automate the annotation of video conferencing, something I heard about in the context of AI last week at MPI (31 years later)....and stable diffusion
1m"Some Experience of co‐authoring with synchronous and0m 1masynchronous computer mediated communication"0m 4mJ.24m 4mCrowcroft0m 4mM.24m 4md’Inverno0m Department of Computer Science 4mABSTRACT0m UCL has a digital multi‐media conferencing system that connects us to 8 sites in the US and one site in Germany, over the Internet. It allows up to 4 way video/audio, and all the normal data communications facilities. These include Electronic Mail and BBoards as well as a shared multimedia document editor called mmconf/Slate. The conferencing system is described in some detail in [1] and [2]. This paper is about specific experiences using this and related systems. For comparison, a more old‐fashioned asynchronous facilities to co‐author a document is briefly described. An approach is also proposed to help with the description of the visual side of human computer mediated communication. It should be stated that most of the use of the system has been self‐referential; although there have been "naive" users (UCL and MIT librarians), their usage has been restricted to synchronous meetings. This paper is anecdotal rather than analytical. 1m1. Introduction0m First let us the terms in the title: • Co‐authoring Two or more people write a text together. There are several interesting points on the spectrum of distribution of labour in co‐authoring which interact with the physical distribution of the authors ‐ we comment on these in the conclusions. • Synchronous1 ____________________ 1. Some use the term isochronous ‐ this is wrong. Isochronous networks maintain clock synchronisation on bits at all points in the network ‐ 2 ‐ Synchronous communication is soft real time. In long haul networks such as the Internet, there may be a delay between sending video/audio/data and receiving it (Einstein is insurmountable) but it is bounded to some reasonable number (because of the video compression technique we use, it can be quite high ‐ as much as 400 msecs, or even 2‐3 seconds under specially bad conditions!). • Asynchronous By this we mean use of electronic mail and file transfer/access out of band from direct voice/video communication between people. • Computer‐mediated This is harder to define. The network that carries E‐ mail, files and video/voice is made of a set of links connected by nodes which are special purpose computers (switches/routers). For E‐mail (especially multi‐media mail) the end system that the human uses is general purpose computer/workstation, which can apply many processing techniques to the data. However, for Video and Voice, only very special purpose processing is done to maximise the quality (the 4mform24m). A great detail of research is needed before more interesting processing could be done on what is said and how during a video conference (4mthe24m 4mcontent24m). However, material from off‐line is presented on the same screens in our system as the video, so a document being discussed is juxtaposed to the speakers. • Webster defines communication as: com.mu.ni.ca.tion ‐.my:‐n*‐’ka‐‐sh*n n 1: an act or instance of transmitting 2a: information communicated 2b: a verbal or written message 3: an exchange of information pl 4a: a system (as of telephones) for communicating 4b: a system of routes for moving troops, supplies, and vehicles 4c: personnel engaged in communicating 5: a process by which meanings are exchanged between individuals through a common system of symbols pl but sing or pl in constr 6a: a technique for expressing ideas effectively in speech or writing or through the arts 6b: the technology of the transmission of information The rest of this note is into three main sections: 1. A brief description of a fairly successful production of a document through asynchronous communication. This is used to set the parameters for what may be of interest in synchronous collaboration. 2. Details of two major meeting and possible documents produced from them. ‐ 3 ‐ 3. An approach to help with the description of the visual side of human computer mediated ommunication is described. This may also be used to help prescribe what visual channels for communication are used when for systems which have limited bandwidth and cannot allow fully cross connected video and audio (Many systems have full cross connected audio with speaker video only for bandwidth reasons). Finally we present some conclusions. 1m2. Asynchronous Collaborative Experience0m A paper was written by 4 authors, 2 in London, 1 in Cambridge and one in Colorado. Two of the authors had not met face to face. The paper involved a repeatable experiment and performance analysis. It took about 1 week elapsed time to write and will be published in a respectable (refereed) journal [4]. It evolved as follows. i. Author in US E‐mails author in London with anomalous output from an experiment. ii. Author in London E‐mails 2nd author in London with problem statement. iii. 2nd author in London re‐implements experiment, and discovers same results. 1st author in London outlines components in the system that could be causing the problem. iv. Author in Colorado writes script to run experiment and graph output, and sends to London. v. 2nd Author in London hypothesises problem explanation, and a possible solution. vi. Authors in London E‐mail author in Cambridge and Colorado with problem and hypothetical solution. vii. Cambridge author E‐mails colleagues for availability of code to implement solution. viii. Meanwhile, draft paper is E‐mailed to interest list. ix. Member of interest list is editor of journal, asks if we want to submit it ‐ we do. x. Referees comments sent (anonymously) by E‐mail that they want to see implementation of solution. ‐ 4 ‐ xi. None of authors have source code to implement solution, so they ask interest‐list. xii. Colleague in Sweden on interest list changes code, compiles and sends new (unix kernel) object to London (by E‐mail). xiii. Authors in London run script by author in colorado, get correct results. These are (automatically) included in the final paper which is E‐mailed to the editor of the journal. This sequence of events hides two extremely important factors in the success of the collaboration: 1. Commonality of interest/experience. The authors came from very similar technical backgrounds ‐ (by coincidence, all trained in engineering followed by computing). 2. Independance of tools. The authors were all able to use tools they were familiar with for mail processing, editing and document preparation as well as actually running the experiment. It also reveals two of the major advantages of using asynchronous communication: 1. Authors can work simultaneously rather than having to gain some token as floor chair or editor‐of‐the‐ moment. 2. Synchronisation of versions of a document is achieved at the same time as communication of a version ‐ i.e. when any one author posts a new version, most users see this at roughly the same time. These advantages are at some expense ‐ authors may work simultaneously to the same effect, and replicate each others effort. Authors may diverge, and have to merge subsequent versions of the work somehow.2 However, replicated work has scientific value and divergence may may well be productive. The reduction of waiting time to gain the lock on a version ____________________ 2. The use of a revision control system was obviated by the documents arriving as e‐mail, and therefore having a unique source/author and timestamp/message id. Merging in many RCS/SCCS type systems makes use of context difference systems which are available in any case as part of the normal operating system tools. ‐ 5 ‐ was certainly advantageous (especially given different time zones, when finding the person who had the lock might take 1/2 a day). 1m3. Synchronous Meetings and Minutes0m In this section we describe two meetings that were held in rather different ways using the multi‐media conferencing system, and report some of the users’ experiences. 4m3.124m 4mThe24m 4mIETF24m 4mDirectory24m 4mWorking24m 4mGroup0m The first is a 4 site video conference which made use of the UCL to US system to hold a research group meeting of around 35 participants. The meeting was held in rooms that are designed like small studios with a small audience and a small number of podium speakers, with cameras switchable from audience to podium or individual speaker. The resulting document from this meeting was a report of the meeting which are available as a UCL technical report. Workstations were available at each site during the meeting. Normal editing/document preparation tools were available as well as normal electronic mail and file transfer. Groups like this usually chose not to use the shared editing system for several reasons: 1. Lack of familiarity ‐ mmconf/Slate is a hybrid GUI WYSYWIS system ‐ partly like Apple MAC, partly like Sun Open Look and partly X Windows. 2. Size of the meeting and lack of screen real estate to show the documents to all sites as well as video. It is felt (whether correctly or not we cannot say) that the video is more important. 3. This particular desired independent minutes from each site ‐ this did not require shared editing. If they had used shared editing, they may have felt that the meeting dynamics were undermined by the floor exchange/token exchange delays. 3.1.1 Pre‐meeting E‐Mail Setting up the meeting itself with aims clearly stated and strong chairing seems to be useful. Electronic mail informed all the users of a strict timetable for the agenda and presentation based meeting. e.g. ‐ 6 ‐ 17:00 Introduction o Discussion of Videoconference modus operandi o Agenda o Minutes of previous meeting o Matters arising o No liaisons! 17:15 Document Status. Review status of all working documents, Internet Drafts, and submitted RFCs. 17:25 Presentation of Pilot Activity ... 18:15 US/Europe liaison issues 18:30 Management of ‘‘experimental’’ object identifiers 18:40 Naming Guidelines (Paul Barker) 19:10 Representing Network Information (Chris Weider) 19:55 Security (Peter Yee) 20:20 Naming in the US in light of NADF 123 (Marshall T. Rose?) 20:50 Date and Venue of next meeting 20:50 ‐‐ 21:00 AOB Notice the use of a strict timescale and objectives. Also, this group do a great deal of work by E‐mail (typically a few messages a day over the last few months, to the groups distribution list). In the event, due to technical difficulties the strict timescales did not come into play. The meeting started late (20 mins or so) and the connection went down for at least a quarter of an hour at one point. The meeting also finished early ‐ of order a quarter of an hour. The agenda was not followed strictly as some participants were not available for the whole meeting.3 4m3.224m 4mAn24m 4minformal24m 4mMeeting0m The UK‐US Network Interconnection is managed by a group of expert networks operations staff (the Operational Management Group ‐ OMG) who have monthly video conferences. These have roughly 2 people per site, and 3 or 4 sites in the US and UK. These meetings are held in a small meetings room with wall projection TV and a workstation running a shared editor. This is used to show output from previous meetings and experiments/measurements of the networks availability/performance. ____________________ 3. Paul Barker commented: 4mMy24m 4mfeeling24m 4mwas24m 4mthat24m 4mthe24m 4mmeeting0m 4mwas24m 4mmuch24m 4mlike24m 4mother24m 4mmeetings24m 4mI24m 4mgo24m 4mto:24m 4mthose24m 4mattending0m 4msense24m 4mwhen24m 4msomething24m 4mimportant24m 4mis24m 4mbeing24m 4mdiscussed,24m 4mand0m 4mwhere24m 4mprogress24m 4mis24m 4mbeing24m 4mmade,24m 4mand24m 4mthe24m 4mdiscussions24m 4mrun24m 4mon0m 4mto24m 4mallow24m 4mthe24m 4misue24m 4mtime.0m ‐ 7 ‐ One document that was produced as a result of the planning and exchanges of ideas in these meetings can be found in [3]. The use of a shared view for these meetings is very successful. Again, like the asynchronous collaboration, this group has common expertise. A major difference is that the group is task oriented, and thus can identify a given member to show a given document, while the others simply view. Under some circumstances, others will extract data from some document or graph, and re‐process it, but this works well in this environment. No locking or token system is used ‐ simply the ability to show a workstation window or screen to the other sites. 1m4. Gesture Detection ‐ Future Work0m We are working on annotating video tapes the live meetings so that we can make objective comparisons between video and 4mface24m 4mto24m 4mface24m meetings. To this end, we are adapting one of the modern Ballet notations as a 4mnon‐verbal24m 4mcommunication0m 4mdescription24m 4mlanguage24m ‐ we’ve called this Balgol ’92. Dance notation or the process of recording peoples movement dates from Arbeau’s early attempts[5]. A particularly scholarly recent work is Guest’s [6], which includes a very useful chapter on the use of computers for annotating movement. Recent work in Zoology and Neuro‐physiology has demonstrated the special applicability of the Eshkol‐Wachmann movement notation [7] to the objective study of movement and gesture [8].4 We have dismissed Labanotation and Benesh notation as deriving too from the musical context from which they arose, and for the pragmatic reason that they depend on the human interpretation of their somewhat hieroglyphic syntax. Eshkol‐Wachmann, however, is based partly on an engineering model of movement and is more amenable to computer recognition and analysis. Other work related to this is the use of the (Set theory based) formal specification language Z to specify 4mfloor0m 4mcontrol24m schemas ‐ this allows a completely general description of any floor control algorithm. We have looked at how 4mspeech24m 4macts24m [9][10][11] may be used to help structure ____________________ 4. We are indebted to Michael Recce from the UCL departments of Computer Science and Anatomy for directing us to the rich body of work in this area. ‐ 8 ‐ bids/negotiations for the floor. [1] By analogy, we are trying to see if the same could be applied to automatic gesture recognition to two ends: 1. Description. There is constant debate comparing face to face meetings with video conferences [e.g. 12]. We need an objective measure of increased quality of meeting when audio and text are enhanced with a view of the participants. We need to know how many participants need to be seen, and at what resolution for a given increase in communication richness against a given cost (more cameras/displays/bandwidth). 2. Prescription If we have limited bandwidth, we may use detection of "significant" movements to switch camera focus/view to other users in a video‐conference. Judging significance may be feasible. Initial work in the UCL Anatomy department has already demonstrated that it is feasible to recognise limited sets of postures in rats in real time from video frames. Two important pre‐requisites are: 1. Foreground/background contrast must be very high 2. The Lexicon of gestures/postures must be known in advance, and areas of interest outlined. Both of these requirements can be easily fulfilled in a videoconference.5 To this end, we have started designing a language with a lexicon of common postures. The syntax of language is based on the idea of Eshkol/Wachmann that we understand the limbs and body as a mechanical system. Thus we can describe movement of limbs around a spherical coordinate system centered on the torso. A gesture is some sequence leading from one posture to an. ____________________ 5. We note that many digital video systems use parallel machines to run the video compression necessary to transmit video over todays limited bandwidth networks. These compression techniques include motion detection/compensation and also transforming the image to the frequency domain. Both of these if available to a gesture detection system would automatically provide strong hints as to the locality of "significant" movement. ‐ 9 ‐ One modest experiment will be the ability to detect the visual equivalent of an interruption ‐ currently the audio systems use silence suppression. It may be useful in situations with limited bandwidth to employ similar approaches with the video. Most people attempt to get attention in a meeting by waving their arm in the air. This is very easily detected as compared with most other, more random movements. 1m5. Conclusions0m The system is available for use, subject to booking. We can allow video recording of meetings if any researchers wish to study how a real system is used when the bandwidth is severely constrained, but the cost‐benefit of the system over travel is very large. The distribution of the authors has a profound effect on the way document authoring is done. In the two video meetings described, two different approaches were tried in producing minutes/paper. The directory group had all sites write full minutes, and then had the chairperson merge them. This may have produced very accurate minutes, but 1mincreased 22mthe workload. The Operations Management Group meeting was used solely as a blue‐skying/planning meeting, after which a number of tasks were allocated, and the users went away, carried out those tasks, and reported their findings back to the authors of [3] who the wrote their paper. This is similar to the experience with the purely asynchronously authored paper [4], although here the success was based on shared experience of the area of discourse (via education) rather than an initial face to face or video meeting. In the broadest sense, documents just record some information, so the videos of the conferences/meetings are themselves a form of document. Indeed, we predict that in the future, (automatically) edited highlights of a video meeting may replace minutes. 1m6. References and Acknowledgements0m Thanks to Steve Hardcastle‐Kille and the members of the IETF’s OSI‐DS Group for permission to quote their meeting experience and minutes. Thanks to anonymous referees and Angela Sasse of UCL for comments on movement notation. 1. "Multimedia TeleConferencing over International Packet Switched Networks." J. Crowcroft, P.T. Kirstein, D. Timm, Proc IEEE TriComm ’91, April 1991 ‐ 10 ‐ 2. "Specification, Design, and Implementation of an Interactive Conferencing System", Mark d’Inverno Jon Crowcroft, April 1991, Proceedings of Infocomm 91, IEEE 3. "Traffic Analysis of some UK‐US Academic Network Data.", Crowcroft, J and Wakeman, I. Proc INET ’91 Copenhagen, June 1991 4. "Layering Considered Harmful", J. Crowcroft, D. Sirovica, I. Wakeman, Z.Wang, To appear, IEEE Networks, Jan 1992. 5. Orchesography (Thonot Arbeau, 1589, trans mary stuart evans, Dover 1967) 6. Dance Notation, Ann Hutchinson‐Grant, Dance Books, 1984 7. Movement Notation, N Eshkol & A Wachmann, London, Weidenfeld and Nicholson, 1958. 8. Golani, I, "Homeostatic motor processes in mammalian interactions: a choreography of display", Perspectives in Ethology, vol 2, pp 69‐134, Plenum Press, NY, 1976 9. Speech Act Theory and Pragmatics, J.R.Searle Ed., Reidel Publishing Company, 1980, pp 40‐53, S. Davies ‐ Behaviour... 10. Foundations of Illocutionary Logic J.R.Searle, Vanderveken, Cambridge, 1985 pp36 ‐40, exposition of illocutionary point ‐ taxonomy of points 11. On Human Communication, Cherry. MIT Press 1957. 12. Heath C, Liuff, P. "Disembodied Conduct: Communication through video in a multi‐media office environment", ACM SIG HCI 1990. 1m7. Appendix: Extracts from IETF DS Report and Comments0m 4m7.124m 4mComments0m What the users said about the real time element: • BBN: Not as good as a face to face meeting, but better than E‐mail. • RIACS: might be more effective to choose a few items and discuss to focus on the issues. ‐ 11 ‐ • ISI: Technical quality appalling ‐ too much delay. Echo annoying. Sound poor. Scale: E‐mail ‐‐ 1, in person ‐‐ 10, then generally video ‐‐ 7, but this time ‐‐ 4 due to the delay and quality. on line terminal may help. • UCL • (SEK): ‘‘interesting’’, some useful discussion. Presentations did not work. If too technical interchange did not work. • Colin Robbins: The delay and quality made it very hard to hold a real meeting. • Comments from Paul Barker: "You get a lot of information from seeing someone and hearing them talk. I’d never met any of the people from the States before, but I was rapidly able to form a picture of who the "politicians" were, as supposed to the strongly technical. In E‐mail, I edit myself very carefully ‐ I missed not being able to do this in "strange" company!! The large delay6 in the system made me very self conscious when talking. To some extent, I played with the technology to see how long the propagation delays were ‐ I’d scratch an ear and wait for the image in front of me to copy me! The delay also made for a very presentational style of address to the meeting. For various reasons I was rather under‐prepared to talk on a subject which I had to present to the meeting. Whereas at a live meeting I would have been fairly happy to have "chatted" informally on the subject, such chat and half thought out ideas seemed to jar somewhat with the formal style of presentation (with the speaker very definitely holding the floor). Self‐consciousness again, but it made me feel rather ridiculous." ____________________ 6. At the time this meeting was held, the 4 way meeting was run by relaying n sites in the US through a single mixer/quadruplexor site ‐ this involved decompressing the video, mixing and recompressing. The CODECs which perform compression do so partly by buffering a number of frames and differencing them, thus introducing large delays. Relaying this is not a good idea (in fact one may introduce artifact in the picture this way too due to different lossy compression techniques interfering). ‐ 12 ‐ • Comments from Steve Titcombe: • Quality of sound The sound quality was fairly reasonable, from all sites except one, which seemed to have electronic bubble noises popping and bursting very time someone from that site talked. This was not too annoying, but it did mean that you had to listen carefully to hear what was being said. • Quality of Pictures Picture quality at the UCL site was very good, the screen monitoring what was being broadcast out was very good. Other sites pictures were pretty good, but when split down into a 2x2 grid for four sites, it was possible to see if someone had a beard or not, but no more. (This was referring to their shots of an entire room.) 4m7.224m 4mReport0m The meeting report is highly structured, and follows the sequence of events in the meeting. Reports often attempt to hide the author of each contribution, whereas this one, being merged from 4 versions produced ateach site, repeatedly states the origin of each idea. It is not highly readable, except perhaps to members of the group present.
No comments:
Post a Comment