HAL Id: jpa-00225842
https://hal.archives-ouvertes.fr/jpa-00225842
Submitted on 1 Jan 1986
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
FAST TREATMENT OF TWO-DIMENSIONAL DETECTOR DATA USING PROCESSOR ARRAYS
D. Rimmer
To cite this version:
D. Rimmer. FAST TREATMENT OF TWO-DIMENSIONAL DETECTOR DATA USING PROCESSOR ARRAYS. Journal de Physique Colloques, 1986, 47 (C5), pp.C5-189-C5-192.
�10.1051/jphyscol:1986525�. �jpa-00225842�
JOURNAL DE PHYSIQUE
Colloque C5, supplkment au n o 8, Tome 47, aoiit 1986
FAST TREATMENT OF TWO-DIMENSIONAL DETECTOR DATA USING PROCESSOR ARRAYS
D.E. RIMMER
Institut L a u e - L a n g e v i n , 156X, F-38042 G r e n o b l e C e d e x , France
Resume
-
Les d e t e c t e u r s d l o c a l i s a t i o n s p a t i a l e DLS (ou PSD en a n g l a i s ) a t t e i g n a n t une r e s o l u t i o n d e p l u s e n p l u s & l e v e e e t l e s d u r e e s d - a c q u i s i t i o n d e s s p e c t r e s B t a n t d e p l u s en p l u s c o u r t e s ( s o u r c e s d h a u t f l u x ) , il e s t donc n e c e s s a i r e d - a c c r o i t r e l a p u i s s a n c e informatique d e t r a i t e m e n t d e s donnees e n temps r e e l . A 1 - h e u r e a c t u e l l e l e moyen l e moins coQteux d'y p a r v e n i r e s t l ' a d j o n c t i o n d e p r o c e s s e u r s v e c t o r i e l s d 1 - o r d i n a t e u r , mais c e t t e s o l u t i o n s ' a v e r e r a p e u t Otre i n s u f f i s a n t e pour l e s b e s o i n s f u t u r s . Des m a t r i c e s b i - dimensionnelles de microprocesseurs p o u r r a i e n t o f f r i r une s o l u t i o n , combinant un p o t e n t i e l enorme de p u i s s a n c e de c a l c u l avec une a r c h i t e c t u r e r e f l e t a n t c e l l e d e s m a t r i c e s d e c h i f f r e s f o u r n i e s p a r l e d e t e c t e u r .Rien que d e s d i f f i c u l t & t e c h n i q u e s e n t r a v e n t l a p r o d u c t i o n de t e l l e s machines, l e I?AP ICL e s t un exemple d'un t e l m a t e r i e l a c t u e l l e m e n t o ~ e r a t i o n n e l quoique avec un environnement l o q i c i e l l i m i t s dii d l a f a i b l e s e r i e de p r o d u c t i o n de c e modiiile. Cet a r t i c l e p o r t e sur l e s a v a n t a ~ e s e t l e s desavantages d e t e l l e s machines, base s u r une e t u d e simple e t s u c c i n t e de l ' u t i l i s a t i o n du DAP pour l a r e d u c t i o n de donnees d e s DLS.
A b s t r a c t
-
A s PSD-s go t o h i g h e r r e s o l u t i o n and frames o f c o u n t s a r e c o l l e c t e d i n s h o r t e r p e r i o d s (from h i g h f l u x s o u r c e s ) t h e r e i s a need t o i n c r e a s e t h e o n - l i n e computing power. A t t h e p r e s e n t time add-on v e c t o r i a l p r o c e s s o r s o f f e r t h e most c o s t - e f f e c t i v e way of a c h i e v i n g t h i s , b u t may n o t be c a p a b l e of keeping up w i t h f u t u r e needs. Two-dimensional a r r a y s of microprocessors could o f f e r a s o l u t i o n , combining enormous p o t e n t i a l computing c a p a c i t y w i t h an a r c h i t e c t u r e m i r r o r i n g t h a t of t h e d a t a a r r a y s from t h e d e t e c t o r .Although t e c h n i c a l d i f f i c u l t i e s a r e hampering t h e p r o d u c t i o n of such machines, t h e ICL DAP p r o v i d e s an example of one which i s i n r o u t i n e a p e r a t i o n , a l b e i t w i t h a l i m i t e d s o f t w a r e environment, due t o t h e s h o r t p r o d u c t i o n r u n of t h a t model. The paper d i s c u s s e s advantages and disadvantages o f such machines, based on a b r i e f and s i m p l i f i e d s t u d y o f u s i n g t h e DAP f o r PSD d a t a r e d u c t i o n .
I
-
INTRODUCTIONLarge PSD-s used w i t h s t r o n g l y s c a t t e r i n g c r y s t a l s and a h i g h f l u x beam o f n e u t r o n s o r X-rays can g e n e r a t e raw d a t a a t a v e r y high r a t e . I n t h e n o t t o o d i s t a n t f u t u r e one could e n v i s a g e a l o 6 c e l l d e t e c t o r from which frames of d a t a were o b t a i n e d 10 t i m e s p e r second.
The problem of t r a n s f e r r i n g and s t o r i n g such q u a n t i t i e s o f d a t a ; t o g e t h e r w i t h t h e d e s i r e t o have some irmnediate i n t e r p r e t a t i o n o f t h e measurement w i l l g e n e r a t e a need t o p u t c o n s i d e r a b l y more computing power t h a n h a s h i t h e r t o been n e c e s s a r y , o r p o s s i b l e , o n - l i n e a t t h e i n s t r u m e n t .
A t t h e p r e s e n t time t h e most powerful computers a v a i l a b l e a r e t h e s o - c a l l e d ' v e c t o r i a l * machines. The stand-alone models (such a s CRAY o r CDC-CYBER) a r e n o t
Article published online by EDP Sciences and available at http://dx.doi.org/10.1051/jphyscol:1986525
JOURNAL DE PHYSIQUE
designed f o r such a r e a l - t i m e r o l e , b u t t h o s e designed a s an -add-on- t o a h o s t (such a s a r e marketed by FPS and CSPI) a r e a l r e a d y b e i n g i n t r o d u c e d f o r f a s t o n - l i n e d a t a r e d u c t i o n .
Whilst t h e s e machines a r e w e l l a b l e t o handle c u r r e n t d a t a r a t e s it i s n o t c l e a r how f a r such an a r c h i t e c t u r e i s capable o f h i g h e r speeds. They a c h i e v e t h e i r performance w i t h a r e l a t i v e l y s m a l l number of s p e c i a l i s e d p r o c e s s o r s e a c h u s i n g t h e p i p e l i n i n g t e c h n i q u e , working a t maximum e f f i c i e n c y on long v e c t o r s .
For t h e f u t u r e it may be n e c e s s a r y t o use a machine comprising a v e r y l a r g e number of p r o c e s s o r s working i n p a r a l l e l , synchronously o r asynchronously. A d e f i c i e n c y of t h e v e c t o r i a l machine f o r h a n d l i n g PSD d a t a i s t h a t it can o n l y t r e a t a two-dimen- s i o n a l a r r a y a s a s e t of v e c t o r s ( o r one long v e c t o r ) . One of t h e dimensions must be handled s e r i a l l y .
For t h i s r e a s o n t h e r e i s c o n s i d e r a b l e i n t e r e s t i n t h e i d e a of having a two-dimen- s i o n a l a r r a y of coupled microprocessors which would map o n t o t h e elements o f t h e d a t a a r r a y s g e n e r a t e d by t h e PSD. Each microprocessor would t a k e r e s p o n s i b i l i t y f o r one, o r a s m a l l group of p i x e l s , depending on t h e r e l a t i v e dimensions of t h e d e t e c t o r and p r o c e s s o r a r r a y .
C o n s t r u c t i o n of a v i a b l e c o s t - e f f e c t i v e machine based on such a concept h a s , up t o now, been hampered by t e c h n i c a l d i f f i c u l t i e s . The r e c e n t announcement by Inmos of t h e i r T r a n s p u t e r ( a 32-bit microprocessor designed t o be coupled i n m u l t i p r o c e s s o r c o n f i g u r a t i o n s ) may p r o v i d e a s o l u t i o n t o t h i s problem.
There does e x i s t , however, one machine based on t h e s e p r i n c i p l e s which is i n r o u t i n e o p e r a t i o n , namely t h e ICL D i s t r i b u t e d Array Processor (DAP). This machine, announced by I C L i n t h e l a t e 1 9 7 0 - s , h a s n o t been cost-performance c o m p e t i t i v e , and p r o d u c t i o n h a s been abandoned i n i t s e x i s t i n g form. The few machines which were b u i l t a r e i n use i n B r i t i s h u n i v e r s i t i e s and Government l a b o r a t o r i e s . With such a s m a l l number of machines s o l d I C L - s s o f t w a r e development and g e n e r a l s u p p o r t have been l i m i t e d and do not e n a b l e a t r u e assessment t o b e made of t h e p o t e n t i a l v a l u e and performance of such a machine. N e v e r t h e l e s s , it does p r o v i d e a working example of a p o s s i b l e f u t u r e d e s i g n o f p r o c e s s o r a r r a y and o f f e r s what i s probably t h e o n l y o p p o r t u n i t y i n Europe t o s t u d y t h e t a s k of programming such a device.
This paper i s based on e x p e r i e n c e gained d u r i n g a one month s t a y a t Edinburgh Uni- v e r s i t y , where two DAP machines a r e i n s t a l l e d . Within such a l i m i t e d p e r i o d it was o n l y p o s s i b l e t o t a c k l e a r a t h e r simple p r o j e c t . This comprised a s i m u l a t i o n of a PSD sweeping through a h e a v i l y p o p u l a t e d r e g i o n of r e c i p r o c a l s p a c e , with summation
of t h e c o u n t s found under each Dragg peak.
I1 - SUMMARY OF DAP FEATURES
T h i s d e s c r i p t i o n o f t h e DAP i s l i m i t e d t o such e s s e n t i a l f e a t u r e s a s a r e n e c e s s a r y f o r understanding t h e r e s t of t h e p a p e r . For more i n f o r m a t i o n r e f e r t o / I / .
The DAP c o n s i s t s o f a s e t of o n e - b i t p r o c e s s o r s c a l l e d p r o c e s s o r elements (P.E.-s).
Each P.E. h a s i t s own memory and can a l s o communicate, v i a s p e c i a l r e g i s t e r s , with i t s 4 n e a r e s t neighbour P.E.'s i n t h e +x, +y d i r e c t i o n s . It i s t h i s p r o p e r t y which c r e a t e s t h e connections t o make a two-dimensional a r r a y .
Vperations executed by t h e P.E. s a r e synchronised and i d e n t i c a l ( w i t h r e s p e c t t o t h e i r own r e g i s t e r s and memory). The o n l y e x c e p t i o n i s t h a t P.E.'s may b e d e a c t i - v a t e d , i n which c a s e t h e y simply i g n o r e t h e b r o a d c a s t i n s t r u c t i o n o r d e r . I n t h e terminology o f p a r a l l e l p r o c e s s o r s t h e DAP i s an SIMD ( s i n g l e i n s t r u c t i o n , m u l t i p l e data-stream) machine.
The synchronous a p p l i c a t i o n of t h e same o p e r a t i o n t o every element of t h e a r r a y g i v e s t h e DAP an enormous p o t e n t i a l computing power. However t o f u l l y e x p l o i t it r e q u i r e s
t h a t t h e a l g o r i t h m s have an a p p r o p r i a t e s t r u c t u r e . The p e n a l t y , i n l o s s of speed, f o r p r e s e n t i n g simple s c a l a r o p e r a t i o n s i s obviously v e r y s e v e r e .
Each Edinburgh University DAP comprises an a r r a y of 64 x 64 P.E.-s. This s u g g e s t s t h a t i t s p o t e n t i a l power i s 1 2 8 t i m e s t h a t of a normal 32-bit machine of t h e same speed c i r c u i t r y . I n f a c t it could be h i g h e r s i n c e it i s more e f f i c i e n t t h a n a normal machine a t handling 1 - b i t l o g i c a l o p e r a t i o n s .
The DAP can have no p e r i p h e r a l d e v i c e s d i r e c t l y a t t a c h e d t o it. A l l a c t i v i t i e s such a s i n p u t / o u t p u t a r e handled by a h o s t machine which must be from t h e ICL 2900 s e r i e s . The DAP n o t having gone i n t o f u l l - s c a l e p r o d u c t i o n , t h e s o f t w a r e a v a i l a b l e f o r it i s l i m i t e d . Even a l l o w i n g f o r t h i s , one can say t h a t programming a machine of t h i s t y p e p r e s e n t s a number of problems. For example, t h e assignment of memory space f o r a r r a y s (of any dimension) i s a t t h e d i s c r e t i o n of t h e programmer, s i n c e he must o p t i m i s e t h e lay-out w i t h r e s p e c t t o t h e P.E.-s i n t h e knowledge of t h e f u n c t i o n s he wishes t o c a r r y o u t . The -one-bit l o g i c a l - i s an important d a t a t y p e f o r s e t t i n g d e a c t i v a t i o n masks. A Fortran-type language, DAP-Fortran, h a s been produced t o p r o v i d e f o r t h e s e , and o t h e r , o p e r a t i o n s , n e c e s s a r y f o r t h e e f f i c i e n t programming of t h e DAP.
A s a g e n e r a l r u l e , a program must be completely redesigned i f it i s t o r u n e f f i - c i e n t l y on t h e DAP. T h i s c o n t r a s t s w i t h t h e v e c t o r i a l machines which, w h i l s t t h e y w i l l b e n e f i t from r e o p t i m i s a t i o n , w i l l normally a c c e p t o r d i n a r y F o r t r a n programs and make e v e r y e f f o r t t o p i p e l i n e t h e o p e r a t i o n s .
I11 - APPLICATION TO PSD DATA REDUCTION
A t f i r s t s i g h t t h e DAP a r c h i t e c t u r e seems i d e a l l y s u i t e d t o h a n d l i n g d a t a from a PSD. But how much improvement i n speed can one r e a l l y hope t o achieve ? T h i s c l e a r l y depends on t h e o p p o r t u n i t i e s t o f u l l y e x p l o i t t h e p a r a l l e l i s m , by a p p l y i n g t h e same o p e r a t i o n t o e a c h element of t h e a r r a y .
Consider, f o r example, t h e m a t t e r of background e l i m i n a t i o n . I f , f o r t h e t r e a t m e n t of a p a r t i c u l a r frame, t h e c r i t i c a l background t h r e s h h o l d is deemed t o be a l r e a d y known
( e i t h e r c o n s t a n t , o r a simple f u n c t i o n of p o s i t i o n ) a l l c e l l s can be t e s t e d immedia- t e l y and simultaneously t o s e e i f t h e y c o n t a i n s i g n i f i c a n t above-background counts.
But i f one'sprocedure r e q u i r e s t h e background l e v e l t o be c a l c u l a t e d from s t a t i s t i c s o b t a i n e d from t h e c u r r e n t frame, t h e n one must compute t h e n e c e s s a r y i n f o r m a t i o n , which n e c e s s a r i l y i n v o l v e s some s e r i a l o p e r a t i o n s (even though f u n c t i o n s such a s summing t h e c o u n t s o v e r a l l c e l l s can b e speeded up by p a r a l l e l o p e r a t i o n s ) .
Concerning t h e identification of t h e edge of Bragg peaks, i f t h i s i s t o be done u s i n g an a l g o r i t h m based simply on i n s p e c t i o n of t h e c o u n t s found i n t h i s and e a r l i e r frames, t h e n t h e p r o c e s s should be v e r y f a s t . Assuming one P.E. p e r d e t e c t o r c e l l , e a c h P.E. knows t h e count f o r i t s own c e l l , and can immediately o b t a i n t h o s e f o r i t s n e a r e s t neighbours. I t can t h u s make a f i r s t guess a s t o whether i t s c e l l l i e s f i r m l y i n t h e background, i n t h e i n t e r i o r of a peak, o r on t h e edge of a peak. This i n f o r m a t i o n can t h e n be c i r c u l a t e d t o n e a r e s t neighbours and t h e f i r s t assessment confirmed o r modified. Groups of c e l l s w i l l q u i c k l y come t o a consensus t h a t t h e y belong t o t h e same peak. The time t o achieve t h i s i s n e c e s s a r i l y a f u n c t i o n of t h e complexity of t h e a l g o r i t h m , b u t i s c o n s t r a i n e d e s s e n t i a l l y o n l y by t h e time t o p a s s t h e i n f o r m a t i o n a c r o s s t h e peak.
However i f t h e peak l i m i t s a r e t o be determined by i n c o r p o r a t i n g a d d i t i o n a l i n f o r - mation, such a s p r e v i o u s l y - d e r i v e d d a t a on p o s i t i o n of c e n t r e s and shapes of peaks, t h e n one i s i n t r o d u c i n g position-dependent r u l e s , which, u n l e s s t h e y can be e a s i l y converted t o a r r a y s of parameters f o r use i n a s t a n d a r d a l g o r i t h m , w i l l d e s t r o y t h e p a r a l l e l i s m . This p o i n t h a s n o t been followed up i n d e t a i l .
Phe f m a l a c t i v i t y , of summing t h e c e l l - c o u n t s r e l a t i n g t o t h e same Bragg peak, a l s o seems t o be a n o n - p a r a l l e l o p e r a t i o n , s i n c e t h e Bragg peaks a r e d i s t i n c t and must be
C5-192 JOURNAL DE PHYSIQUE
i n d i v i d u a l l y l a b e l l e d . Although t h i s a p p e a r s t o i n v o l v e a s e r i a l o p e r a t i o n , t h e loop i s o n l y o v e r t h o s e peaks which a r e a c t i v e i n any frame. T h i s would normally be one o r two o r d e r s o f magnitude l e s s t h a n t h e number of c e l l s i n t h e frame.
For t h e examples s t u d i e d , with a d m i t t e d l y very simple a l q o r i t h m s , i n c r e a s e s i n speed of between 20 and 60 were o b t a i n e d . This i s w i t h r e f e r e n c e t o running t h e same a l g o r i t h m e n t i r e l y on t h e h o s t ICL 2976, which may be t a k e n a s an equivalent-techno- logy s e r i a l machine. The p r e c i s e f a c t o r was p r i n c i p a l l y a f u n c t i o n of t h e mean number of Bragg peaks simultaneously a c t i v e i n any frame, a s suggested above.
I t h a s n o t been f e l t worthwhile i n v e s t i n g any f u r t h e r time t o speed up t h e programs s i n c e any r e f i n e m e n t would be dependent on t h e p a r t i c u l a r f e a t u r e s of t h e DAP, which may w e l l n o t be followed on any f u t u r e r e a l i s a t i o n s of p r o c e s s o r a r r a y s .
The one s i n g l e development which would b r i n g t h e g r e a t e s t improvement would be t o i n t r o d u c e a 3-dimensional a r r a y of p r o c e s s o r s . One c o u l d t h e n map t h e a c t i v e Bragg peaks o n t o t h e elements of t h e t h i r d dimension, and o p e r a t e on them simultaneously with a l l c e l l elements. For t e c h n i c a l r e a s o n s such a machine must s t i l l be a lonq way o f f .
I V
-
CONCLUSIONOne must be c a u t i o u s n o t t o draw sweeping c o n c l u s i o n s from such a s u p e r f i c i a l explo- r a t i o n of a major new a r e a . However it was r e a s s u r i n g t o d i s c o v e r t h a t even w i t h t h e somewhat rudimentary programming t o o l s a v a i l a b l e it was p o s s i b l e , w i t h i n one month, to l e a r n DAP-Fortran and g r a s p t h e c o n c e p t s o f p a r a l l e l programming s u f f i c i e n t l y w e l l t o be a b l e t o c o n v e r t some simple b u t n o n - t r i v i a l programs and achieve improve- ments i n speed t o l e v e l s which were n o t t o o f a r from t h e t a r g e t speed of t h e machine.
The g e n e r a l impression gained from t h e e x e r c i s e was t h a t , i n s p i t e of t h e problems connected w i t h u s i n g what must be r e g a r d e d a s a p r o t o t y p e p r o c e s s o r a r r a y , t h e r e was a c e r t a i n i n t e l l e c t i o n s a t i s f a c t i o n i n t a c k l i n g t h e problem on a machine whose hard- ware a r c h i t e c t u r e m i r r o r e d t h e d a t a s t r u c t u r e s b e i n g t r e a t e d , l e a d i n g t o a f e e l i n g t h a t t h i s p r o v i d e s i n p r i n c i p l e a n a t u r a l approach t o t h e problem.
I t would be i n t e r e s t i n g t o compare t h e DAP philosophy with t h e a l t e r n a t i v e , namely t h a t o f u s i n g c l u s t e r s o f microprocessors o p e r a t i n g asynchronously, b u t w i t h t h e p o s s i b i l i t y of communicating w i t h one a n o t h e r . The a u t h o r h a s no d i r e c t e x p e r i e n c e of such systems, b u t s u g g e s t s t h a t one p o s s i b l e approach would be t o a s s i g n one microprocessor t o e a c h peak, w i t h i n s t r u c t i o n t o s e a r c h e a c h frame f o r c o n t r i b u t i n g c e l l s . T h i s would seem s u i t e d t o t h e s i t u a t i o n i n which one h a s p r i o r knowledge of peak p o s i t i o n s and shapes.
The a u t h o r looks forward t o t h e time when m u l t i p r o c e s s o r a r r a y s become commercially v i a b l e , f e e l i n g t h a t t h e y could make a n i m p o r t a n t c o n t r i b u t i o n t o t h e problem of handling d a t a from PSD-s.
ACKNOWLEDGEMENT
The a u t h o r wished t o t h a n k Prof G . S . Pawley of Edinburgh U n i v e r s i t y f o r making it p o s s i b l e t o c a r r y o u t t h i s p r o j e c t , and f o r s p e c i f i c h e l p i n understanding t h e DAP machine and programming environment.
REFERENCES
/1/ Burke, P . G . and Delves, L.M. (Eds) Computer Physics Communications 26 (1982) 217.