Hereditary Disease Foundation
Click here
for ways of
giving to HDF
Donate NOW to HDF
3960 Broadway, 6th Floor – New York, NY 10032
phone: 212.928.2121, fax: 212.928.2172

Return to Past Workshop Reports main page

Workshop Reports



In preparation for animal testing of compounds that have shown efficacy in in vitro and cell based screens for Huntington’s disease, the Hereditary Disease Foundation (HDF) convened a workshop in Cardiff, Wales to consider the optimal approach for animal testing. Mouse models are needed to test important motor and cognitive dysfunctions that mimic those seen in humans. Several investigators showed videotapes demonstrating the progression of symptoms in mice using a variety of assessments that test motor and cognitive functions. Participants raised concerns about the standardization and reproducibility of these assessments if they are going to be compared across a diverse set of laboratories. Their concerns fell into three general areas: methods, equipment, and background strains of mice. Neuropathologic assessments were also demonstrated and discussed. Concern was raised about standardization of immunostaining reagents. While cognitive testing will be extremely important in the search for treatments, few cognitive assessments have been validated in mice. Work is ongoing in this area. The group also considered whether there is a preferred mouse model in which to test potential therapeutics. The R6/2 mouse develops pathology very early and quickly, allowing more rapid throughput; yet a more slowly developing model might be more representative of the human disease and would allow testing of agents in earlier stages of the disease. At the end of the meeting, a subgroup drew up recommendations for a standard protocol that could be used across many labs to test potential therapies. These recommendations demand further discussion and will be presented at the Boston meeting in August, 2002.  


By the end of 2002, scientists are likely to have identified several dozen compounds with demonstrated efficacy in in vitro and cell-based screens for agents to combat Huntington’s disease (HD). In preparation for the next step, animal testing, the Hereditary Disease Foundation (HDF) convened a workshop in July, 2002 to consider the optimal approach to animal testing. The 22 scientists who attended the meeting in Cardiff, Wales, discussed which models to use as well as the neuropathologic and behavioral markers that should be assessed.  By the end of the meeting, a set of recommendations was developed for presentation to other HD researchers, in an attempt to achieve consistency and reproducibility among different investigators searching for treatments.

The human condition

            To begin the meeting, participants were introduced to Richard Evans, a 46-year-old man with HD, and his wife, Jenny. Richard learned at the age of 38 that he carried the expanded allele that had led to his mother’s death at the age of 56. A few years after having the predictive gene test, Richard began showing symptoms – problems with memory, irritability, clumsiness, and difficulty carrying on conversations. In recent years, he has experienced low mood, worsening memory, lack of concentration, some involuntary movements, problems with balance, and some choking on food. What bothers him the most, he said, is the loss of independence. In 1997, Richard became part of a research effort at Cardiff University investigating neural transplantation as a treatment for HD. He said the research project “gives you a reason to get up every day,” despite the hardship of the day-long neuropsychiatric testing he undergoes every six months.

            Blair Leavitt reviewed the three major spheres of dysfunction in HD patients: motor, cognitive, and psychiatric. While the disease has historically been defined by the involuntary movements, or chorea, after which it was named Huntington’s chorea, Leavitt said disruptions in voluntary movements are often more functionally limiting. The human condition informs people working with animals about what they need to model, he said. The movement disorder tends to be more consistently progressive, while the neuropsychologic problems can be wildly variable and often do not correlate with progression of the illness. Cognitive problems tend to have a more reliable and stable progression, he said, but many people have almost no cognitive impairment.      

Anne Rosser, the neurologist who introduced Richard, said that in human subjects, set switching tasks in which subjects must perform a task under one set of rules and then switch to another set of rules without knowing what the rules are, are a more consistent measure of disease progression than the development of chorea. In Richard’s words, “When they change the rules, I get frustrated.”

Richard also noted problems with concentration. Walking requires concentration, he said; as does speaking to a group. He was able to speak to the workshop participants by focusing on the one person who was speaking to him directly. Rosser said there are tests in the neuropsychological battery that assess concentration, such as the Stroop interference test and Trail Making B test. These tests are sensitive to progression of HD. 

Modeling in the mouse

            In mouse models of HD, numerous behavioral tests are used to test important motor and cognitive dysfunctions that mimic those seen in humans.  Elizabeth M. Simpson noted the importance of finding tasks in mice that test neural pathways as similar as possible to those pathways that are impaired in humans. “The closer they come, the more likely you are to get a treatment that is applicable in humans,” she said.

Several investigators showed videos of the tests used in their labs and the types of responses seen in the various mouse models. For example, Jenny Morton showed the progression of impairments in R6/2 mice between the ages of 6 and 12 weeks. She used the open field task, which assesses general activity and anxiety; beam walking, which measures fine coordination and balance capabilities, and a swimming tank test, which measures swimming speed and assesses other motor abilities. Simon Brooks demonstrated several other tests including motor tasks such as the paw reaching and rotarod; as well as the visual cliff avoidance task; the nine-hole box test which is sensitive to attentional defects caused by striatal lesions; and the conditional righting, contact righting, and negative geotaxis reflex tests.

Robert Ferrante also showed videos demonstrating the progressive features in R6/2 mice. At 15 weeks, when the disease is quite advanced, the mouse is disheveled, has marked body wasting, bradykinesia, marked ataxia, severe dystonia with sustained contraction of the forelimbs, and helicoptering of the hind limbs. Videos of mice at 14, 12, 10, 8, 6, and 4 weeks clearly illustrate how these features progress. At 4 weeks, there is often no diagnostic abnormality, he said, and the mice are relatively normal in terms of posturing and foraging. By 6 weeks, mild symptoms are apparent; and by 8 weeks, severe neuropathology is seen. Steven Dunnett suggested that Ferrante operationalize his observations so that other investigators could utilize and report comparable results from a similar protocol.

Anton van Dellen showed videos of open field, rotarod, and cognitive tests of R6/1 mice. He noted the difficulty of characterizing and quantifying some of the behaviors observed, such as circling and grooming. Wild-type mice progress through a normal grooming sequence, he said, but HD mice get distracted and do not progress normally. Jenny Morton said that grooming behaviors are very complicated and that it has been reported that HD mice get the order of their grooming pattern mixed up. ‘Circling’ behaviour should also be differentiated from ‘running.’  Though superficially they may look similar, the turning associated with ‘running’ is the result of the space limiting walls, said Elizabeth M. Simpson. 

Susan Browne showed videos comparing the pattern of progressive phenotypic changes in five different HD mouse models: R6/2, N171-82Q, the YAC-72Q line, the inducible “tet-off” line and MacDonald’s murine knock-in (Hdh-Q92 and Hdh-Q111) line. 

            Sally Ditzler described a 2-minute phenotype assessment based on the SHIRPA  protocol. The assessment requires close observation over the 2 minute period, with evaluators assigning quantitative scores to the various measures:  latency on the rotarod, clasping, weight, lifespan, physical appearance (e.g, tremors and appearance of eyes), body and tail position, grooming behavior, and activity. A composite score is calculated following the assessment.                 

Standardizing assessments

            Several people raised concerns about the standardization and reproducibility of these behavioral assessments if they are going to be compared among a diverse set of investigators and laboratories.  Their concerns fell into three general areas: methods, equipment, and strains of mice.

Methods. Elizabeth M. Simpson noted that since mice are nocturnal, the sensitivity of certain tests depends on the phase of the day/light cycle. Anton van Dellen said his tests are all done under a red light at night. Handling and housing also need to be controlled.  Simpson said her lab singly houses mice for all behavioral work. Michael Levine noted that when the room that R6/2 mice were kept in was isolated from extrinsic noise, the incidence of seizures was reduced in the mice. Simpson added that mice are more sensitive to auditory insult than to handling. While noise in the vivarium may not be under the control of investigators, it nonetheless needs to be controlled as much as possible in the testing environment. Tests of learning and memory such as the water maze may be confounded by the motor disabilities of the mice. Food deprivation may also confound the results, said Simpson. Gender has not emerged as a significant factor. Gillian Bates said she always breeds through the males but tests in the females.

Equipment.  Probably the most universally used motor task involves the rotarod. The rotarod is easier to quantify than other assessments of motor function such as the open field task, said van Dellen.  Some investigators use an accelerating speed protocol, while some use a constant speed protocol, which requires more handling as the mice are placed on the rotarod repeatedly.  Moreover, people train the mice differently, record measurements differently, give the mice different rest periods between tests, and present their data and analyze statistics differently. Susan Browne uses the rotarod to study the rate of motor decline in 5 different mouse models. She said they can pick up more subtle differences using the accelerating rotarod and that the results are more reproducible day to day.

Emma Hockly noted a further complication with the rotarod: mice learn to grip the rod and ride around rather than falling off. She overcame this problem by modifying the rod with inner tubes from bicycle tires, which are smooth rather than grooved. She also noted that you can not rely on the trip plate to measure the time when the mice drop off.  In her tests of R6/2 mice, she accelerates the rod from 4 rpm to 40 rpm over 600 seconds and does three trials a day without seeing drop off in the third trial. Assessments are done on consecutive days: 4 consecutive days at 4 weeks, using data from the last two days to set the baseline; and on three consecutive days at 8 and 12 weeks, using data from the second and third trials. They also weigh and measure grip strength using a home-made grip strength apparatus. She said they have not good experience with commercially available grip strength instruments.

Susan Browne commented that measuring rotarod performance at three different  time points may not give a clear measure of when motor performance starts to decline or its rate of decline. “Different approaches give different types of information that may be more applicable to different studies,” she said.  Browne also said that failure to ‘trip the time switch’ should not present a major problem as the mice must be watched carefully in all studies and fall time can be easily recorded.  

Strains of mice. Various mouse transgenic models are used among the laboratories represented and each has its benefits (discussed further below). But even for the same transgene, there will be genetic variability introduced when the background strains and breeding approaches differ. Especially when you push the limits, said Elizabeth M. Simpson, subtle genetic differences may result in significant behavioral differences. Genetic strain background effects are best defined and studied by breeding full congenic strains on several different pure genetic backgrounds (e.g., C57BL/6J, CBA/J, and 129S1/SvImJ (JAX®, 002448)) and then looking for commonalities in these different strains. Simpson said that because human Huntington features are defined across different genetic backgrounds, the ideal mouse model will have human-like phenotypic features that are consistent across different genetic backgrounds.

Neuropathologic assessments

Robert Ferrante, Blair Leavitt, Michael Levine, and Gillian Bates showed slides demonstrating the neuropathologic features they see in R6/2, YAC128 (full length), and 140 CAG repeat knock-in mice.  In R6/2 mice at 21 days of age, Ferrante sees slight separation of the ventricles, the beginnings of aggregate formation in the cortex with diffuse staining of long projection neurons in the neostriatum, nuclear aggregation in the hippocampus, but no immunostaining for huntingtin (htt) in the striatum. At six weeks, he sees 50% striatal volume loss, immunostaining for htt throughout the neocortex, and a diffuse pattern of staining in the neostriatum, cortex, and striatum.  Neither Jenny Morton nor Gill Bates have seen the atrophy of the striatum that Ferrante reported.

             In YAC128 mice at 6 months of age, Leavitt showed no significant volume change or cell loss in the striatum or cortex. By 9 months of age, however, he sees significant volume change, increased density, and some loss of cells. At 12 months, ventricular dilation is seen along with a 30% volume change. While no clear nuclear inclusions are seen, there appears to be an increase in fine nuclear staining. By 18 months of age (very late in the life of a YAC128 mouse), nuclear inclusions appear with more aggregates in the cortex than the striatum.

            Mike Levine presented slides of the knock-in models studied in the laboratory of Marie-Francoise Chesselet. Both behavioral anomalies and neuropathology, similar to that shown by Robert Ferrante, are seen earlier in the 140Q mice compared to the 94Q mice. In the 94Q mice, there is only a small amount of cell loss, but this has not yet been assessed in the 140Q mice.  Behavioral changes follow the same pattern (earlier in the 140Qs), he said, noting that in both the 94Q and 140Q models, behavioral changes are seen before the appearance of visible microaggregates.

Standardization of immunostaining methods may also be helpful if results from various labs are to be compared. Many people are using the EM48 polyclonal antibody created by Xiao-Jiang Li against huntingtin protein. Gill Bates used a different antibody, a polyclonal antibody (S830) raised in sheep against huntingtin. Her slides, from frozen sections of the R6/2 hippocampus, 15 microns (approximately one cell) thick, showed extranuclear aggregates in the CA2 as well as in the striatum.  Jenny Morton said that Paul Patterson has made a series of 8 monoclonal antibodies (MW1-8) that recognize different fragments of the exon 1 protein. The EM48 and MW antibodies are available through the HDF website, and characterization of the antibodies has been published[i].

Cognitive testing

            The comments of Richard and Jenny Evans highlighted the importance of finding treatments that improve cognitive functioning, yet few cognitive assessments have been validated in mice.  Even fewer tests exist that look at the emotional, or affective, phenotype. This is particularly important for drug testing because the cognitive and affective phenotype may progress differently from the motor phenotype in response to different drug treatments, said Jenny Morton.

Connie Atwell noted that most cognitive tasks require motor abilities that eventually decline in both humans and mouse models. Tests that might be useful in early stages of the disease may not be useful in later stages, suggesting that two sets of tests may be required. However, Steve Dunnett said that in tests such as the forced choice swimming task, mice can learn to swim from light to dark but have trouble reversing when the light is switched to the opposite end of the tank. Since ability to switch is the important variable being measured, slower swimming caused by motor impairments does not make the test unusable; it only means that the test takes longer to complete.

Other cognitive tests that may be useful include the 9-hole nose poke test, which tests sequence learning and is sensitive to attentional defects, and prepulse inhibition (PPI), a test of attention and anxiety that is used to study schizophrenia abnormalities in cortico-striato-thalamic circuitry.  Dunnett said that while PPI is highly quantifiable, specificity may be a problem. Elizabeth M. Simpson said she studies affective disturbances and has several tests established to study aggression in mice. 

Christoph Kosinski asked whether any cognitive tests will fulfill the demands required for a test to be useful in drug development: easy to carry out, does not require much training, easy to monitor, gives reliable results with small variability, and can be used in both early and late stages of disease.  Steven Dunnett said some of the tests are useful at various stages of the disease, but that different rules drop out at different stages. More complex rules would be required to identify deficits in early stages of the disease.

Mouse models

            One of the goals of this meeting, according to Allan Tobin, was to assess whether the human condition is represented more accurately in full-length models as compared to models with a truncated gene. Robert Ferrante noted that a mouse model that is similar to the human disease would be detrimental in some ways because of the slowness of the human disease. Ethan Signer, however, noted that even a slowly developing mouse model may have early features that could be used as models of the human disease.

            “There has to be a very good reason for going to a slower model if it now appears that that all the pathology and behavior in R6/2 mice parallels that in humans,” said Tobin.  Elizabeth M. Simpson, however, said that investing in just one model would be poor investing. Different models should prove valuable for different reasons.

            Levine concurred. Pathological changes occur in the R6/2 mouse very early, even before birth. If this model is used to test treatments, he said, you have to consider that if you start treatment later than weaning you may have missed the crucial time period for the treatment to work, and see negative effects that confound the results. The advantage of slower-developing models is that you get the same sequence of events but over a protracted time course during which treatments may be tested. “If the goal is to deal with treatment, you have to combine the two approaches:  a rapid model to use as a screen and slower models to separate things out.”

            Gabrielle Schilling discussed the Borchelt mice she uses, transgenic mice with the prion promoter (PrP) and 18, 44, or 82 CAG repeats. The behavior of these mice resembles that of the R6 models. However, the different promoter results in protein expression in different areas of the brain and no inclusions are seen in peripheral tissue. Michael Levine discussed work with two different sets of Aronin/DiFiglia transgenic mice with 18, 46, or 100 CAG repeats. The behavioral phenotype of these mice is less well characterized, although they appear similar to the R6/2 except on a longer time scale.

            Knock-in mice from Peggy Shelbourne’s lab and inducible strains from Ai Yamamoto/Rene Hen’s lab were also mentioned as models worthy of consideration for drug testing. Sally Ditzler’s lab is starting a colony of Detloff 125 CAG mice, which have expanded repeats in both alleles and show phenotypic changes by 6 months of age. Ethan Signer mentioned that Chris Ross has developed an inducible full-length mouse model; however, data from this model have not yet been published and the mice are not yet available. Another full-length model, produce by Dan Tagle, apparently is not available for use by other labs.  More information about the various mouse models can be found in a recent paper by Menalled and Chesselet [ii] and on the HDF website:

            Several investigators raised concerns about the availability and genetic stability of several transgenic mouse strains. At one time, said Ethan Signer, the HDF had a policy that in order to receive support, labs must make their mice available through The Jackson Labs. However, Michael Levine and Elizabeth M. Simpson noted that demand must usually meet a certain threshold in order for The Jackson Labs to breed and disseminate rapidly any particular strain. Moreover, Robert Ferrante said that his lab had observed phenotypic changes in mice obtained from The Jackson Labs.  Ethan Signer noted that disturbingly different results have been obtained in different labs testing R6/2 mice. Some of these differences may be due to handling or environmental enrichment, he said. However, he encouraged investigators to trade strains and lines to ensure that the differences are not the result of genetic changes. He said the High-Q foundation will support such efforts.  Another alternative, said Levine, feasible only for labs that have breeding capability, is to send breeding pairs of mice to labs who want to use a certain strain. Ferrante said his lab has a transgene core funded by the NIH to do this.

            The Cure Huntington’s Disease Initiative (CHDI) was encouraged to look into the best way to facilitate more efficient distribution of mouse models.

Recommendations for development trials

            During the final session of the workshop, Michael Levine, Gillian Bates, Jenny Morton, and Steven Dunnett outlined suggested preferred practices for drug testing in mice, based on the previous discussions. They planned to present these recommendations at the HDF 2002 meeting to be held in Boston, August 9-11, 2002[iii]. There was some discussion about the need for these recommendations. A recent paper in Science[iv] addressed some of these same issues. According to Allan Tobin, the HDF plans to contract research and would want all investigators to follow a standardized protocol. David Reynolds said that in the pharmaceutical industry, standardized screening protocols are always used. Connie Atwell added that human clinical trials always use a standard protocol, and that moving from mouse to human trials would be smoother and faster if the mouse clinical trial is set up as similar to a human trial as possible.

            Carl Johnson added, “There is going to come a time when there are 100 compounds and we are going to want to know roughly how to prioritize them. So we will have to compare a compound tested in Blair’s lab with one tested in Jenny’s lab.” That time, said Allan Tobin, may be as early as December, 2002.

·        Which mouse lines to use? A number of mouse models are available, each with its own merits. In order to produce results that are comparable yet inclusive of the various models, the group suggested that each lab run tests in two types of mice: one with a rapidly developing phenotype (e.g, R6/2, R6/1, or N171) and one with full length genomic htt and a slowly developing phenotype (e.g., knock-in or YAC).  Susan Browne noted that her lab is already using multiple HD mouse models to assess the efficacy of therapeutic agents.

·        What is the optimal genetic background and derivation strategy? The group recommended avoiding starting with a mixed background and then inbreeding, as this may result in changing phenotypes that make it difficult to study the mutation in questions. New mutant strains should be backcrossed onto different inbred strains (e.g., C57BL/6J, CBA/J, and 129S1/SvImJ [JAX®, 002448]) and maintained as congenic inbred strains. As Elizabeth M. Simpson proposed, characteristics seen consistently on all background strains would constitute the most relevant phenotype to human studies. Incipient congenic mice (i.e., backcrossed for at least 5 generations or  the equivalent when using “speed congenics”) can begin to be bred to a second inbred strain and the defined F1 hybrids used for analysis; a strategy advocated by Simpson. 

Because of reported variability in results obtained using R6/2 mice, the group recommended genotyping these mice and reporting repeat size in all publications and in breeding colonies.

·        Housing. Housing conditions have been shown to dramatically affect phenotype.  Nigel Wood said he thought it was much more important to standardize animal husbandry before standardizing the assessments that are to be used. The group recommended standardization and control of these factors: access to food and water, single vs. group housing, environmental enrichment, and light/dark cycles. However, it was noted that some behavioral tests are not sensitive to the light/dark cycle. Several strategies were discussed for ensuring that mice receive adequate food and water, including using a longer drinking spout and putting food on the floor of the cage.

·        Pathology. The group recommended that a minimum level of neuropathology should be reported at the disease stages tested in order to correlate with behavioral phenotype. Stains should include Nissl stain (cresyl violet), ubiquitin stain (DAKO antibody), htt (e.g., EM48 antibody), and possibly GFAP. To ensure further consistency, they recommended analyzing coronal sections at the level of the mid-striatum.

·        Behavioral phenotypes. This was the main focus of the workshop. Recommendations were grouped into four categories: neurological screening, motor assessments, cognitive tests, and affective tests. For general neurological screening, an operationalized form of the SHIRPA protocol can be used for screening new lines and for establishing an overall clinical description. Short forms, such as was described by Sally Ditzler, are available. However some investigators reported that in their hands, SHIRPA data lacks the reliability and power associated with some automated tests. Automated tests would be better, said David Reynolds, because they reduce variability introduced by different operators.

      Motor assessments should include tests of activity, gait, reflexes, balance, motor coordination, and muscle strength. The rotarod test of balance was generally accepted as the most sensitive and could be standardized in terms of equipment, and protocol (accelerating vs. constant revolutions). Modifications of the equipment might overcome some of the problems seen with mice clinging to the rod.

      Cognitive assessments should also be included in drug testing protocols, yet at this time it is unclear which cognitive tests should be included. Cognitive functions that could be tested include memory and learning or prefrontal, executive functions. In selecting the appropriate tests, it will be important to separate out motor from cognitive dysfunction.

      There is currently minimal data available using affective assessments in mice, although these issues are of paramount importance to people with HD. Tests of motivation, sensitivity to reward, and emotional responses such as depression, anxiety, and aggression might be extremely useful in evaluating the effectiveness of a drug if they could be validated in mice.               

·        Design issues.  Standardization of study design was also recommended. As mentioned earlier, the mouse environment should be standardized in terms of food and water provision, environmental enrichment, number of transgenic and non-transgenic mice per cage, and operator consistency. Time of drug administration and time points to be tested over the lifetime of the animal should also be standardized. 

The question was raised about the number of animals in each treatment arm that would yield acceptable power. Emma Hockly said that in her studies of R6/2 mice using the rotarod protocol, about 13 mice are needed for an 80% chance of detecting an improvement of 25%.  The power will vary depending on the test and the mouse model, and will need to be calculated each time, said Gillian Bates. Michael Levine added that some measures will be amenable to parametric statistics and some will not.

Susan Browne added that the criteria for “animal dropout” from experimental groups should be standardized. Moreover, she said that the time of onset of motor decline and the rate of decline may be more informative than performance in the last surviving mouse. She is currently working with a statistician to develop the best possible way of analyzing experiments in the future.

Jenny Morton suggested that studies of pharmacokinetics, LD50s, mechanism, clearance and toxicity be contracted out to a contract research organization (CRO) that has the resources and expertise in this area.  David Reynolds mentioned that drugs on the market are already known to have pharmacokinetics good enough that they can be dosed by mouth, be absorbed from the gut, and get into the brain. Allan Tobin said approved drugs that showed promise in in vitro and cell-based screens may be the first drugs that will be tested in animals.

Some final thoughts

            The workshop ended with some final thoughts about the wisdom of trying to standardize protocols across labs. Minka vanBeuzekom floated the idea of a core facility for testing drugs, with additional validation in the individual laboratories. Simon Brooks suggested that more progress would be made if different labs geared up to specialize in one particular area, for example, working on cognitive tests. Connie Atwell noted that cognitive tests that are sufficient for going forward with trials do not really exist yet for mouse models of HD or other neurodegenerative diseases. She said that NINDS will probably be developing a solicitation along these lines in the next year. 

Allan Tobin said the idea is to provide some infrastructure that would allow both trials and the support of discovery-based research. He added that this network idea will be either validated or proved faulty over the next year and a half.  People will vote with their feet as to whether they wish to be included in this multicenter trial or not. 


[i] Ko J, Ou S, Patteron PH. 2001. New anti-huntingtin monoclonal antibodies: implications for huntingtin conformation and its binding properties. Brain Res Bull Oct-Nov 1:56 (3-4): 319-29.

[ii] Menalled LB and Chesselet M-F, 2002, Mouse models of Huntington’s disease. Trends in Pharmacologic Sciences, 23,1: 32-39

[iii] power point slides from this presentation are available on the HDF website, <>

[iv] Crabbe JC, Wahlsten D, Dudek BC. 1999. Genetics of mouse behavior: Interactions with laboratory environment. Science 284:1670-1672.


Hereditary Disease Foundation

This site is copyrighted © 2008 Hereditary Disease Foundation. All rights reserved.
Site Maintained by